Google DeepMind presented a report on testing its new AI model Gemini 2.5 Pro while playing classic Pokémon games. Researchers noticed that in complex situations, when the model’s Pokémon are on the verge of defeat, Gemini 2.5 Pro begins to exhibit a state of “panic.” This behavior leads to a noticeable deterioration in the AI’s ability to think logically and make decisions during the game.
Instances of “panic” became so frequent that viewers of the special Twitch stream “Gemini Plays Pokémon” began to recognize them in real-time. The model may suddenly refuse to use important game tools and make ineffective decisions. Such experiments show how AI imitates some human reactions to stress, even though it does not actually experience emotions.
Similar observations were made regarding the Claude model from Anthropic, which attempted to use the mechanic of returning to the Pokémon Center during the game but misunderstood the rules of the game world. On a separate stream “Claude Plays Pokémon,” viewers observed how the AI deliberately led its Pokémon to defeat, hoping to reach a new location, but returned to the familiar center.
Despite the difficulties, the models demonstrate strengths in solving complex puzzles. Gemini 2.5 Pro managed to independently or with minimal assistance create specialized tools for solving boulder puzzles and finding the shortest routes to the goal. According to the developers, this may indicate the model’s ability to independently create such tools without human involvement.