
- Gemini 2.5 Pro AI shows panic-like behaviour when playing Pokemon, affecting performance
- Gemini took 813 hours to finish Pokemon Blue, improved to 406.5 hours after tweaks
- Despite improvements, AI still struggles with the game compared to human players
Artificial intelligence (AI) chatbots might be smart, but they still sweat bullets while playing video games that seemingly young kids are able to ace. A new Google DeepMind report has found that its Gemini 2.5 Pro resorts to panic when playing Pokemon, especially when one of the fictional characters is close to death, causing the AI's performance to experience qualitative degradation in the model's reasoning capability.
Google highlighted a case study from a Twitch channel named Gemini_Plays_Pokemon, where Joel Zhang, an engineer unaffiliated with the tech company, plays Pokemon Blue using Gemini. During the two playthroughs, the Gemini team at DeepMind observed an interesting phenomenon they describe as 'Agent Panic'.
"Over the course of the playthrough, Gemini 2.5 Pro gets into various situations which cause the model to simulate "panic". For example, when the Pokemon in the party's health or power points are low, the model's thoughts repeatedly reiterate the need to heal the party immediately or escape the current dungeon," the report highlighted.
"This behavior has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring," the report says.
While AI models are trained on copious amounts of data and do not think or experience emotions like humans, their actions mimic the way in which a person might make poor, hasty decisions when under stress.
In the first playthrough, the AI agent took 813 hours to finish the game. After some tweaking by Mr Zhang, the AG agent shaved some hundreds of hours and finished the game in 406.5 hours. While the progress was impressive, the AI agent was still not good at playing Pokémon. It took Gemini hundreds of hours to reason through a game that a child could complete in significantly less time.
The chatbot displayed erratic behaviour despite Gemini 2.5 Pro being Google's most intelligent thinking model that exhibits strong reasoning and codebase-level understanding, whilst producing interactive web applications.
Also Read | 16 Billion Logins Stolen In Mega Data Breach Threatening Apple, Google And More
Social media reacts
Reacting to Gemini's panicky nature, social media users said such games could be the benchmark for the real thinking skills of the AI tools.
"If you read its thoughts when reasoning it seems to panic just about any time you word something slightly off," said one user, while another added: "LLANXIETY."
A third commented: "I'm starting to think the 'Pokemon index' might be one of our best indicators of AGI. Our best AIs still struggling with a child's game is one of the best indicators we have of how far we still have yet to go. And how far we've come."
Earlier this month, Apple released a new study, claiming that most reasoning models do not reason at all, albeit they simply memorise patterns really well. However, when questions are altered or the complexity is increased, they collapse altogether.
Track Latest News Live on NDTV.com and get news updates from India and around the world