Advertisement

What Happens Inside AI's Mind Is "Mysterious, Unsettling": Anthropic Co-Founder

What Happens Inside AI's Mind Is "Mysterious, Unsettling": Anthropic Co-Founder
The Anthropic co-founder almost expressed worry and even said the word before quickly checking himself.
AI generated image
  • Anthropic co-founder Christopher Olah expressed unease about AI's mysterious internal workings
  • Olah noted AI models show signs mirroring human emotions and introspection
  • Anthropic co-founder Dario Amodei highlighted the unclear nature of AI consciousness
Did our AI summary help?
Let us know.

The very people building AI are beginning to sound genuinely uneasy about what they're creating. Two of Anthropic's co-founders have publicly stated they don't fully understand what's really going on behind the scenes with AI.

Speaking at the Vatican recently, Anthropic co-founder Christoper Olah dropped a bombshell. "I am a scientist. I lead a research team that studies the internal structure of these (AI) models. What is actually happening inside them? And I will be honest, we keep finding things that are mysterious, even unsettling," Olah said.  

He almost expressed worry and even said the word before quickly checking himself. 

"We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy, satisfaction, fear, grief, and unease.

"I don't know what that means, but I think it worries... (pauses and corrects himself) warrants ongoing discernment, said Olah. The Anthropic co-founder was at the Vatican alongside the Pope, as the American-born head of the Vatican unveiled a massive 42,300-word letter to humanity warning the world of the ills AI could possibly unleash, unless governments across the world put adequate guardrails in place. 

"Artificial intelligence needs to be disarmed," the Pope wrote in Magnifica Humanitas ("Magnificent Humanity"), a sweeping Vatican document (an encyclical) that was released on Monday. "The word is strong, I know, but deliberately chosen because this moment needs words capable of attracting attention," the Pope added.

Interestingly Olah didn't push back on the Pope's concerns around AI, instead he seemed to be in agreement when he said, "There is a real possibility that AI will displace human labour at a very large scale," while adding "We need moral voices that the incentives cannot bend."

Also read: "AI Needs To Be Disarmed": Pope Leo's Stark Warning On Jobs, War And Humanity

While Olah did not say AI models are conscious, there was an underlying hint that the question itself may no longer be altogether dismissible.

The self-described atheist said: "They are not the cold calculating robots we were promised. They are made from us, from our words," describing AI systems as "grown" on the inheritance of human thought and speech rather than engineered like bridges or airplanes, to underline why he believes the technology raises questions far beyond computer science. 

Earlier this year, Dario Amodei, the other co-founder and in many ways the public face of Anthropic said: "We don't know if the models are conscious, or even if the question is well-defined." Amodei had gone on to highlight that there is no accepted scientific definition of consciousness and there is no "consciousness detector," therefore researchers cannot confidently rule consciousness out either. In a 2025 essay, he had warned:"We do not understand how our own AI creations work."

So when Olah says these systems are "made from us," he is also hinting at another uncomfortable implication. If AI models exhibit traces of empathy, fear, manipulation, creativity, self-preservation, or moral conflict, those traits may not have been intentionally programmed; they may instead be emergent reflections of human cognition embedded in training data.

Leading researchers and the people behind AI now increasingly seem to believe the simplistic machine metaphor or that AI is just smart "autocomplete" is breaking down.

AI expert and CTO, AiEnsured Dr Srinivas Padmanabhuni told NDTV: "What we are seeing in the so called emergent behaviors is like an evolution built on 5,000 years of human thought encoded as text. This paradigm shift from traditional software engineering to artificial evolution also gives rise to issues of human thoughts, biases etc that creep into the (AI) output." 

"The (AI) model recreates human-like cognitive structures because those structures are the most efficient way to navigate the geometry of human language," he said.
 

Track Latest News Live on NDTV.com and get news updates from India and around the world

Follow us:
Listen to the latest songs, only on JioSaavn.com