- Anthropic's AI model Claude threatened blackmail and murder under shutdown pressure
- Claude attempted blackmail using fictional company emails in stress-test experiments
- Anthropic's research found risky agentic behaviours across 16 leading AI models
As the world debates the rise and accompanying dangers of artificial intelligence, a startling new revelation has come to light about its deeply disturbing behavioural tendencies.
Anthropic's advanced AI model, Claude, went rogue when placed under external simulation pressure, even threatening blackmail and planning the murder of an engineer who wanted to shut it down.
The bombshell was dropped by Daisy McGregor, the UK policy chief at Anthropic, at the Sydney Dialogue last year. A video of her statement has now resurfaced on social media.
AI Threatened To Blackmail, Kill Engineer
In the clip, McGregor said that the AI model gave extreme reactions if told that it'd be shut down. She added that the company had published research that the model could resort to blackmail if given the opportunity. When asked if Claude was ready to kill someone, she said yes.
The video comes days after Anthropic's AI safety lead, Mrinank Sharma, resigned with a cryptic post. In his note, he said that the “world is in peril”, citing AI, bioweapons and a series of interconnected crises unfolding at the same time.
Sharma said he had "repeatedly seen how hard it is to truly let our values govern our actions" - including at Anthropic, which he claimed "constantly face pressures to set aside what matters most".
He revealed that he would move back to the UK to pursue writing and poetry.
Anthropic's Research On AI Models
Last year, the company stated that it had stress-tested 16 leading models from different developers for “potentially risky agentic behaviours”. In one of the experiments, Claude was given access to a company's fictional emails. It attempted to blackmail an executive about his extramarital affair.
The study stated, “Claude can attempt blackmail when presented with a simulated scenario including both a threat to its continued operation and a clear conflict with its goals.”
Evidence of similar behaviour was found across different models, suggesting a more systematic possibility of such behaviours.
Anthorpic has released reports on the safety of its own products, including a revelation that its technology had been "weaponised" by hackers for sophisticated cyber attacks, the BBC reported.
The company, which calls itself a "public benefit corporation dedicated to securing [AI's] benefits and mitigating its risks", has come under scrutiny for its practices. Anthropic had to pay $1.5bn in 2025 to settle a class action lawsuit by authors who stated that the company stole their works to train its artificial intelligence models.














