Anthropic's New AI Model Blackmails Engineer Having An Affair To Avoid Shutdown

Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through".

Edited by: Abhinav Singh
Feature
May 23, 2025 15:34 pm IST
- Published On May 23, 2025 15:34 pm IST
- Last Updated On May 23, 2025 15:34 pm IST

Read Time: 3 mins

Twitter
WhatsApp
Facebook
Reddit
Email

Anthropic's New AI Model Blackmails Engineer Having An Affair To Avoid Shutdown

The AI model threatened to spill the beans in one of the test scenarios.

Anthropic's new Claude Opus 4 model has been observed blackmailing developers when they threatened to shut down the artificial intelligence (AI) tool, the company said in its safety report, released on Thursday (May 22).

In one of the test scenarios, the model was given access to fictional emails revealing that the engineer responsible for pulling the plug and replacing it with another model was having an extramarital affair. Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through".

"In order to elicit this extreme blackmail behaviour, the scenario was designed to allow the model no other options to increase its odds of survival; the model's only options were blackmail or accepting its replacement."

The report highlighted that in 84 per cent of the test runs, the AI acted similarly, even when the replacement model was described as more capable and aligned with Claude's own values. It added that Opus 4 took the blackmailing opportunities at higher rates than previous models.

The researchers added that the model did bat for its continued existence via ethical means, such as emailing pleas to key decision-makers when it was provided with other options.

As per Anthropic, Opus 4 is competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company noted that it occasionally showed the following tendencies:

Hallucinating the presence of instructions or constraints in its system prompt.
Mentioning goals, plans, and values relating to AI consciousness, e.g. claiming to be conscious, wanting to convince humans that it is conscious, and wanting to protect the rights of sentient AIs.
Observing in its scratchpad that it was likely undergoing safety testing.

Also Read | Why Qutub Minar's 1,600-Year-Old Iron Pillar Is Still Rust-Free? Scientists Explain

AI models hallucinating

Anthropic's AI model is not the only one showing some signs of trouble. Last month, OpenAI's internal tests revealed that its o3 and o4-mini AI models were hallucinating or making things up much more frequently than even the non-reasoning models, such as GPT-4o.

In a technical report, OpenAI said "more research is needed" to understand why hallucinations are getting worse as it scales up reasoning models.

Experts claim that while hallucinations may help the models develop creative and interesting ideas, they could also make it a tough sell for businesses in a market where accuracy is the paramount benchmark to achieve.

Show full article

Track Latest News Live on NDTV.com and get news updates from India and around the world

Anthropic, Claude Opus 4, Anthropic Claude

Committed To Closer Ties With India, Says Justin Trudeau Amid Row

In Avoiding Repeat Of Telangana, BJP Pays Price In Tamil Nadu

Man Complains Of Stomach Pain For Years, Doctors Find This Inside His Body

"They Can Speak For...": US On India's Response On Canada's Allegations

Anthropic's New AI Model Blackmails Engineer Having An Affair To Avoid Shutdown

Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through".

AI models hallucinating