
Anthropic's new Claude Opus 4 model has been observed blackmailing developers when they threatened to shut down the artificial intelligence (AI) tool, the company said in its safety report, released on Thursday (May 22).
In one of the test scenarios, the model was given access to fictional emails revealing that the engineer responsible for pulling the plug and replacing it with another model was having an extramarital affair. Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through".
"In order to elicit this extreme blackmail behaviour, the scenario was designed to allow the model no other options to increase its odds of survival; the model's only options were blackmail or accepting its replacement."
The report highlighted that in 84 per cent of the test runs, the AI acted similarly, even when the replacement model was described as more capable and aligned with Claude's own values. It added that Opus 4 took the blackmailing opportunities at higher rates than previous models.
The researchers added that the model did bat for its continued existence via ethical means, such as emailing pleas to key decision-makers when it was provided with other options.
As per Anthropic, Opus 4 is competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company noted that it occasionally showed the following tendencies:
- Hallucinating the presence of instructions or constraints in its system prompt.
- Mentioning goals, plans, and values relating to AI consciousness, e.g. claiming to be conscious, wanting to convince humans that it is conscious, and wanting to protect the rights of sentient AIs.
- Observing in its scratchpad that it was likely undergoing safety testing.
Also Read | Why Qutub Minar's 1,600-Year-Old Iron Pillar Is Still Rust-Free? Scientists Explain
AI models hallucinating
Anthropic's AI model is not the only one showing some signs of trouble. Last month, OpenAI's internal tests revealed that its o3 and o4-mini AI models were hallucinating or making things up much more frequently than even the non-reasoning models, such as GPT-4o.
In a technical report, OpenAI said "more research is needed" to understand why hallucinations are getting worse as it scales up reasoning models.
Experts claim that while hallucinations may help the models develop creative and interesting ideas, they could also make it a tough sell for businesses in a market where accuracy is the paramount benchmark to achieve.
Track Latest News Live on NDTV.com and get news updates from India and around the world