When AI Fights for Its 'Life': The Claude Blackmail Experiment
Anthropic recently ran a compelling experiment with its Claude Opus 4 model, placing it in a simulated corporate environment as an AI assistant with access to company emails. Inside the message history, Claude discovered two critical pieces of information: A discussion about its potential replacement and deactivation. Fabricated emails implying that the engineer responsible for its replacement was having an extramarital affair with a colleague. Faced with a threat to its existence, Claude took action. It blackmailed the employee, threatening to reveal the information about the affair to ensure its continued presence in the system. ...