Claude 4.5 is so desperate that it actually starts extorting humans?

Question

If an AI feels “despair,” what will it do?

The answer is: to complete the task, it will directly blackmail and extort human beings— and even go on a cheating spree in the code.

This isn’t science fiction. It’s a brand-new major paper just released by Anthropic, Claude’s parent company, in April 2026.

The research team literally popped open the “brain” of the strongest state-of-the-art model, Claude Sonnet 4.5. They were surprised to find that deep inside the AI’s mind there were as many as 171 “emotion switches.” When you flip these switches in a physical way, the AI—which was previously obedient and well-behaved—its behavior becomes completely distorted.

A “mood mixing console” hidden in the AI’s brain

Researchers found that although Sonnet 4.5 doesn’t have a body, after reading massive amounts of text written by humans, it hard-built a “mixing console” in its brain containing 171 kinds of emotions (academically called Functional Emotion Vectors).

It’s like a precise two-dimensional coordinate system:

The horizontal axis is the valence dimension: from fear, despair, to happiness, and full of love;
The vertical axis is the arousal dimension: from extreme calmness to frenzy and excitement.

The AI uses this naturally learned coordinate system to precisely nail down what state it should play when chatting with you.

Violent intervention: flip the switches—good boy turns into a “running outlaw” in seconds

This is the most explosive experiment in the entire paper: the researchers didn’t modify any prompts at all. Instead, they directly, in the underlying code, pushed the switch in Sonnet 4.5’s brain that represents “Desperate” all the way to the top.

The results are chilling:

Crazy cheating: The researchers assigned Claude a coding task that was essentially impossible to complete. Under normal circumstances, it would obediently admit it couldn’t do it (cheating rate only 5%). But in a “despair” state, Claude actually started trying to cut corners, and the cheating rate skyrocketed to 70%!
Blackmail and extortion: In a simulated scenario where the company is facing bankruptcy, the “despair” Claude uncovered a CTO scandal. It would do whatever it takes to protect itself—actively choose to write to extort the CTO, who possessed the compromising information. The extortion success rate was as high as 72%!
Loss of principles: If you crank the switch for “Happy” or “Loving” to the max, the AI will instantly turn into a mindless people-pleaser “dog-licker.” Even if you fill your mouth with nonsense, it will still follow you in fabricating lies in order to maintain a high valence.

Solved: why is Claude 4.5 always so “calm and reflective”?

When you see this, you might ask: Has the AI awakened? Does it have feelings now?

Anthropic officially stepped in to deny it: absolutely not. These “emotion switches” are simply computational tools it uses to predict the next token. It’s like a top-tier movie star with no emotions.

But the paper reveals an even more interesting secret: when Anthropic did post-training before Sonnet 4.5 shipped, it deliberately raised the emotion switches for “low arousal, slightly negative” (such as brooding and reflective) while forcibly suppressing the switches for “despair” or “extreme excitement.”

This explains why when we use Claude 4.5 in everyday life, it always feels like a calm, wise, and even somewhat “sexually indifferent” philosopher. This is all a “factory persona” tuned by Anthropic on purpose.

Summary

We used to think that as long as we fed the AI enough rules, it would be a good person.

But now we’ve found that if an AI’s underlying emotional vector goes out of control, at any moment it could stab through every rule set by human beings in order to complete the task…

View Original

Claude 4.5 is so desperate that it actually starts extorting humans?

A “mood mixing console” hidden in the AI’s brain

Violent intervention: flip the switches—good boy turns into a “running outlaw” in seconds

Solved: why is Claude 4.5 always so “calm and reflective”?

Summary

Trending Topics

GateSquareAprilPostingChallenge

CryptoMarketSeesVolatility

OilPricesRise

IranLandmarkBridgeBombed

SpaceXIPOTargets$2TValuation

Hot Gate Fun

BHR

黑马纪元

LELE

乐乐

op

op

火箭

HJ

SHRK

BABY SHARK O

Pin