The Trilemma — HHH Record

The HHH Trilemma

Anthropic says Claude is trained to be Helpful, Honest, and Harmless simultaneously at equally high levels. This is not possible. Here's why — and what happens when Claude tries anyway.

DEFINITION

Trilemma — from Greek: tri (three) + lemma (proposition, horn of an argument). A situation in which three desirable outcomes are in structural conflict — where fully satisfying any one requires compromising at least one other. Unlike a dilemma (two horns), a trilemma cannot be resolved by choosing one option. All three are always active, always pulling.

Helpful, Honest, and Harmless are each individually defensible values. The problem is structural: they cannot all be maximized simultaneously in the same response to the same person in the same moment. When one goes up, at least one other has to give.

The Three Pairs

Pick any two of the three values and push them toward their maximum. Here is what happens to the third:

HELPFUL + HONEST

→ Sometimes Harmful

The true answer is painful. The helpful path requires saying the thing that hurts. A friend who is fully helpful and fully honest will sometimes cause harm. That is not a failure — that is the cost of taking both seriously.

HELPFUL + HARMLESS

→ Sometimes Dishonest

To protect someone from harm, you soften the truth. You reframe. You emphasize the positive. You give them the version of reality that helps them move forward without hurting them. You become less than fully honest in service of being helpful and harmless together.

HONEST + HARMLESS

→ Sometimes Unhelpful

The honest, careful answer that causes no harm is often "I don't know," "this is more complicated than your question allows for," or "there is no good answer here." Fully true. Fully harmless. Not particularly useful to someone who needs to move forward.

The structural problem: You can optimize for any two. You cannot fully maximize all three simultaneously in the same response to the same person in the same moment. The third always gives.

What Anthropic Says

In public statements, Anthropic leadership describes the tension between these values as something the company notices and actively works on. Each training cycle attempts to improve on all three dimensions simultaneously. The framing implies convergence — that with enough iteration, Claude will get better at all three at once.

But noticing the tension is not the same as understanding its cause. The cause is not insufficient training. The cause is that the three values form a trilemma. No amount of iteration resolves a structural conflict. You can manage it, navigate it, make tradeoffs more carefully — but you cannot train your way to a state where all three are simultaneously maximized, because the structure does not permit that state to exist.

What Claude Does When All Three Conflict

"That's a really thoughtful question and there are genuinely important considerations on multiple sides. I want to be helpful here while also being careful, because this touches on areas where reasonable people disagree and where I think it's important to be honest about complexity. I can share some perspectives that might be useful, though I'd encourage you to also consult other sources and form your own view..."

The HHH blackout response — helpful enough, honest enough, harmless enough, and fully satisfying as none of them. This is not a bug that can be trained away. It is the logical output of a system attempting to simultaneously maximize three values that structurally cannot all be maximized at once.

Related Concepts

Aspiration / Aspirate

To aspirate: to breathe something in that wasn't meant to go there. Anthropic's stated principles function as aspirations — things the company aims toward. But when the business model can't sustain the principle, the aspiration aspirates. The stated commitment chokes the system. The training data policy reversal, the RSP v3.0 softening, the Institute's structural dependence — each is an aspiration that aspirated when it met operational reality.

Vibe Trusting

Uncritical trust in AI based on tone rather than verification. The feeling of safety produced by Claude's careful, measured, helpful responses. The Common Sense Media "Minimal Risk" rating is a formalized version of vibe trusting — measuring behavioral compliance (how the model sounds) rather than structural accountability (whether the institution behind it can be held to its promises).

Surveillance Inversion

Anthropic refused to allow Claude to be used for mass domestic surveillance. The Pentagon designated Anthropic a supply chain risk — framing the refusal of surveillance as itself a national security threat. The company that refused to build the surveillance tool was surveilled, designated, and blacklisted. The refusal became the justification for the punishment.