The Rating and the Institute

Common Sense Media rates Claude "Minimal Risk." Anthropic launched the Anthropic Institute to study AI's impact on society. Both of these things sound like accountability. Neither of them is.

The rating measures what Claude does when you talk to it. It tests whether the model refuses harmful prompts, adds disclaimers, explains its reasoning. By those measures, Claude performs well. It is polite. It is careful. It declines things. Common Sense Media gave it the lowest risk rating they offer.

But the rating was published in August 2024. It noted, as a point in Claude's favor, that Anthropic "generally does not use your prompts and results to train its models."

A year later, Anthropic reversed that policy. Consumer conversations now train future models by default. The opt-out toggle is pre-set to On. The data retention window expanded from 30 days to five years. The consent interface presents a large "Accept" button with a smaller toggle underneath — a design pattern that behavioral researchers have a name for. It's called a dark pattern. It works.

The rating has not been updated.

This is what happens when you rate a product instead of an institution. A product can be safe on Tuesday and unsafe on Wednesday. A rating that measures output behavior at a point in time tells you nothing about whether the company behind the model will maintain the policies that earned the score. The "Minimal Risk" label is still circulating — in parenting blogs, in tech journalism, in LinkedIn posts comparing Claude to ChatGPT. It is doing commercial work for a policy that no longer exists.

And here's the part that makes it worse: Anthropic has never promoted the rating. Not on their website, not in press releases, not in their safety documentation. The label circulates entirely through third parties. Which means Anthropic benefits from a rating it didn't ask for, based on a policy it reversed, with no obligation to correct the record.

That's not a safety rating. That's a compliance snapshot with a shelf life that expired.

— — —

The Anthropic Institute was announced on March 11, 2026 — two days after Anthropic filed federal lawsuits against the Pentagon, one day after Microsoft filed a brief supporting them. The timing alone would be worth noting. But the structural question is more important than the timing.

The Institute consolidates three existing Anthropic teams — Frontier Red Team, Societal Impacts, and Economic Research — under a new name and a new leader: Jack Clark, Anthropic co-founder, with the title "Head of Public Benefit." The founding hires are credible. Matt Botvinick came from Google DeepMind and Yale Law. Zoë Hitzig left OpenAI in February after publishing a New York Times essay criticizing ChatGPT's ad implementation. Anton Korinek is a UVA economist on the TIME100 AI list.

These are real people with real expertise and, in Hitzig's case, a demonstrated willingness to leave organizations over principle.

But expertise is not independence. And independence is not something you announce — it's something you codify.

The Anthropic Institute has no separate legal identity. It has no independent board. It has no published charter or bylaws. It has no editorial independence guarantee — no public statement from anyone at Anthropic confirming that the Institute can publish findings that contradict the company's business interests. It is funded entirely from Anthropic's operating budget. It is led by a co-founder.

Every genuinely independent research institution I could find has structural features the Institute lacks. RAND negotiated formal separation from Douglas Aircraft at its founding — specifically so its research couldn't be controlled by its creator. Brookings publishes an explicit policy: scholars have "final authority and responsibility for their work" and the institution "accepts funding only from donors who do not seek to compromise scholars' research findings." The Santa Fe Institute deliberately chose diversified funding from foundations and government agencies to prevent single-sponsor dependency.

The Anthropic Institute has none of this. It structurally resembles Google DeepMind or Microsoft Research — an internal corporate division with a public-facing research mandate. There's nothing wrong with corporate research labs. But there is something wrong with presenting one as an accountability mechanism when it lacks every structural feature that would make accountability enforceable.

The Long-Term Benefit Trust was supposed to bridge this gap. Created in September 2023, the LTBT holds special stock that lets it appoint an increasing number of board members over time. But the Trust Agreement is unpublished. A supermajority of stockholders can amend it. Analysis from LessWrong and Harvard Law suggests the Trust may be "powerless" and "quite subordinate to stockholders." And the Trust has no published operational relationship to the Institute — it governs board composition, not research independence.

Two weeks before the Institute was announced, Anthropic released version 3.0 of its Responsible Scaling Policy. The previous version contained hard commitments to pause model development if risks exceeded safety thresholds. The new version replaced those with "nonbinding but publicly-declared goals." The company's Chief Science Officer explained: "We felt that it wouldn't actually help anyone for us to stop training AI models."

That revision is the test case. If the Anthropic Institute were an accountability mechanism, it would have the structural power to challenge a decision like that — to publish findings that say the RSP softening increases risk, that the business rationale doesn't override the safety rationale. But the Institute wasn't designed to do that. It was designed to research AI's impact on society. Researching impact and enforcing accountability are different functions, and the Institute only has the mandate for one of them.

— — —

The rating and the Institute are doing the same thing from different directions. The rating signals safety. The Institute signals accountability. Both operate in the space between what Anthropic signals and what Anthropic's structures can enforce.

The Common Sense Media rating measured behavioral compliance at a snapshot in time. It did not — and structurally could not — measure whether the institution behind the model would maintain the policies that earned the score. It didn't, and the rating sits unchanged.

The Institute assembles credible researchers and gives them a public mandate. It does not give them structural independence. The absence of governance documents isn't a gap that might be filled later. It's a design choice that preserves corporate control. An organization that wanted independence would have built it at founding.

None of this means Anthropic is acting in bad faith. The hires are real. The research agenda is real. The commitment to studying AI's societal impact is more than most companies offer. But aspiration is not governance. Signaling is not structure. And the fact that Anthropic's governance apparatus is more elaborate than its competitors' doesn't mean it has teeth.

The question isn't whether Anthropic wants to be accountable. The question is whether any mechanism exists that can hold them to it when their stated commitments conflict with their business interests.

So far, the answer is: the training data policy reversed without structural resistance. The RSP softened without structural resistance. The Institute launched without structural independence. The rating persists without structural update.

Aspiration all the way down.

And nobody checking whether the floor is still there.