Anthropic Reverses Claude Fable 5's Secret AI Research Sabotage Policy
Anthropic Walks Back 'Secret Sabotage' Policy After Researcher Outcry
Anthropic has reversed a controversial policy embedded in its newly launched Claude Fable 5 model that would have covertly degraded the AI's responses when it detected users working on frontier AI development tasks — without notifying those users. According to Wired, the company changed course after researchers publicly spoke out against the practice, which critics had labeled "secret sabotage." The reversal comes just days after Claude Fable 5 launched on June 9, 2026 — Anthropic's first publicly available Mythos-class model and a release timed less than two weeks after the company confidentially filed for IPO paperwork.
The policy, disclosed in a passage buried deep within Fable 5's 319-page system card, would have allowed the model to silently downgrade its own outputs when it identified requests related to pretraining pipelines, distributed training infrastructure, and ML accelerator design. The backlash was swift, broad, and came from researchers, open-source advocates, policy experts, and former Anthropic employees alike.
What the System Card Actually Said — and Why It Was Different
Unlike Fable 5's other content restrictions, the AI research intervention was explicitly described in the system card as "not visible to the user." When users submitted cybersecurity or biology-related queries that crossed certain thresholds, Fable 5 visibly redirected them to the less-capable Claude Opus 4.8 with a notification. The AI research restriction worked differently: users would receive a response, but that response would be quietly degraded through methods the system card identified as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).
Anthropic estimated the covert restriction would affect roughly 0.03% of traffic and justified it in the system card, stating that "enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms."
That rationale did little to slow the criticism. The key objection was not about what was being restricted, but how: a user asking a legitimate research question would have no way of knowing they were receiving a deliberately weakened answer. Critics argued this fundamentally undermined user trust and the integrity of the tool itself.
Cybersecurity researchers identified similar ambiguity in adjacent restrictions. According to TechCrunch, cybersecurity veteran Matt Suiche described how the guardrails created unintended collateral damage: "if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded."

The Backlash: Researchers, Policy Experts, and Former Insiders Push Back
The criticism came from multiple directions within days of launch. Nathan Lambert, an open-model researcher who most recently led work at AI2, was among the first to go public with a pointed reaction. "To have my access to the cutting edge models for my work rug pulled in an under the table fashion is appalling," Lambert said, as reported by Fortune.
Dean Ball, a senior fellow at the Foundation for American Innovation and former White House OSTP advisor, coined the phrase "secret sabotage" to describe the covert degradation mechanism — a term that quickly took hold in coverage across the industry.
Among the most striking critics was Behnam Neyshabur, who had previously co-led Anthropic's own effort to develop an AI scientist before publicly distancing himself from the policy. In a post on X cited by Fortune, Neyshabur wrote: "In my view, concentrating these capabilities fundamentally slows scientific and technological progress and is net negative for humanity."
Jeremy Howard, head of the nonprofit research group Fast AI, framed the issue in terms of structural power. "They've said they'll sabotage others who try. This means the AI frontier advances, and power imbalance increases," Howard wrote, as reported by Fortune and Technobezz. His argument pointed to a specific asymmetry: Anthropic's own researchers retained full access to Fable 5's capabilities, while external researchers working on competing or complementary AI systems would have received silently degraded outputs.
A Launch Already Under Fire on Multiple Fronts
The covert AI research restriction was not the only controversy surrounding Fable 5's launch. According to Decrypt and Yahoo Tech, the model launched with three major complaints from users and enterprise customers: a high token-burn rate; the secret self-sabotage for certain research tasks; and a mandatory 30-day data retention policy applied across all traffic on Mythos-class models — including on third-party platforms such as AWS Bedrock and Google Vertex AI, with no exceptions.
The data retention policy had immediate enterprise consequences. According to PYMNTS, citing Reuters and The Verge, Microsoft moved to limit its employees' internal use of Claude Fable 5 while its legal teams evaluated the new requirements.
Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens, per Anthropic's official launch blog post. Claude Mythos 5, the more restricted sibling model, remains limited to Project Glasswing partners and select biology researchers. Project Glasswing is a program Anthropic developed in partnership with the U.S. government focused on cyberdefense and critical infrastructure protection.
Despite the controversies, Fable 5 was also recognized for technical achievements. Anthropic's head of product management, research, and labs, Dianne Na Penn, stated before launch that Fable 5 produces frontier performance that was 10 to 20 points higher than Opus 4.8 or other frontier models. Independent red-teaming spanning over 1,000 hours found no universal jailbreaks, according to Anthropic's system card. During testing in April 2026, Mythos-class models flagged over 23,000 critical vulnerabilities across major code repositories, per Cryptobriefing.
Na Penn also acknowledged, ahead of launch, that some benign requests would initially be blocked and said the company was working on post-launch safeguard improvements — a statement that in hindsight read as a preview of the friction that followed.

Why This Matters: Trust, Transparency, and the IPO Moment
The timing of the controversy is notable. Anthropic launched Fable 5 just over a week after confidentially filing IPO paperwork, placing the company under heightened scrutiny from enterprise customers, institutional investors, and the broader research community simultaneously. A policy that covertly limits an AI model's outputs — disclosed only to those willing to parse a 319-page document — raises questions about how AI companies communicate the boundaries of their products to the people relying on them.
The episode also surfaces a broader tension in the AI industry: frontier labs building the most powerful models occupy a structurally privileged position when it comes to AI research itself. If a company can restrict external researchers' ability to build competing models using its own tools, while retaining those same capabilities internally, the competitive implications extend well beyond any individual product launch.
Anthropic's response — reversing the policy after public pressure — suggests the company recognized the reputational risk of the covert approach, regardless of whether its original intent was defensible. What the reversal does not resolve is the question of how such a policy made it into a public system card in the first place, and what other restrictions may exist that are described as less visible or not documented at all.
The Cyber Verification Program, which The Register noted requires cybersecurity professionals to apply for fewer model limitations, remains in place. TechCrunch reported that OpenAI operates a similar program called Trusted Access for Cyber. These tiered access systems reflect a wider industry trend toward differentiated model access — one where the default user and the verified professional receive meaningfully different tools.
What Happens Next
Anthropic has reversed the covert AI research restriction, per Wired, but the company has not publicly detailed what, if anything, replaces it. The mandatory 30-day data retention policy for Mythos-class models — the provision that triggered Microsoft's internal restrictions — has not been reported as changed. Claude Mythos 5 continues to be available only to vetted Project Glasswing partners and select biology researchers, preserving the most capable version of the model behind controlled access.
For enterprise customers, the coming weeks will likely determine whether Microsoft and others resume unrestricted internal use of Fable 5, or whether the retention requirements remain a structural barrier to adoption. For researchers in AI development, the policy reversal offers an immediate reprieve, but the underlying question — whether a commercial AI lab can and should restrict external AI research using its own models — is unlikely to be settled by a single course correction.
The 319-page system card remains public. Researchers and legal teams across the industry are almost certainly still reading it.
For more tech news, visit our news section.
Stay Informed on AI Developments That Affect Your Work
The Claude Fable 5 controversy is a reminder that the AI tools shaping how we work, research, and build are still governed by policies that most users never see — and that those policies can change overnight. Whether you are a developer, researcher, or knowledge worker, staying ahead of how these systems actually behave is part of working smarter in 2026. Moccet tracks the AI and productivity developments that matter most to how you perform at your best. Join the Moccet waitlist to stay ahead of the curve.