Anthropic Postmortem: Three Bugs That Made Claude Feel Dumber

Anthropic’s Postmortem: Three Bugs That Made Claude Feel Dumber for a Month

If you’ve been using Claude Code lately and something felt off — you weren’t crazy. On April 23, 2026, Anthropic published a full engineering postmortem confirming what many developers had been complaining about for weeks: Claude genuinely performed worse. And the root cause was three separate bugs that compounded into what looked like one massive degradation.

Here’s the breakdown.

Bug #1: They Silently Downgraded Claude’s Reasoning (March 4)

On March 4, Anthropic changed Claude Code’s default reasoning effort from “high” to “medium” to reduce latency. Some users were experiencing UI freezes because Opus 4.6 in high-effort mode would think for so long it appeared broken.

The tradeoff made sense on paper: slightly less intelligent responses, but faster and less frustrating. In practice? Users noticed immediately.

“Users began reporting that Claude Code felt less intelligent,” Anthropic acknowledged. They tried adding UI indicators so people could see and change the setting, but most users never touched the default. After a month of feedback, they reversed course on April 7, now defaulting to “xhigh” effort for Opus 4.7.

The takeaway: Never silently reduce a product’s quality to solve a different UX problem. If your users choose the fast lane, fine. But don’t put them there without telling them.

Bug #2: A Caching Bug That Made Claude Forget Its Own Reasoning (March 26)

This one is wild. Anthropic shipped a change to clear older “thinking” from idle sessions — if someone walked away for an hour, pruning old reasoning would save tokens and speed up re-engagement.

Smart idea. Buggy implementation.

Instead of clearing thinking history once after an idle period, the bug cleared it on every single turn for the rest of the session. Claude would reason through a task, make decisions, execute tool calls — and then immediately forget why it made those decisions.

It got worse. The dropped thinking blocks also caused cache misses on every request, which is why users reported usage limits draining faster than expected. Every turn was essentially starting from scratch.

And the kicker: two unrelated internal experiments made it nearly impossible to reproduce the issue. It took Anthropic over a week to identify and confirm the root cause.

The takeaway: Context management bugs in AI systems are uniquely dangerous. A traditional software bug breaks a feature. A context bug breaks the model’s memory — and the symptoms can look like the AI is just… dumber.

Bug #3: A System Prompt Change Capped Responses at 25 Words (April 16)

Anthropic added a system prompt instruction: “keep text between tool calls to 25 words. Keep final responses to 100 words.”

The goal was to reduce verbosity. The result was a measurable 3% drop in coding quality evaluations across both Opus 4.6 and 4.7. Claude became terse to the point of being unhelpful — cutting corners on explanations and skipping important context.

Reverted April 20.

The takeaway: System prompt changes are not trivial configuration tweaks. They’re operating instructions that directly shape output quality. They need the same rigor as code changes.

Why This Was So Hard to Diagnose

Each bug affected different traffic slices on different schedules. The reasoning downgrade hit all users March 4. The caching bug hit stale sessions starting March 26. The verbosity prompt hit everything April 16. The combined effect looked like random, inconsistent degradation — impossible to reproduce, hard to pin down.

Anthropic says their internal usage and evals didn’t initially catch the issues. It was the developer community — particularly detailed technical analyses like Stella Laurenzo’s audit of 6,852 Claude Code session files and 234,000 tool calls — that built the evidence.

What Anthropic Is Doing Differently

The company is:

– Resetting usage limits for all subscribers (as of April 23)

– Running more internal testing before public builds of Claude Code

– Improving Code Review tooling (Opus 4.7 actually caught the caching bug in back-testing when given full repo context)

– Being more rigorous about evaluating system prompt changes

– Launching @ClaudeDevs on X to explain product decisions in depth

My Take

This is rare. I can’t think of another AI company that has published this level of detail about quality regressions. Most companies would silently fix the bugs and move on — or worse, never acknowledge them.

Anthropic’s transparency here is notable. They named the exact dates, the exact changes, and the exact reasoning. They admitted it was “the wrong tradeoff.” They didn’t blame users for being confused.

That said, the damage to trust is real. A month of degraded performance across the most popular AI coding tool — while users were gaslit by some community members telling them it was a “skill issue” — isn’t something a blog post fully fixes. The usage limit reset is a good gesture, but the bar for “don’t silently break our product” shouldn’t be this low.

The bigger lesson: as AI tools become infrastructure, the harness matters as much as the model. The model weights never changed. The API never regressed. But three product-layer changes — a default setting, a caching bug, a system prompt — made the experience measurably worse for weeks. That’s an engineering culture problem, not an AI problem.