<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Sre on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/sre/</link><description>Recent content in Sre on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sun, 10 May 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/sre/index.xml" rel="self" type="application/rss+xml"/><item><title>Anthropic's April 23 Postmortem — Three Overlapping Regressions and What Engineers on Claude Should Take Away</title><link>https://ice-ice-bear.github.io/posts/2026-05-10-anthropic-april-23-postmortem/</link><pubDate>Sun, 10 May 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-05-10-anthropic-april-23-postmortem/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Anthropic's April 23 Postmortem — Three Overlapping Regressions and What Engineers on Claude Should Take Away" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://www.anthropic.com/engineering/april-23-postmortem" target="_blank" rel="noopener"
 &gt;Anthropic&amp;rsquo;s April 23 postmortem&lt;/a&gt; attributes a month of Claude Code quality complaints to &lt;strong&gt;three independent product-layer changes&lt;/strong&gt;, not to the API or inference fleet. It&amp;rsquo;s not a capacity or region outage, but the failure modes — silent default changes, an off-by-N caching bug, and a single system-prompt line causing a 3% eval drop — are the LLM analogue of classic SRE failure patterns. Anyone building on shared model infrastructure should read it twice.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 Trigger["User reports accumulate &amp;lt;br/&amp;gt; early March"] --&gt; Investigate["Signals not separable &amp;lt;br/&amp;gt; internal use/evals fail to reproduce"]
 Investigate --&gt; C1["Cause 1: reasoning effort default &amp;lt;br/&amp;gt; high → medium (3/4)"]
 Investigate --&gt; C2["Cause 2: thinking-clear bug on idle sessions &amp;lt;br/&amp;gt; (3/26)"]
 Investigate --&gt; C3["Cause 3: verbosity system prompt &amp;lt;br/&amp;gt; (4/16)"]
 C1 --&gt; F1["4/7 rollback: xhigh/high defaults"]
 C2 --&gt; F2["4/10 v2.1.101: clear runs once"]
 C3 --&gt; F3["4/20 v2.1.116: prompt removed"]
 F1 --&gt; Reset["4/23 reset usage limits &amp;lt;br/&amp;gt; + new governance"]
 F2 --&gt; Reset
 F3 --&gt; Reset&lt;/pre&gt;&lt;p&gt;All three issues hit &lt;a class="link" href="https://docs.claude.com/en/docs/claude-code/overview" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt;, the &lt;a class="link" href="https://docs.claude.com/en/docs/agent-sdk" target="_blank" rel="noopener"
 &gt;Claude Agent SDK&lt;/a&gt;, and Claude Cowork. The &lt;a class="link" href="https://docs.claude.com/en/api/messages" target="_blank" rel="noopener"
 &gt;Messages API&lt;/a&gt; was untouched. That the signal stayed muddy for six weeks is the bigger story.&lt;/p&gt;
&lt;h2 id="1-default-reasoning-effort-high--medium-mar-4"&gt;1. Default reasoning effort: high → medium (Mar 4)
&lt;/h2&gt;&lt;p&gt;When &lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-6" target="_blank" rel="noopener"
 &gt;Opus 4.6 shipped in Claude Code&lt;/a&gt; it defaulted to &lt;code&gt;high&lt;/code&gt;. Tail-latency complaints (UI appearing frozen) accumulated. Anthropic&amp;rsquo;s internal evals showed &lt;code&gt;medium&lt;/code&gt; sitting at a better operating point on the latency-vs-intelligence curve:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&amp;ldquo;In our internal evals and testing, medium effort achieved slightly lower intelligence with significantly less latency for the majority of tasks.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;User feedback disagreed. As good UX dictates, most users stayed on the default rather than reaching for &lt;code&gt;/effort&lt;/code&gt; — so a &amp;ldquo;slightly lower&amp;rdquo; eval delta translated into a much larger perceived quality drop in the wild. On April 7 the change was reverted; &lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-7" target="_blank" rel="noopener"
 &gt;Opus 4.7&lt;/a&gt; now defaults to &lt;code&gt;xhigh&lt;/code&gt;, everything else to &lt;code&gt;high&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Takeaway.&lt;/strong&gt; Moving a default operating point on a model&amp;rsquo;s &lt;a class="link" href="https://arxiv.org/abs/2408.03314" target="_blank" rel="noopener"
 &gt;test-time compute curve&lt;/a&gt; is one of the easiest ways to ship a silent quality regression. Internal evals undercount the human-perceived gap because most users never change defaults — defaults &lt;em&gt;are&lt;/em&gt; the product promise.&lt;/p&gt;
&lt;h2 id="2-a-caching-optimization-that-dropped-thinking-history-every-turn-mar-26"&gt;2. A caching optimization that dropped thinking history every turn (Mar 26)
&lt;/h2&gt;&lt;p&gt;This is the most technically interesting failure. Anthropic leans hard on &lt;a class="link" href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" target="_blank" rel="noopener"
 &gt;prompt caching&lt;/a&gt; — the team literally wrote &lt;a class="link" href="https://claude.com/blog/lessons-from-building-claude-code-prompt-caching-is-everything" target="_blank" rel="noopener"
 &gt;&amp;ldquo;prompt caching is everything&amp;rdquo;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The intent was clean: when a session has been &lt;strong&gt;idle for more than an hour&lt;/strong&gt; and is bound for a cache miss anyway, prune older thinking blocks to reduce uncached tokens at resume time. They reached for &lt;a class="link" href="https://docs.claude.com/en/docs/build-with-claude/context-editing" target="_blank" rel="noopener"
 &gt;the &lt;code&gt;clear_thinking_20251015&lt;/code&gt; context-editing strategy&lt;/a&gt; with &lt;code&gt;keep:1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The bug.&lt;/strong&gt; Instead of running once when an idle session resumed, the clear header was attached to &lt;strong&gt;every subsequent request for the rest of the session&lt;/strong&gt;. Each request told the API to keep only the most recent reasoning block and discard the rest. If a follow-up arrived mid-tool-use, even the current turn&amp;rsquo;s reasoning got dropped. Claude kept executing, but increasingly without memory of why it had picked the actions it had — surfacing as the forgetfulness, repetition, and odd tool choices users reported.&lt;/p&gt;
&lt;p&gt;A secondary effect: every such request became a cache miss, which is what drove the parallel reports of &lt;strong&gt;usage limits draining unexpectedly fast&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="why-it-slipped-through"&gt;Why it slipped through
&lt;/h3&gt;
 &lt;blockquote&gt;
 &lt;p&gt;&amp;ldquo;The changes it introduced made it past multiple human and automated code reviews, as well as unit tests, end-to-end tests, automated verification, and dogfooding.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;Three coincidences combined:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;An &lt;strong&gt;internal-only message-queuing experiment&lt;/strong&gt; running concurrently muddied the signal&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;orthogonal change to thinking display&lt;/strong&gt; suppressed the bug in most CLI sessions&lt;/li&gt;
&lt;li&gt;The trigger was a &lt;strong&gt;stale-session corner case&lt;/strong&gt; that didn&amp;rsquo;t reproduce in dogfooding&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After the fact, Anthropic back-tested &lt;a class="link" href="https://code.claude.com/docs/en/code-review" target="_blank" rel="noopener"
 &gt;Claude Code Review&lt;/a&gt; on the offending PRs: &lt;strong&gt;Opus 4.7 found the bug when given enough repo context, Opus 4.6 did not.&lt;/strong&gt; One of the committed follow-ups is to ship multi-repo context support in Code Review to customers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Takeaway.&lt;/strong&gt; Don&amp;rsquo;t watch cache hit rate purely as a cost metric. A &lt;strong&gt;sudden jump in cache misses&lt;/strong&gt; is a first-class signal of a context-management regression. Memory/reasoning-preservation code lures unit tests into false confidence — your multi-turn integration tests should explicitly assert how context evolves as turn count grows.&lt;/p&gt;
&lt;h2 id="3-one-system-prompt-line-cost-3-of-evals-apr-16"&gt;3. One system-prompt line cost 3% of evals (Apr 16)
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-7" target="_blank" rel="noopener"
 &gt;Opus 4.7&amp;rsquo;s launch post&lt;/a&gt; calls out a verbose tendency in the new model — smarter on hard problems, more output tokens. Anthropic worked the problem across training, prompting, and product UX. One line in the system prompt did outsized damage:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;&amp;ldquo;Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;The eval set in use during pre-release testing showed no regression, so it shipped on April 16. Post-incident ablation against a broader eval suite showed a &lt;strong&gt;3% drop on both Opus 4.6 and Opus 4.7&lt;/strong&gt;. Reverted in v2.1.116 on April 20.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Takeaway.&lt;/strong&gt; A single system-prompt line is a &lt;a class="link" href="https://martinfowler.com/articles/feature-toggles.html" target="_blank" rel="noopener"
 &gt;globally-applied config change&lt;/a&gt;, not an experiment. The same line affects each model differently — hence Anthropic&amp;rsquo;s new CLAUDE.md guidance that &lt;strong&gt;&amp;ldquo;model-specific changes are gated to the specific model they&amp;rsquo;re targeting.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="why-detection-took-a-month--anatomy-of-a-signal-separation-failure"&gt;Why detection took a month — anatomy of a signal-separation failure
&lt;/h2&gt;&lt;p&gt;Three changes, three rollout schedules, three different traffic slices:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Change&lt;/th&gt;
 &lt;th&gt;Affected models&lt;/th&gt;
 &lt;th&gt;Traffic slice&lt;/th&gt;
 &lt;th&gt;Time to find&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;effort default&lt;/td&gt;
 &lt;td&gt;Sonnet 4.6, Opus 4.6&lt;/td&gt;
 &lt;td&gt;default-mode users (majority)&lt;/td&gt;
 &lt;td&gt;~5 weeks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;thinking-clear bug&lt;/td&gt;
 &lt;td&gt;Sonnet 4.6, Opus 4.6&lt;/td&gt;
 &lt;td&gt;sessions resumed after 1hr idle&lt;/td&gt;
 &lt;td&gt;~2 weeks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;verbosity prompt&lt;/td&gt;
 &lt;td&gt;Sonnet 4.6, Opus 4.6, Opus 4.7&lt;/td&gt;
 &lt;td&gt;everything after Opus 4.7 ship&lt;/td&gt;
 &lt;td&gt;~4 days&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Each cohort suffered differently, and the aggregate looked like &lt;strong&gt;&amp;ldquo;broad, inconsistent degradation&amp;rdquo;&lt;/strong&gt; — the worst pattern for an incident commander to disentangle. Alongside, the community surfaced detailed external audits (e.g., &lt;a class="link" href="https://venturebeat.com/technology/mystery-solved-anthropic-reveals-changes-to-claudes-harnesses-and-operating-instructions-likely-caused-degradation" target="_blank" rel="noopener"
 &gt;Stella Laurenzo&amp;rsquo;s analysis of 6,852 session files and 234,000 tool calls&lt;/a&gt;) that became forcing functions.&lt;/p&gt;
&lt;p&gt;The &lt;a class="link" href="https://sre.google/sre-book/managing-incidents/" target="_blank" rel="noopener"
 &gt;Google SRE chapter on managing incidents&lt;/a&gt; frames &amp;ldquo;distinguish signal from noise&amp;rdquo; as the first job; for LLM products it gets harder because user satisfaction is inherently distributional. Reports right after a change blend &lt;a class="link" href="https://en.wikipedia.org/wiki/Confirmation_bias" target="_blank" rel="noopener"
 &gt;confirmation bias&lt;/a&gt; with real regressions.&lt;/p&gt;
&lt;h2 id="what-anthropic-committed-to-going-forward"&gt;What Anthropic committed to going forward
&lt;/h2&gt;&lt;p&gt;From the postmortem&amp;rsquo;s &amp;ldquo;Going forward&amp;rdquo; section:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Have more internal staff use the exact public Claude Code build&lt;/strong&gt; rather than the feature-test build — closing the dogfooding gap&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ship the internal Code Review improvements&lt;/strong&gt; (additional repo context) to customers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Per-model evals required for every system-prompt change&lt;/strong&gt;, with ablation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New tooling to review and audit prompt changes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CLAUDE.md guidance&lt;/strong&gt; to gate model-specific changes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Soak periods + gradual rollouts&lt;/strong&gt; for any change that could trade off against intelligence&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a class="link" href="https://twitter.com/ClaudeDevs" target="_blank" rel="noopener"
 &gt;@ClaudeDevs on X&lt;/a&gt; and GitHub&lt;/strong&gt; as centralized comm channels&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Compared with &lt;a class="link" href="https://status.openai.com/history" target="_blank" rel="noopener"
 &gt;OpenAI&amp;rsquo;s public incident pattern on its status page&lt;/a&gt; — mostly availability and latency events — Anthropic is unusual in formally extending the incident surface to include &lt;strong&gt;quality regressions&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="what-this-means-if-you-build-on-claude-or-any-frontier-api"&gt;What this means if you build on Claude (or any frontier API)
&lt;/h2&gt;&lt;p&gt;The blast radius of &lt;a class="link" href="https://en.wikipedia.org/wiki/Blast_radius_%28software%29" target="_blank" rel="noopener"
 &gt;shared infrastructure&lt;/a&gt; now includes &lt;strong&gt;harness and system prompt&lt;/strong&gt;, not just model weights. As a downstream operator:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Regression-test the output distribution.&lt;/strong&gt; Beyond latency and error rate, baseline &lt;strong&gt;token distribution, tool-call patterns, response lengths&lt;/strong&gt; and diff them daily. LLM eval platforms like &lt;a class="link" href="https://docs.smith.langchain.com/" target="_blank" rel="noopener"
 &gt;LangSmith&lt;/a&gt; and &lt;a class="link" href="https://www.braintrust.dev/" target="_blank" rel="noopener"
 &gt;Braintrust&lt;/a&gt; exist for this.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature-flag your own &lt;a class="link" href="https://martinfowler.com/articles/feature-toggles.html" target="_blank" rel="noopener"
 &gt;prompt changes&lt;/a&gt;.&lt;/strong&gt; When your changes and the vendor&amp;rsquo;s overlap in time, signal separation becomes nearly impossible.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Plan for multi-provider routing.&lt;/strong&gt; Tools like &lt;a class="link" href="https://docs.litellm.ai/" target="_blank" rel="noopener"
 &gt;LiteLLM&lt;/a&gt;, &lt;a class="link" href="https://openrouter.ai/" target="_blank" rel="noopener"
 &gt;OpenRouter&lt;/a&gt;, and &lt;a class="link" href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener"
 &gt;AWS Bedrock&lt;/a&gt; let you fail over models. Single-vendor dependence creates exactly this &amp;ldquo;all users simultaneously worse&amp;rdquo; pattern.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elevate cache hit rate to a real SLI.&lt;/strong&gt; Sudden miss-rate jumps are both a cost signal and a &lt;strong&gt;context-management regression signal&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Idempotent retries + circuit breakers&lt;/strong&gt; still apply. &lt;a class="link" href="https://github.com/App-vNext/Polly" target="_blank" rel="noopener"
 &gt;Polly&lt;/a&gt; and &lt;a class="link" href="https://github.com/resilience4j/resilience4j" target="_blank" rel="noopener"
 &gt;resilience4j&lt;/a&gt; patterns work for LLM clients too — just budget for retries doubling token spend.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Combine user feedback with quantitative metrics.&lt;/strong&gt; Free-text reports are leading indicators of unseparated quality regressions, not noise to discard.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;All three causes are LLM-flavored versions of textbook operational failures. (1) A default change broke implicit user-behavior assumptions. (2) A classic &lt;a class="link" href="https://en.wikipedia.org/wiki/Off-by-one_error" target="_blank" rel="noopener"
 &gt;off-by-N bug&lt;/a&gt; sat deep in caching-optimization code and survived every layer of review and testing. (3) Eval-set coverage wasn&amp;rsquo;t broad enough to catch a 3% regression from one system-prompt line. &lt;strong&gt;Nothing here is new.&lt;/strong&gt; What&amp;rsquo;s new is the diagnostic difficulty. The moment model, harness, and prompt ship as a single bundle to users, overlapping slice regressions don&amp;rsquo;t light up a status page red dot. The controls Anthropic added — required per-model evals, automated ablation, soak periods, narrowing the dogfooding gap — all amount to &lt;strong&gt;&amp;ldquo;apply infrastructure-grade change management to everything that ships besides the model weights.&amp;rdquo;&lt;/strong&gt; Downstream builders should reach the same conclusion. The model is an external variable, but &lt;strong&gt;prompts, routing, and retry policy are ours&lt;/strong&gt;. Without SRE-grade change discipline on our side of the line, we&amp;rsquo;ll inflict our own six-week silent degradation on our own users.&lt;/p&gt;
&lt;h2 id="references"&gt;References
&lt;/h2&gt;&lt;h3 id="primary-anthropic-sources"&gt;Primary Anthropic sources
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/engineering/april-23-postmortem" target="_blank" rel="noopener"
 &gt;An update on recent Claude Code quality reports&lt;/a&gt; — the postmortem itself&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://claude.com/blog/lessons-from-building-claude-code-prompt-caching-is-everything" target="_blank" rel="noopener"
 &gt;Lessons from building Claude Code — prompt caching is everything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-7" target="_blank" rel="noopener"
 &gt;Claude Opus 4.7 launch post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/engineering" target="_blank" rel="noopener"
 &gt;Engineering at Anthropic index&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="anthropic-api-docs"&gt;Anthropic API docs
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://platform.claude.com/docs/en/build-with-claude/extended-thinking" target="_blank" rel="noopener"
 &gt;Extended thinking guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://platform.claude.com/docs/en/build-with-claude/context-editing" target="_blank" rel="noopener"
 &gt;Context editing — &lt;code&gt;clear_thinking_20251015&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" target="_blank" rel="noopener"
 &gt;Prompt caching docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.claude.com/en/api/messages" target="_blank" rel="noopener"
 &gt;Messages API reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.claude.com/en/docs/claude-code/overview" target="_blank" rel="noopener"
 &gt;Claude Code docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="sre--incident-response-background"&gt;SRE / incident-response background
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://sre.google/sre-book/managing-incidents/" target="_blank" rel="noopener"
 &gt;Google SRE Book — Managing Incidents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://martinfowler.com/articles/feature-toggles.html" target="_blank" rel="noopener"
 &gt;Feature Toggles (Martin Fowler)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://arxiv.org/abs/2408.03314" target="_blank" rel="noopener"
 &gt;Scaling Test-Time Compute (Snell et al., 2024)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="external-analysis--comparison"&gt;External analysis / comparison
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://venturebeat.com/technology/mystery-solved-anthropic-reveals-changes-to-claudes-harnesses-and-operating-instructions-likely-caused-degradation" target="_blank" rel="noopener"
 &gt;VentureBeat: Anthropic reveals harness changes likely caused degradation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://status.openai.com/history" target="_blank" rel="noopener"
 &gt;OpenAI status page history&lt;/a&gt; — pattern comparison&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.litellm.ai/" target="_blank" rel="noopener"
 &gt;LiteLLM multi-provider routing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>