<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Performance on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/performance/</link><description>Recent content in Performance on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 03 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/performance/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Code Cache Bug Analysis: 7 Confirmed Bugs and Their Impact</title><link>https://ice-ice-bear.github.io/posts/2026-04-03-claude-code-cache-analysis/</link><pubDate>Fri, 03 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-03-claude-code-cache-analysis/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Claude Code Cache Bug Analysis: 7 Confirmed Bugs and Their Impact" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;On April 1, 2026, a developer using the Claude Code Max 20 plan ($200/month) burned through 100% of their usage in roughly 70 minutes during a normal coding session. JSONL log analysis revealed an average cache read ratio of 36.1% (minimum 21.1%) — far below the 90%+ that should be expected. Every token was billed at full price.&lt;/p&gt;
&lt;p&gt;That incident gave rise to &lt;a class="link" href="https://github.com/ArkNill/claude-code-cache-analysis" target="_blank" rel="noopener"
 &gt;ArkNill/claude-code-cache-analysis&lt;/a&gt;: a community-driven investigation that grew from personal debugging into a systematic, proxy-measured analysis confirming &lt;strong&gt;7 bugs across 5 layers&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="background-a-plan-drained-in-70-minutes"&gt;Background: A Plan Drained in 70 Minutes
&lt;/h2&gt;&lt;p&gt;The immediate workaround was downgrading from v2.1.89 to v2.1.68 (npm). Cache read immediately recovered to &lt;strong&gt;97.6% average&lt;/strong&gt; (119 entries), confirming the regression was v2.1.89-specific.&lt;/p&gt;
&lt;p&gt;A transparent monitoring proxy (cc-relay) was then configured using the &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; environment variable to capture per-request data. Combined with reports from 91+ related GitHub issues and contributors including &lt;a class="link" href="https://github.com/Sn3th" target="_blank" rel="noopener"
 &gt;@Sn3th&lt;/a&gt;, &lt;a class="link" href="https://github.com/rwp65" target="_blank" rel="noopener"
 &gt;@rwp65&lt;/a&gt;, and a dozen others, the scattered findings were consolidated into structured, measured analysis.&lt;/p&gt;
&lt;h2 id="the-7-confirmed-bugs-as-of-v2191"&gt;The 7 Confirmed Bugs (as of v2.1.91)
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 A["Claude Code Request"] --&gt; B{"Version Check"}
 B --&gt;|"v2.1.89 standalone"| C["B1: Sentinel &amp;lt;br/&amp;gt; Cache prefix corruption &amp;lt;br/&amp;gt; → 4-17% cache read"]
 B --&gt;|"--resume flag"| D["B2: Resume &amp;lt;br/&amp;gt; Full context replayed uncached &amp;lt;br/&amp;gt; → 20x cost per resume"]
 B --&gt;|"v2.1.91"| E["Cache normal: 95-99%"]
 E --&gt; F{"Still active bugs"}
 F --&gt; G["B3: False RL &amp;lt;br/&amp;gt; Fake rate limit error &amp;lt;br/&amp;gt; 0 API calls made"]
 F --&gt; H["B4: Microcompact &amp;lt;br/&amp;gt; Tool results silently cleared &amp;lt;br/&amp;gt; mid-session"]
 F --&gt; I["B5: Budget Cap &amp;lt;br/&amp;gt; 200K aggregate limit &amp;lt;br/&amp;gt; → truncated to 1-41 chars"]
 F --&gt; J["B8: Log Inflation &amp;lt;br/&amp;gt; JSONL entry duplication &amp;lt;br/&amp;gt; → 2.87x local inflation"]&lt;/pre&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Bug&lt;/th&gt;
 &lt;th&gt;What It Does&lt;/th&gt;
 &lt;th&gt;Impact&lt;/th&gt;
 &lt;th&gt;Status (v2.1.91)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;B1&lt;/strong&gt; Sentinel&lt;/td&gt;
 &lt;td&gt;Standalone binary corrupts cache prefix&lt;/td&gt;
 &lt;td&gt;4-17% cache read (v2.1.89)&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Fixed&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;B2&lt;/strong&gt; Resume&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;--resume&lt;/code&gt; replays full context uncached&lt;/td&gt;
 &lt;td&gt;20x cost per resume&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Fixed&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;B3&lt;/strong&gt; False RL&lt;/td&gt;
 &lt;td&gt;Client blocks API calls with fake error&lt;/td&gt;
 &lt;td&gt;Instant &amp;ldquo;Rate limit reached&amp;rdquo;, 0 API calls&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Unfixed&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;B4&lt;/strong&gt; Microcompact&lt;/td&gt;
 &lt;td&gt;Tool results silently cleared mid-session&lt;/td&gt;
 &lt;td&gt;Context quality degrades&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Unfixed&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;B5&lt;/strong&gt; Budget Cap&lt;/td&gt;
 &lt;td&gt;200K aggregate limit on tool results&lt;/td&gt;
 &lt;td&gt;Older results truncated to 1-41 chars&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Unfixed&lt;/strong&gt; (MCP override only)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;B8&lt;/strong&gt; Log Inflation&lt;/td&gt;
 &lt;td&gt;Extended thinking duplicates JSONL entries&lt;/td&gt;
 &lt;td&gt;2.87x local token inflation&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Unfixed&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Server&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Peak-hour limits tightened + 1M billing bug&lt;/td&gt;
 &lt;td&gt;Reduced effective quota&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;By design&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="key-bug-deep-dives"&gt;Key Bug Deep Dives
&lt;/h2&gt;&lt;h3 id="b1-sentinel-bug-fixed"&gt;B1: Sentinel Bug (Fixed)
&lt;/h3&gt;&lt;p&gt;Claude Code ships in two forms. The standalone binary is a single ELF 64-bit executable (~228MB) with an embedded Bun runtime. It contained a Sentinel replacement mechanism (&lt;code&gt;cch=00000&lt;/code&gt;) that corrupted cache prefixes — causing dramatically low cache read rates.&lt;/p&gt;
&lt;p&gt;The npm package (&lt;code&gt;cli.js&lt;/code&gt;, ~13MB, executed by Node.js) does not contain this logic and was immune to Bug 1.&lt;/p&gt;
&lt;p&gt;In v2.1.91, routing &lt;code&gt;stripAnsi&lt;/code&gt; through &lt;code&gt;Bun.stripANSI&lt;/code&gt; appears to have closed the Sentinel gap. &lt;strong&gt;Both npm and standalone now achieve identical 84.7% cold-start cache read.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="b2-resume-bug-fixed"&gt;B2: Resume Bug (Fixed)
&lt;/h3&gt;&lt;p&gt;Using &lt;code&gt;--resume&lt;/code&gt; caused the entire conversation context to be sent as billable input with no cache benefit — up to 20x the expected cost per resume. Fixed in v2.1.91&amp;rsquo;s transcript chain break patch, but &lt;strong&gt;avoiding &lt;code&gt;--resume&lt;/code&gt; and &lt;code&gt;--continue&lt;/code&gt; entirely is still the recommended approach.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="b3-false-rate-limiting-unfixed"&gt;B3: False Rate Limiting (Unfixed)
&lt;/h3&gt;&lt;p&gt;The client generates &amp;ldquo;Rate limit reached&amp;rdquo; errors locally without ever making an API call. Measured across 151 entries / 65 sessions. The session appears throttled while the API has not been contacted at all.&lt;/p&gt;
&lt;h3 id="b4--b5-microcompact-and-budget-cap-unfixed"&gt;B4 &amp;amp; B5: Microcompact and Budget Cap (Unfixed)
&lt;/h3&gt;&lt;p&gt;Tool results are silently deleted mid-session (327 events detected), and a 200K aggregate limit causes older file read results to be truncated to 1-41 characters. &lt;strong&gt;After approximately 15-20 tool uses, earlier context is effectively gone without any warning.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="cache-ttl-not-a-bug"&gt;Cache TTL (Not a Bug)
&lt;/h3&gt;&lt;p&gt;Idle gaps of 13+ hours cause a full cache rebuild on resume. Cache write costs $3.75/M versus read at $0.30/M — a 12.5x difference. Shorter gaps (5-26 minutes) maintain 96%+ cache. This is by design (5-minute TTL), not a bug — but worth understanding.&lt;/p&gt;
&lt;h2 id="npm-vs-standalone-v2190-benchmark"&gt;npm vs Standalone: v2.1.90 Benchmark
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;npm&lt;/th&gt;
 &lt;th&gt;Standalone&lt;/th&gt;
 &lt;th&gt;Winner&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Overall cache read %&lt;/td&gt;
 &lt;td&gt;86.4%&lt;/td&gt;
 &lt;td&gt;86.2%&lt;/td&gt;
 &lt;td&gt;Tie&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Stable session&lt;/td&gt;
 &lt;td&gt;95-99.8%&lt;/td&gt;
 &lt;td&gt;95-99.7%&lt;/td&gt;
 &lt;td&gt;Tie&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Sub-agent cold start&lt;/td&gt;
 &lt;td&gt;79-87%&lt;/td&gt;
 &lt;td&gt;47-67%&lt;/td&gt;
 &lt;td&gt;npm&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Sub-agent warmed (5+ req)&lt;/td&gt;
 &lt;td&gt;87-94%&lt;/td&gt;
 &lt;td&gt;94-99%&lt;/td&gt;
 &lt;td&gt;Tie&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Usage for full test suite&lt;/td&gt;
 &lt;td&gt;7% of Max 20&lt;/td&gt;
 &lt;td&gt;5% of Max 20&lt;/td&gt;
 &lt;td&gt;Tie&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In v2.1.91, the sub-agent cold start gap is also closed. &lt;strong&gt;Both achieve 84.7% cold-start cache read identically.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="anthropics-official-position"&gt;Anthropic&amp;rsquo;s Official Position
&lt;/h2&gt;&lt;p&gt;Lydia Hallie from Anthropic posted on X (April 2):&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&amp;ldquo;Peak-hour limits are tighter and 1M-context sessions got bigger, that&amp;rsquo;s most of what you&amp;rsquo;re feeling. We fixed a few bugs along the way, but none were over-charging you.&amp;rdquo;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;She recommended using Sonnet as default, lowering effort level, starting fresh instead of resuming, and capping context with &lt;code&gt;CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The analysis agrees that the cache bugs are fixed, but identifies five additional active mechanisms that Anthropic&amp;rsquo;s statement does not address.&lt;/p&gt;
&lt;h2 id="what-you-can-do-right-now"&gt;What You Can Do Right Now
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Update to v2.1.91&lt;/strong&gt; — fixes the cache regression responsible for the worst drain&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;npm and standalone are equivalent on v2.1.91&lt;/strong&gt; — either install method is fine&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do not use &lt;code&gt;--resume&lt;/code&gt; or &lt;code&gt;--continue&lt;/code&gt;&lt;/strong&gt; — replays full context as billable input&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Start fresh sessions periodically&lt;/strong&gt; — the 200K tool result cap (B5) means older file reads silently truncate after ~15-20 tool uses&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avoid &lt;code&gt;/dream&lt;/code&gt; and &lt;code&gt;/insights&lt;/code&gt;&lt;/strong&gt; — silent background API calls that consume quota&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-jsonc" data-lang="jsonc"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// ~/.claude/settings.json — disable auto-update
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;env&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;DISABLE_AUTOUPDATER&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="closing-thoughts"&gt;Closing Thoughts
&lt;/h2&gt;&lt;p&gt;This analysis is a strong example of community-driven debugging at its best. A simple transparent proxy via &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt;, combined with systematic testing across v2.1.89 through v2.1.91, produced measured evidence behind phenomena reported across 91+ GitHub issues.&lt;/p&gt;
&lt;p&gt;The cache bugs (B1, B2) are fixed in v2.1.91. The remaining five bugs are still active. For Max plan users, applying the practical mitigations above and pinning a validated version with &lt;code&gt;DISABLE_AUTOUPDATER&lt;/code&gt; is the most reliable defensive posture until Anthropic addresses the remaining issues.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Source repository: &lt;a class="link" href="https://github.com/ArkNill/claude-code-cache-analysis" target="_blank" rel="noopener"
 &gt;ArkNill/claude-code-cache-analysis&lt;/a&gt;&lt;/p&gt;</description></item></channel></rss>