<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Proxy on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/proxy/</link><description>Recent content in Proxy on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 13 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/proxy/index.xml" rel="self" type="application/rss+xml"/><item><title>Reading the Claude Code Hidden Problem Analysis — 11 Bugs, Proxy Data, and a Quota Blind Spot</title><link>https://ice-ice-bear.github.io/posts/2026-04-13-claude-code-hidden-problems/</link><pubDate>Mon, 13 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-13-claude-code-hidden-problems/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Reading the Claude Code Hidden Problem Analysis — 11 Bugs, Proxy Data, and a Quota Blind Spot" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/ArkNill/claude-code-hidden-problem-analysis" target="_blank" rel="noopener"
 &gt;ArkNill/claude-code-hidden-problem-analysis&lt;/a&gt; is one of the most thorough pieces of community reverse engineering I&amp;rsquo;ve seen for any developer tool. It catalogs &lt;strong&gt;11 confirmed client-side bugs&lt;/strong&gt; in Claude Code, of which &lt;strong&gt;9 remain unfixed across six releases (v2.1.92–v2.1.97)&lt;/strong&gt;, and reconstructs the server-side quota system from intercepted HTTP headers. This post summarizes what&amp;rsquo;s actually in there.&lt;/p&gt;
&lt;h2 id="where-the-bugs-sit"&gt;Where the Bugs Sit
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 A[Claude Code Client] --&gt; B[Cache layer]
 A --&gt; C[Context manager]
 A --&gt; D[Rate limit handler]
 B --&gt; B1["B1 Sentinel &amp;lt;br/&amp;gt; (Fixed v2.1.91)"]
 B --&gt; B2["B2 Resume &amp;lt;br/&amp;gt; (Fixed v2.1.91)"]
 B --&gt; B2a["B2a SendMessage &amp;lt;br/&amp;gt; resume miss"]
 C --&gt; B4["B4 Microcompact &amp;lt;br/&amp;gt; silent clear"]
 C --&gt; B8["B8 Inflation"]
 C --&gt; B9["B9 /branch &amp;lt;br/&amp;gt; 6%→73% jump"]
 C --&gt; B10["B10 TaskOutput &amp;lt;br/&amp;gt; 21x injection"]
 D --&gt; B3["B3 False rate limit"]
 D --&gt; B5["B5 Budget enforcement"]
 A --&gt; B11["B11 Adaptive thinking &amp;lt;br/&amp;gt; zero reasoning"]&lt;/pre&gt;&lt;p&gt;The repo&amp;rsquo;s bug taxonomy hits three layers: &lt;strong&gt;cache&lt;/strong&gt; (B1, B2, B2a), &lt;strong&gt;context&lt;/strong&gt; (B4, B8, B9, B10), and &lt;strong&gt;rate limiting&lt;/strong&gt; (B3, B5, B11). Anthropic shipped fixes for B1 and B2 in v2.1.91; nothing else has moved across six subsequent releases. The maintainer cross-references the changelog to make this case explicitly.&lt;/p&gt;
&lt;h2 id="the-proxy-dataset"&gt;The Proxy Dataset
&lt;/h2&gt;&lt;p&gt;What separates this analysis from ordinary &amp;ldquo;Claude Code feels slower&amp;rdquo; complaints is the data. The maintainer runs a transparent HTTP proxy (&lt;strong&gt;cc-relay&lt;/strong&gt;) that captures every request between the Claude Code client and Anthropic&amp;rsquo;s API. The April 8 dataset covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;17,610 requests&lt;/strong&gt; across &lt;strong&gt;129 sessions&lt;/strong&gt; (April 1-8)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;532 JSONL files&lt;/strong&gt; (158.3 MB) of raw session logs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bulk bug detection&lt;/strong&gt; automated across the dataset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The numbers that jump out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;B5 budget enforcement events:&lt;/strong&gt; went from 261 (single-day measurement on Apr 3) to &lt;strong&gt;72,839 (full week April 1-8)&lt;/strong&gt; — a 279× increase in detection volume as the dataset grew, suggesting the bug fires on virtually every long session&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;B4 microcompact events:&lt;/strong&gt; 3,782 events that silently cleared &lt;strong&gt;15,998 items&lt;/strong&gt; mid-session&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;B8 context inflation:&lt;/strong&gt; 2.37× average across 10 sessions, max 4.42× — universal, not isolated&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Synthetic rate limit (B3):&lt;/strong&gt; 183 of 532 files (34.4%) contain &lt;code&gt;&amp;lt;synthetic&amp;gt;&lt;/code&gt; model entries — pervasive&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cache efficiency held at 98-99% across all session lengths on v2.1.91, confirming the cache regression really is fixed. Per-request cost scales with session length — &lt;code&gt;$0.20/req&lt;/code&gt; for 0-30 minute sessions vs &lt;code&gt;$0.33/req&lt;/code&gt; for 5+ hour sessions. The maintainer attributes this to structural context growth, not version-specific bugs.&lt;/p&gt;
&lt;h2 id="the-quota-architecture-reverse-engineered"&gt;The Quota Architecture Reverse Engineered
&lt;/h2&gt;&lt;p&gt;The most interesting single finding is the quota system reconstruction from &lt;code&gt;anthropic-ratelimit-unified-*&lt;/code&gt; headers across &lt;strong&gt;3,702 requests&lt;/strong&gt; (April 4-6). The headline:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dual sliding window system:&lt;/strong&gt; two independent counters running in parallel — a &lt;strong&gt;5-hour&lt;/strong&gt; window (&lt;code&gt;5h-utilization&lt;/code&gt;) and a &lt;strong&gt;7-day&lt;/strong&gt; window (&lt;code&gt;7d-utilization&lt;/code&gt;). The &lt;code&gt;representative-claim&lt;/code&gt; field is &lt;code&gt;five_hour&lt;/code&gt; in &lt;strong&gt;100% of requests&lt;/strong&gt; observed — i.e., the 5-hour window is &lt;em&gt;always&lt;/em&gt; the bottleneck, never the 7-day one.&lt;/p&gt;
&lt;p&gt;Per-1% utilization measurements on Max 20x ($200/mo):&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;Range per 1%&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Output tokens&lt;/td&gt;
 &lt;td&gt;9K-16K&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Cache Read tokens&lt;/td&gt;
 &lt;td&gt;1.5M-2.1M&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Total Visible&lt;/td&gt;
 &lt;td&gt;1.5M-2.1M&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7d accumulation ratio&lt;/td&gt;
 &lt;td&gt;0.12-0.17&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="the-thinking-token-blind-spot"&gt;The Thinking-Token Blind Spot
&lt;/h3&gt;&lt;p&gt;Here&amp;rsquo;s the unsettling part. Extended thinking tokens are &lt;strong&gt;not included&lt;/strong&gt; in the &lt;code&gt;output_tokens&lt;/code&gt; field returned by the API. At 9K-16K visible output per 1%, a full 100% 5-hour window equals only 0.9M-1.6M visible output tokens — implausibly low for several hours of Opus work. The pattern is consistent with thinking tokens being counted against the quota server-side without being reported client-side. The maintainer explicitly flags this as unconfirmed from the client and proposes a thinking-disabled isolation test.&lt;/p&gt;
&lt;p&gt;This matters because it means &lt;strong&gt;Max plan users have no way to predict when they&amp;rsquo;ll hit the wall&lt;/strong&gt; — the visible token counter understates true consumption by a factor that depends on how much thinking the model does, which the user cannot observe.&lt;/p&gt;
&lt;h2 id="community-cross-validation"&gt;Community Cross-Validation
&lt;/h2&gt;&lt;p&gt;Two independent contributors back the analysis with their own data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;@fgrosswig&lt;/strong&gt;: dual-machine 18-day JSONL forensics shows a &lt;strong&gt;64× budget reduction&lt;/strong&gt; between March 26 (3.2B tokens, no limit) and April 5 (88M tokens at 90%)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;@Commandershadow9&lt;/strong&gt;: separate cache-fix forensics shows &lt;strong&gt;34-143× capacity reduction&lt;/strong&gt;, independent of the cache bug, supporting the thinking-token hypothesis&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Anthropic acknowledged B11 (adaptive thinking zero-reasoning → fabrication) on Hacker News but has not followed up.&lt;/p&gt;
&lt;h2 id="why-this-analysis-matters"&gt;Why This Analysis Matters
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 A[Vendor changes &amp;lt;br/&amp;gt; quota silently] --&gt; B[Users notice &amp;lt;br/&amp;gt; slowness]
 B --&gt; C{Without proxy data}
 C --&gt;|Anecdote| D[Easy to dismiss]
 C --&gt;|Measured proxy| E[Hard to dismiss]
 E --&gt; F[Anthropic acknowledges B11]&lt;/pre&gt;&lt;p&gt;The repo is essentially a worked example of why &lt;strong&gt;transparent observability of vendor APIs matters&lt;/strong&gt;. Without &lt;code&gt;cc-relay&lt;/code&gt; capturing actual headers and JSONL forensics, every claim in the analysis would be dismissable as &amp;ldquo;user error&amp;rdquo; or &amp;ldquo;your prompts are different now.&amp;rdquo; With 17K requests on the record, the conversation shifts to &amp;ldquo;what is the server actually doing differently.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The companion repo &lt;a class="link" href="https://github.com/ArkNill/claude-code-cache-analysis" target="_blank" rel="noopener"
 &gt;ArkNill/claude-code-cache-analysis&lt;/a&gt; has the cache-specific deep dive and a &lt;a class="link" href="https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/09_QUICKSTART.md" target="_blank" rel="noopener"
 &gt;quickstart guide&lt;/a&gt; for users who want to skip the analysis and just apply the workarounds.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;This is what good developer-tool QA looks like when the vendor is opaque. The pattern — run a transparent proxy, log every header, automate bug detection across hundreds of sessions, cross-reference the changelog — is portable to any opaque API service. The thinking-token blind spot in particular is a case study in &lt;strong&gt;why client-side telemetry from a vendor is not enough&lt;/strong&gt;; you need server-side headers or you can&amp;rsquo;t see the bottleneck. For Claude Code users on Max plans, the practical implications are concrete: log your sessions, don&amp;rsquo;t assume &lt;code&gt;output_tokens&lt;/code&gt; reflects true cost, and watch the &lt;code&gt;5h-utilization&lt;/code&gt; header if you&amp;rsquo;re hitting walls. For everyone building on top of LLM APIs, the lesson is that &lt;strong&gt;observability infrastructure pays for itself the first time a vendor changes quota behavior without telling you.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="quick-links"&gt;Quick Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ArkNill/claude-code-hidden-problem-analysis" target="_blank" rel="noopener"
 &gt;ArkNill/claude-code-hidden-problem-analysis&lt;/a&gt; — main repo&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ArkNill/claude-code-cache-analysis" target="_blank" rel="noopener"
 &gt;ArkNill/claude-code-cache-analysis&lt;/a&gt; — cache-specific deep dive&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/ko/README.md" target="_blank" rel="noopener"
 &gt;Korean version (ko/README.md)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/13_PROXY-DATA.md" target="_blank" rel="noopener"
 &gt;13_PROXY-DATA.md&lt;/a&gt; — proxy dataset details&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>