<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Streaming on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/streaming/</link><description>Recent content in Streaming on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 06 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/streaming/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request</title><link>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-1/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-1/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;This is the first post in a series that systematically dissects Claude Code&amp;rsquo;s source structure across 27 sessions. In this post, we trace the &lt;strong&gt;complete call stack across 11 TypeScript files&lt;/strong&gt; that a &amp;ldquo;hello&amp;rdquo; typed into the terminal traverses before a response appears on screen.&lt;/p&gt;
&lt;h2 id="analysis-target-11-core-files"&gt;Analysis Target: 11 Core Files
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Lines&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;entrypoints/cli.tsx&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;302&lt;/td&gt;
 &lt;td&gt;CLI bootstrap, argument parsing, mode routing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;main.tsx&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;4,683&lt;/td&gt;
 &lt;td&gt;Main REPL component, Commander setup&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;commands.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;754&lt;/td&gt;
 &lt;td&gt;Command registry&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;context.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;189&lt;/td&gt;
 &lt;td&gt;System prompt assembly, CLAUDE.md injection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;QueryEngine.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,295&lt;/td&gt;
 &lt;td&gt;Session management, SDK interface&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,729&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Core turn loop&lt;/strong&gt; — API + tool execution&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/api/client.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;389&lt;/td&gt;
 &lt;td&gt;HTTP client, 4-provider routing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;8&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/api/claude.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;3,419&lt;/td&gt;
 &lt;td&gt;Messages API wrapper, SSE streaming, retries&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;9&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/tools/toolOrchestration.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;188&lt;/td&gt;
 &lt;td&gt;Concurrency partitioning&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/tools/StreamingToolExecutor.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;530&lt;/td&gt;
 &lt;td&gt;Tool execution during streaming&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/tools/toolExecution.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,745&lt;/td&gt;
 &lt;td&gt;Tool dispatch, permission checks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We trace a total of &lt;strong&gt;15,223 lines&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="1-entry-and-bootstrap-clitsx---maintsx"&gt;1. Entry and Bootstrap: cli.tsx -&amp;gt; main.tsx
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;cli.tsx&lt;/code&gt; is only 302 lines, yet it contains a surprising number of &lt;strong&gt;fast-path&lt;/strong&gt; branches:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:37 --version -&amp;gt; immediate output, 0 imports
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:53 --dump-system -&amp;gt; minimal imports
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:100 --daemon-worker -&amp;gt; worker-only path
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:112 remote-control -&amp;gt; bridge mode
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:185 ps/logs/attach -&amp;gt; background sessions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:293 default path -&amp;gt; dynamic import of main.tsx
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Design intent&lt;/strong&gt;: Avoid loading &lt;code&gt;main.tsx&lt;/code&gt;&amp;rsquo;s 4,683 lines just for &lt;code&gt;--version&lt;/code&gt;. This optimization directly impacts the perceived responsiveness of the CLI tool.&lt;/p&gt;
&lt;p&gt;The default path dynamically imports &lt;code&gt;main.tsx&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// cli.tsx:293-297
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;main&lt;/span&gt;: &lt;span class="kt"&gt;cliMain&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="kr"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;../main.js&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cliMain&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The reason &lt;code&gt;main.tsx&lt;/code&gt; is 4,683 lines is that it includes all of the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Side-effect imports&lt;/strong&gt; (lines 1-209): &lt;code&gt;profileCheckpoint&lt;/code&gt;, &lt;code&gt;startMdmRawRead&lt;/code&gt;, &lt;code&gt;startKeychainPrefetch&lt;/code&gt; — parallel subprocesses launched at module evaluation time to hide the ~65ms macOS keychain read&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Commander setup&lt;/strong&gt; (line 585+): CLI argument parsing, 10+ mode-specific branches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;React/Ink REPL rendering&lt;/strong&gt;: Terminal UI mount&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Headless path&lt;/strong&gt; (&lt;code&gt;-p&lt;/code&gt;/&lt;code&gt;--print&lt;/code&gt;): Uses &lt;code&gt;QueryEngine&lt;/code&gt; directly without UI&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="2-prompt-assembly-contexttss-dual-memoize"&gt;2. Prompt Assembly: context.ts&amp;rsquo;s dual-memoize
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;context.ts&lt;/code&gt; is a small file at 189 lines, but it handles all dynamic parts of the system prompt. Two memoized functions are at its core:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;getSystemContext()&lt;/code&gt;&lt;/strong&gt; (context.ts:116): Collects git state (branch, status, recent commits)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;getUserContext()&lt;/code&gt;&lt;/strong&gt; (context.ts:155): Discovers and parses CLAUDE.md files&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why the separation?&lt;/strong&gt; It&amp;rsquo;s directly tied to the Anthropic Messages API&amp;rsquo;s prompt caching strategy. Since the cache lifetimes of the system prompt and user context differ, &lt;code&gt;cache_control&lt;/code&gt; must be applied differently to each. Wrapping them in &lt;code&gt;memoize&lt;/code&gt; ensures each is computed only once per session.&lt;/p&gt;
&lt;p&gt;The call to &lt;code&gt;setCachedClaudeMdContent()&lt;/code&gt; at context.ts:170-176 is &lt;strong&gt;a mechanism to break circular dependencies&lt;/strong&gt; — yoloClassifier needs CLAUDE.md content, but a direct import would create a permissions -&amp;gt; yoloClassifier -&amp;gt; claudemd -&amp;gt; permissions cycle.&lt;/p&gt;
&lt;h2 id="3-asyncgenerator-chain-the-architectural-spine"&gt;3. AsyncGenerator Chain: The Architectural Spine
&lt;/h2&gt;&lt;p&gt;Claude Code&amp;rsquo;s entire data flow is built on an &lt;code&gt;AsyncGenerator&lt;/code&gt; chain:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;QueryEngine.submitMessage()* -&amp;gt; query()* -&amp;gt; queryLoop()* -&amp;gt; queryModelWithStreaming()*
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Every core function is an &lt;code&gt;async function*&lt;/code&gt;. This isn&amp;rsquo;t just an implementation choice — it&amp;rsquo;s an &lt;strong&gt;architectural decision&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Backpressure&lt;/strong&gt;: When the consumer is slow, the producer waits&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cancellation&lt;/strong&gt;: Combined with AbortController for immediate cancellation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Composition&lt;/strong&gt;: &lt;code&gt;yield*&lt;/code&gt; naturally chains generators together&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;State management&lt;/strong&gt;: Local variables within loops naturally maintain state across turns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking at the signature of &lt;code&gt;QueryEngine.submitMessage()&lt;/code&gt; (QueryEngine.ts:209):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;async&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;submitMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;: &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;ContentBlockParam&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;uuid?&lt;/span&gt;: &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;isMeta?&lt;/span&gt;: &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;SDKMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;void&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In SDK mode, each message is &lt;strong&gt;streamed via yield&lt;/strong&gt;, and Node.js backpressure is naturally implemented.&lt;/p&gt;
&lt;h2 id="4-the-core-turn-loop-querytss-whiletrue"&gt;4. The Core Turn Loop: query.ts&amp;rsquo;s while(true)
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;queryLoop()&lt;/code&gt; in &lt;code&gt;query.ts&lt;/code&gt; (1,729 lines) is the actual API + tool loop:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// query.ts:307
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 1. Call queryModelWithStreaming() -&amp;gt; SSE stream
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 2. Yield streaming events
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 3. Detect tool calls -&amp;gt; runTools()/StreamingToolExecutor
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 4. Append tool results to messages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 5. stop_reason == &amp;#34;end_turn&amp;#34; -&amp;gt; break
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// stop_reason == &amp;#34;tool_use&amp;#34; -&amp;gt; continue
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;State&lt;/code&gt; type (query.ts:204) is important. It manages loop state as an explicit record with fields like &lt;code&gt;messages&lt;/code&gt;, &lt;code&gt;toolUseContext&lt;/code&gt;, &lt;code&gt;autoCompactTracking&lt;/code&gt;, and &lt;code&gt;maxOutputTokensRecoveryCount&lt;/code&gt;, updating everything at once at continue sites.&lt;/p&gt;
&lt;h2 id="5-api-communication-4-providers-and-caching"&gt;5. API Communication: 4 Providers and Caching
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;getAnthropicClient()&lt;/code&gt; at &lt;code&gt;client.ts:88&lt;/code&gt; supports 4 providers:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Provider&lt;/th&gt;
 &lt;th&gt;SDK&lt;/th&gt;
 &lt;th&gt;Reason for Dynamic Import&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Anthropic Direct&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Anthropic&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Default, loaded immediately&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AWS Bedrock&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;AnthropicBedrock&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;AWS SDK is several MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Azure Foundry&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;AnthropicFoundry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Azure Identity is several MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GCP Vertex&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;AnthropicVertex&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Google Auth is several MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The core function chain in &lt;code&gt;claude.ts&lt;/code&gt; (3,419 lines):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;queryModelWithStreaming() (claude.ts:752)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; queryModel()
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; withRetry()
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; anthropic.beta.messages.stream() (SDK call)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The caching strategy is determined by &lt;code&gt;getCacheControl()&lt;/code&gt; (claude.ts:358), which decides the 1-hour TTL based on user type, feature flags, and query source.&lt;/p&gt;
&lt;h2 id="6-tool-orchestration-3-tier-concurrency"&gt;6. Tool Orchestration: 3-Tier Concurrency
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 TC["Tool call array&amp;lt;br/&amp;gt;[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()&amp;lt;br/&amp;gt;toolOrchestration.ts:91"]
 B1["Batch 1&amp;lt;br/&amp;gt;ReadFile + ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 B2["Batch 2&amp;lt;br/&amp;gt;Bash&amp;lt;br/&amp;gt;isConcurrencySafe=false"]
 B3["Batch 3&amp;lt;br/&amp;gt;ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 PAR["Promise.all()&amp;lt;br/&amp;gt;max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --&gt; P
 P --&gt; B1
 P --&gt; B2
 P --&gt; B3
 B1 --&gt; PAR
 B2 --&gt; SEQ
 B3 --&gt; PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9&lt;/pre&gt;&lt;p&gt;&lt;code&gt;StreamingToolExecutor&lt;/code&gt; (530 lines) extends this batch partitioning into a &lt;strong&gt;streaming context&lt;/strong&gt;. When it detects tool calls while the API response is still streaming, it immediately starts execution:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;addTool()&lt;/code&gt; (StreamingToolExecutor.ts:76) — Add to queue&lt;/li&gt;
&lt;li&gt;&lt;code&gt;processQueue()&lt;/code&gt; (StreamingToolExecutor.ts:140) — Check concurrency, then execute immediately&lt;/li&gt;
&lt;li&gt;&lt;code&gt;getRemainingResults()&lt;/code&gt; (StreamingToolExecutor.ts:453) — Wait for all tools to complete&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Error propagation rules&lt;/strong&gt;: Only Bash errors cancel sibling tools (&lt;code&gt;siblingAbortController&lt;/code&gt;). Read/WebFetch errors don&amp;rsquo;t affect other tools. This reflects the implicit dependencies between Bash commands (if mkdir fails, subsequent commands are pointless).&lt;/p&gt;
&lt;h2 id="full-data-flow"&gt;Full Data Flow
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;sequenceDiagram
 participant User as User
 participant CLI as cli.tsx
 participant Main as main.tsx
 participant QE as QueryEngine
 participant Query as query.ts
 participant Claude as claude.ts
 participant API as Anthropic API
 participant Tools as toolOrchestration
 participant Exec as toolExecution

 User-&gt;&gt;CLI: Types "hello"
 CLI-&gt;&gt;Main: dynamic import
 Main-&gt;&gt;QE: new QueryEngine()
 QE-&gt;&gt;Query: query()
 Query-&gt;&gt;Claude: queryModelWithStreaming()
 Claude-&gt;&gt;API: anthropic.beta.messages.stream()
 API--&gt;&gt;Claude: SSE stream

 alt stop_reason == end_turn
 Claude--&gt;&gt;User: Output response
 else stop_reason == tool_use
 Claude--&gt;&gt;Query: tool_use blocks
 Query-&gt;&gt;Tools: partitionToolCalls()
 Tools-&gt;&gt;Exec: runToolUse()
 Exec-&gt;&gt;Exec: canUseTool() + tool.call()
 Exec--&gt;&gt;Query: Tool results
 Note over Query: Next iteration of while(true)
 end&lt;/pre&gt;&lt;h2 id="rust-gap-map-preview"&gt;Rust Gap Map Preview
&lt;/h2&gt;&lt;p&gt;Tracing the same request through the Rust port revealed &lt;strong&gt;31 gaps&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Priority&lt;/th&gt;
 &lt;th&gt;Gap Count&lt;/th&gt;
 &lt;th&gt;Key Examples&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;P0 (Critical)&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;Synchronous ApiClient, missing StreamingToolExecutor&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;P1 (High)&lt;/td&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;3-tier concurrency, prompt caching, Agent tool&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;P2 (Medium)&lt;/td&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;Multi-provider, effort control, sandbox&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td&gt;Auto-compaction, SSE parser, OAuth, config loading&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Implementation coverage: 36% (11/31)&lt;/strong&gt;. The next post dives deep into the conversation loop at the heart of these gaps.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AsyncGenerator is the architectural spine&lt;/strong&gt; — It&amp;rsquo;s not just an implementation technique but a design decision that simultaneously solves backpressure, cancellation, and composition. In Rust, the &lt;code&gt;Stream&lt;/code&gt; trait is the counterpart, but the ergonomics of &lt;code&gt;yield*&lt;/code&gt; composition differ significantly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;main.tsx at 4,683 lines is technical debt&lt;/strong&gt; — Commander setup, React components, and state management are all mixed in a single file. This is the result of organic growth and represents an opportunity for module decomposition.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tool concurrency is non-trivial&lt;/strong&gt; — The 3-tier model (read batches, sequential writes, Bash sibling cancellation) rather than &amp;ldquo;all parallel&amp;rdquo; or &amp;ldquo;all sequential&amp;rdquo; is a core design element of production agent harnesses.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Next post: &lt;a class="link" href="https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-2/" &gt;#2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths</title><link>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-2/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-2/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;In the first post of this series, we traced the journey of a single &amp;ldquo;hello&amp;rdquo; through 11 files. This post fully dissects the heart of that journey: the &lt;code&gt;while(true)&lt;/code&gt; loop in &lt;code&gt;query.ts&lt;/code&gt;&amp;rsquo;s 1,729 lines. We analyze the resilient execution model created by 7 &lt;code&gt;continue&lt;/code&gt; paths, the 4-stage state machine of &lt;code&gt;StreamingToolExecutor&lt;/code&gt;, and the 3-tier concurrency model of &lt;code&gt;partitionToolCalls()&lt;/code&gt;, then compare how we reproduced these patterns in a Rust prototype.&lt;/p&gt;
&lt;h2 id="analysis-target-10-core-files"&gt;Analysis Target: 10 Core Files
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Lines&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/config.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;46&lt;/td&gt;
 &lt;td&gt;Immutable runtime gate snapshot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/deps.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;40&lt;/td&gt;
 &lt;td&gt;Testable I/O boundary (DI)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/tokenBudget.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;93&lt;/td&gt;
 &lt;td&gt;Token budget management, auto-continue/stop decisions&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/stopHooks.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;473&lt;/td&gt;
 &lt;td&gt;Stop/TaskCompleted/TeammateIdle hooks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,729&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Core&lt;/strong&gt; &amp;ndash; while(true) turn loop&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;QueryEngine.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,295&lt;/td&gt;
 &lt;td&gt;Session wrapper, SDK interface&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;toolOrchestration.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;188&lt;/td&gt;
 &lt;td&gt;Tool partitioning + concurrency control&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;8&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;StreamingToolExecutor.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;530&lt;/td&gt;
 &lt;td&gt;SSE mid-stream tool pipelining&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;9&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;toolExecution.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,745&lt;/td&gt;
 &lt;td&gt;Tool dispatch, permission checks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;toolHooks.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;650&lt;/td&gt;
 &lt;td&gt;Pre/PostToolUse hook pipeline&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We dissect a total of &lt;strong&gt;6,789 lines&lt;/strong&gt; of core orchestration code.&lt;/p&gt;
&lt;h2 id="1-queryloops-7-continue-paths"&gt;1. queryLoop()&amp;rsquo;s 7 Continue Paths
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;queryLoop()&lt;/code&gt; function in &lt;code&gt;query.ts&lt;/code&gt; (query.ts:241) is not a simple API call loop. It&amp;rsquo;s a &lt;strong&gt;resilient executor&lt;/strong&gt; with 7 distinct &lt;code&gt;continue&lt;/code&gt; reasons, each handling a unique failure scenario:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Reason&lt;/th&gt;
 &lt;th&gt;Line&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;collapse_drain_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1114&lt;/td&gt;
 &lt;td&gt;Retry after context collapse drain&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;reactive_compact_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1162&lt;/td&gt;
 &lt;td&gt;Retry after reactive compaction (413 recovery)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_escalate&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1219&lt;/td&gt;
 &lt;td&gt;Token escalation from 8k -&amp;gt; 64k&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_recovery&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1248&lt;/td&gt;
 &lt;td&gt;Inject &amp;ldquo;continue writing&amp;rdquo; nudge message&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;stop_hook_blocking&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1303&lt;/td&gt;
 &lt;td&gt;Stop hook returned a blocking error&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;token_budget_continuation&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1337&lt;/td&gt;
 &lt;td&gt;Continue due to remaining token budget&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;next_turn&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1725&lt;/td&gt;
 &lt;td&gt;Next turn after tool execution completes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;The State type is key&lt;/strong&gt; (query.ts:204-217). Loop state is managed as a record with 10 fields. Why a record instead of individual variables? There are 7 &lt;code&gt;continue&lt;/code&gt; sites, each updating via &lt;code&gt;state = { ... }&lt;/code&gt; all at once. Individually assigning 9 variables makes it easy to miss one. &lt;strong&gt;Record updates let the type system catch omissions.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="full-flow-of-a-single-loop-iteration"&gt;Full Flow of a Single Loop Iteration
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;1.&lt;/span&gt; &lt;span class="n"&gt;Preprocessing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;365&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;447&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;snip&lt;/span&gt; &lt;span class="n"&gt;compaction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;micro&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="n"&gt;collapse&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;2.&lt;/span&gt; &lt;span class="n"&gt;Auto&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compaction&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;454&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;543&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replace&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;3.&lt;/span&gt; &lt;span class="n"&gt;Blocking&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;628&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;648&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;immediate&lt;/span&gt; &lt;span class="n"&gt;termination&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="n"&gt;exceeded&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;4.&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="n"&gt;streaming&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;654&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;863&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;consume&lt;/span&gt; &lt;span class="n"&gt;SSE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="n"&gt;via&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;await&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;5.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt; &lt;span class="n"&gt;paths&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1062&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1357&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mi"&gt;413&lt;/span&gt; &lt;span class="n"&gt;recovery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_output&lt;/span&gt; &lt;span class="n"&gt;recovery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;6.&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt; &lt;span class="n"&gt;continuation&lt;/span&gt; &lt;span class="n"&gt;paths&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1360&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1728&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;execute&lt;/span&gt; &lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;next_turn&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="2-streamingtoolexecutors-4-stage-state-machine"&gt;2. StreamingToolExecutor&amp;rsquo;s 4-Stage State Machine
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;StreamingToolExecutor.ts&lt;/code&gt; (530 lines) is the most sophisticated concurrency pattern in Claude Code. The core idea: &lt;strong&gt;start executing completed tool calls while the API response is still streaming&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When the model calls &lt;code&gt;[ReadFile(&amp;quot;a.ts&amp;quot;), ReadFile(&amp;quot;b.ts&amp;quot;), Bash(&amp;quot;make test&amp;quot;)]&lt;/code&gt; at once, without pipelining, execution only begins after all three tool blocks have arrived. With pipelining, file reading starts the instant the &lt;code&gt;ReadFile(&amp;quot;a.ts&amp;quot;)&lt;/code&gt; block completes.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;stateDiagram-v2
 [*] --&gt; queued: addTool()
 queued --&gt; executing: processQueue()&amp;lt;br/&amp;gt;canExecuteTool() == true
 queued --&gt; completed: Pre-canceled&amp;lt;br/&amp;gt;getAbortReason() != null

 executing --&gt; completed: Tool execution finished&amp;lt;br/&amp;gt;or sibling abort

 completed --&gt; yielded: getCompletedResults()&amp;lt;br/&amp;gt;yield in order

 yielded --&gt; [*]

 note right of queued
 processQueue() auto-triggers
 on addTool() and prior
 tool completion
 end note

 note right of completed
 On Bash error:
 siblingAbortController.abort()
 cancels sibling tools only
 end note&lt;/pre&gt;&lt;h3 id="concurrency-decision-logic-canexecutetool-line-129"&gt;Concurrency Decision Logic (canExecuteTool, line 129)
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Execution&lt;/span&gt; &lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="n"&gt;currently&lt;/span&gt; &lt;span class="n"&gt;executing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executingTools&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="k"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;is&lt;/span&gt; &lt;span class="n"&gt;concurrencySafe&lt;/span&gt; &lt;span class="n"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt; &lt;span class="n"&gt;executing&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;also&lt;/span&gt; &lt;span class="n"&gt;concurrencySafe&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Read-only tools can execute in parallel, but if even one write tool is present, the next tool waits until it finishes.&lt;/p&gt;
&lt;h3 id="siblingabortcontroller--hierarchical-cancellation"&gt;siblingAbortController &amp;ndash; Hierarchical Cancellation
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;siblingAbortController&lt;/code&gt; (line 46-61) is a child of &lt;code&gt;toolUseContext.abortController&lt;/code&gt;. When a Bash tool throws an error, it calls &lt;code&gt;siblingAbortController.abort('sibling_error')&lt;/code&gt; to &lt;strong&gt;cancel only sibling tools&lt;/strong&gt;. The parent controller is unaffected, so the overall query continues.&lt;/p&gt;
&lt;p&gt;Why do only Bash errors cancel siblings? In &lt;code&gt;mkdir -p dir &amp;amp;&amp;amp; cd dir &amp;amp;&amp;amp; make&lt;/code&gt;, if mkdir fails, subsequent commands are pointless. ReadFile or WebFetch failures are independent and shouldn&amp;rsquo;t affect other tools.&lt;/p&gt;
&lt;h2 id="3-partitiontoolcalls--3-tier-concurrency-model"&gt;3. partitionToolCalls &amp;ndash; 3-Tier Concurrency Model
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;toolOrchestration.ts&lt;/code&gt; (188 lines) defines the entire concurrency model for tool execution.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 TC["Tool call array&amp;lt;br/&amp;gt;[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()&amp;lt;br/&amp;gt;toolOrchestration.ts:91"]
 B1["Batch 1&amp;lt;br/&amp;gt;ReadFile + ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 B2["Batch 2&amp;lt;br/&amp;gt;Bash&amp;lt;br/&amp;gt;isConcurrencySafe=false"]
 B3["Batch 3&amp;lt;br/&amp;gt;ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 PAR["Promise.all()&amp;lt;br/&amp;gt;max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --&gt; P
 P --&gt; B1
 P --&gt; B2
 P --&gt; B3
 B1 --&gt; PAR
 B2 --&gt; SEQ
 B3 --&gt; PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9&lt;/pre&gt;&lt;p&gt;The rule is simple: consecutive &lt;code&gt;isConcurrencySafe&lt;/code&gt; tools are grouped into a single batch, while non-safe tools each become independent batches. This decision comes &lt;strong&gt;from the tool definition itself&lt;/strong&gt; — determined by calling &lt;code&gt;tool.isConcurrencySafe(parsedInput)&lt;/code&gt;. The same tool may have different concurrency safety depending on its input.&lt;/p&gt;
&lt;h3 id="context-modifiers-and-race-conditions"&gt;Context Modifiers and Race Conditions
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Why apply them in order after the batch completes?&lt;/strong&gt; Applying context modifiers immediately during parallel execution creates race conditions. If A completes first and modifies the context, B (still executing) started with the pre-modification context but would see the post-modification state. Applying them in original tool order after batch completion guarantees deterministic results (toolOrchestration.ts:54-62).&lt;/p&gt;
&lt;h2 id="4-tool-execution-pipeline-and-hooks"&gt;4. Tool Execution Pipeline and Hooks
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;runToolUse()&lt;/code&gt; in &lt;code&gt;toolExecution.ts&lt;/code&gt; (1,745 lines, line 337) manages the complete lifecycle of each individual tool call:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;runToolUse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="n"&gt;point&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="mf"&gt;1.&lt;/span&gt; &lt;span class="n"&gt;findToolByName&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;deprecated&lt;/span&gt; &lt;span class="n"&gt;aliases&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;345&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;356&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="mf"&gt;2.&lt;/span&gt; &lt;span class="n"&gt;abort&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;already&lt;/span&gt; &lt;span class="n"&gt;canceled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;CANCEL_MESSAGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;415&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="mf"&gt;3.&lt;/span&gt; &lt;span class="n"&gt;streamedCheckPermissionsAndCallTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="n"&gt;permissions&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;execution&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;455&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;checkPermissionsAndCallTool&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Zod&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;615&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validateInput&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;custom&lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;683&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Speculative&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bash&lt;/span&gt; &lt;span class="n"&gt;only&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;740&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;runPreToolUseHooks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;resolveHookPermissionDecision&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;921&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="n"&gt;execution&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1207&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;runPostToolUseHooks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="n"&gt;transformation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="the-core-invariant-of-resolvehookpermissiondecision"&gt;The Core Invariant of resolveHookPermissionDecision
&lt;/h3&gt;&lt;p&gt;In &lt;code&gt;resolveHookPermissionDecision()&lt;/code&gt; (toolHooks.ts:332), &lt;strong&gt;a hook&amp;rsquo;s &lt;code&gt;allow&lt;/code&gt; does not bypass settings.json deny/ask rules&lt;/strong&gt; (toolHooks.ts:373). Even if a hook allows, it must still pass &lt;code&gt;checkRuleBasedPermissions()&lt;/code&gt;. This reflects the design principle that &amp;ldquo;hooks are automation helpers, not security bypasses.&amp;rdquo;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;When hook result is allow:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; Call checkRuleBasedPermissions()
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; null means pass (no rules)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; deny means rule overrides hook
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; ask means user prompt required
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="5-rust-comparison--152-lines-vs-1729-lines"&gt;5. Rust Comparison &amp;ndash; 152 Lines vs 1,729 Lines
&lt;/h2&gt;&lt;p&gt;Rust&amp;rsquo;s &lt;code&gt;ConversationRuntime::run_turn()&lt;/code&gt; consists of &lt;strong&gt;152 lines in a single &lt;code&gt;loop {}&lt;/code&gt;&lt;/strong&gt; (conversation.rs:183-272). Of the 7 TS continue paths, only &lt;code&gt;next_turn&lt;/code&gt; (next turn after tool execution) exists in Rust.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;TS Continue Reason&lt;/th&gt;
 &lt;th&gt;Rust Status&lt;/th&gt;
 &lt;th&gt;Why&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;collapse_drain_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No context collapse&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;reactive_compact_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No 413 recovery&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_escalate&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No 8k-&amp;gt;64k escalation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_recovery&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No multi-turn nudge&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;stop_hook_blocking&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No stop hooks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;token_budget_continuation&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No token budget system&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;next_turn&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Implemented&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Re-calls API after tool results&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="the-most-critical-gap-synchronous-api-consumption"&gt;The Most Critical Gap: Synchronous API Consumption
&lt;/h3&gt;&lt;p&gt;The Rust &lt;code&gt;ApiClient&lt;/code&gt; trait signature says it all:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;: &lt;span class="nc"&gt;ApiRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AssistantEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RuntimeError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The return type is &lt;code&gt;Vec&amp;lt;AssistantEvent&amp;gt;&lt;/code&gt;. &lt;strong&gt;It&amp;rsquo;s not streaming.&lt;/strong&gt; It collects all SSE events and returns them as a vector. This means when the model calls 5 ReadFiles, TS can finish executing the first ReadFile while still streaming, but Rust must wait for all 5 to finish streaming before starting sequential execution. &lt;strong&gt;The latency gap grows proportionally with the number of tools.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="6-rust-prototype--bridging-the-gap"&gt;6. Rust Prototype &amp;ndash; Bridging the Gap
&lt;/h2&gt;&lt;p&gt;In the S04 prototype, we implemented an orchestration layer that bridges 3 P0 gaps:&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 subgraph TS["TS Streaming Pipeline"]
 direction TB
 ts1["SSE event stream"]
 ts2["StreamingToolExecutor&amp;lt;br/&amp;gt;4-state machine"]
 ts3["getCompletedResults()&amp;lt;br/&amp;gt;guaranteed yield order"]
 ts1 --&gt; ts2 --&gt; ts3
 end

 subgraph Rust["Rust Prototype"]
 direction TB
 rs1["EventStream&amp;lt;br/&amp;gt;tokio async"]
 rs2["StreamingPipeline&amp;lt;br/&amp;gt;tokio::spawn + mpsc"]
 rs3["Post-MessageEnd&amp;lt;br/&amp;gt;channel collect + sort"]
 rs1 --&gt; rs2 --&gt; rs3
 end

 subgraph Bridge["Core Mappings"]
 direction TB
 b1["yield -&gt; tx.send()"]
 b2["yield* -&gt; channel forwarding"]
 b3["for await -&gt; while let recv()"]
 end

 TS ~~~ Bridge ~~~ Rust

 style TS fill:#e1f5fe
 style Rust fill:#fff3e0
 style Bridge fill:#f3e5f5&lt;/pre&gt;&lt;h3 id="3-key-implementations-in-the-prototype"&gt;3 Key Implementations in the Prototype
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;1. Async streaming&lt;/strong&gt;: Extended the &lt;code&gt;ApiClient&lt;/code&gt; trait to an async stream. Since &lt;code&gt;MessageStream::next_event()&lt;/code&gt; is already async, only the consumer side needed changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Tool pipelining&lt;/strong&gt;: On receiving a &lt;code&gt;ToolUseEnd&lt;/code&gt; event, assembles a &lt;code&gt;ToolCall&lt;/code&gt; from accumulated input and immediately starts background execution via &lt;code&gt;tokio::spawn&lt;/code&gt;. Collects results in completion order via &lt;code&gt;mpsc::unbounded_channel&lt;/code&gt;, then sorts back to original order.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. 3-tier concurrency&lt;/strong&gt;: Partitions by &lt;code&gt;ToolCategory&lt;/code&gt; enum (ReadOnly/Write/BashLike). ReadOnly batches use &lt;code&gt;Semaphore(10)&lt;/code&gt; + &lt;code&gt;tokio::spawn&lt;/code&gt; for up to 10 parallel tasks. BashLike runs sequentially with remaining tasks aborted on error.&lt;/p&gt;
&lt;h3 id="prototype-coverage"&gt;Prototype Coverage
&lt;/h3&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;TS Feature&lt;/th&gt;
 &lt;th&gt;Prototype&lt;/th&gt;
 &lt;th&gt;Status&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;partitionToolCalls()&lt;/code&gt; 3-tier&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;partition_into_runs()&lt;/code&gt; + &lt;code&gt;ToolCategory&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;runToolsConcurrently()&lt;/code&gt; max 10&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Semaphore(10)&lt;/code&gt; + &lt;code&gt;tokio::spawn&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;siblingAbortController&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;break&lt;/code&gt; on BashLike error&lt;/td&gt;
 &lt;td&gt;Simplified&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;StreamingToolExecutor.addTool()&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;tokio::spawn&lt;/code&gt; on &lt;code&gt;ToolUseEnd&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;PreToolUse hook deny/allow&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;HookDecision::Allow/Deny&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;PostToolUse output transform&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;HookResult::transformed_output&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4-state machine (queued-&amp;gt;yielded)&lt;/td&gt;
 &lt;td&gt;spawned/completed 2-state&lt;/td&gt;
 &lt;td&gt;Incomplete&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;413 recovery / max_output escalation&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;preventContinuation&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="stop-condition-comparison"&gt;Stop Condition Comparison
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Condition&lt;/th&gt;
 &lt;th&gt;TS&lt;/th&gt;
 &lt;th&gt;Rust&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;No tools (end_turn)&lt;/td&gt;
 &lt;td&gt;Execute &lt;code&gt;handleStopHooks()&lt;/code&gt; then exit&lt;/td&gt;
 &lt;td&gt;Immediate &lt;code&gt;break&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Token budget exceeded&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;checkTokenBudget()&lt;/code&gt; with 3 decisions&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_output_tokens&lt;/td&gt;
 &lt;td&gt;Escalation + multi-turn recovery&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;413 prompt-too-long&lt;/td&gt;
 &lt;td&gt;Context collapse + reactive compaction&lt;/td&gt;
 &lt;td&gt;Error propagation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;maxTurns&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;maxTurns&lt;/code&gt; parameter (query.ts:1696)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;max_iterations&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Diminishing returns&lt;/td&gt;
 &lt;td&gt;3+ turns with &amp;lt;500 token increase&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;checkTokenBudget()&lt;/code&gt; in &lt;code&gt;tokenBudget.ts&lt;/code&gt; (93 lines) controls &lt;strong&gt;whether to continue responding, not prompt size&lt;/strong&gt;. &lt;code&gt;COMPLETION_THRESHOLD = 0.9&lt;/code&gt; (continue if below 90% of total budget), &lt;code&gt;DIMINISHING_THRESHOLD = 500&lt;/code&gt; (stop if 3+ consecutive turns each produce fewer than 500 tokens, indicating diminishing returns). The &lt;code&gt;nudgeMessage&lt;/code&gt; explicitly instructs &amp;ldquo;do not summarize.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-core-design-decision--why-asyncgenerator"&gt;The Core Design Decision &amp;ndash; Why AsyncGenerator
&lt;/h2&gt;&lt;p&gt;The entire pipeline is an &lt;code&gt;async function*&lt;/code&gt; chain:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;QueryEngine.submitMessage()* -&amp;gt; query()* -&amp;gt; queryLoop()* -&amp;gt; deps.callModel()*
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;runTools()* -&amp;gt; runToolUse()* -&amp;gt; handleStopHooks()* -&amp;gt; executeStopHooks()*
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key benefit of this choice: &lt;strong&gt;implementing complex state machines without inversion of control&lt;/strong&gt;. At each of the 7 &lt;code&gt;continue&lt;/code&gt; paths, you construct state explicitly with &lt;code&gt;state = { ... }&lt;/code&gt; and &lt;code&gt;continue&lt;/code&gt;. With a callback-based approach, state management would be scattered, making it difficult to guarantee consistency across 7 recovery paths.&lt;/p&gt;
&lt;p&gt;In Rust, since the &lt;code&gt;yield&lt;/code&gt; keyword isn&amp;rsquo;t stabilized, &lt;code&gt;tokio::sync::mpsc&lt;/code&gt; channels serve as the replacement. &lt;code&gt;yield&lt;/code&gt; -&amp;gt; &lt;code&gt;tx.send()&lt;/code&gt;, &lt;code&gt;yield*&lt;/code&gt; -&amp;gt; channel forwarding, &lt;code&gt;for await...of&lt;/code&gt; -&amp;gt; &lt;code&gt;while let Some(v) = rx.recv()&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;query.ts&amp;rsquo;s 7 continue paths are not &amp;ldquo;error handling&amp;rdquo; but a &amp;ldquo;resilience engine&amp;rdquo;&lt;/strong&gt; &amp;ndash; It collapses context on 413 errors, escalates tokens on max_output, and feeds back errors to the model on stop hook blocking. This recovery pipeline ensures stability during long-running autonomous tasks. Reproducing this in Rust requires state management beyond a simple &lt;code&gt;loop {}&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;StreamingToolExecutor is a UX decision, not a performance optimization&lt;/strong&gt; &amp;ndash; Executing 5 tools sequentially makes users wait for the sum of all execution times. Pipelining reduces not benchmark numbers but the perceived &amp;ldquo;waiting for a response&amp;rdquo; time. In the Rust prototype, we implemented this in under 20 lines using &lt;code&gt;tokio::spawn&lt;/code&gt; + &lt;code&gt;mpsc&lt;/code&gt; channels.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The dual structure of static partitioning + runtime concurrency balances safety and performance&lt;/strong&gt; &amp;ndash; &lt;code&gt;partitionToolCalls()&lt;/code&gt; divides batches at build time, while &lt;code&gt;canExecuteTool()&lt;/code&gt; judges executability at runtime. Thanks to this dual structure, the non-streaming path (&lt;code&gt;runTools&lt;/code&gt;) and the streaming path (&lt;code&gt;StreamingToolExecutor&lt;/code&gt;) share identical concurrency semantics.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Next post: &lt;a class="link" href="https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-3/" &gt;#3 &amp;ndash; The Design Philosophy of 42 Tools, from BashTool to AgentTool&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>