<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Tool-Orchestration on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/tool-orchestration/</link><description>Recent content in Tool-Orchestration on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 06 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/tool-orchestration/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request</title><link>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-1/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-1/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;This is the first post in a series that systematically dissects Claude Code&amp;rsquo;s source structure across 27 sessions. In this post, we trace the &lt;strong&gt;complete call stack across 11 TypeScript files&lt;/strong&gt; that a &amp;ldquo;hello&amp;rdquo; typed into the terminal traverses before a response appears on screen.&lt;/p&gt;
&lt;h2 id="analysis-target-11-core-files"&gt;Analysis Target: 11 Core Files
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Lines&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;entrypoints/cli.tsx&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;302&lt;/td&gt;
 &lt;td&gt;CLI bootstrap, argument parsing, mode routing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;main.tsx&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;4,683&lt;/td&gt;
 &lt;td&gt;Main REPL component, Commander setup&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;commands.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;754&lt;/td&gt;
 &lt;td&gt;Command registry&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;context.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;189&lt;/td&gt;
 &lt;td&gt;System prompt assembly, CLAUDE.md injection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;QueryEngine.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,295&lt;/td&gt;
 &lt;td&gt;Session management, SDK interface&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,729&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Core turn loop&lt;/strong&gt; — API + tool execution&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/api/client.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;389&lt;/td&gt;
 &lt;td&gt;HTTP client, 4-provider routing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;8&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/api/claude.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;3,419&lt;/td&gt;
 &lt;td&gt;Messages API wrapper, SSE streaming, retries&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;9&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/tools/toolOrchestration.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;188&lt;/td&gt;
 &lt;td&gt;Concurrency partitioning&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/tools/StreamingToolExecutor.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;530&lt;/td&gt;
 &lt;td&gt;Tool execution during streaming&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;services/tools/toolExecution.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,745&lt;/td&gt;
 &lt;td&gt;Tool dispatch, permission checks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We trace a total of &lt;strong&gt;15,223 lines&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="1-entry-and-bootstrap-clitsx---maintsx"&gt;1. Entry and Bootstrap: cli.tsx -&amp;gt; main.tsx
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;cli.tsx&lt;/code&gt; is only 302 lines, yet it contains a surprising number of &lt;strong&gt;fast-path&lt;/strong&gt; branches:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:37 --version -&amp;gt; immediate output, 0 imports
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:53 --dump-system -&amp;gt; minimal imports
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:100 --daemon-worker -&amp;gt; worker-only path
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:112 remote-control -&amp;gt; bridge mode
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:185 ps/logs/attach -&amp;gt; background sessions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cli.tsx:293 default path -&amp;gt; dynamic import of main.tsx
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Design intent&lt;/strong&gt;: Avoid loading &lt;code&gt;main.tsx&lt;/code&gt;&amp;rsquo;s 4,683 lines just for &lt;code&gt;--version&lt;/code&gt;. This optimization directly impacts the perceived responsiveness of the CLI tool.&lt;/p&gt;
&lt;p&gt;The default path dynamically imports &lt;code&gt;main.tsx&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// cli.tsx:293-297
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;main&lt;/span&gt;: &lt;span class="kt"&gt;cliMain&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="kr"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;../main.js&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cliMain&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The reason &lt;code&gt;main.tsx&lt;/code&gt; is 4,683 lines is that it includes all of the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Side-effect imports&lt;/strong&gt; (lines 1-209): &lt;code&gt;profileCheckpoint&lt;/code&gt;, &lt;code&gt;startMdmRawRead&lt;/code&gt;, &lt;code&gt;startKeychainPrefetch&lt;/code&gt; — parallel subprocesses launched at module evaluation time to hide the ~65ms macOS keychain read&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Commander setup&lt;/strong&gt; (line 585+): CLI argument parsing, 10+ mode-specific branches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;React/Ink REPL rendering&lt;/strong&gt;: Terminal UI mount&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Headless path&lt;/strong&gt; (&lt;code&gt;-p&lt;/code&gt;/&lt;code&gt;--print&lt;/code&gt;): Uses &lt;code&gt;QueryEngine&lt;/code&gt; directly without UI&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="2-prompt-assembly-contexttss-dual-memoize"&gt;2. Prompt Assembly: context.ts&amp;rsquo;s dual-memoize
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;context.ts&lt;/code&gt; is a small file at 189 lines, but it handles all dynamic parts of the system prompt. Two memoized functions are at its core:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;getSystemContext()&lt;/code&gt;&lt;/strong&gt; (context.ts:116): Collects git state (branch, status, recent commits)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;getUserContext()&lt;/code&gt;&lt;/strong&gt; (context.ts:155): Discovers and parses CLAUDE.md files&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why the separation?&lt;/strong&gt; It&amp;rsquo;s directly tied to the Anthropic Messages API&amp;rsquo;s prompt caching strategy. Since the cache lifetimes of the system prompt and user context differ, &lt;code&gt;cache_control&lt;/code&gt; must be applied differently to each. Wrapping them in &lt;code&gt;memoize&lt;/code&gt; ensures each is computed only once per session.&lt;/p&gt;
&lt;p&gt;The call to &lt;code&gt;setCachedClaudeMdContent()&lt;/code&gt; at context.ts:170-176 is &lt;strong&gt;a mechanism to break circular dependencies&lt;/strong&gt; — yoloClassifier needs CLAUDE.md content, but a direct import would create a permissions -&amp;gt; yoloClassifier -&amp;gt; claudemd -&amp;gt; permissions cycle.&lt;/p&gt;
&lt;h2 id="3-asyncgenerator-chain-the-architectural-spine"&gt;3. AsyncGenerator Chain: The Architectural Spine
&lt;/h2&gt;&lt;p&gt;Claude Code&amp;rsquo;s entire data flow is built on an &lt;code&gt;AsyncGenerator&lt;/code&gt; chain:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;QueryEngine.submitMessage()* -&amp;gt; query()* -&amp;gt; queryLoop()* -&amp;gt; queryModelWithStreaming()*
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Every core function is an &lt;code&gt;async function*&lt;/code&gt;. This isn&amp;rsquo;t just an implementation choice — it&amp;rsquo;s an &lt;strong&gt;architectural decision&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Backpressure&lt;/strong&gt;: When the consumer is slow, the producer waits&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cancellation&lt;/strong&gt;: Combined with AbortController for immediate cancellation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Composition&lt;/strong&gt;: &lt;code&gt;yield*&lt;/code&gt; naturally chains generators together&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;State management&lt;/strong&gt;: Local variables within loops naturally maintain state across turns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking at the signature of &lt;code&gt;QueryEngine.submitMessage()&lt;/code&gt; (QueryEngine.ts:209):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;async&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;submitMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;: &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;ContentBlockParam&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;uuid?&lt;/span&gt;: &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;isMeta?&lt;/span&gt;: &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;SDKMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;void&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In SDK mode, each message is &lt;strong&gt;streamed via yield&lt;/strong&gt;, and Node.js backpressure is naturally implemented.&lt;/p&gt;
&lt;h2 id="4-the-core-turn-loop-querytss-whiletrue"&gt;4. The Core Turn Loop: query.ts&amp;rsquo;s while(true)
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;queryLoop()&lt;/code&gt; in &lt;code&gt;query.ts&lt;/code&gt; (1,729 lines) is the actual API + tool loop:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-typescript" data-lang="typescript"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// query.ts:307
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 1. Call queryModelWithStreaming() -&amp;gt; SSE stream
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 2. Yield streaming events
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 3. Detect tool calls -&amp;gt; runTools()/StreamingToolExecutor
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 4. Append tool results to messages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// 5. stop_reason == &amp;#34;end_turn&amp;#34; -&amp;gt; break
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;// stop_reason == &amp;#34;tool_use&amp;#34; -&amp;gt; continue
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;State&lt;/code&gt; type (query.ts:204) is important. It manages loop state as an explicit record with fields like &lt;code&gt;messages&lt;/code&gt;, &lt;code&gt;toolUseContext&lt;/code&gt;, &lt;code&gt;autoCompactTracking&lt;/code&gt;, and &lt;code&gt;maxOutputTokensRecoveryCount&lt;/code&gt;, updating everything at once at continue sites.&lt;/p&gt;
&lt;h2 id="5-api-communication-4-providers-and-caching"&gt;5. API Communication: 4 Providers and Caching
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;getAnthropicClient()&lt;/code&gt; at &lt;code&gt;client.ts:88&lt;/code&gt; supports 4 providers:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Provider&lt;/th&gt;
 &lt;th&gt;SDK&lt;/th&gt;
 &lt;th&gt;Reason for Dynamic Import&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Anthropic Direct&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Anthropic&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Default, loaded immediately&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AWS Bedrock&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;AnthropicBedrock&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;AWS SDK is several MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Azure Foundry&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;AnthropicFoundry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Azure Identity is several MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GCP Vertex&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;AnthropicVertex&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Google Auth is several MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The core function chain in &lt;code&gt;claude.ts&lt;/code&gt; (3,419 lines):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;queryModelWithStreaming() (claude.ts:752)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; queryModel()
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; withRetry()
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; anthropic.beta.messages.stream() (SDK call)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The caching strategy is determined by &lt;code&gt;getCacheControl()&lt;/code&gt; (claude.ts:358), which decides the 1-hour TTL based on user type, feature flags, and query source.&lt;/p&gt;
&lt;h2 id="6-tool-orchestration-3-tier-concurrency"&gt;6. Tool Orchestration: 3-Tier Concurrency
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 TC["Tool call array&amp;lt;br/&amp;gt;[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()&amp;lt;br/&amp;gt;toolOrchestration.ts:91"]
 B1["Batch 1&amp;lt;br/&amp;gt;ReadFile + ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 B2["Batch 2&amp;lt;br/&amp;gt;Bash&amp;lt;br/&amp;gt;isConcurrencySafe=false"]
 B3["Batch 3&amp;lt;br/&amp;gt;ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 PAR["Promise.all()&amp;lt;br/&amp;gt;max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --&gt; P
 P --&gt; B1
 P --&gt; B2
 P --&gt; B3
 B1 --&gt; PAR
 B2 --&gt; SEQ
 B3 --&gt; PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9&lt;/pre&gt;&lt;p&gt;&lt;code&gt;StreamingToolExecutor&lt;/code&gt; (530 lines) extends this batch partitioning into a &lt;strong&gt;streaming context&lt;/strong&gt;. When it detects tool calls while the API response is still streaming, it immediately starts execution:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;addTool()&lt;/code&gt; (StreamingToolExecutor.ts:76) — Add to queue&lt;/li&gt;
&lt;li&gt;&lt;code&gt;processQueue()&lt;/code&gt; (StreamingToolExecutor.ts:140) — Check concurrency, then execute immediately&lt;/li&gt;
&lt;li&gt;&lt;code&gt;getRemainingResults()&lt;/code&gt; (StreamingToolExecutor.ts:453) — Wait for all tools to complete&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Error propagation rules&lt;/strong&gt;: Only Bash errors cancel sibling tools (&lt;code&gt;siblingAbortController&lt;/code&gt;). Read/WebFetch errors don&amp;rsquo;t affect other tools. This reflects the implicit dependencies between Bash commands (if mkdir fails, subsequent commands are pointless).&lt;/p&gt;
&lt;h2 id="full-data-flow"&gt;Full Data Flow
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;sequenceDiagram
 participant User as User
 participant CLI as cli.tsx
 participant Main as main.tsx
 participant QE as QueryEngine
 participant Query as query.ts
 participant Claude as claude.ts
 participant API as Anthropic API
 participant Tools as toolOrchestration
 participant Exec as toolExecution

 User-&gt;&gt;CLI: Types "hello"
 CLI-&gt;&gt;Main: dynamic import
 Main-&gt;&gt;QE: new QueryEngine()
 QE-&gt;&gt;Query: query()
 Query-&gt;&gt;Claude: queryModelWithStreaming()
 Claude-&gt;&gt;API: anthropic.beta.messages.stream()
 API--&gt;&gt;Claude: SSE stream

 alt stop_reason == end_turn
 Claude--&gt;&gt;User: Output response
 else stop_reason == tool_use
 Claude--&gt;&gt;Query: tool_use blocks
 Query-&gt;&gt;Tools: partitionToolCalls()
 Tools-&gt;&gt;Exec: runToolUse()
 Exec-&gt;&gt;Exec: canUseTool() + tool.call()
 Exec--&gt;&gt;Query: Tool results
 Note over Query: Next iteration of while(true)
 end&lt;/pre&gt;&lt;h2 id="rust-gap-map-preview"&gt;Rust Gap Map Preview
&lt;/h2&gt;&lt;p&gt;Tracing the same request through the Rust port revealed &lt;strong&gt;31 gaps&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Priority&lt;/th&gt;
 &lt;th&gt;Gap Count&lt;/th&gt;
 &lt;th&gt;Key Examples&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;P0 (Critical)&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;Synchronous ApiClient, missing StreamingToolExecutor&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;P1 (High)&lt;/td&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;3-tier concurrency, prompt caching, Agent tool&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;P2 (Medium)&lt;/td&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;Multi-provider, effort control, sandbox&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td&gt;Auto-compaction, SSE parser, OAuth, config loading&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Implementation coverage: 36% (11/31)&lt;/strong&gt;. The next post dives deep into the conversation loop at the heart of these gaps.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AsyncGenerator is the architectural spine&lt;/strong&gt; — It&amp;rsquo;s not just an implementation technique but a design decision that simultaneously solves backpressure, cancellation, and composition. In Rust, the &lt;code&gt;Stream&lt;/code&gt; trait is the counterpart, but the ergonomics of &lt;code&gt;yield*&lt;/code&gt; composition differ significantly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;main.tsx at 4,683 lines is technical debt&lt;/strong&gt; — Commander setup, React components, and state management are all mixed in a single file. This is the result of organic growth and represents an opportunity for module decomposition.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tool concurrency is non-trivial&lt;/strong&gt; — The 3-tier model (read batches, sequential writes, Bash sibling cancellation) rather than &amp;ldquo;all parallel&amp;rdquo; or &amp;ldquo;all sequential&amp;rdquo; is a core design element of production agent harnesses.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Next post: &lt;a class="link" href="https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-2/" &gt;#2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>