<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Tool-Execution on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/tool-execution/</link><description>Recent content in Tool-Execution on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 06 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/tool-execution/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths</title><link>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-2/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-2/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;In the first post of this series, we traced the journey of a single &amp;ldquo;hello&amp;rdquo; through 11 files. This post fully dissects the heart of that journey: the &lt;code&gt;while(true)&lt;/code&gt; loop in &lt;code&gt;query.ts&lt;/code&gt;&amp;rsquo;s 1,729 lines. We analyze the resilient execution model created by 7 &lt;code&gt;continue&lt;/code&gt; paths, the 4-stage state machine of &lt;code&gt;StreamingToolExecutor&lt;/code&gt;, and the 3-tier concurrency model of &lt;code&gt;partitionToolCalls()&lt;/code&gt;, then compare how we reproduced these patterns in a Rust prototype.&lt;/p&gt;
&lt;h2 id="analysis-target-10-core-files"&gt;Analysis Target: 10 Core Files
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Lines&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/config.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;46&lt;/td&gt;
 &lt;td&gt;Immutable runtime gate snapshot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/deps.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;40&lt;/td&gt;
 &lt;td&gt;Testable I/O boundary (DI)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/tokenBudget.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;93&lt;/td&gt;
 &lt;td&gt;Token budget management, auto-continue/stop decisions&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query/stopHooks.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;473&lt;/td&gt;
 &lt;td&gt;Stop/TaskCompleted/TeammateIdle hooks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;query.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,729&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Core&lt;/strong&gt; &amp;ndash; while(true) turn loop&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;QueryEngine.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,295&lt;/td&gt;
 &lt;td&gt;Session wrapper, SDK interface&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;toolOrchestration.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;188&lt;/td&gt;
 &lt;td&gt;Tool partitioning + concurrency control&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;8&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;StreamingToolExecutor.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;530&lt;/td&gt;
 &lt;td&gt;SSE mid-stream tool pipelining&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;9&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;toolExecution.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1,745&lt;/td&gt;
 &lt;td&gt;Tool dispatch, permission checks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;toolHooks.ts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;650&lt;/td&gt;
 &lt;td&gt;Pre/PostToolUse hook pipeline&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We dissect a total of &lt;strong&gt;6,789 lines&lt;/strong&gt; of core orchestration code.&lt;/p&gt;
&lt;h2 id="1-queryloops-7-continue-paths"&gt;1. queryLoop()&amp;rsquo;s 7 Continue Paths
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;queryLoop()&lt;/code&gt; function in &lt;code&gt;query.ts&lt;/code&gt; (query.ts:241) is not a simple API call loop. It&amp;rsquo;s a &lt;strong&gt;resilient executor&lt;/strong&gt; with 7 distinct &lt;code&gt;continue&lt;/code&gt; reasons, each handling a unique failure scenario:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Reason&lt;/th&gt;
 &lt;th&gt;Line&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;collapse_drain_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1114&lt;/td&gt;
 &lt;td&gt;Retry after context collapse drain&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;reactive_compact_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1162&lt;/td&gt;
 &lt;td&gt;Retry after reactive compaction (413 recovery)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_escalate&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1219&lt;/td&gt;
 &lt;td&gt;Token escalation from 8k -&amp;gt; 64k&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_recovery&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1248&lt;/td&gt;
 &lt;td&gt;Inject &amp;ldquo;continue writing&amp;rdquo; nudge message&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;stop_hook_blocking&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1303&lt;/td&gt;
 &lt;td&gt;Stop hook returned a blocking error&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;token_budget_continuation&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1337&lt;/td&gt;
 &lt;td&gt;Continue due to remaining token budget&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;next_turn&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1725&lt;/td&gt;
 &lt;td&gt;Next turn after tool execution completes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;The State type is key&lt;/strong&gt; (query.ts:204-217). Loop state is managed as a record with 10 fields. Why a record instead of individual variables? There are 7 &lt;code&gt;continue&lt;/code&gt; sites, each updating via &lt;code&gt;state = { ... }&lt;/code&gt; all at once. Individually assigning 9 variables makes it easy to miss one. &lt;strong&gt;Record updates let the type system catch omissions.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="full-flow-of-a-single-loop-iteration"&gt;Full Flow of a Single Loop Iteration
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;1.&lt;/span&gt; &lt;span class="n"&gt;Preprocessing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;365&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;447&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;snip&lt;/span&gt; &lt;span class="n"&gt;compaction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;micro&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="n"&gt;collapse&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;2.&lt;/span&gt; &lt;span class="n"&gt;Auto&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compaction&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;454&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;543&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replace&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;3.&lt;/span&gt; &lt;span class="n"&gt;Blocking&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;628&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;648&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;immediate&lt;/span&gt; &lt;span class="n"&gt;termination&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="n"&gt;exceeded&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;4.&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="n"&gt;streaming&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;654&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;863&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;consume&lt;/span&gt; &lt;span class="n"&gt;SSE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="n"&gt;via&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;await&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;5.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt; &lt;span class="n"&gt;paths&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1062&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1357&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mi"&gt;413&lt;/span&gt; &lt;span class="n"&gt;recovery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_output&lt;/span&gt; &lt;span class="n"&gt;recovery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="mf"&gt;6.&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt; &lt;span class="n"&gt;continuation&lt;/span&gt; &lt;span class="n"&gt;paths&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1360&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1728&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;execute&lt;/span&gt; &lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;next_turn&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="2-streamingtoolexecutors-4-stage-state-machine"&gt;2. StreamingToolExecutor&amp;rsquo;s 4-Stage State Machine
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;StreamingToolExecutor.ts&lt;/code&gt; (530 lines) is the most sophisticated concurrency pattern in Claude Code. The core idea: &lt;strong&gt;start executing completed tool calls while the API response is still streaming&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When the model calls &lt;code&gt;[ReadFile(&amp;quot;a.ts&amp;quot;), ReadFile(&amp;quot;b.ts&amp;quot;), Bash(&amp;quot;make test&amp;quot;)]&lt;/code&gt; at once, without pipelining, execution only begins after all three tool blocks have arrived. With pipelining, file reading starts the instant the &lt;code&gt;ReadFile(&amp;quot;a.ts&amp;quot;)&lt;/code&gt; block completes.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;stateDiagram-v2
 [*] --&gt; queued: addTool()
 queued --&gt; executing: processQueue()&amp;lt;br/&amp;gt;canExecuteTool() == true
 queued --&gt; completed: Pre-canceled&amp;lt;br/&amp;gt;getAbortReason() != null

 executing --&gt; completed: Tool execution finished&amp;lt;br/&amp;gt;or sibling abort

 completed --&gt; yielded: getCompletedResults()&amp;lt;br/&amp;gt;yield in order

 yielded --&gt; [*]

 note right of queued
 processQueue() auto-triggers
 on addTool() and prior
 tool completion
 end note

 note right of completed
 On Bash error:
 siblingAbortController.abort()
 cancels sibling tools only
 end note&lt;/pre&gt;&lt;h3 id="concurrency-decision-logic-canexecutetool-line-129"&gt;Concurrency Decision Logic (canExecuteTool, line 129)
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Execution&lt;/span&gt; &lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="n"&gt;currently&lt;/span&gt; &lt;span class="n"&gt;executing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executingTools&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="k"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;is&lt;/span&gt; &lt;span class="n"&gt;concurrencySafe&lt;/span&gt; &lt;span class="n"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt; &lt;span class="n"&gt;executing&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;also&lt;/span&gt; &lt;span class="n"&gt;concurrencySafe&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Read-only tools can execute in parallel, but if even one write tool is present, the next tool waits until it finishes.&lt;/p&gt;
&lt;h3 id="siblingabortcontroller--hierarchical-cancellation"&gt;siblingAbortController &amp;ndash; Hierarchical Cancellation
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;siblingAbortController&lt;/code&gt; (line 46-61) is a child of &lt;code&gt;toolUseContext.abortController&lt;/code&gt;. When a Bash tool throws an error, it calls &lt;code&gt;siblingAbortController.abort('sibling_error')&lt;/code&gt; to &lt;strong&gt;cancel only sibling tools&lt;/strong&gt;. The parent controller is unaffected, so the overall query continues.&lt;/p&gt;
&lt;p&gt;Why do only Bash errors cancel siblings? In &lt;code&gt;mkdir -p dir &amp;amp;&amp;amp; cd dir &amp;amp;&amp;amp; make&lt;/code&gt;, if mkdir fails, subsequent commands are pointless. ReadFile or WebFetch failures are independent and shouldn&amp;rsquo;t affect other tools.&lt;/p&gt;
&lt;h2 id="3-partitiontoolcalls--3-tier-concurrency-model"&gt;3. partitionToolCalls &amp;ndash; 3-Tier Concurrency Model
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;toolOrchestration.ts&lt;/code&gt; (188 lines) defines the entire concurrency model for tool execution.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 TC["Tool call array&amp;lt;br/&amp;gt;[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()&amp;lt;br/&amp;gt;toolOrchestration.ts:91"]
 B1["Batch 1&amp;lt;br/&amp;gt;ReadFile + ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 B2["Batch 2&amp;lt;br/&amp;gt;Bash&amp;lt;br/&amp;gt;isConcurrencySafe=false"]
 B3["Batch 3&amp;lt;br/&amp;gt;ReadFile&amp;lt;br/&amp;gt;isConcurrencySafe=true"]
 PAR["Promise.all()&amp;lt;br/&amp;gt;max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --&gt; P
 P --&gt; B1
 P --&gt; B2
 P --&gt; B3
 B1 --&gt; PAR
 B2 --&gt; SEQ
 B3 --&gt; PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9&lt;/pre&gt;&lt;p&gt;The rule is simple: consecutive &lt;code&gt;isConcurrencySafe&lt;/code&gt; tools are grouped into a single batch, while non-safe tools each become independent batches. This decision comes &lt;strong&gt;from the tool definition itself&lt;/strong&gt; — determined by calling &lt;code&gt;tool.isConcurrencySafe(parsedInput)&lt;/code&gt;. The same tool may have different concurrency safety depending on its input.&lt;/p&gt;
&lt;h3 id="context-modifiers-and-race-conditions"&gt;Context Modifiers and Race Conditions
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Why apply them in order after the batch completes?&lt;/strong&gt; Applying context modifiers immediately during parallel execution creates race conditions. If A completes first and modifies the context, B (still executing) started with the pre-modification context but would see the post-modification state. Applying them in original tool order after batch completion guarantees deterministic results (toolOrchestration.ts:54-62).&lt;/p&gt;
&lt;h2 id="4-tool-execution-pipeline-and-hooks"&gt;4. Tool Execution Pipeline and Hooks
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;runToolUse()&lt;/code&gt; in &lt;code&gt;toolExecution.ts&lt;/code&gt; (1,745 lines, line 337) manages the complete lifecycle of each individual tool call:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;runToolUse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="n"&gt;point&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="mf"&gt;1.&lt;/span&gt; &lt;span class="n"&gt;findToolByName&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;deprecated&lt;/span&gt; &lt;span class="n"&gt;aliases&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;345&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;356&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="mf"&gt;2.&lt;/span&gt; &lt;span class="n"&gt;abort&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;already&lt;/span&gt; &lt;span class="n"&gt;canceled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;CANCEL_MESSAGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;415&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="mf"&gt;3.&lt;/span&gt; &lt;span class="n"&gt;streamedCheckPermissionsAndCallTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt; &lt;span class="n"&gt;permissions&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;execution&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;455&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;checkPermissionsAndCallTool&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Zod&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;615&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validateInput&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;custom&lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;683&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Speculative&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bash&lt;/span&gt; &lt;span class="n"&gt;only&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;740&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;runPreToolUseHooks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;resolveHookPermissionDecision&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;921&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="n"&gt;execution&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1207&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;runPostToolUseHooks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="n"&gt;transformation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="the-core-invariant-of-resolvehookpermissiondecision"&gt;The Core Invariant of resolveHookPermissionDecision
&lt;/h3&gt;&lt;p&gt;In &lt;code&gt;resolveHookPermissionDecision()&lt;/code&gt; (toolHooks.ts:332), &lt;strong&gt;a hook&amp;rsquo;s &lt;code&gt;allow&lt;/code&gt; does not bypass settings.json deny/ask rules&lt;/strong&gt; (toolHooks.ts:373). Even if a hook allows, it must still pass &lt;code&gt;checkRuleBasedPermissions()&lt;/code&gt;. This reflects the design principle that &amp;ldquo;hooks are automation helpers, not security bypasses.&amp;rdquo;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;When hook result is allow:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; Call checkRuleBasedPermissions()
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; null means pass (no rules)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; deny means rule overrides hook
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -&amp;gt; ask means user prompt required
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="5-rust-comparison--152-lines-vs-1729-lines"&gt;5. Rust Comparison &amp;ndash; 152 Lines vs 1,729 Lines
&lt;/h2&gt;&lt;p&gt;Rust&amp;rsquo;s &lt;code&gt;ConversationRuntime::run_turn()&lt;/code&gt; consists of &lt;strong&gt;152 lines in a single &lt;code&gt;loop {}&lt;/code&gt;&lt;/strong&gt; (conversation.rs:183-272). Of the 7 TS continue paths, only &lt;code&gt;next_turn&lt;/code&gt; (next turn after tool execution) exists in Rust.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;TS Continue Reason&lt;/th&gt;
 &lt;th&gt;Rust Status&lt;/th&gt;
 &lt;th&gt;Why&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;collapse_drain_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No context collapse&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;reactive_compact_retry&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No 413 recovery&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_escalate&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No 8k-&amp;gt;64k escalation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_output_tokens_recovery&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No multi-turn nudge&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;stop_hook_blocking&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No stop hooks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;token_budget_continuation&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;td&gt;No token budget system&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;next_turn&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Implemented&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Re-calls API after tool results&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="the-most-critical-gap-synchronous-api-consumption"&gt;The Most Critical Gap: Synchronous API Consumption
&lt;/h3&gt;&lt;p&gt;The Rust &lt;code&gt;ApiClient&lt;/code&gt; trait signature says it all:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;: &lt;span class="nc"&gt;ApiRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AssistantEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RuntimeError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The return type is &lt;code&gt;Vec&amp;lt;AssistantEvent&amp;gt;&lt;/code&gt;. &lt;strong&gt;It&amp;rsquo;s not streaming.&lt;/strong&gt; It collects all SSE events and returns them as a vector. This means when the model calls 5 ReadFiles, TS can finish executing the first ReadFile while still streaming, but Rust must wait for all 5 to finish streaming before starting sequential execution. &lt;strong&gt;The latency gap grows proportionally with the number of tools.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="6-rust-prototype--bridging-the-gap"&gt;6. Rust Prototype &amp;ndash; Bridging the Gap
&lt;/h2&gt;&lt;p&gt;In the S04 prototype, we implemented an orchestration layer that bridges 3 P0 gaps:&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 subgraph TS["TS Streaming Pipeline"]
 direction TB
 ts1["SSE event stream"]
 ts2["StreamingToolExecutor&amp;lt;br/&amp;gt;4-state machine"]
 ts3["getCompletedResults()&amp;lt;br/&amp;gt;guaranteed yield order"]
 ts1 --&gt; ts2 --&gt; ts3
 end

 subgraph Rust["Rust Prototype"]
 direction TB
 rs1["EventStream&amp;lt;br/&amp;gt;tokio async"]
 rs2["StreamingPipeline&amp;lt;br/&amp;gt;tokio::spawn + mpsc"]
 rs3["Post-MessageEnd&amp;lt;br/&amp;gt;channel collect + sort"]
 rs1 --&gt; rs2 --&gt; rs3
 end

 subgraph Bridge["Core Mappings"]
 direction TB
 b1["yield -&gt; tx.send()"]
 b2["yield* -&gt; channel forwarding"]
 b3["for await -&gt; while let recv()"]
 end

 TS ~~~ Bridge ~~~ Rust

 style TS fill:#e1f5fe
 style Rust fill:#fff3e0
 style Bridge fill:#f3e5f5&lt;/pre&gt;&lt;h3 id="3-key-implementations-in-the-prototype"&gt;3 Key Implementations in the Prototype
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;1. Async streaming&lt;/strong&gt;: Extended the &lt;code&gt;ApiClient&lt;/code&gt; trait to an async stream. Since &lt;code&gt;MessageStream::next_event()&lt;/code&gt; is already async, only the consumer side needed changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Tool pipelining&lt;/strong&gt;: On receiving a &lt;code&gt;ToolUseEnd&lt;/code&gt; event, assembles a &lt;code&gt;ToolCall&lt;/code&gt; from accumulated input and immediately starts background execution via &lt;code&gt;tokio::spawn&lt;/code&gt;. Collects results in completion order via &lt;code&gt;mpsc::unbounded_channel&lt;/code&gt;, then sorts back to original order.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. 3-tier concurrency&lt;/strong&gt;: Partitions by &lt;code&gt;ToolCategory&lt;/code&gt; enum (ReadOnly/Write/BashLike). ReadOnly batches use &lt;code&gt;Semaphore(10)&lt;/code&gt; + &lt;code&gt;tokio::spawn&lt;/code&gt; for up to 10 parallel tasks. BashLike runs sequentially with remaining tasks aborted on error.&lt;/p&gt;
&lt;h3 id="prototype-coverage"&gt;Prototype Coverage
&lt;/h3&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;TS Feature&lt;/th&gt;
 &lt;th&gt;Prototype&lt;/th&gt;
 &lt;th&gt;Status&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;partitionToolCalls()&lt;/code&gt; 3-tier&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;partition_into_runs()&lt;/code&gt; + &lt;code&gt;ToolCategory&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;runToolsConcurrently()&lt;/code&gt; max 10&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Semaphore(10)&lt;/code&gt; + &lt;code&gt;tokio::spawn&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;siblingAbortController&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;break&lt;/code&gt; on BashLike error&lt;/td&gt;
 &lt;td&gt;Simplified&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;StreamingToolExecutor.addTool()&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;tokio::spawn&lt;/code&gt; on &lt;code&gt;ToolUseEnd&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;PreToolUse hook deny/allow&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;HookDecision::Allow/Deny&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;PostToolUse output transform&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;HookResult::transformed_output&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4-state machine (queued-&amp;gt;yielded)&lt;/td&gt;
 &lt;td&gt;spawned/completed 2-state&lt;/td&gt;
 &lt;td&gt;Incomplete&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;413 recovery / max_output escalation&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;preventContinuation&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;&lt;/td&gt;
 &lt;td&gt;Not implemented&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="stop-condition-comparison"&gt;Stop Condition Comparison
&lt;/h2&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Condition&lt;/th&gt;
 &lt;th&gt;TS&lt;/th&gt;
 &lt;th&gt;Rust&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;No tools (end_turn)&lt;/td&gt;
 &lt;td&gt;Execute &lt;code&gt;handleStopHooks()&lt;/code&gt; then exit&lt;/td&gt;
 &lt;td&gt;Immediate &lt;code&gt;break&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Token budget exceeded&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;checkTokenBudget()&lt;/code&gt; with 3 decisions&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_output_tokens&lt;/td&gt;
 &lt;td&gt;Escalation + multi-turn recovery&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;413 prompt-too-long&lt;/td&gt;
 &lt;td&gt;Context collapse + reactive compaction&lt;/td&gt;
 &lt;td&gt;Error propagation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;maxTurns&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;maxTurns&lt;/code&gt; parameter (query.ts:1696)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;max_iterations&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Diminishing returns&lt;/td&gt;
 &lt;td&gt;3+ turns with &amp;lt;500 token increase&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;checkTokenBudget()&lt;/code&gt; in &lt;code&gt;tokenBudget.ts&lt;/code&gt; (93 lines) controls &lt;strong&gt;whether to continue responding, not prompt size&lt;/strong&gt;. &lt;code&gt;COMPLETION_THRESHOLD = 0.9&lt;/code&gt; (continue if below 90% of total budget), &lt;code&gt;DIMINISHING_THRESHOLD = 500&lt;/code&gt; (stop if 3+ consecutive turns each produce fewer than 500 tokens, indicating diminishing returns). The &lt;code&gt;nudgeMessage&lt;/code&gt; explicitly instructs &amp;ldquo;do not summarize.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-core-design-decision--why-asyncgenerator"&gt;The Core Design Decision &amp;ndash; Why AsyncGenerator
&lt;/h2&gt;&lt;p&gt;The entire pipeline is an &lt;code&gt;async function*&lt;/code&gt; chain:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;QueryEngine.submitMessage()* -&amp;gt; query()* -&amp;gt; queryLoop()* -&amp;gt; deps.callModel()*
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;runTools()* -&amp;gt; runToolUse()* -&amp;gt; handleStopHooks()* -&amp;gt; executeStopHooks()*
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key benefit of this choice: &lt;strong&gt;implementing complex state machines without inversion of control&lt;/strong&gt;. At each of the 7 &lt;code&gt;continue&lt;/code&gt; paths, you construct state explicitly with &lt;code&gt;state = { ... }&lt;/code&gt; and &lt;code&gt;continue&lt;/code&gt;. With a callback-based approach, state management would be scattered, making it difficult to guarantee consistency across 7 recovery paths.&lt;/p&gt;
&lt;p&gt;In Rust, since the &lt;code&gt;yield&lt;/code&gt; keyword isn&amp;rsquo;t stabilized, &lt;code&gt;tokio::sync::mpsc&lt;/code&gt; channels serve as the replacement. &lt;code&gt;yield&lt;/code&gt; -&amp;gt; &lt;code&gt;tx.send()&lt;/code&gt;, &lt;code&gt;yield*&lt;/code&gt; -&amp;gt; channel forwarding, &lt;code&gt;for await...of&lt;/code&gt; -&amp;gt; &lt;code&gt;while let Some(v) = rx.recv()&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;query.ts&amp;rsquo;s 7 continue paths are not &amp;ldquo;error handling&amp;rdquo; but a &amp;ldquo;resilience engine&amp;rdquo;&lt;/strong&gt; &amp;ndash; It collapses context on 413 errors, escalates tokens on max_output, and feeds back errors to the model on stop hook blocking. This recovery pipeline ensures stability during long-running autonomous tasks. Reproducing this in Rust requires state management beyond a simple &lt;code&gt;loop {}&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;StreamingToolExecutor is a UX decision, not a performance optimization&lt;/strong&gt; &amp;ndash; Executing 5 tools sequentially makes users wait for the sum of all execution times. Pipelining reduces not benchmark numbers but the perceived &amp;ldquo;waiting for a response&amp;rdquo; time. In the Rust prototype, we implemented this in under 20 lines using &lt;code&gt;tokio::spawn&lt;/code&gt; + &lt;code&gt;mpsc&lt;/code&gt; channels.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The dual structure of static partitioning + runtime concurrency balances safety and performance&lt;/strong&gt; &amp;ndash; &lt;code&gt;partitionToolCalls()&lt;/code&gt; divides batches at build time, while &lt;code&gt;canExecuteTool()&lt;/code&gt; judges executability at runtime. Thanks to this dual structure, the non-streaming path (&lt;code&gt;runTools&lt;/code&gt;) and the streaming path (&lt;code&gt;StreamingToolExecutor&lt;/code&gt;) share identical concurrency semantics.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;Next post: &lt;a class="link" href="https://ice-ice-bear.github.io/posts/2026-04-06-harness-anatomy-3/" &gt;#3 &amp;ndash; The Design Philosophy of 42 Tools, from BashTool to AgentTool&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>