Streaming on ICE-ICE-BEAR-BLOG

Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request

Mon, 06 Apr 2026 00:00:00 +0900

Overview

This is the first post in a series that systematically dissects Claude Code’s source structure across 27 sessions. In this post, we trace the complete call stack across 11 TypeScript files that a “hello” typed into the terminal traverses before a response appears on screen.

Analysis Target: 11 Core Files

#	Path	Lines	Role
1	`entrypoints/cli.tsx`	302	CLI bootstrap, argument parsing, mode routing
2	`main.tsx`	4,683	Main REPL component, Commander setup
3	`commands.ts`	754	Command registry
4	`context.ts`	189	System prompt assembly, CLAUDE.md injection
5	`QueryEngine.ts`	1,295	Session management, SDK interface
6	`query.ts`	1,729	Core turn loop — API + tool execution
7	`services/api/client.ts`	389	HTTP client, 4-provider routing
8	`services/api/claude.ts`	3,419	Messages API wrapper, SSE streaming, retries
9	`services/tools/toolOrchestration.ts`	188	Concurrency partitioning
10	`services/tools/StreamingToolExecutor.ts`	530	Tool execution during streaming
11	`services/tools/toolExecution.ts`	1,745	Tool dispatch, permission checks

We trace a total of 15,223 lines.

1. Entry and Bootstrap: cli.tsx -> main.tsx

cli.tsx is only 302 lines, yet it contains a surprising number of fast-path branches:

cli.tsx:37 --version -> immediate output, 0 imports
cli.tsx:53 --dump-system -> minimal imports
cli.tsx:100 --daemon-worker -> worker-only path
cli.tsx:112 remote-control -> bridge mode
cli.tsx:185 ps/logs/attach -> background sessions
cli.tsx:293 default path -> dynamic import of main.tsx

Design intent: Avoid loading main.tsx’s 4,683 lines just for --version. This optimization directly impacts the perceived responsiveness of the CLI tool.

The default path dynamically imports main.tsx:

// cli.tsx:293-297
const { main: cliMain } = await import('../main.js');
await cliMain();

The reason main.tsx is 4,683 lines is that it includes all of the following:

Side-effect imports (lines 1-209): profileCheckpoint, startMdmRawRead, startKeychainPrefetch — parallel subprocesses launched at module evaluation time to hide the ~65ms macOS keychain read
Commander setup (line 585+): CLI argument parsing, 10+ mode-specific branches
React/Ink REPL rendering: Terminal UI mount
Headless path (-p/--print): Uses QueryEngine directly without UI

2. Prompt Assembly: context.ts’s dual-memoize

context.ts is a small file at 189 lines, but it handles all dynamic parts of the system prompt. Two memoized functions are at its core:

getSystemContext() (context.ts:116): Collects git state (branch, status, recent commits)
getUserContext() (context.ts:155): Discovers and parses CLAUDE.md files

Why the separation? It’s directly tied to the Anthropic Messages API’s prompt caching strategy. Since the cache lifetimes of the system prompt and user context differ, cache_control must be applied differently to each. Wrapping them in memoize ensures each is computed only once per session.

The call to setCachedClaudeMdContent() at context.ts:170-176 is a mechanism to break circular dependencies — yoloClassifier needs CLAUDE.md content, but a direct import would create a permissions -> yoloClassifier -> claudemd -> permissions cycle.

3. AsyncGenerator Chain: The Architectural Spine

Claude Code’s entire data flow is built on an AsyncGenerator chain:

QueryEngine.submitMessage()* -> query()* -> queryLoop()* -> queryModelWithStreaming()*

Every core function is an async function*. This isn’t just an implementation choice — it’s an architectural decision:

Backpressure: When the consumer is slow, the producer waits
Cancellation: Combined with AbortController for immediate cancellation
Composition: yield* naturally chains generators together
State management: Local variables within loops naturally maintain state across turns

Looking at the signature of QueryEngine.submitMessage() (QueryEngine.ts:209):

async *submitMessage(
 prompt: string | ContentBlockParam[],
 options?: { uuid?: string; isMeta?: boolean },
): AsyncGenerator<SDKMessage, void, unknown>

In SDK mode, each message is streamed via yield, and Node.js backpressure is naturally implemented.

4. The Core Turn Loop: query.ts’s while(true)

queryLoop() in query.ts (1,729 lines) is the actual API + tool loop:

// query.ts:307
while (true) {
 // 1. Call queryModelWithStreaming() -> SSE stream
 // 2. Yield streaming events
 // 3. Detect tool calls -> runTools()/StreamingToolExecutor
 // 4. Append tool results to messages
 // 5. stop_reason == "end_turn" -> break
 // stop_reason == "tool_use" -> continue
}

The State type (query.ts:204) is important. It manages loop state as an explicit record with fields like messages, toolUseContext, autoCompactTracking, and maxOutputTokensRecoveryCount, updating everything at once at continue sites.

5. API Communication: 4 Providers and Caching

getAnthropicClient() at client.ts:88 supports 4 providers:

Provider	SDK	Reason for Dynamic Import
Anthropic Direct	`Anthropic`	Default, loaded immediately
AWS Bedrock	`AnthropicBedrock`	AWS SDK is several MB
Azure Foundry	`AnthropicFoundry`	Azure Identity is several MB
GCP Vertex	`AnthropicVertex`	Google Auth is several MB

The core function chain in claude.ts (3,419 lines):

queryModelWithStreaming() (claude.ts:752)
 -> queryModel()
 -> withRetry()
 -> anthropic.beta.messages.stream() (SDK call)

The caching strategy is determined by getCacheControl() (claude.ts:358), which decides the 1-hour TTL based on user type, feature flags, and query source.

6. Tool Orchestration: 3-Tier Concurrency

flowchart TD
 TC["Tool call array<br/>[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()<br/>toolOrchestration.ts:91"]
 B1["Batch 1<br/>ReadFile + ReadFile<br/>isConcurrencySafe=true"]
 B2["Batch 2<br/>Bash<br/>isConcurrencySafe=false"]
 B3["Batch 3<br/>ReadFile<br/>isConcurrencySafe=true"]
 PAR["Promise.all()<br/>max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --> P
 P --> B1
 P --> B2
 P --> B3
 B1 --> PAR
 B2 --> SEQ
 B3 --> PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9

StreamingToolExecutor (530 lines) extends this batch partitioning into a streaming context. When it detects tool calls while the API response is still streaming, it immediately starts execution:

addTool() (StreamingToolExecutor.ts:76) — Add to queue
processQueue() (StreamingToolExecutor.ts:140) — Check concurrency, then execute immediately
getRemainingResults() (StreamingToolExecutor.ts:453) — Wait for all tools to complete

Error propagation rules: Only Bash errors cancel sibling tools (siblingAbortController). Read/WebFetch errors don’t affect other tools. This reflects the implicit dependencies between Bash commands (if mkdir fails, subsequent commands are pointless).

Full Data Flow

sequenceDiagram
 participant User as User
 participant CLI as cli.tsx
 participant Main as main.tsx
 participant QE as QueryEngine
 participant Query as query.ts
 participant Claude as claude.ts
 participant API as Anthropic API
 participant Tools as toolOrchestration
 participant Exec as toolExecution

 User->>CLI: Types "hello"
 CLI->>Main: dynamic import
 Main->>QE: new QueryEngine()
 QE->>Query: query()
 Query->>Claude: queryModelWithStreaming()
 Claude->>API: anthropic.beta.messages.stream()
 API-->>Claude: SSE stream

 alt stop_reason == end_turn
 Claude-->>User: Output response
 else stop_reason == tool_use
 Claude-->>Query: tool_use blocks
 Query->>Tools: partitionToolCalls()
 Tools->>Exec: runToolUse()
 Exec->>Exec: canUseTool() + tool.call()
 Exec-->>Query: Tool results
 Note over Query: Next iteration of while(true)
 end

Rust Gap Map Preview

Tracing the same request through the Rust port revealed 31 gaps:

Priority	Gap Count	Key Examples
P0 (Critical)	2	Synchronous ApiClient, missing StreamingToolExecutor
P1 (High)	6	3-tier concurrency, prompt caching, Agent tool
P2 (Medium)	7	Multi-provider, effort control, sandbox
Implemented	11	Auto-compaction, SSE parser, OAuth, config loading

Implementation coverage: 36% (11/31). The next post dives deep into the conversation loop at the heart of these gaps.

Insights

AsyncGenerator is the architectural spine — It’s not just an implementation technique but a design decision that simultaneously solves backpressure, cancellation, and composition. In Rust, the Stream trait is the counterpart, but the ergonomics of yield* composition differ significantly.
main.tsx at 4,683 lines is technical debt — Commander setup, React components, and state management are all mixed in a single file. This is the result of organic growth and represents an opportunity for module decomposition.
Tool concurrency is non-trivial — The 3-tier model (read batches, sequential writes, Bash sibling cancellation) rather than “all parallel” or “all sequential” is a core design element of production agent harnesses.

Next post: #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths

Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths

Mon, 06 Apr 2026 00:00:00 +0900

Overview

In the first post of this series, we traced the journey of a single “hello” through 11 files. This post fully dissects the heart of that journey: the while(true) loop in query.ts’s 1,729 lines. We analyze the resilient execution model created by 7 continue paths, the 4-stage state machine of StreamingToolExecutor, and the 3-tier concurrency model of partitionToolCalls(), then compare how we reproduced these patterns in a Rust prototype.

Analysis Target: 10 Core Files

#	Path	Lines	Role
1	`query/config.ts`	46	Immutable runtime gate snapshot
2	`query/deps.ts`	40	Testable I/O boundary (DI)
3	`query/tokenBudget.ts`	93	Token budget management, auto-continue/stop decisions
4	`query/stopHooks.ts`	473	Stop/TaskCompleted/TeammateIdle hooks
5	`query.ts`	1,729	Core – while(true) turn loop
6	`QueryEngine.ts`	1,295	Session wrapper, SDK interface
7	`toolOrchestration.ts`	188	Tool partitioning + concurrency control
8	`StreamingToolExecutor.ts`	530	SSE mid-stream tool pipelining
9	`toolExecution.ts`	1,745	Tool dispatch, permission checks
10	`toolHooks.ts`	650	Pre/PostToolUse hook pipeline

We dissect a total of 6,789 lines of core orchestration code.

1. queryLoop()’s 7 Continue Paths

The queryLoop() function in query.ts (query.ts:241) is not a simple API call loop. It’s a resilient executor with 7 distinct continue reasons, each handling a unique failure scenario:

Reason	Line	Description
`collapse_drain_retry`	1114	Retry after context collapse drain
`reactive_compact_retry`	1162	Retry after reactive compaction (413 recovery)
`max_output_tokens_escalate`	1219	Token escalation from 8k -> 64k
`max_output_tokens_recovery`	1248	Inject “continue writing” nudge message
`stop_hook_blocking`	1303	Stop hook returned a blocking error
`token_budget_continuation`	1337	Continue due to remaining token budget
`next_turn`	1725	Next turn after tool execution completes

The State type is key (query.ts:204-217). Loop state is managed as a record with 10 fields. Why a record instead of individual variables? There are 7 continue sites, each updating via state = { ... } all at once. Individually assigning 9 variables makes it easy to miss one. Record updates let the type system catch omissions.

Full Flow of a Single Loop Iteration

1. Preprocessing (365-447): snip compaction, micro-compact, context collapse
2. Auto-compaction (454-543): on success, replace messages and continue
3. Blocking limit check (628-648): immediate termination if token threshold exceeded
4. API streaming (654-863): consume SSE events via for-await
5. No-tool exit paths (1062-1357): 413 recovery, max_output recovery, stop hooks
6. Tool continuation paths (1360-1728): execute remaining tools -> next_turn

2. StreamingToolExecutor’s 4-Stage State Machine

StreamingToolExecutor.ts (530 lines) is the most sophisticated concurrency pattern in Claude Code. The core idea: start executing completed tool calls while the API response is still streaming.

When the model calls [ReadFile("a.ts"), ReadFile("b.ts"), Bash("make test")] at once, without pipelining, execution only begins after all three tool blocks have arrived. With pipelining, file reading starts the instant the ReadFile("a.ts") block completes.

stateDiagram-v2
 [*] --> queued: addTool()
 queued --> executing: processQueue()<br/>canExecuteTool() == true
 queued --> completed: Pre-canceled<br/>getAbortReason() != null

 executing --> completed: Tool execution finished<br/>or sibling abort

 completed --> yielded: getCompletedResults()<br/>yield in order

 yielded --> [*]

 note right of queued
 processQueue() auto-triggers
 on addTool() and prior
 tool completion
 end note

 note right of completed
 On Bash error:
 siblingAbortController.abort()
 cancels sibling tools only
 end note

Concurrency Decision Logic (canExecuteTool, line 129)

Execution conditions:
 - No tools currently executing (executingTools.length === 0)
 - Or: this tool is concurrencySafe AND all executing tools are also concurrencySafe

Read-only tools can execute in parallel, but if even one write tool is present, the next tool waits until it finishes.

siblingAbortController – Hierarchical Cancellation

siblingAbortController (line 46-61) is a child of toolUseContext.abortController. When a Bash tool throws an error, it calls siblingAbortController.abort('sibling_error') to cancel only sibling tools. The parent controller is unaffected, so the overall query continues.

Why do only Bash errors cancel siblings? In mkdir -p dir && cd dir && make, if mkdir fails, subsequent commands are pointless. ReadFile or WebFetch failures are independent and shouldn’t affect other tools.

3. partitionToolCalls – 3-Tier Concurrency Model

toolOrchestration.ts (188 lines) defines the entire concurrency model for tool execution.

flowchart TD
 TC["Tool call array<br/>[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()<br/>toolOrchestration.ts:91"]
 B1["Batch 1<br/>ReadFile + ReadFile<br/>isConcurrencySafe=true"]
 B2["Batch 2<br/>Bash<br/>isConcurrencySafe=false"]
 B3["Batch 3<br/>ReadFile<br/>isConcurrencySafe=true"]
 PAR["Promise.all()<br/>max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --> P
 P --> B1
 P --> B2
 P --> B3
 B1 --> PAR
 B2 --> SEQ
 B3 --> PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9

The rule is simple: consecutive isConcurrencySafe tools are grouped into a single batch, while non-safe tools each become independent batches. This decision comes from the tool definition itself — determined by calling tool.isConcurrencySafe(parsedInput). The same tool may have different concurrency safety depending on its input.

Context Modifiers and Race Conditions

Why apply them in order after the batch completes? Applying context modifiers immediately during parallel execution creates race conditions. If A completes first and modifies the context, B (still executing) started with the pre-modification context but would see the post-modification state. Applying them in original tool order after batch completion guarantees deterministic results (toolOrchestration.ts:54-62).

4. Tool Execution Pipeline and Hooks

runToolUse() in toolExecution.ts (1,745 lines, line 337) manages the complete lifecycle of each individual tool call:

runToolUse() entry point
 1. findToolByName() -- retry with deprecated aliases (345-356)
 2. abort check -- if already canceled, return CANCEL_MESSAGE (415)
 3. streamedCheckPermissionsAndCallTool() -- permissions + execution + hooks (455)
 -> checkPermissionsAndCallTool():
 a. Zod schema input validation (615)
 b. tool.validateInput() custom validation (683)
 c. Speculative classifier (Bash only, 740)
 d. runPreToolUseHooks() (800)
 e. resolveHookPermissionDecision() (921)
 f. tool.call() actual execution (1207)
 g. runPostToolUseHooks() result transformation

The Core Invariant of resolveHookPermissionDecision

In resolveHookPermissionDecision() (toolHooks.ts:332), a hook’s allow does not bypass settings.json deny/ask rules (toolHooks.ts:373). Even if a hook allows, it must still pass checkRuleBasedPermissions(). This reflects the design principle that “hooks are automation helpers, not security bypasses.”

When hook result is allow:
 -> Call checkRuleBasedPermissions()
 -> null means pass (no rules)
 -> deny means rule overrides hook
 -> ask means user prompt required

5. Rust Comparison – 152 Lines vs 1,729 Lines

Rust’s ConversationRuntime::run_turn() consists of 152 lines in a single loop {} (conversation.rs:183-272). Of the 7 TS continue paths, only next_turn (next turn after tool execution) exists in Rust.

TS Continue Reason	Rust Status	Why
`collapse_drain_retry`	Not implemented	No context collapse
`reactive_compact_retry`	Not implemented	No 413 recovery
`max_output_tokens_escalate`	Not implemented	No 8k->64k escalation
`max_output_tokens_recovery`	Not implemented	No multi-turn nudge
`stop_hook_blocking`	Not implemented	No stop hooks
`token_budget_continuation`	Not implemented	No token budget system
`next_turn`	Implemented	Re-calls API after tool results

The Most Critical Gap: Synchronous API Consumption

The Rust ApiClient trait signature says it all:

fn stream(&mut self, request: ApiRequest) -> Result<Vec<AssistantEvent>, RuntimeError>;

The return type is Vec<AssistantEvent>. It’s not streaming. It collects all SSE events and returns them as a vector. This means when the model calls 5 ReadFiles, TS can finish executing the first ReadFile while still streaming, but Rust must wait for all 5 to finish streaming before starting sequential execution. The latency gap grows proportionally with the number of tools.

6. Rust Prototype – Bridging the Gap

In the S04 prototype, we implemented an orchestration layer that bridges 3 P0 gaps:

flowchart LR
 subgraph TS["TS Streaming Pipeline"]
 direction TB
 ts1["SSE event stream"]
 ts2["StreamingToolExecutor<br/>4-state machine"]
 ts3["getCompletedResults()<br/>guaranteed yield order"]
 ts1 --> ts2 --> ts3
 end

 subgraph Rust["Rust Prototype"]
 direction TB
 rs1["EventStream<br/>tokio async"]
 rs2["StreamingPipeline<br/>tokio::spawn + mpsc"]
 rs3["Post-MessageEnd<br/>channel collect + sort"]
 rs1 --> rs2 --> rs3
 end

 subgraph Bridge["Core Mappings"]
 direction TB
 b1["yield -> tx.send()"]
 b2["yield* -> channel forwarding"]
 b3["for await -> while let recv()"]
 end

 TS ~~~ Bridge ~~~ Rust

 style TS fill:#e1f5fe
 style Rust fill:#fff3e0
 style Bridge fill:#f3e5f5

3 Key Implementations in the Prototype

1. Async streaming: Extended the ApiClient trait to an async stream. Since MessageStream::next_event() is already async, only the consumer side needed changes.

2. Tool pipelining: On receiving a ToolUseEnd event, assembles a ToolCall from accumulated input and immediately starts background execution via tokio::spawn. Collects results in completion order via mpsc::unbounded_channel, then sorts back to original order.

3. 3-tier concurrency: Partitions by ToolCategory enum (ReadOnly/Write/BashLike). ReadOnly batches use Semaphore(10) + tokio::spawn for up to 10 parallel tasks. BashLike runs sequentially with remaining tasks aborted on error.

Prototype Coverage

TS Feature	Prototype	Status
`partitionToolCalls()` 3-tier	`partition_into_runs()` + `ToolCategory`	Implemented
`runToolsConcurrently()` max 10	`Semaphore(10)` + `tokio::spawn`	Implemented
`siblingAbortController`	`break` on BashLike error	Simplified
`StreamingToolExecutor.addTool()`	`tokio::spawn` on `ToolUseEnd`	Implemented
PreToolUse hook deny/allow	`HookDecision::Allow/Deny`	Implemented
PostToolUse output transform	`HookResult::transformed_output`	Implemented
4-state machine (queued->yielded)	spawned/completed 2-state	Incomplete
413 recovery / max_output escalation	–	Not implemented
`preventContinuation`	–	Not implemented

Stop Condition Comparison

Condition	TS	Rust
No tools (end_turn)	Execute `handleStopHooks()` then exit	Immediate `break`
Token budget exceeded	`checkTokenBudget()` with 3 decisions	None
max_output_tokens	Escalation + multi-turn recovery	None
413 prompt-too-long	Context collapse + reactive compaction	Error propagation
maxTurns	`maxTurns` parameter (query.ts:1696)	`max_iterations`
Diminishing returns	3+ turns with <500 token increase	None

checkTokenBudget() in tokenBudget.ts (93 lines) controls whether to continue responding, not prompt size. COMPLETION_THRESHOLD = 0.9 (continue if below 90% of total budget), DIMINISHING_THRESHOLD = 500 (stop if 3+ consecutive turns each produce fewer than 500 tokens, indicating diminishing returns). The nudgeMessage explicitly instructs “do not summarize.”

The Core Design Decision – Why AsyncGenerator

The entire pipeline is an async function* chain:

QueryEngine.submitMessage()* -> query()* -> queryLoop()* -> deps.callModel()*
runTools()* -> runToolUse()* -> handleStopHooks()* -> executeStopHooks()*

The key benefit of this choice: implementing complex state machines without inversion of control. At each of the 7 continue paths, you construct state explicitly with state = { ... } and continue. With a callback-based approach, state management would be scattered, making it difficult to guarantee consistency across 7 recovery paths.

In Rust, since the yield keyword isn’t stabilized, tokio::sync::mpsc channels serve as the replacement. yield -> tx.send(), yield* -> channel forwarding, for await...of -> while let Some(v) = rx.recv().

Insights

query.ts’s 7 continue paths are not “error handling” but a “resilience engine” – It collapses context on 413 errors, escalates tokens on max_output, and feeds back errors to the model on stop hook blocking. This recovery pipeline ensures stability during long-running autonomous tasks. Reproducing this in Rust requires state management beyond a simple loop {}.
StreamingToolExecutor is a UX decision, not a performance optimization – Executing 5 tools sequentially makes users wait for the sum of all execution times. Pipelining reduces not benchmark numbers but the perceived “waiting for a response” time. In the Rust prototype, we implemented this in under 20 lines using tokio::spawn + mpsc channels.
The dual structure of static partitioning + runtime concurrency balances safety and performance – partitionToolCalls() divides batches at build time, while canExecuteTool() judges executability at runtime. Thanks to this dual structure, the non-streaming path (runTools) and the streaming path (StreamingToolExecutor) share identical concurrency semantics.

Next post: #3 – The Design Philosophy of 42 Tools, from BashTool to AgentTool