Harness-Engineering on ICE-ICE-BEAR-BLOG

Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request

Mon, 06 Apr 2026 00:00:00 +0900

Overview

This is the first post in a series that systematically dissects Claude Code’s source structure across 27 sessions. In this post, we trace the complete call stack across 11 TypeScript files that a “hello” typed into the terminal traverses before a response appears on screen.

Analysis Target: 11 Core Files

#	Path	Lines	Role
1	`entrypoints/cli.tsx`	302	CLI bootstrap, argument parsing, mode routing
2	`main.tsx`	4,683	Main REPL component, Commander setup
3	`commands.ts`	754	Command registry
4	`context.ts`	189	System prompt assembly, CLAUDE.md injection
5	`QueryEngine.ts`	1,295	Session management, SDK interface
6	`query.ts`	1,729	Core turn loop — API + tool execution
7	`services/api/client.ts`	389	HTTP client, 4-provider routing
8	`services/api/claude.ts`	3,419	Messages API wrapper, SSE streaming, retries
9	`services/tools/toolOrchestration.ts`	188	Concurrency partitioning
10	`services/tools/StreamingToolExecutor.ts`	530	Tool execution during streaming
11	`services/tools/toolExecution.ts`	1,745	Tool dispatch, permission checks

We trace a total of 15,223 lines.

1. Entry and Bootstrap: cli.tsx -> main.tsx

cli.tsx is only 302 lines, yet it contains a surprising number of fast-path branches:

cli.tsx:37 --version -> immediate output, 0 imports
cli.tsx:53 --dump-system -> minimal imports
cli.tsx:100 --daemon-worker -> worker-only path
cli.tsx:112 remote-control -> bridge mode
cli.tsx:185 ps/logs/attach -> background sessions
cli.tsx:293 default path -> dynamic import of main.tsx

Design intent: Avoid loading main.tsx’s 4,683 lines just for --version. This optimization directly impacts the perceived responsiveness of the CLI tool.

The default path dynamically imports main.tsx:

// cli.tsx:293-297
const { main: cliMain } = await import('../main.js');
await cliMain();

The reason main.tsx is 4,683 lines is that it includes all of the following:

Side-effect imports (lines 1-209): profileCheckpoint, startMdmRawRead, startKeychainPrefetch — parallel subprocesses launched at module evaluation time to hide the ~65ms macOS keychain read
Commander setup (line 585+): CLI argument parsing, 10+ mode-specific branches
React/Ink REPL rendering: Terminal UI mount
Headless path (-p/--print): Uses QueryEngine directly without UI

2. Prompt Assembly: context.ts’s dual-memoize

context.ts is a small file at 189 lines, but it handles all dynamic parts of the system prompt. Two memoized functions are at its core:

getSystemContext() (context.ts:116): Collects git state (branch, status, recent commits)
getUserContext() (context.ts:155): Discovers and parses CLAUDE.md files

Why the separation? It’s directly tied to the Anthropic Messages API’s prompt caching strategy. Since the cache lifetimes of the system prompt and user context differ, cache_control must be applied differently to each. Wrapping them in memoize ensures each is computed only once per session.

The call to setCachedClaudeMdContent() at context.ts:170-176 is a mechanism to break circular dependencies — yoloClassifier needs CLAUDE.md content, but a direct import would create a permissions -> yoloClassifier -> claudemd -> permissions cycle.

3. AsyncGenerator Chain: The Architectural Spine

Claude Code’s entire data flow is built on an AsyncGenerator chain:

QueryEngine.submitMessage()* -> query()* -> queryLoop()* -> queryModelWithStreaming()*

Every core function is an async function*. This isn’t just an implementation choice — it’s an architectural decision:

Backpressure: When the consumer is slow, the producer waits
Cancellation: Combined with AbortController for immediate cancellation
Composition: yield* naturally chains generators together
State management: Local variables within loops naturally maintain state across turns

Looking at the signature of QueryEngine.submitMessage() (QueryEngine.ts:209):

async *submitMessage(
 prompt: string | ContentBlockParam[],
 options?: { uuid?: string; isMeta?: boolean },
): AsyncGenerator<SDKMessage, void, unknown>

In SDK mode, each message is streamed via yield, and Node.js backpressure is naturally implemented.

4. The Core Turn Loop: query.ts’s while(true)

queryLoop() in query.ts (1,729 lines) is the actual API + tool loop:

// query.ts:307
while (true) {
 // 1. Call queryModelWithStreaming() -> SSE stream
 // 2. Yield streaming events
 // 3. Detect tool calls -> runTools()/StreamingToolExecutor
 // 4. Append tool results to messages
 // 5. stop_reason == "end_turn" -> break
 // stop_reason == "tool_use" -> continue
}

The State type (query.ts:204) is important. It manages loop state as an explicit record with fields like messages, toolUseContext, autoCompactTracking, and maxOutputTokensRecoveryCount, updating everything at once at continue sites.

5. API Communication: 4 Providers and Caching

getAnthropicClient() at client.ts:88 supports 4 providers:

Provider	SDK	Reason for Dynamic Import
Anthropic Direct	`Anthropic`	Default, loaded immediately
AWS Bedrock	`AnthropicBedrock`	AWS SDK is several MB
Azure Foundry	`AnthropicFoundry`	Azure Identity is several MB
GCP Vertex	`AnthropicVertex`	Google Auth is several MB

The core function chain in claude.ts (3,419 lines):

queryModelWithStreaming() (claude.ts:752)
 -> queryModel()
 -> withRetry()
 -> anthropic.beta.messages.stream() (SDK call)

The caching strategy is determined by getCacheControl() (claude.ts:358), which decides the 1-hour TTL based on user type, feature flags, and query source.

6. Tool Orchestration: 3-Tier Concurrency

flowchart TD
 TC["Tool call array<br/>[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()<br/>toolOrchestration.ts:91"]
 B1["Batch 1<br/>ReadFile + ReadFile<br/>isConcurrencySafe=true"]
 B2["Batch 2<br/>Bash<br/>isConcurrencySafe=false"]
 B3["Batch 3<br/>ReadFile<br/>isConcurrencySafe=true"]
 PAR["Promise.all()<br/>max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --> P
 P --> B1
 P --> B2
 P --> B3
 B1 --> PAR
 B2 --> SEQ
 B3 --> PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9

StreamingToolExecutor (530 lines) extends this batch partitioning into a streaming context. When it detects tool calls while the API response is still streaming, it immediately starts execution:

addTool() (StreamingToolExecutor.ts:76) — Add to queue
processQueue() (StreamingToolExecutor.ts:140) — Check concurrency, then execute immediately
getRemainingResults() (StreamingToolExecutor.ts:453) — Wait for all tools to complete

Error propagation rules: Only Bash errors cancel sibling tools (siblingAbortController). Read/WebFetch errors don’t affect other tools. This reflects the implicit dependencies between Bash commands (if mkdir fails, subsequent commands are pointless).

Full Data Flow

sequenceDiagram
 participant User as User
 participant CLI as cli.tsx
 participant Main as main.tsx
 participant QE as QueryEngine
 participant Query as query.ts
 participant Claude as claude.ts
 participant API as Anthropic API
 participant Tools as toolOrchestration
 participant Exec as toolExecution

 User->>CLI: Types "hello"
 CLI->>Main: dynamic import
 Main->>QE: new QueryEngine()
 QE->>Query: query()
 Query->>Claude: queryModelWithStreaming()
 Claude->>API: anthropic.beta.messages.stream()
 API-->>Claude: SSE stream

 alt stop_reason == end_turn
 Claude-->>User: Output response
 else stop_reason == tool_use
 Claude-->>Query: tool_use blocks
 Query->>Tools: partitionToolCalls()
 Tools->>Exec: runToolUse()
 Exec->>Exec: canUseTool() + tool.call()
 Exec-->>Query: Tool results
 Note over Query: Next iteration of while(true)
 end

Rust Gap Map Preview

Tracing the same request through the Rust port revealed 31 gaps:

Priority	Gap Count	Key Examples
P0 (Critical)	2	Synchronous ApiClient, missing StreamingToolExecutor
P1 (High)	6	3-tier concurrency, prompt caching, Agent tool
P2 (Medium)	7	Multi-provider, effort control, sandbox
Implemented	11	Auto-compaction, SSE parser, OAuth, config loading

Implementation coverage: 36% (11/31). The next post dives deep into the conversation loop at the heart of these gaps.

Insights

AsyncGenerator is the architectural spine — It’s not just an implementation technique but a design decision that simultaneously solves backpressure, cancellation, and composition. In Rust, the Stream trait is the counterpart, but the ergonomics of yield* composition differ significantly.
main.tsx at 4,683 lines is technical debt — Commander setup, React components, and state management are all mixed in a single file. This is the result of organic growth and represents an opportunity for module decomposition.
Tool concurrency is non-trivial — The 3-tier model (read batches, sequential writes, Bash sibling cancellation) rather than “all parallel” or “all sequential” is a core design element of production agent harnesses.

Next post: #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths

Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths

Mon, 06 Apr 2026 00:00:00 +0900

Overview

In the first post of this series, we traced the journey of a single “hello” through 11 files. This post fully dissects the heart of that journey: the while(true) loop in query.ts’s 1,729 lines. We analyze the resilient execution model created by 7 continue paths, the 4-stage state machine of StreamingToolExecutor, and the 3-tier concurrency model of partitionToolCalls(), then compare how we reproduced these patterns in a Rust prototype.

Analysis Target: 10 Core Files

#	Path	Lines	Role
1	`query/config.ts`	46	Immutable runtime gate snapshot
2	`query/deps.ts`	40	Testable I/O boundary (DI)
3	`query/tokenBudget.ts`	93	Token budget management, auto-continue/stop decisions
4	`query/stopHooks.ts`	473	Stop/TaskCompleted/TeammateIdle hooks
5	`query.ts`	1,729	Core – while(true) turn loop
6	`QueryEngine.ts`	1,295	Session wrapper, SDK interface
7	`toolOrchestration.ts`	188	Tool partitioning + concurrency control
8	`StreamingToolExecutor.ts`	530	SSE mid-stream tool pipelining
9	`toolExecution.ts`	1,745	Tool dispatch, permission checks
10	`toolHooks.ts`	650	Pre/PostToolUse hook pipeline

We dissect a total of 6,789 lines of core orchestration code.

1. queryLoop()’s 7 Continue Paths

The queryLoop() function in query.ts (query.ts:241) is not a simple API call loop. It’s a resilient executor with 7 distinct continue reasons, each handling a unique failure scenario:

Reason	Line	Description
`collapse_drain_retry`	1114	Retry after context collapse drain
`reactive_compact_retry`	1162	Retry after reactive compaction (413 recovery)
`max_output_tokens_escalate`	1219	Token escalation from 8k -> 64k
`max_output_tokens_recovery`	1248	Inject “continue writing” nudge message
`stop_hook_blocking`	1303	Stop hook returned a blocking error
`token_budget_continuation`	1337	Continue due to remaining token budget
`next_turn`	1725	Next turn after tool execution completes

The State type is key (query.ts:204-217). Loop state is managed as a record with 10 fields. Why a record instead of individual variables? There are 7 continue sites, each updating via state = { ... } all at once. Individually assigning 9 variables makes it easy to miss one. Record updates let the type system catch omissions.

Full Flow of a Single Loop Iteration

1. Preprocessing (365-447): snip compaction, micro-compact, context collapse
2. Auto-compaction (454-543): on success, replace messages and continue
3. Blocking limit check (628-648): immediate termination if token threshold exceeded
4. API streaming (654-863): consume SSE events via for-await
5. No-tool exit paths (1062-1357): 413 recovery, max_output recovery, stop hooks
6. Tool continuation paths (1360-1728): execute remaining tools -> next_turn

2. StreamingToolExecutor’s 4-Stage State Machine

StreamingToolExecutor.ts (530 lines) is the most sophisticated concurrency pattern in Claude Code. The core idea: start executing completed tool calls while the API response is still streaming.

When the model calls [ReadFile("a.ts"), ReadFile("b.ts"), Bash("make test")] at once, without pipelining, execution only begins after all three tool blocks have arrived. With pipelining, file reading starts the instant the ReadFile("a.ts") block completes.

stateDiagram-v2
 [*] --> queued: addTool()
 queued --> executing: processQueue()<br/>canExecuteTool() == true
 queued --> completed: Pre-canceled<br/>getAbortReason() != null

 executing --> completed: Tool execution finished<br/>or sibling abort

 completed --> yielded: getCompletedResults()<br/>yield in order

 yielded --> [*]

 note right of queued
 processQueue() auto-triggers
 on addTool() and prior
 tool completion
 end note

 note right of completed
 On Bash error:
 siblingAbortController.abort()
 cancels sibling tools only
 end note

Concurrency Decision Logic (canExecuteTool, line 129)

Execution conditions:
 - No tools currently executing (executingTools.length === 0)
 - Or: this tool is concurrencySafe AND all executing tools are also concurrencySafe

Read-only tools can execute in parallel, but if even one write tool is present, the next tool waits until it finishes.

siblingAbortController – Hierarchical Cancellation

siblingAbortController (line 46-61) is a child of toolUseContext.abortController. When a Bash tool throws an error, it calls siblingAbortController.abort('sibling_error') to cancel only sibling tools. The parent controller is unaffected, so the overall query continues.

Why do only Bash errors cancel siblings? In mkdir -p dir && cd dir && make, if mkdir fails, subsequent commands are pointless. ReadFile or WebFetch failures are independent and shouldn’t affect other tools.

3. partitionToolCalls – 3-Tier Concurrency Model

toolOrchestration.ts (188 lines) defines the entire concurrency model for tool execution.

flowchart TD
 TC["Tool call array<br/>[ReadFile, ReadFile, Bash, ReadFile]"]
 P["partitionToolCalls()<br/>toolOrchestration.ts:91"]
 B1["Batch 1<br/>ReadFile + ReadFile<br/>isConcurrencySafe=true"]
 B2["Batch 2<br/>Bash<br/>isConcurrencySafe=false"]
 B3["Batch 3<br/>ReadFile<br/>isConcurrencySafe=true"]
 PAR["Promise.all()<br/>max 10 concurrent"]
 SEQ["Sequential execution"]
 PAR2["Promise.all()"]

 TC --> P
 P --> B1
 P --> B2
 P --> B3
 B1 --> PAR
 B2 --> SEQ
 B3 --> PAR2

 style B1 fill:#e8f5e9
 style B2 fill:#ffebee
 style B3 fill:#e8f5e9

The rule is simple: consecutive isConcurrencySafe tools are grouped into a single batch, while non-safe tools each become independent batches. This decision comes from the tool definition itself — determined by calling tool.isConcurrencySafe(parsedInput). The same tool may have different concurrency safety depending on its input.

Context Modifiers and Race Conditions

Why apply them in order after the batch completes? Applying context modifiers immediately during parallel execution creates race conditions. If A completes first and modifies the context, B (still executing) started with the pre-modification context but would see the post-modification state. Applying them in original tool order after batch completion guarantees deterministic results (toolOrchestration.ts:54-62).

4. Tool Execution Pipeline and Hooks

runToolUse() in toolExecution.ts (1,745 lines, line 337) manages the complete lifecycle of each individual tool call:

runToolUse() entry point
 1. findToolByName() -- retry with deprecated aliases (345-356)
 2. abort check -- if already canceled, return CANCEL_MESSAGE (415)
 3. streamedCheckPermissionsAndCallTool() -- permissions + execution + hooks (455)
 -> checkPermissionsAndCallTool():
 a. Zod schema input validation (615)
 b. tool.validateInput() custom validation (683)
 c. Speculative classifier (Bash only, 740)
 d. runPreToolUseHooks() (800)
 e. resolveHookPermissionDecision() (921)
 f. tool.call() actual execution (1207)
 g. runPostToolUseHooks() result transformation

The Core Invariant of resolveHookPermissionDecision

In resolveHookPermissionDecision() (toolHooks.ts:332), a hook’s allow does not bypass settings.json deny/ask rules (toolHooks.ts:373). Even if a hook allows, it must still pass checkRuleBasedPermissions(). This reflects the design principle that “hooks are automation helpers, not security bypasses.”

When hook result is allow:
 -> Call checkRuleBasedPermissions()
 -> null means pass (no rules)
 -> deny means rule overrides hook
 -> ask means user prompt required

5. Rust Comparison – 152 Lines vs 1,729 Lines

Rust’s ConversationRuntime::run_turn() consists of 152 lines in a single loop {} (conversation.rs:183-272). Of the 7 TS continue paths, only next_turn (next turn after tool execution) exists in Rust.

TS Continue Reason	Rust Status	Why
`collapse_drain_retry`	Not implemented	No context collapse
`reactive_compact_retry`	Not implemented	No 413 recovery
`max_output_tokens_escalate`	Not implemented	No 8k->64k escalation
`max_output_tokens_recovery`	Not implemented	No multi-turn nudge
`stop_hook_blocking`	Not implemented	No stop hooks
`token_budget_continuation`	Not implemented	No token budget system
`next_turn`	Implemented	Re-calls API after tool results

The Most Critical Gap: Synchronous API Consumption

The Rust ApiClient trait signature says it all:

fn stream(&mut self, request: ApiRequest) -> Result<Vec<AssistantEvent>, RuntimeError>;

The return type is Vec<AssistantEvent>. It’s not streaming. It collects all SSE events and returns them as a vector. This means when the model calls 5 ReadFiles, TS can finish executing the first ReadFile while still streaming, but Rust must wait for all 5 to finish streaming before starting sequential execution. The latency gap grows proportionally with the number of tools.

6. Rust Prototype – Bridging the Gap

In the S04 prototype, we implemented an orchestration layer that bridges 3 P0 gaps:

flowchart LR
 subgraph TS["TS Streaming Pipeline"]
 direction TB
 ts1["SSE event stream"]
 ts2["StreamingToolExecutor<br/>4-state machine"]
 ts3["getCompletedResults()<br/>guaranteed yield order"]
 ts1 --> ts2 --> ts3
 end

 subgraph Rust["Rust Prototype"]
 direction TB
 rs1["EventStream<br/>tokio async"]
 rs2["StreamingPipeline<br/>tokio::spawn + mpsc"]
 rs3["Post-MessageEnd<br/>channel collect + sort"]
 rs1 --> rs2 --> rs3
 end

 subgraph Bridge["Core Mappings"]
 direction TB
 b1["yield -> tx.send()"]
 b2["yield* -> channel forwarding"]
 b3["for await -> while let recv()"]
 end

 TS ~~~ Bridge ~~~ Rust

 style TS fill:#e1f5fe
 style Rust fill:#fff3e0
 style Bridge fill:#f3e5f5

3 Key Implementations in the Prototype

1. Async streaming: Extended the ApiClient trait to an async stream. Since MessageStream::next_event() is already async, only the consumer side needed changes.

2. Tool pipelining: On receiving a ToolUseEnd event, assembles a ToolCall from accumulated input and immediately starts background execution via tokio::spawn. Collects results in completion order via mpsc::unbounded_channel, then sorts back to original order.

3. 3-tier concurrency: Partitions by ToolCategory enum (ReadOnly/Write/BashLike). ReadOnly batches use Semaphore(10) + tokio::spawn for up to 10 parallel tasks. BashLike runs sequentially with remaining tasks aborted on error.

Prototype Coverage

TS Feature	Prototype	Status
`partitionToolCalls()` 3-tier	`partition_into_runs()` + `ToolCategory`	Implemented
`runToolsConcurrently()` max 10	`Semaphore(10)` + `tokio::spawn`	Implemented
`siblingAbortController`	`break` on BashLike error	Simplified
`StreamingToolExecutor.addTool()`	`tokio::spawn` on `ToolUseEnd`	Implemented
PreToolUse hook deny/allow	`HookDecision::Allow/Deny`	Implemented
PostToolUse output transform	`HookResult::transformed_output`	Implemented
4-state machine (queued->yielded)	spawned/completed 2-state	Incomplete
413 recovery / max_output escalation	–	Not implemented
`preventContinuation`	–	Not implemented

Stop Condition Comparison

Condition	TS	Rust
No tools (end_turn)	Execute `handleStopHooks()` then exit	Immediate `break`
Token budget exceeded	`checkTokenBudget()` with 3 decisions	None
max_output_tokens	Escalation + multi-turn recovery	None
413 prompt-too-long	Context collapse + reactive compaction	Error propagation
maxTurns	`maxTurns` parameter (query.ts:1696)	`max_iterations`
Diminishing returns	3+ turns with <500 token increase	None

checkTokenBudget() in tokenBudget.ts (93 lines) controls whether to continue responding, not prompt size. COMPLETION_THRESHOLD = 0.9 (continue if below 90% of total budget), DIMINISHING_THRESHOLD = 500 (stop if 3+ consecutive turns each produce fewer than 500 tokens, indicating diminishing returns). The nudgeMessage explicitly instructs “do not summarize.”

The Core Design Decision – Why AsyncGenerator

The entire pipeline is an async function* chain:

QueryEngine.submitMessage()* -> query()* -> queryLoop()* -> deps.callModel()*
runTools()* -> runToolUse()* -> handleStopHooks()* -> executeStopHooks()*

The key benefit of this choice: implementing complex state machines without inversion of control. At each of the 7 continue paths, you construct state explicitly with state = { ... } and continue. With a callback-based approach, state management would be scattered, making it difficult to guarantee consistency across 7 recovery paths.

In Rust, since the yield keyword isn’t stabilized, tokio::sync::mpsc channels serve as the replacement. yield -> tx.send(), yield* -> channel forwarding, for await...of -> while let Some(v) = rx.recv().

Insights

query.ts’s 7 continue paths are not “error handling” but a “resilience engine” – It collapses context on 413 errors, escalates tokens on max_output, and feeds back errors to the model on stop hook blocking. This recovery pipeline ensures stability during long-running autonomous tasks. Reproducing this in Rust requires state management beyond a simple loop {}.
StreamingToolExecutor is a UX decision, not a performance optimization – Executing 5 tools sequentially makes users wait for the sum of all execution times. Pipelining reduces not benchmark numbers but the perceived “waiting for a response” time. In the Rust prototype, we implemented this in under 20 lines using tokio::spawn + mpsc channels.
The dual structure of static partitioning + runtime concurrency balances safety and performance – partitionToolCalls() divides batches at build time, while canExecuteTool() judges executability at runtime. Thanks to this dual structure, the non-streaming path (runTools) and the streaming path (StreamingToolExecutor) share identical concurrency semantics.

Next post: #3 – The Design Philosophy of 42 Tools, from BashTool to AgentTool

Claude Code Harness Anatomy #3 — The Design Philosophy of 42 Tools, from BashTool to AgentTool

Mon, 06 Apr 2026 00:00:00 +0900

Overview

Claude Code has 42 tools. This post dissects the “tools know themselves” pattern implemented by the 30+ member Tool.ts interface, classifies all 42 tools into 8 families, and deep-dives into the most complex ones: BashTool’s 6-layer security chain (12,411 lines), AgentTool’s 4 spawn modes (6,782 lines), FileEditTool’s string matching strategy, MCPTool’s empty-shell proxy pattern, and the Task state machine.

1. Tool Interface – “Tools Know Themselves”

Tool.ts (792 lines) is the contract for the tool system. The Tool type (Tool.ts:362-695) that every tool implements consists of 30+ members across four domains:

Domain	Key Members	Role
Execution contract	`call()`, `inputSchema`, `validateInput()`, `checkPermissions()`	Core tool logic
Metadata	`name`, `aliases`, `searchHint`, `shouldDefer`, `maxResultSizeChars`	Search and display
Concurrency/Safety	`isConcurrencySafe()`, `isReadOnly()`, `isDestructive()`, `interruptBehavior()`	Orchestration decisions
UI rendering	`renderToolUseMessage()` + 10 more	Terminal display

Why so many members in one interface? When the orchestrator (toolExecution.ts) calls a tool, it can read all metadata directly from the tool object without any external mapping tables. This is the foundation of a plugin architecture where adding a new tool is self-contained within a single directory.

ToolUseContext – 42 Fields of Execution Environment

ToolUseContext (Tool.ts:158-300) is the environment context injected during tool execution. Spanning 142 lines, it defines 42 fields:

abortController: Cancellation propagation for the 3-tier concurrency model
getAppState()/setAppState(): Global state access (permissions, todos, teams)
readFileState: LRU cache-based change detection
contentReplacementState: Save large results to disk, return summaries only

Tools are not isolated functions — they need access to the harness’s entire state. FileReadTool uses the cache to detect changes, AgentTool registers sub-agent state, and BashTool can interrupt sibling processes.

buildTool()’s fail-closed Defaults

buildTool() (Tool.ts:783) takes a ToolDef and returns a complete Tool with defaults filled in. The defaults follow a fail-closed principle (Tool.ts:757-768):

isConcurrencySafe -> false (assume unsafe)
isReadOnly -> false (assume writes)

If a new tool doesn’t explicitly declare concurrency/read-only status, it takes the most conservative path (sequential execution, write permission required). This structurally prevents the bug of accidentally running an unsafe tool in parallel.

2. 42 Tools in 8 Families

flowchart LR
 subgraph safe["isConcurrencySafe: true (10)"]
 direction TB
 R1["FileReadTool"]
 R2["GlobTool / GrepTool"]
 R3["WebFetchTool / WebSearchTool"]
 R4["ToolSearchTool / SleepTool"]
 R5["TaskGetTool / TaskListTool"]
 R6["LSPTool"]
 end

 subgraph unsafe["isConcurrencySafe: false (32)"]
 direction TB
 W1["BashTool 12,411 lines"]
 W2["FileEditTool / FileWriteTool"]
 W3["AgentTool 6,782 lines"]
 W4["MCPTool / SkillTool"]
 W5["Task 5 / Todo"]
 W6["Config / PlanMode / Worktree"]
 end

 subgraph orch["Orchestrator"]
 O["partitionToolCalls()<br/>toolOrchestration.ts"]
 end

 O -->|"Parallel batch"| safe
 O -->|"Sequential execution"| unsafe

 style safe fill:#e8f5e9
 style unsafe fill:#ffebee

Family	Count	Representative Tool	Key Characteristic
Filesystem	5	FileReadTool (1,602 lines)	PDF/image/notebook support, token limits
Execution	3	BashTool (12,411 lines)	6-layer security, command semantics
Agent/Team	4	AgentTool (6,782 lines)	4 spawn modes, recursive harness
Task management	7	TaskUpdateTool (484 lines)	State machine, verification nudge
MCP/LSP	5	MCPTool (1,086 lines)	Empty-shell proxy
Web/External	2	WebFetchTool (1,131 lines)	Parallel safe
State/Config	5	ConfigTool (809 lines)	Session state changes
Infra/Utility	11	SkillTool (1,477 lines)	Command-to-tool bridge

Only 10 of 42 (24%) are parallel-safe, but these 10 are the most frequently called tools (Read, Glob, Grep, Web), so perceived parallelism is higher than the ratio suggests.

3. BashTool – 6-Layer Security Chain

BashTool is not a simple shell executor. Because arbitrary code execution is an inherent risk, more than half of its 12,411 lines are security layers.

flowchart TB
 A["Model: Bash call"] --> B{"validateInput"}
 B -->|"sleep pattern blocked"| B1["Return error"]
 B -->|"Pass"| C{"6-layer security chain"}

 subgraph chain["Security chain"]
 C1["1. bashSecurity.ts<br/>2,592 lines -- command structure analysis"]
 C2["2. bashPermissions.ts<br/>2,621 lines -- rule matching"]
 C3["3. readOnlyValidation.ts<br/>1,990 lines -- read-only determination"]
 C4["4. pathValidation.ts<br/>1,303 lines -- path-based security"]
 C5["5. sedValidation.ts<br/>684 lines -- sed-specific security"]
 C6["6. shouldUseSandbox.ts<br/>153 lines -- sandbox decision"]
 C1 --> C2 --> C3 --> C4 --> C5 --> C6
 end

 C --> chain
 chain --> D{"allow / ask / deny"}
 D -->|"allow"| E["runShellCommand()"]
 D -->|"ask"| F["Request user approval"]
 D -->|"deny"| G["Denied"]
 E --> H["Result processing<br/>interpretCommandResult()<br/>trackGitOperations()"]

 style chain fill:#fff3e0

Each layer handles a different threat:

bashSecurity.ts (2,592 lines): Blocks command substitution ($(), `), Zsh module-based attacks. Key: only metacharacters in unquoted contexts are classified as dangerous
bashPermissions.ts (2,621 lines): Rule-based allow/deny/ask. stripAllLeadingEnvVars() + stripSafeWrappers() strip wrappers to extract the actual command
readOnlyValidation.ts (1,990 lines): If read-only, then isConcurrencySafe: true — parallel execution allowed
pathValidation.ts (1,303 lines): Per-command path extraction rules for path safety judgment
sedValidation.ts (684 lines): sed’s w and e flags can write files/execute arbitrary code — blocked separately
shouldUseSandbox.ts (153 lines): Final isolation decision

Command semantics (commandSemantics.ts): grep and diff return exit code 1 as a normal result, not an error. The COMMAND_SEMANTICS Map defines per-command interpretation rules.

Rust porting implications: Either reproduce all 6 layers wholesale, or simplify to sandbox-only. Skipping intermediate layers creates security holes.

4. AgentTool – 4 Spawn Modes

AgentTool is less of a “tool” and more of an agent orchestrator. The key: runAgent() recursively calls the harness’s query() loop. Child agents receive the same tools, API access, and security checks as the parent.

Mode	Trigger	Context Sharing	Background
Synchronous	Default	None (prompt only)	No
Async	`run_in_background: true`	None	Yes
Fork	`subagent_type` omitted	Full parent context	Yes
Remote	`isolation: "remote"`	None	Yes

Fork Sub-agents – Byte-Identical Prefix

Forks inherit the parent’s full conversation context. To share prompt cache, all fork children are designed to produce byte-identical API request prefixes:

Tool use results replaced with placeholders
FORK_BOILERPLATE_TAG prevents recursive forking
Model kept identical (model: 'inherit') — different models cause cache misses

Memory System (agentMemory.ts)

Per-agent persistent memory is managed across 3 scopes:

user: ~/.claude/agent-memory/<type>/ — user-global
project: .claude/agent-memory/<type>/ — project-shared (VCS)
local: .claude/agent-memory-local/<type>/ — local-only

5. FileEditTool – Partial Replacement Pattern

FileEditTool (1,812 lines) performs old_string -> new_string patches rather than full file writes. The model doesn’t need to output the entire file, saving tokens and enabling diff-based review.

Matching strategy:

Exact string matching: fileContent.includes(searchString)
Quote normalization: Convert curly quotes -> straight quotes and retry, with preserveQuoteStyle() preserving the original style
Uniqueness validation: Fails if old_string is not unique in the file (unless replace_all)

Concurrency protection: The readFileState Map stores per-file last-read timestamps. During editing, it compares against the on-disk modification time to detect external changes. This is why the “Read before Edit” rule is enforced in the prompt.

6. MCPTool – Empty-Shell Proxy

MCPTool (1,086 lines) is where a single tool definition represents hundreds of external tools. At build time it’s an empty shell; at runtime, mcpClient.ts clones and overrides it per server:

// MCPTool.ts:27-51 -- core methods have "Overridden in mcpClient.ts" comments
name: 'mcp', // replaced at runtime with 'mcp__serverName__toolName'
async call() { return { data: '' } }, // replaced at runtime with actual MCP call

The UI collapse classification (classifyForCollapse.ts, 604 lines) uses 139 SEARCH_TOOLS and 280+ READ_TOOLS names to determine whether an MCP tool is a read/search operation. Unknown tools are not collapsed (conservative approach).

7. Task State Machine – Agent IPC

TaskUpdateTool (406 lines) state flow: pending -> in_progress -> completed or deleted.

Key behaviors:

Auto-assign owner: Current agent name is automatically assigned on in_progress transition
Verification nudge: After 3+ tasks completed without a verification step, recommends spawning a verification agent
Message routing (SendMessageTool 917 lines): By name, * broadcast, uds:path Unix domain socket, bridge:session remote peer, agent ID resume

Task/SendMessage are not simple utilities but the inter-process communication (IPC) foundation of the multi-agent system.

TS vs Rust Comparison

Aspect	TS (42 tools)	Rust (10 tools)
Tool definition	`Tool` interface + `buildTool()`	`ToolSpec` struct + `mvp_tool_specs()`
Input schema	Zod v4 + `lazySchema()`	`serde_json::json!()` direct JSON Schema
Concurrency declaration	`isConcurrencySafe(parsedInput)`	None — sequential execution
Permission check	`checkPermissions()` -> `PermissionResult`	`PermissionMode` enum
UI rendering	10+ render methods (React/Ink)	None
MCP integration	MCPTool + `inputJSONSchema` dual path	None
Size comparison	~48,000 lines (tool code only)	~1,300 lines (single lib.rs)

Key gap: The Rust port only implements the execution contract (call equivalent); concurrency declarations, permission pipeline, UI rendering, and lazy-loading optimizations are all missing.

Insights

Security is a chain, not a single checkpoint – BashTool’s 6 layers each handle different threats. bashSecurity handles command structure, bashPermissions handles rule matching, pathValidation handles path safety. If any link in this chain is missing, an attack surface opens. Combined with the fail-closed principle, the conservative strategy of “block when uncertain” permeates the entire system.
Agents are recursive harness instances – The fact that AgentTool’s runAgent() recursively calls the harness’s query() loop means “agent” is not a separate system but a different configuration of the same harness. It swaps only the tool pool while reusing the same security, hooks, and orchestration.
Only 10 of 42 tools are concurrency-safe, yet perceived parallelism is high – The 10 tools representing only 24% of the total (Read, Glob, Grep, Web, LSP) happen to be the most frequently called. This asymmetry demonstrates the practical value of the 3-tier concurrency model. buildTool()’s fail-closed default (isConcurrencySafe: false) forms the safety boundary, structurally preventing new tool developers from incorrectly declaring concurrency safety.

Next post: #4 – Runtime Hooks: 26+ Events and CLAUDE.md 6-Stage Discovery

Claude Code Harness Anatomy #4 — Runtime Hooks: 26+ Events and CLAUDE.md 6-Stage Discovery

Mon, 06 Apr 2026 00:00:00 +0900

Overview

In Claude Code, the word “hook” refers to two completely different systems. Runtime hooks (toolHooks.ts + utils/hooks.ts) are a security/extension pipeline that executes shell scripts before and after tool execution, while React hooks (hooks/*.ts, 85+) are state management code for the terminal UI. Missing this distinction leads to a 85x overestimation of the Rust reimplementation scope. This post analyzes the PreToolUse/PostToolUse pipeline of runtime hooks, the security invariant of resolveHookPermissionDecision(), the 9-category classification of 85 React hooks, and CLAUDE.md’s 6-stage discovery with token budget management.

1. Runtime Hooks vs React Hooks – The Key Distinction

Dimension	Runtime Hooks (toolHooks.ts + utils/hooks.ts)	React Hooks (hooks/*.ts)
Executor	`child_process.spawn()`	React render cycle
Configuration	settings.json `hooks` field, shell commands	Source code `import`
Execution timing	Before/after tool use, session start, etc. (26+ events)	Component mount/update
User-defined	Yes — users register shell scripts	No — internal code
Result format	JSON stdout (allow/deny/ask/rewrite)	React state changes
Rust reimplementation	Required — core of tool execution pipeline	Not needed — TUI only

2. PreToolUse Pipeline – 7 Yield Variants

runPreToolUseHooks() (toolHooks.ts:435-650) is designed as an AsyncGenerator. Called before tool execution, it emits the following yield types:

message: Progress messages (hook start/error/cancel)
hookPermissionResult: allow/deny/ask decision
hookUpdatedInput: Input rewrite (changes input without a permission decision)
preventContinuation: Execution halt flag
stopReason: Halt reason string
additionalContext: Additional context to pass to the model
stop: Immediate halt

Why AsyncGenerator? Hooks execute sequentially, and each hook’s result affects subsequent processing. Promise chaining returns only the final result, and event emitters lack type safety. AsyncGenerator is the only pattern that lets the caller consume each result and halt mid-stream.

flowchart TD
 subgraph "PreToolUse Pipeline"
 A["toolExecution.ts<br/>Tool call begins"]
 B["runPreToolUseHooks()<br/>toolHooks.ts:435"]
 C["getMatchingHooks()<br/>utils/hooks.ts:1603"]
 D["settings.json hooks<br/>event + pattern matching"]
 E["spawn() shell command<br/>stdin: JSON, stdout: result"]
 F["HookResult parsing<br/>allow / deny / ask / rewrite"]
 end

 subgraph "Permission Resolution"
 G["resolveHookPermission<br/>Decision()<br/>toolHooks.ts:332"]
 H{"Hook result?"}
 I["allow: checkRule<br/>BasedPermissions()<br/>rules override hooks"]
 J["deny: immediate rejection"]
 K["ask: canUseTool()<br/>user prompt"]
 end

 subgraph "Tool Execution"
 L["tool.call()"]
 end

 subgraph "PostToolUse"
 M["runPostToolUseHooks()<br/>result transform / block"]
 end

 A --> B --> C --> D --> E --> F --> G --> H
 H -->|"allow"| I
 H -->|"deny"| J
 H -->|"ask"| K
 I -->|"Rules pass"| L
 L --> M

resolveHookPermissionDecision – allow != bypass

The core invariant of resolveHookPermissionDecision() (toolHooks.ts:332-433): a hook’s allow does not bypass settings.json deny/ask rules (toolHooks.ts:325-327).

The processing logic has 3 stages:

Stage 1 – allow handling (toolHooks.ts:347-406):

hookResult.behavior === 'allow':
 -> Call checkRuleBasedPermissions()
 -> null -> no rules, hook allow passes
 -> deny -> rule overrides hook (security first!)
 -> ask -> user prompt required

Why doesn’t allow bypass? This is a deliberate security decision. If an external shell script returning {"decision":"allow"} could override settings.json deny rules, a malicious hook could circumvent security policies. Rules always take precedence over hooks.

Stage 2 – deny (toolHooks.ts:408-411): Immediate rejection, no further checks.

Stage 3 – ask/none (toolHooks.ts:413-432): Calls canUseTool() for user prompt.

26+ Event Types

getMatchingHooks() (utils/hooks.ts:1603-1682) handles hook matching:

Tool events: PreToolUse, PostToolUse, PostToolUseFailure, PermissionRequest, PermissionDenied
Session events: SessionStart, SessionEnd, Setup
Agent events: SubagentStart, SubagentStop, TeammateIdle
Task events: TaskCreated, TaskCompleted
System events: Notification, ConfigChange, FileChanged, InstructionsLoaded
Compact events: PreCompact, PostCompact
Input events: UserPromptSubmit, Elicitation, ElicitationResult
Stop events: Stop, StopFailure

Matched hooks execute sequentially, and if one denies, subsequent hooks are not executed.

3. 85 React Hooks – 9 Category Classification

mindmap
 root(("TS Hook System"))
 Runtime Hooks
 toolHooks.ts 651 lines
 PreToolUse
 PostToolUse
 PostToolUseFailure
 utils/hooks.ts ~5000 lines
 26+ event types
 Shell spawn
 Async protocol
 React Hooks 85+
 Permission 3
 useCanUseTool
 PermissionContext
 UI Input 11
 useTextInput
 useVimInput
 useTypeahead
 UI Display 11
 useVirtualScroll
 useDiffData
 State/Config 12
 useSettings
 useSessionBackgrounding
 Integration/Remote 12
 useRemoteSession
 useReplBridge
 Features 20
 useVoice
 useSwarm
 useTasks
 Notifications 16
 notifs/ directory
 Tools/Keybindings 5
 useMergedTools
 Additional 5+
 fileSuggestions
 useManagePlugins

Category	Count	Rust Reimpl	Representative Hook
Permission	3	Partial (bridge)	`useCanUseTool` (203 lines)
UI Input	11	Not needed	`useTextInput` (529 lines), `useVimInput` (316 lines)
UI Display	11	Not needed	`useVirtualScroll` (721 lines)
State/Config	12	Not needed	`useSessionBackgrounding` (158 lines)
Integration/Remote	12	Not needed	`useRemoteSession` (605 lines)
Features/Notifications	20	Not needed	`useVoice` (1,144 lines)
Notifications/Banners	16	Not needed	`notifs/` directory
Tools/Keybindings	5	Not needed	`useMergedTools` (44 lines)
Additional	5+	Not needed	`fileSuggestions` (811 lines)

Key takeaway: What Rust needs to reimplement is only the runtime pipeline of toolHooks.ts (651 lines) + utils/hooks.ts (~5,000 lines). The 85 React hooks totaling 15,000+ lines are out of scope.

4. CLAUDE.md 6-Stage Discovery

getMemoryFiles() in claudemd.ts (1,479 lines, L790-1074) loads CLAUDE.md through a 6-stage hierarchy:

Stage	Source	Path Example	Priority
1. Managed	Org policy	`/etc/claude-code/CLAUDE.md`	Lowest
2. User	Personal habits	`~/.claude/CLAUDE.md`, `~/.claude/rules/*.md`
3. Project	Project rules	`CLAUDE.md` and `.claude/rules/*.md` from cwd to root
4. Local	Local overrides	`CLAUDE.local.md` (gitignored)
5. AutoMem	Auto memory	`MEMORY.md` entrypoint
6. TeamMem	Team memory	Cross-org sync	Highest

Why this order? The file comment (L9) states explicitly: “Files are loaded in reverse order of priority.” LLMs pay more attention to later parts of the prompt, so the most specific instructions (Local > Project > User > Managed) are placed last. This is not CSS specificity — it’s a design that leverages LLM attention bias.

Upward Directory Traversal and Deduplication

Starting from originalCwd, it walks up to the filesystem root, then calls dirs.reverse() to process from root downward (L851-857). In monorepos, the parent CLAUDE.md loads first and the child project’s CLAUDE.md layers on top.

Worktree deduplication (L868-884): When a git worktree is nested inside the main repo, an isNestedWorktree check prevents the same CLAUDE.md from being loaded twice.

@include directive (L451-535): Lexes markdown tokens to ignore @path inside code blocks, recursively resolving only @path in text nodes. Maximum depth of 5.

5. System/User Context Separation – dual-memoize Cache

context.ts (189 lines) separates the system prompt into two independent contexts:

getSystemContext() (L116): Git state, cache breaker
getUserContext() (L155): CLAUDE.md merged string, current date

Why split into two? Because of the Anthropic API’s prompt caching strategy. Git state (session-fixed) and CLAUDE.md (invalidated only on file changes) have different cache lifetimes, so cache_control must be applied differently. Both functions are wrapped in memoize and execute only once per session.

3 Cache Invalidation Paths

setSystemPromptInjection() (context.ts:29): Clears both caches
clearMemoryFileCaches() (claudemd.ts:1119): Clears memory files only
resetGetMemoryFilesCache() (claudemd.ts:1124): Clears memory files + fires InstructionsLoaded hook

This separation distinguishes between worktree switches (no reload needed) and actual reloads (after compaction).

6. Token Budget – Response Continuation Decisions

checkTokenBudget() in tokenBudget.ts (93 lines) controls whether to continue responding, not prompt size:

COMPLETION_THRESHOLD = 0.9 -- continue if below 90%
DIMINISHING_THRESHOLD = 500 -- 3+ consecutive turns, <500 tokens each -> diminishing returns

if (!isDiminishing && turnTokens < budget * 0.9) -> continue
if (isDiminishing || continuationCount > 0) -> stop with event
else -> stop without event

Why 0.9? Models tend to start summarizing near the budget limit. Stopping at 90% prevents “wrapping up” summaries and keeps the work going. The nudgeMessage explicitly instructs “do not summarize.”

Diminishing returns detection prevents the model from falling into repetitive patterns. Sub-agents stop immediately (L51) — they don’t have their own budgets.

Rust Comparison

Aspect	TS	Rust
Hook event types	26+	PreToolUse, PostToolUse (2 only)
Hook execution	Async AsyncGenerator	Synchronous `Command::output()`
Hook results	7 yield variants + JSON	Allow/Deny/Warn (3 via exit code)
Input modification	`hookUpdatedInput`	Not possible
allow != bypass	Guaranteed	Not implemented (security vulnerability)
CLAUDE.md	6-stage discovery	4 candidates per dir
@include	Recursive, depth 5	Not supported
Token budget	`checkTokenBudget()` with 3 decisions	None
Prompt cache	memoize + 3 invalidation paths	Rebuilt every time

Insights

The dual meaning of “hook” is the biggest source of architectural confusion – The 85 React hooks are not in scope for Rust reimplementation. Only the runtime hooks (~5,600 lines) are porting targets. However, this runtime engine includes 26 event types, an async protocol ({"async":true} background switching), and prompt requests (bidirectional stdin/stdout). Precisely scoping the meaning of “hooks” is the starting point for accurate estimation.
CLAUDE.md’s “last is strongest” pattern is deliberate exploitation of LLM attention bias – In the 6-stage hierarchical loading (Managed -> User -> Project -> Local -> AutoMem -> TeamMem), the most specific instructions are placed at the end of the prompt for maximum influence. This design emerges at the intersection of API prompt cache hit-rate optimization + LLM behavioral characteristics, not from architectural tidiness.
The “allow != bypass” invariant in resolveHookPermissionDecision() is the security cornerstone – The current Rust hooks.rs judges allow/deny solely by exit code. Without implementing JSON result parsing and the subsequent checkRuleBasedPermissions check, a malicious hook could bypass deny rules — a security vulnerability. Clearly delineating the boundary between automation convenience and security policy is the fundamental challenge of the hook system.

Next post: #5 – MCP Services and the Plugin-Skill Extension Ecosystem

Claude Code Harness Anatomy #5 — MCP Services and the Plugin-Skill Extension Ecosystem

Mon, 06 Apr 2026 00:00:00 +0900

Overview

Beyond its 42 built-in tools, Claude Code can extend with unlimited external tools via MCP (Model Context Protocol). This post analyzes the connection management architecture of client.ts (3,348 lines), the OAuth authentication system of auth.ts (2,465 lines), the 4-layer security model, and config deduplication. We then dissect the structural differences between plugins and skills, the 5-layer skill discovery engine, and the circular reference resolution pattern in mcpSkillBuilders.ts.

1. MCP Client – Connection Management Is Harder Than the Protocol

Memoization-Based Connection Pool

connectToServer is wrapped with lodash.memoize. The cache key is name + JSON(config). Since MCP servers are stateful (stdio processes, WebSocket connections), creating a new connection for every tool call would be catastrophically bad for performance.

onclose handler invalidates the cache -> next call automatically reconnects
fetchToolsForClient and fetchResourcesForClient each have their own LRU cache (20 entries)

Tool Proxy Pattern

MCP tools are converted to native Tool interfaces:

name: Format mcp__<normalized_server>__<normalized_tool>
call(): ensureConnectedClient -> callMCPToolWithUrlElicitationRetry -> callMCPTool
checkPermissions(): Always passthrough — MCP tools use a separate permission system
annotations: Maps MCP annotations like readOnlyHint, destructiveHint

URL Elicitation Retry: OAuth-based MCP servers can require authentication mid-tool-call (error code -32042). A retry loop shows the user the URL, waits for authentication to complete, and retries.

Connection State Machine and 3-Strike Terminal Error

stateDiagram-v2
 [*] --> Pending: Config loaded
 Pending --> Connected: connectToServer success
 Pending --> Failed: Connection timeout
 Pending --> NeedsAuth: 401 UnauthorizedError
 Pending --> Disabled: isMcpServerDisabled()

 Connected --> Connected: Tool call success
 Connected --> Failed: 3 consecutive terminal errors
 Connected --> NeedsAuth: 401 during callMCPTool
 Connected --> Pending: onclose cache invalidation

 NeedsAuth --> Pending: Auth completed
 NeedsAuth --> NeedsAuth: 15-min TTL cache

 Failed --> Pending: reconnectMcpServer()
 Disabled --> Pending: toggleMcpServer()

 note right of Connected
 Exists in memoize cache
 fetchTools/Resources also cached
 end note

3-strike rule: 3 consecutive terminal errors force a transition to Failed state. This prevents endlessly retrying against dead servers.

15-minute needs-auth cache: Retrying a server that returned 401 every time would cause 30+ connectors to fire simultaneous network requests. The TTL cache prevents unnecessary retries.

2. OAuth – The Reality of 2,465 Lines

The reason auth.ts is 2,465 lines is that real-world OAuth servers don’t consistently implement the RFCs:

Component	Description
RFC 9728 + 8414 discovery	Server can run AS on a separate host -> discover AS URL via PRM
PKCE	Public client — code_verifier/code_challenge required
XAA (Cross-App Access)	Exchange IdP id_token for access_token at the MCP server’s AS
Non-standard error normalization	Slack returns HTTP 200 with `{"error":"invalid_grant"}`
Keychain storage	macOS Keychain integration (`getSecureStorage()`)

Rust porting implications: OAuth is not an SDK dependency but a complex async state machine. Discovery (2 stages) -> PKCE -> callback server -> token storage -> refresh -> revocation -> XAA. Porting the whole thing is impractical, so starting with stdio MCP + API key authentication is realistic.

3. 4-Layer Security Model

MCP security is not a single gate but a composition of trust levels:

flowchart TD
 subgraph L1["1. Enterprise"]
 E1["managed-mcp.json<br/>If present, blocks all other sources"]
 E2["denylist / allowlist<br/>name, command, URL patterns"]
 end

 subgraph L2["2. Project"]
 P1[".mcp.json loaded"]
 P2["pending -> user approval -> approved"]
 end

 subgraph L3["3. Server"]
 S1["Independent OAuth tokens per server"]
 S2["Keychain storage"]
 end

 subgraph L4["4. Channel"]
 C1["GrowthBook allowlist<br/>tengu_harbor_ledger"]
 C2["Structured events<br/>not plain text matching"]
 end

 L1 --> L2 --> L3 --> L4

 style L1 fill:#ffcdd2
 style L2 fill:#fff9c4
 style L3 fill:#c8e6c9
 style L4 fill:#e1f5fe

Each layer operates independently, and Enterprise takes highest priority. Even if .mcp.json exists in the project, it’s blocked if it hits the enterprise denylist.

Config Sources and Deduplication (config.ts 1,578 lines)

Config source priority (higher wins):

Enterprise managed (managed-mcp.json)
Local (per-user project settings)
User (global ~/.claude.json)
Project (.mcp.json)
Plugin (dynamic)
claude.ai connectors (lowest)

Why is deduplication needed? The same MCP server can exist in both .mcp.json and claude.ai connectors. getMcpServerSignature creates stdio:[command|args] or url:<base> signatures, unwrapping CCR proxy URLs to original vendor URLs before comparison.

Environment variable expansion: Supports ${VAR} and ${VAR:-default} syntax. Missing variables are reported as warnings rather than errors to prevent partial connection failures.

4. Plugins vs Skills – Structural Differences

Dimension	Skills	Plugins
Essence	Prompt extension (SKILL.md = text)	System extension (skills + hooks + MCP)
Installation	Drop a single file	Marketplace git clone
Runtime code	None (pure text)	Yes (MCP servers, hook scripts)
Toggle	Implicit (file existence)	Explicit (`/plugin` UI)
ID scheme	File path	`{name}@builtin` or `{name}@marketplace`

Skills are the embodiment of the “file = extension” principle. A single SKILL.md works as an extension immediately without installation or building.

Plugin Service Separation of Concerns

File	Role	Side Effects
`pluginOperations.ts`	Pure library functions	None
`pluginCliCommands.ts`	CLI wrappers	`process.exit`, console output
`PluginInstallationManager.ts`	Background coordinator	AppState updates

The pure functions in pluginOperations are reused by both CLI and interactive UI.

Marketplace coordination: diffMarketplaces() compares declared marketplaces against actual installations. New installs trigger auto-refresh; existing updates only set a needsRefresh flag. New installs need auto-refresh to prevent “plugin not found” errors, while updates let users choose when to apply.

5. 5-Layer Skill Discovery Engine

Loading source priority in loadSkillsDir.ts (1,086 lines):

flowchart TD
 subgraph Discovery["Skill Discovery"]
 A["1. policySettings<br/>managed-settings.json"]
 B["2. userSettings<br/>~/.claude/skills/"]
 C["3. projectSettings<br/>.claude/skills/<br/>project root to home"]
 D["4. --add-dir<br/>additional directories"]
 E["5. legacy<br/>/commands/ directory"]
 end

 subgraph Dedup["Deduplication"]
 F["realpath() symlink resolution"]
 G["File ID based first-wins"]
 end

 subgraph Parse["Frontmatter Parsing"]
 H["description, when_to_use"]
 I["allowed-tools"]
 J["model, context, hooks"]
 K["paths, shell"]
 end

 A --> B --> C --> D --> E
 E --> F --> G
 G --> H & I & J & K

 style Discovery fill:#e1f5fe
 style Parse fill:#fff3e0

Frontmatter System

15+ fields are extracted from SKILL.md’s YAML frontmatter:

description, when_to_use: Used by the model for skill selection
allowed-tools: List of tools permitted during skill execution
model: Force a specific model
context: fork: Execute in a separate context
hooks: Skill-specific hook configuration
paths: Path-based activation filter
shell: Inline shell command execution

Lazy Disk Extraction of Bundled Skills

17 bundled skills compiled into the CLI binary (skills/bundled/) are extracted to disk on first invocation if they have a files field:

O_NOFOLLOW | O_EXCL flags prevent symlink attacks
0o600 permissions restrict access
resolveSkillFilePath() rejects .. paths to prevent directory escape

Why extract to disk? So the model can read reference files using the Read/Grep tools. Keeping them only in memory would make them inaccessible to the model.

mcpSkillBuilders – A 44-Line Circular Reference Solution

mcpSkillBuilders.ts (44 lines) is small but architecturally significant.

Problem: mcpSkills.ts needs functions from loadSkillsDir.ts, but a direct import creates a circular reference (client.ts -> mcpSkills.ts -> loadSkillsDir.ts -> ... -> client.ts).

Solution: A write-once registry. loadSkillsDir.ts registers functions at module initialization time, and mcpSkills.ts retrieves them when needed. Dynamic imports fail in the Bun bundler, and literal dynamic imports trigger dependency-cruiser’s circular dependency check, making this approach the only viable solution.

Leaf modules in the dependency graph import only types, and runtime registration happens exactly once at startup.

Rust Comparison

Area	TS (Complete)	Rust (Current)
Name normalization	`normalization.ts`	`mcp.rs` — same logic
Server signature	`getMcpServerSignature`	`mcp_server_signature` — includes CCR proxy unwrap
stdio JSON-RPC	SDK-dependent	`mcp_stdio.rs` — direct implementation (initialize, tools/list, tools/call)
OAuth	2,465-line full implementation	None — types only
Connection management	memoize + onclose reconnection	None
Skill loading	5-layer + 15-field frontmatter	2 directories, SKILL.md only
Bundled skills	17 built-in	None
Plugins	Built-in + marketplace	None
Security	4-layer (Enterprise->Channel)	None

Key gap: Rust has implemented bootstrap (config -> transport) and stdio JSON-RPC. The SDK-less JSON-RPC implementation in mcp_stdio.rs is meaningful progress. However, OAuth, connection lifecycle, channel security, and the full skill discovery system are all absent.

Insights

MCP is not a “protocol” but an “integration framework” – What client.ts’s 3,348 lines tell us is that the hard part is not JSON-RPC but connection lifecycle management. Memoization, auto-reconnect, session expiry detection, 401 retry, 3-strike terminal errors, needs-auth caching. External processes (stdio) and remote services (HTTP/SSE) die unpredictably, OAuth tokens expire, and networks drop. This is code that reflects the reality that “connect once and done” doesn’t exist.
Skills embody the “file = extension” principle – A single SKILL.md works as an extension immediately without installation or building. This simplicity, combined with incremental complexity via frontmatter (model specification, hooks, path filters), accommodates both beginners and power users. Plugins are the organizational layer above skills, packaging “skills + hooks + MCP servers” together.
mcpSkillBuilders.ts is a 44-line architecture lesson – The only solution that simultaneously satisfies Bun bundler’s dynamic import constraints and dependency-cruiser’s circular dependency check was a “write-once registry.” The pattern where leaf modules import only types and runtime registration happens once at startup is a broadly applicable approach to resolving circular references in complex module systems — worth remembering.

Next post: #6 – Beyond Claude Code: A Retrospective on Building an Independent 7-Crate Harness

Claude Code Harness Anatomy #6 — Beyond Claude Code: A Retrospective on Building an Independent 7-Crate Harness

Mon, 06 Apr 2026 00:00:00 +0900

Overview

This is the final post in the series that systematically dissected Claude Code’s TypeScript source across 27 sessions. In Phase 1 we understood the architecture of 100k+ lines of TS code, in Phase 2 we reimplemented core patterns in Rust, and in Phase 3 we designed and built an independent agent harness that overcomes the 8 limitations we discovered. This post covers the limitation analysis, 5 design principles, 7-crate architecture, 61 tests, and a full retrospective of the journey.

1. 8 Limitations of Claude Code’s Architecture

From 27 sessions of analysis, we distinguished strengths from limitations. The strengths (AsyncGenerator pipeline, 3-tier concurrency, hook extensibility, CLAUDE.md discovery, MCP support, self-contained tool interface, 7-path error recovery) represent excellent design. However, the following 8 limitations motivated the independent harness:

#	Limitation	Source Session	Impact
1	React/Ink dependency — heavy TUI	S08	Unnecessary dependency in headless mode
2	Single provider (effectively Anthropic-only)	S01	Cannot use OpenAI or local models
3	main.tsx 4,683-line monolith	S01	CLI/REPL/session mixed in one file
4	Synchronous tool execution (Rust port)	S03	No streaming pipelining
5	TS ecosystem-locked plugins	S13	No language-neutral extensions
6	85 React hooks mixing UI/runtime	S08	Dual meaning of “hook”
7	Implicit prompt caching dependencies	S10	3 cache invalidation paths are implicit
8	MCP OAuth 2,465-line complexity	S12	RFC inconsistency is the root cause

2. 5 Design Principles

We established 5 core principles to overcome these limitations:

Principle 1 – Multi-provider: Support Anthropic, OpenAI, and local models (Ollama) through a single abstraction.

#[async_trait]
pub trait Provider: Send + Sync {
 async fn stream(&self, request: ProviderRequest)
 -> Result<EventStream, ProviderError>;
 fn available_models(&self) -> &[ModelInfo];
 fn name(&self) -> &str;
}

ProviderRequest is a provider-neutral struct that each implementation converts to its own API format.

Principle 2 – Native async: Fully async based on tokio. yield -> tx.send(), yield* -> channel forwarding replaces the AsyncGenerator pattern.

Principle 3 – Module separation: Conversation engine, tools, hooks, and prompts are each separate crates. No repeating the main.tsx monolith.

Principle 4 – Language-neutral extensions: SKILL.md compatibility + MCP servers as plugin units.

Principle 5 – Full MCP utilization: Leveraging not just tools but resources, prompts, and sampling across the full spec.

3. 7-Crate Architecture

graph TD
 CLI["harness-cli<br/>REPL binary"] --> CORE["harness-core<br/>Conversation engine + turn loop"]
 CORE --> PROV["harness-provider<br/>LLM provider abstraction"]
 CORE --> TOOLS["harness-tools<br/>Tool registry + built-in tools"]
 CORE --> HOOKS["harness-hooks<br/>Hook pipeline"]
 CORE --> PROMPT["harness-prompt<br/>CLAUDE.md discovery"]
 CORE --> MCP["harness-mcp<br/>MCP client"]
 MCP --> TOOLS

 style CLI fill:#b3e5fc
 style CORE fill:#fff9c4
 style PROV fill:#c8e6c9
 style TOOLS fill:#c8e6c9
 style HOOKS fill:#c8e6c9
 style PROMPT fill:#c8e6c9
 style MCP fill:#e1bee7

Core design: Only harness-core depends on other crates. The rest are independent of each other (except harness-mcp -> harness-tools). This structure enables:

Independent cargo test for each crate
No harness-core changes needed when adding providers
MCP tools implementing the same Tool trait as built-in tools

Crate	Core Responsibility	Test Count
`harness-provider`	LLM API calls, SSE parsing, retries	11
`harness-tools`	Tool registry, 3-tier concurrency	12
`harness-hooks`	Shell hook execution, deny short-circuit, rewrite chain	9
`harness-prompt`	6-stage CLAUDE.md, SHA-256 deduplication	9
`harness-core`	Conversation engine, `StreamingToolExecutor`	6
`harness-mcp`	JSON-RPC, stdio transport	14
`harness-cli`	REPL binary	–

Provider Trait – Multi-Provider

The existing Rust port’s ApiClient trait was Anthropic-specific (ApiRequest with Anthropic fields). The Provider trait accepts a provider-neutral ProviderRequest that each implementation converts to its own API format. Box<dyn Provider> enables runtime fallback chains.

ConversationEngine – Turn Loop

pub struct ConversationEngine {
 session: Session,
 provider: Box<dyn Provider>,
 tool_executor: StreamingToolExecutor,
 hook_pipeline: HookPipeline,
 prompt_builder: PromptBuilder,
 budget: TokenBudget,
}

Instead of the existing Rust port’s ConversationRuntime<C, T> generic pattern, we use trait objects. The provider must be swappable at runtime (model fallback), and generics fix the type at compile time, lacking flexibility.

Streaming Tool Execution (Pipelining)

We solved the biggest constraint of the existing Rust port — “collect all SSE events then execute tools”:

When a ContentBlockStop(ToolUse) event arrives from EventStream, forward immediately
After is_concurrency_safe() check, parallel processing via tokio::spawn
Tool execution proceeds while the API is still streaming

4. Phase 2 Retrospective – Extending the Existing Port

Before Phase 3’s independent harness, we extended the existing rust/ prototype in Phase 2:

Sprint	Achievement	Core Pattern
S14-S15	Orchestration module + 3-tier concurrency	`tokio::JoinSet`-based parallel execution
S16-S17	Tool expansion (19 -> 26)	Added Task, PlanMode, AskUser
S18-S19	Hook execution pipeline	stdin JSON, deny short-circuit
S20-S21	Skill discovery	`.claude/skills/` scan, prompt injection

Most of Phase 2’s code was rewritten in Phase 3. However, the questions discovered during prototyping (“Why AsyncGenerator?”, “Why should tools be unaware of the UI?”) determined the final design.

5. 61 Tests and the MockProvider Pattern

All crates are independently testable. MockProvider enables verifying the conversation engine’s full turn loop without actual API calls:

harness-provider: 11 tests (SSE parsing, retries, streams)
harness-tools: 12 tests (registry, concurrency, execution)
harness-hooks: 9 tests (deny short-circuit, rewrite chain, timeouts)
harness-prompt: 9 tests (6-stage discovery, hash deduplication)
harness-core: 6 tests (turn loop, tool calls, max iterations)
harness-mcp: 14 tests (JSON-RPC, initialization, tool listing)

6. How Phase 1-2 Lessons Shaped the Design

flowchart LR
 subgraph Phase1["Phase 1 -- Understanding"]
 direction TB
 P1A["S02: AsyncGenerator chain"]
 P1B["S05: 42-tool classification"]
 P1C["S08: Runtime vs React hooks"]
 P1D["S10: 6-stage CLAUDE.md"]
 P1E["S12: MCP connection management"]
 P1F["S13: Skills = prompts"]
 end

 subgraph Phase3["Phase 3 -- Independent Harness"]
 direction TB
 P3A["EventStream + mpsc channels"]
 P3B["Tool trait + 3-tier"]
 P3C["HookPipeline (runtime only)"]
 P3D["PromptAssembler separation"]
 P3E["harness-mcp stdio"]
 P3F["SKILL.md compatible"]
 end

 P1A -->|"yield -> tx.send()"| P3A
 P1B -->|"fail-closed defaults"| P3B
 P1C -->|"scope reduction"| P3C
 P1D -->|"cache splitting"| P3D
 P1E -->|"implemented without SDK"| P3E
 P1F -->|"text injection"| P3F

 style Phase1 fill:#e1f5fe
 style Phase3 fill:#fff3e0

Lesson	Source	Design Impact
`StreamingToolExecutor` 4-stage state machine	S03	Async implementation in `harness-core`
`QueryDeps` callback DI’s type safety limits	S03	Trait object DI
6-layer Bash security chain	S06	`check_permissions()` + hook separation
Agent = recursive harness instance	S06	`ConversationEngine` reuse
`ApiClient` sync trait blocks pipelining	S03	`Provider` async trait
Deny short-circuit + Rewrite chaining	S09	Identical pattern in `HookPipeline`
SHA-256 content hash outperforms path hash	S11	Content hash in `harness-prompt`

7. Top 10 Architecture Patterns Learned

Core architecture patterns extracted from 27 sessions:

AsyncGenerator/Stream pipeline: The core abstraction for streaming LLM responses
3-tier tool concurrency: ReadOnly/Write/Dangerous classification balances safety and performance
ToolSpec + ToolResult duality: Separating metadata (for LLM) from execution results
Hook chain execution: Deny short-circuit, rewrite chain, independent post-hook transforms
6-stage prompt discovery: Managed -> user -> project -> local overrides
MCP adapter pattern: Unifying external protocol tools into the internal Tool trait
Provider abstraction: Swapping Anthropic/OpenAI behind the same interface
SSE incremental parsing: Assembling network chunks into event frames
MockProvider testing: Verifying engine behavior with predefined event sequences
Skills = prompts: Text injection sufficient instead of complex plugin systems

8. Full Journey Retrospective

Phase	Sessions	Key Deliverables
Phase 1 – Understanding	S00-S13	14 analysis documents, Rust prototype
Phase 2 – Reimplementation	S14-S21	Orchestration, 26 tools, hooks, skills
Phase 3 – Independent Harness	S22-S27	7-crate workspace, 61+ tests

Claude Code is a prompt engineering runtime. The core loop assembles messages, the tool system grants the ability to interact with the world, and the permission system sets boundaries. CLAUDE.md injects context, MCP integrates external systems, hooks and agents enable automation/delegation, and plugins/skills transform it into a user extension platform.

Future Directions

True streaming: Processing SSE byte streams chunk by chunk
Permission system: Per-tool user approval workflows
MCP SSE transport: HTTP SSE support beyond stdio
Token budget integration: Automatic context window budget management
Multi-turn agent mode: Autonomous iteration + breakpoint system

Insights

Good abstractions emerge at boundaries – Provider trait, Tool trait, HookRunner trait. Every core abstraction is a trait defining module boundaries. The existing Rust port’s ConversationRuntime<C, T> generics provide strong compile-time guarantees but had limitations for scenarios like swapping providers at runtime or dynamically registering MCP tools. Box<dyn Provider> + Box<dyn Tool> trait objects buy runtime flexibility at a minor vtable cost. Relative to LLM API latency (hundreds of ms to seconds), the vtable overhead is immeasurable.
The value of prototypes lies in questions, not code – Most of Phase 1-2’s prototype code was rewritten in Phase 3. But questions like “Why AsyncGenerator?”, “Why should tools be unaware of UI?”, and “Why doesn’t allow bypass?” determined the final design. The act of reading 100k lines of code is not the answer itself — the design intent (the why) discovered during reading is the true deliverable.
Most of the TS code’s complexity is defensive lines – Permission layers, frontmatter parsing, deduplication, symlink prevention. These aren’t features — they’re defenses. Rust can guarantee some of this at compile time through its type system and ownership model, but runtime policies like filesystem security and user config precedence must be implemented explicitly. The 27 sessions were the process of mapping these defensive lines, and that map guided the independent harness’s design.

Series complete. The full analysis documents are available at the claw-code repository.