<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ai Agents on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/ai-agents/</link><description>Recent content in Ai Agents on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 16 Apr 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/ai-agents/index.xml" rel="self" type="application/rss+xml"/><item><title>AI Coding Agent Ecosystem Tools — openai-oauth and Happy</title><link>https://ice-ice-bear.github.io/posts/2026-04-16-agent-ecosystem-tools/</link><pubDate>Thu, 16 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-16-agent-ecosystem-tools/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post AI Coding Agent Ecosystem Tools — openai-oauth and Happy" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;Two community projects caught my attention this week, both extending the AI coding agent ecosystem in different directions. &lt;strong&gt;openai-oauth&lt;/strong&gt; lets you use your ChatGPT subscription&amp;rsquo;s OAuth token as a free API proxy, while &lt;strong&gt;Happy&lt;/strong&gt; gives you mobile control over Claude Code and Codex sessions with push notifications and E2E encryption.&lt;/p&gt;
&lt;h2 id="ecosystem-architecture"&gt;Ecosystem Architecture
&lt;/h2&gt;&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TB
 Dev["Developer"] --&gt; Happy["Happy CLI&amp;lt;br/&amp;gt;happy claude / happy codex"]
 Happy --&gt; CC["Claude Code"]
 Happy --&gt; Codex["OpenAI Codex"]
 Dev --&gt; Phone["Phone App&amp;lt;br/&amp;gt;Remote Control"]
 Phone --&gt;|"Push notifications&amp;lt;br/&amp;gt;Permission approvals"| Happy
 Codex --&gt; Proxy["openai-oauth Proxy&amp;lt;br/&amp;gt;127.0.0.1:10531"]
 Proxy --&gt;|"OAuth token&amp;lt;br/&amp;gt;reuse"| API["OpenAI API&amp;lt;br/&amp;gt;Free access"]&lt;/pre&gt;&lt;h2 id="openai-oauth--free-api-access-via-chatgpt-token"&gt;openai-oauth — Free API Access via ChatGPT Token
&lt;/h2&gt;&lt;p&gt;This tool uses your existing ChatGPT account&amp;rsquo;s OAuth token to access the OpenAI API without purchasing separate API credits. Run &lt;code&gt;npx openai-oauth&lt;/code&gt; and it starts a local proxy at &lt;code&gt;127.0.0.1:10531/v1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uses the same OAuth endpoint that Codex CLI uses internally&lt;/li&gt;
&lt;li&gt;Authentication via &lt;code&gt;npx @openai/codex login&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Supports &lt;code&gt;/v1/responses&lt;/code&gt;, &lt;code&gt;/v1/chat/completions&lt;/code&gt;, &lt;code&gt;/v1/models&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Full support for streaming, tool calls, and reasoning traces&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Important caveats:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unofficial community project, not endorsed by OpenAI&lt;/li&gt;
&lt;li&gt;Personal use only — account risk exists&lt;/li&gt;
&lt;li&gt;Interestingly, Claude/Anthropic blocked similar approaches, but OpenAI appears to tolerate it (they acquired OpenClaw, a project in this space)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="happy--mobile-control-for-ai-coding-agents"&gt;Happy — Mobile Control for AI Coding Agents
&lt;/h2&gt;&lt;p&gt;Happy is a mobile and web client that wraps Claude Code and Codex, letting you monitor and control AI sessions from your phone.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CLI wrapper: &lt;code&gt;happy claude&lt;/code&gt; or &lt;code&gt;happy codex&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Push notifications for permission requests and errors&lt;/li&gt;
&lt;li&gt;E2E encryption for all communication&lt;/li&gt;
&lt;li&gt;Open source (MIT license), TypeScript codebase&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Components:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;App&lt;/strong&gt; — Expo-based mobile app&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CLI&lt;/strong&gt; — Terminal wrapper for AI agents&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent&lt;/strong&gt; — Bridge between CLI and server&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server&lt;/strong&gt; — Relay for remote communication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;npm install -g happy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then scan the QR code from the mobile app to pair your phone with your terminal session.&lt;/p&gt;
&lt;h2 id="why-these-matter"&gt;Why These Matter
&lt;/h2&gt;&lt;p&gt;Both tools address the same underlying need: AI coding agents are powerful but constrained. openai-oauth removes the cost barrier for API access (at the risk of account terms), while Happy removes the physical proximity requirement for managing agent sessions. Together they represent the community pushing AI agent tooling beyond what the providers officially support.&lt;/p&gt;
&lt;p&gt;The ecosystem is rapidly evolving, with developers building bridges between tools, creating mobile control planes, and finding creative ways to maximize the value of their existing subscriptions.&lt;/p&gt;</description></item><item><title>GBrain — Garry Tan's AI Agent Memory System</title><link>https://ice-ice-bear.github.io/posts/2026-04-16-gbrain/</link><pubDate>Thu, 16 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-16-gbrain/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post GBrain — Garry Tan's AI Agent Memory System" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&amp;ldquo;Your AI agent is smart but forgetful. GBrain gives it a brain.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;GBrain is an open-source AI agent memory system built by Garry Tan, President and CEO of Y Combinator. It is not a toy or a demo — Tan built it for the agents he actually uses in production. The repository has already gathered 8,349 stars and 931 forks on GitHub, written primarily in TypeScript and PLpgSQL.&lt;/p&gt;
&lt;h2 id="production-scale"&gt;Production Scale
&lt;/h2&gt;&lt;p&gt;GBrain&amp;rsquo;s production deployment speaks for itself:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;Count&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Pages ingested&lt;/td&gt;
 &lt;td&gt;17,888&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;People tracked&lt;/td&gt;
 &lt;td&gt;4,383&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Companies indexed&lt;/td&gt;
 &lt;td&gt;723&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Cron jobs running&lt;/td&gt;
 &lt;td&gt;21&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Time to build&lt;/td&gt;
 &lt;td&gt;12 days&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is not a proof-of-concept. It is a working knowledge graph that powers real agent workflows every day.&lt;/p&gt;
&lt;h2 id="architecture-the-signal-to-memory-loop"&gt;Architecture: The Signal-to-Memory Loop
&lt;/h2&gt;&lt;p&gt;The core loop is straightforward: every message is a signal, and every signal gets processed through the brain.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 A["Signal Arrives"] --&gt; B["Signal Detector &amp;lt;br/&amp;gt; runs on every message"]
 B --&gt; C["Brain-Ops &amp;lt;br/&amp;gt; check brain first"]
 B --&gt; D["Entity Extraction &amp;lt;br/&amp;gt; people, companies, topics"]
 C --&gt; E["Respond with &amp;lt;br/&amp;gt; brain context"]
 E --&gt; F["Write back &amp;lt;br/&amp;gt; to knowledge graph"]
 F --&gt; G["Sync &amp;lt;br/&amp;gt; cross-agent memory"]
 D --&gt; F&lt;/pre&gt;&lt;p&gt;The key insight is that the signal detector fires on &lt;strong&gt;every single message&lt;/strong&gt; in parallel, capturing the agent&amp;rsquo;s thinking and extracting entities before the main response even begins. This means the brain is always accumulating context, not just when explicitly asked.&lt;/p&gt;
&lt;h2 id="philosophy-thin-harness-fat-skills"&gt;Philosophy: Thin Harness, Fat Skills
&lt;/h2&gt;&lt;p&gt;GBrain follows a distinctive design philosophy: &lt;strong&gt;intelligence lives in skills, not in the runtime&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The harness itself is deliberately thin — it handles message routing, database connections, and the signal detection loop. Everything else is pushed into 25 skill files organized by a central &lt;code&gt;RESOLVER.md&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;signal-detector&lt;/strong&gt; — always-on, fires on every message&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;brain-ops&lt;/strong&gt; — the 5-step lookup protocol before any external call&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ingest&lt;/strong&gt; — pull in pages, documents, feeds&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;enrich&lt;/strong&gt; — add metadata, classify, link entities&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;query&lt;/strong&gt; — structured retrieval from the knowledge graph&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;maintain&lt;/strong&gt; — garbage collection, deduplication, health checks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;daily-task-manager&lt;/strong&gt; — recurring workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;cron-scheduler&lt;/strong&gt; — 21 cron jobs and counting&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;soul-audit&lt;/strong&gt; — personality and behavior consistency checks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The phrase &amp;ldquo;skill files are code&amp;rdquo; captures this well. Each skill is a fat markdown document that encodes an entire workflow — not just a prompt template, but a complete operational specification with decision trees, error handling, and output formats.&lt;/p&gt;
&lt;h2 id="brain-first-convention"&gt;Brain-First Convention
&lt;/h2&gt;&lt;p&gt;Before any agent reaches for an external API, it follows a strict 5-step brain lookup:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check the knowledge graph for existing information&lt;/li&gt;
&lt;li&gt;Check recent signals for context&lt;/li&gt;
&lt;li&gt;Check entity relationships&lt;/li&gt;
&lt;li&gt;Check temporal patterns&lt;/li&gt;
&lt;li&gt;Only then, if needed, call an external API&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This &amp;ldquo;brain-first&amp;rdquo; convention dramatically reduces redundant API calls and ensures the agent&amp;rsquo;s responses are grounded in accumulated knowledge rather than fresh (and potentially inconsistent) lookups.&lt;/p&gt;
&lt;h2 id="technical-stack"&gt;Technical Stack
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;PGLite&lt;/strong&gt; deserves special mention. Instead of requiring a Postgres server, GBrain uses PGLite for instant database setup — about 2 seconds from zero to a running knowledge graph. No Docker, no server provisioning, no connection strings.&lt;/p&gt;
&lt;p&gt;The system also ships as an &lt;strong&gt;MCP server&lt;/strong&gt;, meaning it integrates directly with Claude Code, Cursor, and Windsurf. Any MCP-compatible tool can tap into the brain.&lt;/p&gt;
&lt;p&gt;Installation takes roughly 30 minutes, and the agent handles its own setup — you point it at the repo and it bootstraps the database, installs skills, and configures cron jobs.&lt;/p&gt;
&lt;h2 id="why-it-matters"&gt;Why It Matters
&lt;/h2&gt;&lt;p&gt;Most AI agent frameworks focus on orchestration: how to chain LLM calls, how to manage tool use, how to handle errors. GBrain addresses a different problem entirely — &lt;strong&gt;persistent, structured memory across sessions and across agents&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The fact that it was built in 12 days and is already running at production scale (17,888 pages, 4,383 people) suggests that the &amp;ldquo;thin harness, fat skills&amp;rdquo; approach is not just philosophically clean but practically effective.&lt;/p&gt;
&lt;p&gt;GitHub: &lt;a class="link" href="https://github.com/garrytan/gbrain" target="_blank" rel="noopener"
 &gt;garrytan/gbrain&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Why Multi-Agent Orchestration Doesn't Work Well</title><link>https://ice-ice-bear.github.io/posts/2026-04-16-multiagent-orchestration/</link><pubDate>Thu, 16 Apr 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-04-16-multiagent-orchestration/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Why Multi-Agent Orchestration Doesn't Work Well" /&gt;&lt;p&gt;Multi-agent orchestration sounds like the natural next step for AI-powered development: break a complex task into subtasks, assign each to a specialized agent, and let them collaborate. In practice, however, the approach falls apart in predictable and structural ways. After burning through $5,000 worth of tokens testing systems like Claude Code agent teams, Gastown (city-style orchestration for web app development), and Paperclip (company-style orchestration), shalomeir identified three fundamental bottlenecks that plague every multi-agent system tested.&lt;/p&gt;
&lt;p&gt;This post examines those bottlenecks and explores why the answer may already exist in single-agent tools rather than elaborate orchestration frameworks.&lt;/p&gt;
&lt;h2 id="the-three-structural-bottlenecks"&gt;The Three Structural Bottlenecks
&lt;/h2&gt;&lt;p&gt;Multi-agent systems fail not because individual agents are weak, but because the connections between them introduce compounding failures. The three bottlenecks — Context Collapse, Ghost Delegation, and Verification Error — are not independent problems. They cascade into each other, creating a failure mode that is worse than the sum of its parts.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart TD
 A["Orchestrator assigns subtask"] --&gt; B["Agent receives partial context"]
 B --&gt; C{"Context Collapse &amp;lt;br/&amp;gt; Agent lacks full picture"}
 C --&gt;|incomplete work| D{"Ghost Delegation &amp;lt;br/&amp;gt; Handoff breaks silently"}
 D --&gt;|broken assumptions| E{"Verification Error &amp;lt;br/&amp;gt; QA passes broken output"}
 E --&gt;|bad output accepted| F["Downstream agents build &amp;lt;br/&amp;gt; on faulty foundations"]
 F --&gt;|compounds| C

 style C fill:#ff6b6b,stroke:#c92a2a,color:#fff
 style D fill:#ffa94d,stroke:#e67700,color:#fff
 style E fill:#ffd43b,stroke:#f08c00,color:#333
 style F fill:#868e96,stroke:#495057,color:#fff&lt;/pre&gt;&lt;p&gt;Each bottleneck deserves close examination, because understanding the mechanism is key to understanding why simply adding more agents or better prompts does not fix the problem.&lt;/p&gt;
&lt;h2 id="bottleneck-1-context-collapse"&gt;Bottleneck 1: Context Collapse
&lt;/h2&gt;&lt;p&gt;When an orchestrator delegates a subtask to an agent, it must decide what context to pass along. This is where the first failure occurs. The orchestrator cannot pass the entire project context — token limits, cost, and latency all prevent it. So it summarizes, truncates, or selectively forwards information. Every time it does this, critical details are lost.&lt;/p&gt;
&lt;p&gt;Consider a web application with a frontend component that depends on a specific backend API contract. The orchestrator assigns the frontend work to Agent A and the backend work to Agent B. Agent A receives a summary of the API spec, but not the nuanced discussion about error handling edge cases that shaped the spec. Agent A then makes reasonable assumptions that happen to be wrong, and the resulting code compiles but fails at integration.&lt;/p&gt;
&lt;p&gt;This is not a prompting problem. It is a fundamental information-theoretic constraint. The orchestrator is acting as a lossy compression layer between agents and the full project state. No amount of prompt engineering eliminates the information loss — it only shifts which details get dropped. A single agent working in one long context window does not face this problem because it can reference any prior decision or constraint directly.&lt;/p&gt;
&lt;p&gt;The irony is that the more complex a project becomes (and thus the more you want to parallelize), the more critical full context becomes, and the harder it is to distribute that context across agents without loss.&lt;/p&gt;
&lt;h2 id="bottleneck-2-ghost-delegation"&gt;Bottleneck 2: Ghost Delegation
&lt;/h2&gt;&lt;p&gt;Ghost delegation occurs when a handoff between agents appears to succeed but actually fails silently. Agent A completes its subtask and passes the result to the orchestrator, which passes it to Agent B. But the handoff loses nuance: Agent A&amp;rsquo;s implicit assumptions, the reasoning behind certain choices, and the constraints it discovered during execution.&lt;/p&gt;
&lt;p&gt;In the Gastown and Paperclip experiments, this manifested as agents confidently building on foundations that were subtly wrong. A database schema agent would produce a schema, a backend agent would build an API on it, and a frontend agent would build UI components — each step technically completing successfully, but with accumulated drift from the original intent.&lt;/p&gt;
&lt;p&gt;The core issue is that inter-agent communication is restricted to explicit artifacts — code files, JSON specs, text summaries. But software development involves enormous amounts of tacit knowledge: why a particular approach was chosen over alternatives, what trade-offs were considered, which edge cases are known but deferred. This tacit knowledge evaporates at every handoff boundary.&lt;/p&gt;
&lt;p&gt;Real-world software teams solve this through shared environments — the same codebase, the same issue tracker, the same Slack channel where context accumulates organically. Multi-agent systems that share conversations instead of environments lose this ambient context entirely.&lt;/p&gt;
&lt;h2 id="bottleneck-3-verification-error"&gt;Bottleneck 3: Verification Error
&lt;/h2&gt;&lt;p&gt;The final bottleneck is the most insidious. When Agent B completes its work based on Agent A&amp;rsquo;s output, something needs to verify that the result is correct. In most multi-agent frameworks, this verification is done by another agent — or by the orchestrator itself. But verification requires the same full context that was lost in the first bottleneck.&lt;/p&gt;
&lt;p&gt;A verifier agent that only sees the output and a specification cannot catch errors that stem from context that was never communicated. It can check syntax, run tests if they exist, and verify surface-level correctness. But it cannot detect that the architectural approach contradicts a constraint discussed three handoffs ago that never made it into the spec.&lt;/p&gt;
&lt;p&gt;In practice, this means multi-agent systems converge on outputs that pass automated checks but fail in integration or under real-world conditions. The verification step provides a false sense of confidence: the system reports success, the orchestrator moves on, and the error compounds through subsequent stages.&lt;/p&gt;
&lt;p&gt;This is where the cascade becomes truly destructive. A verification error feeds back into context collapse — downstream agents now have an expanded context that includes incorrect assumptions validated by the verifier. The error has been laundered into accepted truth.&lt;/p&gt;
&lt;h2 id="the-orchestrator-design-problem"&gt;The Orchestrator Design Problem
&lt;/h2&gt;&lt;p&gt;The experiments reveal a counterintuitive insight: the bottleneck is not agent quality or count, but orchestrator design. Adding more agents to a poorly designed orchestration makes things worse, not better, because each additional agent adds another handoff where context can collapse and delegation can ghost.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;flowchart LR
 subgraph bad["Conversation-Based Orchestration"]
 O1["Orchestrator"] --&gt;|summary| A1["Agent 1"]
 O1 --&gt;|summary| A2["Agent 2"]
 O1 --&gt;|summary| A3["Agent 3"]
 A1 --&gt;|result| O1
 A2 --&gt;|result| O1
 A3 --&gt;|result| O1
 end

 subgraph good["Environment-Based Orchestration"]
 E["Shared Environment &amp;lt;br/&amp;gt; (codebase, state, history)"]
 B1["Agent 1"] &lt;--&gt;|direct access| E
 B2["Agent 2"] &lt;--&gt;|direct access| E
 B3["Agent 3"] &lt;--&gt;|direct access| E
 end

 style bad fill:#fff5f5,stroke:#c92a2a
 style good fill:#f0fff4,stroke:#2b8a3e&lt;/pre&gt;&lt;p&gt;The key distinction is between conversation-based and environment-based orchestration. In conversation-based systems, agents communicate through the orchestrator, which becomes the bottleneck. In environment-based systems, agents share a common workspace — the filesystem, the git history, the running application — and context is preserved in the environment itself rather than in message passing.&lt;/p&gt;
&lt;p&gt;This is why tools like Claude Code already work better than most multi-agent frameworks for real development tasks. A single agent with direct access to the full codebase, the ability to run commands, and persistent context within a session avoids all three bottlenecks by design. There is no handoff to lose context at, no delegation to ghost, and no separate verifier that lacks context.&lt;/p&gt;
&lt;h2 id="deep-within-domains-loose-across-boundaries"&gt;Deep Within Domains, Loose Across Boundaries
&lt;/h2&gt;&lt;p&gt;The practical takeaway is captured in one phrase: &amp;ldquo;deep within domains, loose across boundaries.&amp;rdquo; An AI agent should go deep on a well-scoped domain — understanding the full context of a particular module, service, or feature. But the boundaries between domains should be handled loosely: through well-defined interfaces, shared environments, and human oversight rather than tight agent-to-agent coupling.&lt;/p&gt;
&lt;p&gt;This maps well to how effective human teams work. A senior engineer goes deep on their component and communicates with other teams through APIs, design docs, and code review — not by having a manager relay summarized instructions. The manager (orchestrator) sets direction and resolves conflicts but does not serve as the communication channel for technical details.&lt;/p&gt;
&lt;p&gt;Five evaluation criteria emerge for deciding how much to delegate to agents: task scope clarity, context self-containment, verification tractability, rollback cost, and domain expertise depth. Tasks that score high on all five — clear scope, self-contained context, easy to verify, cheap to undo, deep domain match — are excellent candidates for agent delegation. Tasks that score low on any dimension are better handled by a human or a single agent with full context.&lt;/p&gt;
&lt;h2 id="the-metaphor-itself-may-be-wrong"&gt;The Metaphor Itself May Be Wrong
&lt;/h2&gt;&lt;p&gt;Perhaps the most provocative insight is that the employee metaphor for AI agents is fundamentally misleading. We talk about &amp;ldquo;hiring&amp;rdquo; agents, &amp;ldquo;delegating&amp;rdquo; tasks, building &amp;ldquo;teams&amp;rdquo; and &amp;ldquo;companies&amp;rdquo; of agents. But agents are not employees. They do not accumulate institutional knowledge across sessions. They do not build relationships with other agents that improve collaboration over time. They do not have the ambient awareness that comes from sitting in the same office.&lt;/p&gt;
&lt;p&gt;Agents are more like pure functions with expensive invocations: they take an input context, produce an output, and forget everything. Orchestrating them like employees — with org charts, reporting structures, and delegation hierarchies — applies a metaphor that actively misleads system designers into architectures that maximize the three bottlenecks.&lt;/p&gt;
&lt;p&gt;A better metaphor might be a single expert with excellent tools. One skilled developer with a powerful IDE, good documentation, and access to the full codebase will outperform a &amp;ldquo;team&amp;rdquo; of ten agents with fragmented context every time. The future of AI-assisted development is not about building bigger agent teams. It is about making individual agents deeper, giving them richer environment access, and being thoughtful about when and where to introduce boundaries.&lt;/p&gt;
&lt;p&gt;The $5,000 in burned tokens was not wasted — it was the cost of learning that the answer was already in front of us.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Based on &lt;a class="link" href="https://shalomeir.substack.com/p/multi-agent-orchestration-problems" target="_blank" rel="noopener"
 &gt;shalomeir&amp;rsquo;s analysis&lt;/a&gt; of multi-agent orchestration failures across Claude Code agent teams, Gastown, and Paperclip.&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>