[{"content":"Overview This is the first post in a series that systematically dissects Claude Code\u0026rsquo;s source structure across 27 sessions. In this post, we trace the complete call stack across 11 TypeScript files that a \u0026ldquo;hello\u0026rdquo; typed into the terminal traverses before a response appears on screen.\nAnalysis Target: 11 Core Files # Path Lines Role 1 entrypoints/cli.tsx 302 CLI bootstrap, argument parsing, mode routing 2 main.tsx 4,683 Main REPL component, Commander setup 3 commands.ts 754 Command registry 4 context.ts 189 System prompt assembly, CLAUDE.md injection 5 QueryEngine.ts 1,295 Session management, SDK interface 6 query.ts 1,729 Core turn loop — API + tool execution 7 services/api/client.ts 389 HTTP client, 4-provider routing 8 services/api/claude.ts 3,419 Messages API wrapper, SSE streaming, retries 9 services/tools/toolOrchestration.ts 188 Concurrency partitioning 10 services/tools/StreamingToolExecutor.ts 530 Tool execution during streaming 11 services/tools/toolExecution.ts 1,745 Tool dispatch, permission checks We trace a total of 15,223 lines.\n1. Entry and Bootstrap: cli.tsx -\u0026gt; main.tsx cli.tsx is only 302 lines, yet it contains a surprising number of fast-path branches:\ncli.tsx:37 --version -\u0026gt; immediate output, 0 imports cli.tsx:53 --dump-system -\u0026gt; minimal imports cli.tsx:100 --daemon-worker -\u0026gt; worker-only path cli.tsx:112 remote-control -\u0026gt; bridge mode cli.tsx:185 ps/logs/attach -\u0026gt; background sessions cli.tsx:293 default path -\u0026gt; dynamic import of main.tsx Design intent: Avoid loading main.tsx\u0026rsquo;s 4,683 lines just for --version. This optimization directly impacts the perceived responsiveness of the CLI tool.\nThe default path dynamically imports main.tsx:\n// cli.tsx:293-297 const { main: cliMain } = await import(\u0026#39;../main.js\u0026#39;); await cliMain(); The reason main.tsx is 4,683 lines is that it includes all of the following:\nSide-effect imports (lines 1-209): profileCheckpoint, startMdmRawRead, startKeychainPrefetch — parallel subprocesses launched at module evaluation time to hide the ~65ms macOS keychain read Commander setup (line 585+): CLI argument parsing, 10+ mode-specific branches React/Ink REPL rendering: Terminal UI mount Headless path (-p/--print): Uses QueryEngine directly without UI 2. Prompt Assembly: context.ts\u0026rsquo;s dual-memoize context.ts is a small file at 189 lines, but it handles all dynamic parts of the system prompt. Two memoized functions are at its core:\ngetSystemContext() (context.ts:116): Collects git state (branch, status, recent commits) getUserContext() (context.ts:155): Discovers and parses CLAUDE.md files Why the separation? It\u0026rsquo;s directly tied to the Anthropic Messages API\u0026rsquo;s prompt caching strategy. Since the cache lifetimes of the system prompt and user context differ, cache_control must be applied differently to each. Wrapping them in memoize ensures each is computed only once per session.\nThe call to setCachedClaudeMdContent() at context.ts:170-176 is a mechanism to break circular dependencies — yoloClassifier needs CLAUDE.md content, but a direct import would create a permissions -\u0026gt; yoloClassifier -\u0026gt; claudemd -\u0026gt; permissions cycle.\n3. AsyncGenerator Chain: The Architectural Spine Claude Code\u0026rsquo;s entire data flow is built on an AsyncGenerator chain:\nQueryEngine.submitMessage()* -\u0026gt; query()* -\u0026gt; queryLoop()* -\u0026gt; queryModelWithStreaming()* Every core function is an async function*. This isn\u0026rsquo;t just an implementation choice — it\u0026rsquo;s an architectural decision:\nBackpressure: When the consumer is slow, the producer waits Cancellation: Combined with AbortController for immediate cancellation Composition: yield* naturally chains generators together State management: Local variables within loops naturally maintain state across turns Looking at the signature of QueryEngine.submitMessage() (QueryEngine.ts:209):\nasync *submitMessage( prompt: string | ContentBlockParam[], options?: { uuid?: string; isMeta?: boolean }, ): AsyncGenerator\u0026lt;SDKMessage, void, unknown\u0026gt; In SDK mode, each message is streamed via yield, and Node.js backpressure is naturally implemented.\n4. The Core Turn Loop: query.ts\u0026rsquo;s while(true) queryLoop() in query.ts (1,729 lines) is the actual API + tool loop:\n// query.ts:307 while (true) { // 1. Call queryModelWithStreaming() -\u0026gt; SSE stream // 2. Yield streaming events // 3. Detect tool calls -\u0026gt; runTools()/StreamingToolExecutor // 4. Append tool results to messages // 5. stop_reason == \u0026#34;end_turn\u0026#34; -\u0026gt; break // stop_reason == \u0026#34;tool_use\u0026#34; -\u0026gt; continue } The State type (query.ts:204) is important. It manages loop state as an explicit record with fields like messages, toolUseContext, autoCompactTracking, and maxOutputTokensRecoveryCount, updating everything at once at continue sites.\n5. API Communication: 4 Providers and Caching getAnthropicClient() at client.ts:88 supports 4 providers:\nProvider SDK Reason for Dynamic Import Anthropic Direct Anthropic Default, loaded immediately AWS Bedrock AnthropicBedrock AWS SDK is several MB Azure Foundry AnthropicFoundry Azure Identity is several MB GCP Vertex AnthropicVertex Google Auth is several MB The core function chain in claude.ts (3,419 lines):\nqueryModelWithStreaming() (claude.ts:752) -\u0026gt; queryModel() -\u0026gt; withRetry() -\u0026gt; anthropic.beta.messages.stream() (SDK call) The caching strategy is determined by getCacheControl() (claude.ts:358), which decides the 1-hour TTL based on user type, feature flags, and query source.\n6. Tool Orchestration: 3-Tier Concurrency flowchart TD TC[\"Tool call array\u0026lt;br/\u0026gt;[ReadFile, ReadFile, Bash, ReadFile]\"] P[\"partitionToolCalls()\u0026lt;br/\u0026gt;toolOrchestration.ts:91\"] B1[\"Batch 1\u0026lt;br/\u0026gt;ReadFile + ReadFile\u0026lt;br/\u0026gt;isConcurrencySafe=true\"] B2[\"Batch 2\u0026lt;br/\u0026gt;Bash\u0026lt;br/\u0026gt;isConcurrencySafe=false\"] B3[\"Batch 3\u0026lt;br/\u0026gt;ReadFile\u0026lt;br/\u0026gt;isConcurrencySafe=true\"] PAR[\"Promise.all()\u0026lt;br/\u0026gt;max 10 concurrent\"] SEQ[\"Sequential execution\"] PAR2[\"Promise.all()\"] TC --\u003e P P --\u003e B1 P --\u003e B2 P --\u003e B3 B1 --\u003e PAR B2 --\u003e SEQ B3 --\u003e PAR2 style B1 fill:#e8f5e9 style B2 fill:#ffebee style B3 fill:#e8f5e9StreamingToolExecutor (530 lines) extends this batch partitioning into a streaming context. When it detects tool calls while the API response is still streaming, it immediately starts execution:\naddTool() (StreamingToolExecutor.ts:76) — Add to queue processQueue() (StreamingToolExecutor.ts:140) — Check concurrency, then execute immediately getRemainingResults() (StreamingToolExecutor.ts:453) — Wait for all tools to complete Error propagation rules: Only Bash errors cancel sibling tools (siblingAbortController). Read/WebFetch errors don\u0026rsquo;t affect other tools. This reflects the implicit dependencies between Bash commands (if mkdir fails, subsequent commands are pointless).\nFull Data Flow sequenceDiagram participant User as User participant CLI as cli.tsx participant Main as main.tsx participant QE as QueryEngine participant Query as query.ts participant Claude as claude.ts participant API as Anthropic API participant Tools as toolOrchestration participant Exec as toolExecution User-\u003e\u003eCLI: Types \"hello\" CLI-\u003e\u003eMain: dynamic import Main-\u003e\u003eQE: new QueryEngine() QE-\u003e\u003eQuery: query() Query-\u003e\u003eClaude: queryModelWithStreaming() Claude-\u003e\u003eAPI: anthropic.beta.messages.stream() API--\u003e\u003eClaude: SSE stream alt stop_reason == end_turn Claude--\u003e\u003eUser: Output response else stop_reason == tool_use Claude--\u003e\u003eQuery: tool_use blocks Query-\u003e\u003eTools: partitionToolCalls() Tools-\u003e\u003eExec: runToolUse() Exec-\u003e\u003eExec: canUseTool() + tool.call() Exec--\u003e\u003eQuery: Tool results Note over Query: Next iteration of while(true) endRust Gap Map Preview Tracing the same request through the Rust port revealed 31 gaps:\nPriority Gap Count Key Examples P0 (Critical) 2 Synchronous ApiClient, missing StreamingToolExecutor P1 (High) 6 3-tier concurrency, prompt caching, Agent tool P2 (Medium) 7 Multi-provider, effort control, sandbox Implemented 11 Auto-compaction, SSE parser, OAuth, config loading Implementation coverage: 36% (11/31). The next post dives deep into the conversation loop at the heart of these gaps.\nInsights AsyncGenerator is the architectural spine — It\u0026rsquo;s not just an implementation technique but a design decision that simultaneously solves backpressure, cancellation, and composition. In Rust, the Stream trait is the counterpart, but the ergonomics of yield* composition differ significantly.\nmain.tsx at 4,683 lines is technical debt — Commander setup, React components, and state management are all mixed in a single file. This is the result of organic growth and represents an opportunity for module decomposition.\nTool concurrency is non-trivial — The 3-tier model (read batches, sequential writes, Bash sibling cancellation) rather than \u0026ldquo;all parallel\u0026rdquo; or \u0026ldquo;all sequential\u0026rdquo; is a core design element of production agent harnesses.\nNext post: #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-harness-anatomy-1/cover-en.jpg","permalink":"/posts/2026-04-06-harness-anatomy-1/","title":"Claude Code Harness Anatomy #1 — From Entry Point to Response: The Journey of a Single Request"},{"content":"Overview In the first post of this series, we traced the journey of a single \u0026ldquo;hello\u0026rdquo; through 11 files. This post fully dissects the heart of that journey: the while(true) loop in query.ts\u0026rsquo;s 1,729 lines. We analyze the resilient execution model created by 7 continue paths, the 4-stage state machine of StreamingToolExecutor, and the 3-tier concurrency model of partitionToolCalls(), then compare how we reproduced these patterns in a Rust prototype.\nAnalysis Target: 10 Core Files # Path Lines Role 1 query/config.ts 46 Immutable runtime gate snapshot 2 query/deps.ts 40 Testable I/O boundary (DI) 3 query/tokenBudget.ts 93 Token budget management, auto-continue/stop decisions 4 query/stopHooks.ts 473 Stop/TaskCompleted/TeammateIdle hooks 5 query.ts 1,729 Core \u0026ndash; while(true) turn loop 6 QueryEngine.ts 1,295 Session wrapper, SDK interface 7 toolOrchestration.ts 188 Tool partitioning + concurrency control 8 StreamingToolExecutor.ts 530 SSE mid-stream tool pipelining 9 toolExecution.ts 1,745 Tool dispatch, permission checks 10 toolHooks.ts 650 Pre/PostToolUse hook pipeline We dissect a total of 6,789 lines of core orchestration code.\n1. queryLoop()\u0026rsquo;s 7 Continue Paths The queryLoop() function in query.ts (query.ts:241) is not a simple API call loop. It\u0026rsquo;s a resilient executor with 7 distinct continue reasons, each handling a unique failure scenario:\nReason Line Description collapse_drain_retry 1114 Retry after context collapse drain reactive_compact_retry 1162 Retry after reactive compaction (413 recovery) max_output_tokens_escalate 1219 Token escalation from 8k -\u0026gt; 64k max_output_tokens_recovery 1248 Inject \u0026ldquo;continue writing\u0026rdquo; nudge message stop_hook_blocking 1303 Stop hook returned a blocking error token_budget_continuation 1337 Continue due to remaining token budget next_turn 1725 Next turn after tool execution completes The State type is key (query.ts:204-217). Loop state is managed as a record with 10 fields. Why a record instead of individual variables? There are 7 continue sites, each updating via state = { ... } all at once. Individually assigning 9 variables makes it easy to miss one. Record updates let the type system catch omissions.\nFull Flow of a Single Loop Iteration 1. Preprocessing (365-447): snip compaction, micro-compact, context collapse 2. Auto-compaction (454-543): on success, replace messages and continue 3. Blocking limit check (628-648): immediate termination if token threshold exceeded 4. API streaming (654-863): consume SSE events via for-await 5. No-tool exit paths (1062-1357): 413 recovery, max_output recovery, stop hooks 6. Tool continuation paths (1360-1728): execute remaining tools -\u0026gt; next_turn 2. StreamingToolExecutor\u0026rsquo;s 4-Stage State Machine StreamingToolExecutor.ts (530 lines) is the most sophisticated concurrency pattern in Claude Code. The core idea: start executing completed tool calls while the API response is still streaming.\nWhen the model calls [ReadFile(\u0026quot;a.ts\u0026quot;), ReadFile(\u0026quot;b.ts\u0026quot;), Bash(\u0026quot;make test\u0026quot;)] at once, without pipelining, execution only begins after all three tool blocks have arrived. With pipelining, file reading starts the instant the ReadFile(\u0026quot;a.ts\u0026quot;) block completes.\nstateDiagram-v2 [*] --\u003e queued: addTool() queued --\u003e executing: processQueue()\u0026lt;br/\u0026gt;canExecuteTool() == true queued --\u003e completed: Pre-canceled\u0026lt;br/\u0026gt;getAbortReason() != null executing --\u003e completed: Tool execution finished\u0026lt;br/\u0026gt;or sibling abort completed --\u003e yielded: getCompletedResults()\u0026lt;br/\u0026gt;yield in order yielded --\u003e [*] note right of queued processQueue() auto-triggers on addTool() and prior tool completion end note note right of completed On Bash error: siblingAbortController.abort() cancels sibling tools only end noteConcurrency Decision Logic (canExecuteTool, line 129) Execution conditions: - No tools currently executing (executingTools.length === 0) - Or: this tool is concurrencySafe AND all executing tools are also concurrencySafe Read-only tools can execute in parallel, but if even one write tool is present, the next tool waits until it finishes.\nsiblingAbortController \u0026ndash; Hierarchical Cancellation siblingAbortController (line 46-61) is a child of toolUseContext.abortController. When a Bash tool throws an error, it calls siblingAbortController.abort('sibling_error') to cancel only sibling tools. The parent controller is unaffected, so the overall query continues.\nWhy do only Bash errors cancel siblings? In mkdir -p dir \u0026amp;\u0026amp; cd dir \u0026amp;\u0026amp; make, if mkdir fails, subsequent commands are pointless. ReadFile or WebFetch failures are independent and shouldn\u0026rsquo;t affect other tools.\n3. partitionToolCalls \u0026ndash; 3-Tier Concurrency Model toolOrchestration.ts (188 lines) defines the entire concurrency model for tool execution.\nflowchart TD TC[\"Tool call array\u0026lt;br/\u0026gt;[ReadFile, ReadFile, Bash, ReadFile]\"] P[\"partitionToolCalls()\u0026lt;br/\u0026gt;toolOrchestration.ts:91\"] B1[\"Batch 1\u0026lt;br/\u0026gt;ReadFile + ReadFile\u0026lt;br/\u0026gt;isConcurrencySafe=true\"] B2[\"Batch 2\u0026lt;br/\u0026gt;Bash\u0026lt;br/\u0026gt;isConcurrencySafe=false\"] B3[\"Batch 3\u0026lt;br/\u0026gt;ReadFile\u0026lt;br/\u0026gt;isConcurrencySafe=true\"] PAR[\"Promise.all()\u0026lt;br/\u0026gt;max 10 concurrent\"] SEQ[\"Sequential execution\"] PAR2[\"Promise.all()\"] TC --\u003e P P --\u003e B1 P --\u003e B2 P --\u003e B3 B1 --\u003e PAR B2 --\u003e SEQ B3 --\u003e PAR2 style B1 fill:#e8f5e9 style B2 fill:#ffebee style B3 fill:#e8f5e9The rule is simple: consecutive isConcurrencySafe tools are grouped into a single batch, while non-safe tools each become independent batches. This decision comes from the tool definition itself — determined by calling tool.isConcurrencySafe(parsedInput). The same tool may have different concurrency safety depending on its input.\nContext Modifiers and Race Conditions Why apply them in order after the batch completes? Applying context modifiers immediately during parallel execution creates race conditions. If A completes first and modifies the context, B (still executing) started with the pre-modification context but would see the post-modification state. Applying them in original tool order after batch completion guarantees deterministic results (toolOrchestration.ts:54-62).\n4. Tool Execution Pipeline and Hooks runToolUse() in toolExecution.ts (1,745 lines, line 337) manages the complete lifecycle of each individual tool call:\nrunToolUse() entry point 1. findToolByName() -- retry with deprecated aliases (345-356) 2. abort check -- if already canceled, return CANCEL_MESSAGE (415) 3. streamedCheckPermissionsAndCallTool() -- permissions + execution + hooks (455) -\u0026gt; checkPermissionsAndCallTool(): a. Zod schema input validation (615) b. tool.validateInput() custom validation (683) c. Speculative classifier (Bash only, 740) d. runPreToolUseHooks() (800) e. resolveHookPermissionDecision() (921) f. tool.call() actual execution (1207) g. runPostToolUseHooks() result transformation The Core Invariant of resolveHookPermissionDecision In resolveHookPermissionDecision() (toolHooks.ts:332), a hook\u0026rsquo;s allow does not bypass settings.json deny/ask rules (toolHooks.ts:373). Even if a hook allows, it must still pass checkRuleBasedPermissions(). This reflects the design principle that \u0026ldquo;hooks are automation helpers, not security bypasses.\u0026rdquo;\nWhen hook result is allow: -\u0026gt; Call checkRuleBasedPermissions() -\u0026gt; null means pass (no rules) -\u0026gt; deny means rule overrides hook -\u0026gt; ask means user prompt required 5. Rust Comparison \u0026ndash; 152 Lines vs 1,729 Lines Rust\u0026rsquo;s ConversationRuntime::run_turn() consists of 152 lines in a single loop {} (conversation.rs:183-272). Of the 7 TS continue paths, only next_turn (next turn after tool execution) exists in Rust.\nTS Continue Reason Rust Status Why collapse_drain_retry Not implemented No context collapse reactive_compact_retry Not implemented No 413 recovery max_output_tokens_escalate Not implemented No 8k-\u0026gt;64k escalation max_output_tokens_recovery Not implemented No multi-turn nudge stop_hook_blocking Not implemented No stop hooks token_budget_continuation Not implemented No token budget system next_turn Implemented Re-calls API after tool results The Most Critical Gap: Synchronous API Consumption The Rust ApiClient trait signature says it all:\nfn stream(\u0026amp;mut self, request: ApiRequest) -\u0026gt; Result\u0026lt;Vec\u0026lt;AssistantEvent\u0026gt;, RuntimeError\u0026gt;; The return type is Vec\u0026lt;AssistantEvent\u0026gt;. It\u0026rsquo;s not streaming. It collects all SSE events and returns them as a vector. This means when the model calls 5 ReadFiles, TS can finish executing the first ReadFile while still streaming, but Rust must wait for all 5 to finish streaming before starting sequential execution. The latency gap grows proportionally with the number of tools.\n6. Rust Prototype \u0026ndash; Bridging the Gap In the S04 prototype, we implemented an orchestration layer that bridges 3 P0 gaps:\nflowchart LR subgraph TS[\"TS Streaming Pipeline\"] direction TB ts1[\"SSE event stream\"] ts2[\"StreamingToolExecutor\u0026lt;br/\u0026gt;4-state machine\"] ts3[\"getCompletedResults()\u0026lt;br/\u0026gt;guaranteed yield order\"] ts1 --\u003e ts2 --\u003e ts3 end subgraph Rust[\"Rust Prototype\"] direction TB rs1[\"EventStream\u0026lt;br/\u0026gt;tokio async\"] rs2[\"StreamingPipeline\u0026lt;br/\u0026gt;tokio::spawn + mpsc\"] rs3[\"Post-MessageEnd\u0026lt;br/\u0026gt;channel collect + sort\"] rs1 --\u003e rs2 --\u003e rs3 end subgraph Bridge[\"Core Mappings\"] direction TB b1[\"yield -\u003e tx.send()\"] b2[\"yield* -\u003e channel forwarding\"] b3[\"for await -\u003e while let recv()\"] end TS ~~~ Bridge ~~~ Rust style TS fill:#e1f5fe style Rust fill:#fff3e0 style Bridge fill:#f3e5f53 Key Implementations in the Prototype 1. Async streaming: Extended the ApiClient trait to an async stream. Since MessageStream::next_event() is already async, only the consumer side needed changes.\n2. Tool pipelining: On receiving a ToolUseEnd event, assembles a ToolCall from accumulated input and immediately starts background execution via tokio::spawn. Collects results in completion order via mpsc::unbounded_channel, then sorts back to original order.\n3. 3-tier concurrency: Partitions by ToolCategory enum (ReadOnly/Write/BashLike). ReadOnly batches use Semaphore(10) + tokio::spawn for up to 10 parallel tasks. BashLike runs sequentially with remaining tasks aborted on error.\nPrototype Coverage TS Feature Prototype Status partitionToolCalls() 3-tier partition_into_runs() + ToolCategory Implemented runToolsConcurrently() max 10 Semaphore(10) + tokio::spawn Implemented siblingAbortController break on BashLike error Simplified StreamingToolExecutor.addTool() tokio::spawn on ToolUseEnd Implemented PreToolUse hook deny/allow HookDecision::Allow/Deny Implemented PostToolUse output transform HookResult::transformed_output Implemented 4-state machine (queued-\u0026gt;yielded) spawned/completed 2-state Incomplete 413 recovery / max_output escalation \u0026ndash; Not implemented preventContinuation \u0026ndash; Not implemented Stop Condition Comparison Condition TS Rust No tools (end_turn) Execute handleStopHooks() then exit Immediate break Token budget exceeded checkTokenBudget() with 3 decisions None max_output_tokens Escalation + multi-turn recovery None 413 prompt-too-long Context collapse + reactive compaction Error propagation maxTurns maxTurns parameter (query.ts:1696) max_iterations Diminishing returns 3+ turns with \u0026lt;500 token increase None checkTokenBudget() in tokenBudget.ts (93 lines) controls whether to continue responding, not prompt size. COMPLETION_THRESHOLD = 0.9 (continue if below 90% of total budget), DIMINISHING_THRESHOLD = 500 (stop if 3+ consecutive turns each produce fewer than 500 tokens, indicating diminishing returns). The nudgeMessage explicitly instructs \u0026ldquo;do not summarize.\u0026rdquo;\nThe Core Design Decision \u0026ndash; Why AsyncGenerator The entire pipeline is an async function* chain:\nQueryEngine.submitMessage()* -\u0026gt; query()* -\u0026gt; queryLoop()* -\u0026gt; deps.callModel()* runTools()* -\u0026gt; runToolUse()* -\u0026gt; handleStopHooks()* -\u0026gt; executeStopHooks()* The key benefit of this choice: implementing complex state machines without inversion of control. At each of the 7 continue paths, you construct state explicitly with state = { ... } and continue. With a callback-based approach, state management would be scattered, making it difficult to guarantee consistency across 7 recovery paths.\nIn Rust, since the yield keyword isn\u0026rsquo;t stabilized, tokio::sync::mpsc channels serve as the replacement. yield -\u0026gt; tx.send(), yield* -\u0026gt; channel forwarding, for await...of -\u0026gt; while let Some(v) = rx.recv().\nInsights query.ts\u0026rsquo;s 7 continue paths are not \u0026ldquo;error handling\u0026rdquo; but a \u0026ldquo;resilience engine\u0026rdquo; \u0026ndash; It collapses context on 413 errors, escalates tokens on max_output, and feeds back errors to the model on stop hook blocking. This recovery pipeline ensures stability during long-running autonomous tasks. Reproducing this in Rust requires state management beyond a simple loop {}.\nStreamingToolExecutor is a UX decision, not a performance optimization \u0026ndash; Executing 5 tools sequentially makes users wait for the sum of all execution times. Pipelining reduces not benchmark numbers but the perceived \u0026ldquo;waiting for a response\u0026rdquo; time. In the Rust prototype, we implemented this in under 20 lines using tokio::spawn + mpsc channels.\nThe dual structure of static partitioning + runtime concurrency balances safety and performance \u0026ndash; partitionToolCalls() divides batches at build time, while canExecuteTool() judges executability at runtime. Thanks to this dual structure, the non-streaming path (runTools) and the streaming path (StreamingToolExecutor) share identical concurrency semantics.\nNext post: #3 \u0026ndash; The Design Philosophy of 42 Tools, from BashTool to AgentTool\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-harness-anatomy-2/cover-en.jpg","permalink":"/posts/2026-04-06-harness-anatomy-2/","title":"Claude Code Harness Anatomy #2 — The Heart of the Conversation Loop: StreamingToolExecutor and 7 Continue Paths"},{"content":"Overview Claude Code has 42 tools. This post dissects the \u0026ldquo;tools know themselves\u0026rdquo; pattern implemented by the 30+ member Tool.ts interface, classifies all 42 tools into 8 families, and deep-dives into the most complex ones: BashTool\u0026rsquo;s 6-layer security chain (12,411 lines), AgentTool\u0026rsquo;s 4 spawn modes (6,782 lines), FileEditTool\u0026rsquo;s string matching strategy, MCPTool\u0026rsquo;s empty-shell proxy pattern, and the Task state machine.\n1. Tool Interface \u0026ndash; \u0026ldquo;Tools Know Themselves\u0026rdquo; Tool.ts (792 lines) is the contract for the tool system. The Tool type (Tool.ts:362-695) that every tool implements consists of 30+ members across four domains:\nDomain Key Members Role Execution contract call(), inputSchema, validateInput(), checkPermissions() Core tool logic Metadata name, aliases, searchHint, shouldDefer, maxResultSizeChars Search and display Concurrency/Safety isConcurrencySafe(), isReadOnly(), isDestructive(), interruptBehavior() Orchestration decisions UI rendering renderToolUseMessage() + 10 more Terminal display Why so many members in one interface? When the orchestrator (toolExecution.ts) calls a tool, it can read all metadata directly from the tool object without any external mapping tables. This is the foundation of a plugin architecture where adding a new tool is self-contained within a single directory.\nToolUseContext \u0026ndash; 42 Fields of Execution Environment ToolUseContext (Tool.ts:158-300) is the environment context injected during tool execution. Spanning 142 lines, it defines 42 fields:\nabortController: Cancellation propagation for the 3-tier concurrency model getAppState()/setAppState(): Global state access (permissions, todos, teams) readFileState: LRU cache-based change detection contentReplacementState: Save large results to disk, return summaries only Tools are not isolated functions — they need access to the harness\u0026rsquo;s entire state. FileReadTool uses the cache to detect changes, AgentTool registers sub-agent state, and BashTool can interrupt sibling processes.\nbuildTool()\u0026rsquo;s fail-closed Defaults buildTool() (Tool.ts:783) takes a ToolDef and returns a complete Tool with defaults filled in. The defaults follow a fail-closed principle (Tool.ts:757-768):\nisConcurrencySafe -\u0026gt; false (assume unsafe) isReadOnly -\u0026gt; false (assume writes) If a new tool doesn\u0026rsquo;t explicitly declare concurrency/read-only status, it takes the most conservative path (sequential execution, write permission required). This structurally prevents the bug of accidentally running an unsafe tool in parallel.\n2. 42 Tools in 8 Families flowchart LR subgraph safe[\"isConcurrencySafe: true (10)\"] direction TB R1[\"FileReadTool\"] R2[\"GlobTool / GrepTool\"] R3[\"WebFetchTool / WebSearchTool\"] R4[\"ToolSearchTool / SleepTool\"] R5[\"TaskGetTool / TaskListTool\"] R6[\"LSPTool\"] end subgraph unsafe[\"isConcurrencySafe: false (32)\"] direction TB W1[\"BashTool 12,411 lines\"] W2[\"FileEditTool / FileWriteTool\"] W3[\"AgentTool 6,782 lines\"] W4[\"MCPTool / SkillTool\"] W5[\"Task 5 / Todo\"] W6[\"Config / PlanMode / Worktree\"] end subgraph orch[\"Orchestrator\"] O[\"partitionToolCalls()\u0026lt;br/\u0026gt;toolOrchestration.ts\"] end O --\u003e|\"Parallel batch\"| safe O --\u003e|\"Sequential execution\"| unsafe style safe fill:#e8f5e9 style unsafe fill:#ffebee Family Count Representative Tool Key Characteristic Filesystem 5 FileReadTool (1,602 lines) PDF/image/notebook support, token limits Execution 3 BashTool (12,411 lines) 6-layer security, command semantics Agent/Team 4 AgentTool (6,782 lines) 4 spawn modes, recursive harness Task management 7 TaskUpdateTool (484 lines) State machine, verification nudge MCP/LSP 5 MCPTool (1,086 lines) Empty-shell proxy Web/External 2 WebFetchTool (1,131 lines) Parallel safe State/Config 5 ConfigTool (809 lines) Session state changes Infra/Utility 11 SkillTool (1,477 lines) Command-to-tool bridge Only 10 of 42 (24%) are parallel-safe, but these 10 are the most frequently called tools (Read, Glob, Grep, Web), so perceived parallelism is higher than the ratio suggests.\n3. BashTool \u0026ndash; 6-Layer Security Chain BashTool is not a simple shell executor. Because arbitrary code execution is an inherent risk, more than half of its 12,411 lines are security layers.\nflowchart TB A[\"Model: Bash call\"] --\u003e B{\"validateInput\"} B --\u003e|\"sleep pattern blocked\"| B1[\"Return error\"] B --\u003e|\"Pass\"| C{\"6-layer security chain\"} subgraph chain[\"Security chain\"] C1[\"1. bashSecurity.ts\u0026lt;br/\u0026gt;2,592 lines -- command structure analysis\"] C2[\"2. bashPermissions.ts\u0026lt;br/\u0026gt;2,621 lines -- rule matching\"] C3[\"3. readOnlyValidation.ts\u0026lt;br/\u0026gt;1,990 lines -- read-only determination\"] C4[\"4. pathValidation.ts\u0026lt;br/\u0026gt;1,303 lines -- path-based security\"] C5[\"5. sedValidation.ts\u0026lt;br/\u0026gt;684 lines -- sed-specific security\"] C6[\"6. shouldUseSandbox.ts\u0026lt;br/\u0026gt;153 lines -- sandbox decision\"] C1 --\u003e C2 --\u003e C3 --\u003e C4 --\u003e C5 --\u003e C6 end C --\u003e chain chain --\u003e D{\"allow / ask / deny\"} D --\u003e|\"allow\"| E[\"runShellCommand()\"] D --\u003e|\"ask\"| F[\"Request user approval\"] D --\u003e|\"deny\"| G[\"Denied\"] E --\u003e H[\"Result processing\u0026lt;br/\u0026gt;interpretCommandResult()\u0026lt;br/\u0026gt;trackGitOperations()\"] style chain fill:#fff3e0Each layer handles a different threat:\nbashSecurity.ts (2,592 lines): Blocks command substitution ($(), `), Zsh module-based attacks. Key: only metacharacters in unquoted contexts are classified as dangerous bashPermissions.ts (2,621 lines): Rule-based allow/deny/ask. stripAllLeadingEnvVars() + stripSafeWrappers() strip wrappers to extract the actual command readOnlyValidation.ts (1,990 lines): If read-only, then isConcurrencySafe: true — parallel execution allowed pathValidation.ts (1,303 lines): Per-command path extraction rules for path safety judgment sedValidation.ts (684 lines): sed\u0026rsquo;s w and e flags can write files/execute arbitrary code — blocked separately shouldUseSandbox.ts (153 lines): Final isolation decision Command semantics (commandSemantics.ts): grep and diff return exit code 1 as a normal result, not an error. The COMMAND_SEMANTICS Map defines per-command interpretation rules.\nRust porting implications: Either reproduce all 6 layers wholesale, or simplify to sandbox-only. Skipping intermediate layers creates security holes.\n4. AgentTool \u0026ndash; 4 Spawn Modes AgentTool is less of a \u0026ldquo;tool\u0026rdquo; and more of an agent orchestrator. The key: runAgent() recursively calls the harness\u0026rsquo;s query() loop. Child agents receive the same tools, API access, and security checks as the parent.\nMode Trigger Context Sharing Background Synchronous Default None (prompt only) No Async run_in_background: true None Yes Fork subagent_type omitted Full parent context Yes Remote isolation: \u0026quot;remote\u0026quot; None Yes Fork Sub-agents \u0026ndash; Byte-Identical Prefix Forks inherit the parent\u0026rsquo;s full conversation context. To share prompt cache, all fork children are designed to produce byte-identical API request prefixes:\nTool use results replaced with placeholders FORK_BOILERPLATE_TAG prevents recursive forking Model kept identical (model: 'inherit') — different models cause cache misses Memory System (agentMemory.ts) Per-agent persistent memory is managed across 3 scopes:\nuser: ~/.claude/agent-memory/\u0026lt;type\u0026gt;/ — user-global project: .claude/agent-memory/\u0026lt;type\u0026gt;/ — project-shared (VCS) local: .claude/agent-memory-local/\u0026lt;type\u0026gt;/ — local-only 5. FileEditTool \u0026ndash; Partial Replacement Pattern FileEditTool (1,812 lines) performs old_string -\u0026gt; new_string patches rather than full file writes. The model doesn\u0026rsquo;t need to output the entire file, saving tokens and enabling diff-based review.\nMatching strategy:\nExact string matching: fileContent.includes(searchString) Quote normalization: Convert curly quotes -\u0026gt; straight quotes and retry, with preserveQuoteStyle() preserving the original style Uniqueness validation: Fails if old_string is not unique in the file (unless replace_all) Concurrency protection: The readFileState Map stores per-file last-read timestamps. During editing, it compares against the on-disk modification time to detect external changes. This is why the \u0026ldquo;Read before Edit\u0026rdquo; rule is enforced in the prompt.\n6. MCPTool \u0026ndash; Empty-Shell Proxy MCPTool (1,086 lines) is where a single tool definition represents hundreds of external tools. At build time it\u0026rsquo;s an empty shell; at runtime, mcpClient.ts clones and overrides it per server:\n// MCPTool.ts:27-51 -- core methods have \u0026#34;Overridden in mcpClient.ts\u0026#34; comments name: \u0026#39;mcp\u0026#39;, // replaced at runtime with \u0026#39;mcp__serverName__toolName\u0026#39; async call() { return { data: \u0026#39;\u0026#39; } }, // replaced at runtime with actual MCP call The UI collapse classification (classifyForCollapse.ts, 604 lines) uses 139 SEARCH_TOOLS and 280+ READ_TOOLS names to determine whether an MCP tool is a read/search operation. Unknown tools are not collapsed (conservative approach).\n7. Task State Machine \u0026ndash; Agent IPC TaskUpdateTool (406 lines) state flow: pending -\u0026gt; in_progress -\u0026gt; completed or deleted.\nKey behaviors:\nAuto-assign owner: Current agent name is automatically assigned on in_progress transition Verification nudge: After 3+ tasks completed without a verification step, recommends spawning a verification agent Message routing (SendMessageTool 917 lines): By name, * broadcast, uds:path Unix domain socket, bridge:session remote peer, agent ID resume Task/SendMessage are not simple utilities but the inter-process communication (IPC) foundation of the multi-agent system.\nTS vs Rust Comparison Aspect TS (42 tools) Rust (10 tools) Tool definition Tool interface + buildTool() ToolSpec struct + mvp_tool_specs() Input schema Zod v4 + lazySchema() serde_json::json!() direct JSON Schema Concurrency declaration isConcurrencySafe(parsedInput) None — sequential execution Permission check checkPermissions() -\u0026gt; PermissionResult PermissionMode enum UI rendering 10+ render methods (React/Ink) None MCP integration MCPTool + inputJSONSchema dual path None Size comparison ~48,000 lines (tool code only) ~1,300 lines (single lib.rs) Key gap: The Rust port only implements the execution contract (call equivalent); concurrency declarations, permission pipeline, UI rendering, and lazy-loading optimizations are all missing.\nInsights Security is a chain, not a single checkpoint \u0026ndash; BashTool\u0026rsquo;s 6 layers each handle different threats. bashSecurity handles command structure, bashPermissions handles rule matching, pathValidation handles path safety. If any link in this chain is missing, an attack surface opens. Combined with the fail-closed principle, the conservative strategy of \u0026ldquo;block when uncertain\u0026rdquo; permeates the entire system.\nAgents are recursive harness instances \u0026ndash; The fact that AgentTool\u0026rsquo;s runAgent() recursively calls the harness\u0026rsquo;s query() loop means \u0026ldquo;agent\u0026rdquo; is not a separate system but a different configuration of the same harness. It swaps only the tool pool while reusing the same security, hooks, and orchestration.\nOnly 10 of 42 tools are concurrency-safe, yet perceived parallelism is high \u0026ndash; The 10 tools representing only 24% of the total (Read, Glob, Grep, Web, LSP) happen to be the most frequently called. This asymmetry demonstrates the practical value of the 3-tier concurrency model. buildTool()\u0026rsquo;s fail-closed default (isConcurrencySafe: false) forms the safety boundary, structurally preventing new tool developers from incorrectly declaring concurrency safety.\nNext post: #4 \u0026ndash; Runtime Hooks: 26+ Events and CLAUDE.md 6-Stage Discovery\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-harness-anatomy-3/cover-en.jpg","permalink":"/posts/2026-04-06-harness-anatomy-3/","title":"Claude Code Harness Anatomy #3 — The Design Philosophy of 42 Tools, from BashTool to AgentTool"},{"content":"Overview In Claude Code, the word \u0026ldquo;hook\u0026rdquo; refers to two completely different systems. Runtime hooks (toolHooks.ts + utils/hooks.ts) are a security/extension pipeline that executes shell scripts before and after tool execution, while React hooks (hooks/*.ts, 85+) are state management code for the terminal UI. Missing this distinction leads to a 85x overestimation of the Rust reimplementation scope. This post analyzes the PreToolUse/PostToolUse pipeline of runtime hooks, the security invariant of resolveHookPermissionDecision(), the 9-category classification of 85 React hooks, and CLAUDE.md\u0026rsquo;s 6-stage discovery with token budget management.\n1. Runtime Hooks vs React Hooks \u0026ndash; The Key Distinction Dimension Runtime Hooks (toolHooks.ts + utils/hooks.ts) React Hooks (hooks/*.ts) Executor child_process.spawn() React render cycle Configuration settings.json hooks field, shell commands Source code import Execution timing Before/after tool use, session start, etc. (26+ events) Component mount/update User-defined Yes — users register shell scripts No — internal code Result format JSON stdout (allow/deny/ask/rewrite) React state changes Rust reimplementation Required — core of tool execution pipeline Not needed — TUI only 2. PreToolUse Pipeline \u0026ndash; 7 Yield Variants runPreToolUseHooks() (toolHooks.ts:435-650) is designed as an AsyncGenerator. Called before tool execution, it emits the following yield types:\nmessage: Progress messages (hook start/error/cancel) hookPermissionResult: allow/deny/ask decision hookUpdatedInput: Input rewrite (changes input without a permission decision) preventContinuation: Execution halt flag stopReason: Halt reason string additionalContext: Additional context to pass to the model stop: Immediate halt Why AsyncGenerator? Hooks execute sequentially, and each hook\u0026rsquo;s result affects subsequent processing. Promise chaining returns only the final result, and event emitters lack type safety. AsyncGenerator is the only pattern that lets the caller consume each result and halt mid-stream.\nflowchart TD subgraph \"PreToolUse Pipeline\" A[\"toolExecution.ts\u0026lt;br/\u0026gt;Tool call begins\"] B[\"runPreToolUseHooks()\u0026lt;br/\u0026gt;toolHooks.ts:435\"] C[\"getMatchingHooks()\u0026lt;br/\u0026gt;utils/hooks.ts:1603\"] D[\"settings.json hooks\u0026lt;br/\u0026gt;event + pattern matching\"] E[\"spawn() shell command\u0026lt;br/\u0026gt;stdin: JSON, stdout: result\"] F[\"HookResult parsing\u0026lt;br/\u0026gt;allow / deny / ask / rewrite\"] end subgraph \"Permission Resolution\" G[\"resolveHookPermission\u0026lt;br/\u0026gt;Decision()\u0026lt;br/\u0026gt;toolHooks.ts:332\"] H{\"Hook result?\"} I[\"allow: checkRule\u0026lt;br/\u0026gt;BasedPermissions()\u0026lt;br/\u0026gt;rules override hooks\"] J[\"deny: immediate rejection\"] K[\"ask: canUseTool()\u0026lt;br/\u0026gt;user prompt\"] end subgraph \"Tool Execution\" L[\"tool.call()\"] end subgraph \"PostToolUse\" M[\"runPostToolUseHooks()\u0026lt;br/\u0026gt;result transform / block\"] end A --\u003e B --\u003e C --\u003e D --\u003e E --\u003e F --\u003e G --\u003e H H --\u003e|\"allow\"| I H --\u003e|\"deny\"| J H --\u003e|\"ask\"| K I --\u003e|\"Rules pass\"| L L --\u003e MresolveHookPermissionDecision \u0026ndash; allow != bypass The core invariant of resolveHookPermissionDecision() (toolHooks.ts:332-433): a hook\u0026rsquo;s allow does not bypass settings.json deny/ask rules (toolHooks.ts:325-327).\nThe processing logic has 3 stages:\nStage 1 \u0026ndash; allow handling (toolHooks.ts:347-406):\nhookResult.behavior === \u0026#39;allow\u0026#39;: -\u0026gt; Call checkRuleBasedPermissions() -\u0026gt; null -\u0026gt; no rules, hook allow passes -\u0026gt; deny -\u0026gt; rule overrides hook (security first!) -\u0026gt; ask -\u0026gt; user prompt required Why doesn\u0026rsquo;t allow bypass? This is a deliberate security decision. If an external shell script returning {\u0026quot;decision\u0026quot;:\u0026quot;allow\u0026quot;} could override settings.json deny rules, a malicious hook could circumvent security policies. Rules always take precedence over hooks.\nStage 2 \u0026ndash; deny (toolHooks.ts:408-411): Immediate rejection, no further checks.\nStage 3 \u0026ndash; ask/none (toolHooks.ts:413-432): Calls canUseTool() for user prompt.\n26+ Event Types getMatchingHooks() (utils/hooks.ts:1603-1682) handles hook matching:\nTool events: PreToolUse, PostToolUse, PostToolUseFailure, PermissionRequest, PermissionDenied Session events: SessionStart, SessionEnd, Setup Agent events: SubagentStart, SubagentStop, TeammateIdle Task events: TaskCreated, TaskCompleted System events: Notification, ConfigChange, FileChanged, InstructionsLoaded Compact events: PreCompact, PostCompact Input events: UserPromptSubmit, Elicitation, ElicitationResult Stop events: Stop, StopFailure Matched hooks execute sequentially, and if one denies, subsequent hooks are not executed.\n3. 85 React Hooks \u0026ndash; 9 Category Classification mindmap root((\"TS Hook System\")) Runtime Hooks toolHooks.ts 651 lines PreToolUse PostToolUse PostToolUseFailure utils/hooks.ts ~5000 lines 26+ event types Shell spawn Async protocol React Hooks 85+ Permission 3 useCanUseTool PermissionContext UI Input 11 useTextInput useVimInput useTypeahead UI Display 11 useVirtualScroll useDiffData State/Config 12 useSettings useSessionBackgrounding Integration/Remote 12 useRemoteSession useReplBridge Features 20 useVoice useSwarm useTasks Notifications 16 notifs/ directory Tools/Keybindings 5 useMergedTools Additional 5+ fileSuggestions useManagePlugins Category Count Rust Reimpl Representative Hook Permission 3 Partial (bridge) useCanUseTool (203 lines) UI Input 11 Not needed useTextInput (529 lines), useVimInput (316 lines) UI Display 11 Not needed useVirtualScroll (721 lines) State/Config 12 Not needed useSessionBackgrounding (158 lines) Integration/Remote 12 Not needed useRemoteSession (605 lines) Features/Notifications 20 Not needed useVoice (1,144 lines) Notifications/Banners 16 Not needed notifs/ directory Tools/Keybindings 5 Not needed useMergedTools (44 lines) Additional 5+ Not needed fileSuggestions (811 lines) Key takeaway: What Rust needs to reimplement is only the runtime pipeline of toolHooks.ts (651 lines) + utils/hooks.ts (~5,000 lines). The 85 React hooks totaling 15,000+ lines are out of scope.\n4. CLAUDE.md 6-Stage Discovery getMemoryFiles() in claudemd.ts (1,479 lines, L790-1074) loads CLAUDE.md through a 6-stage hierarchy:\nStage Source Path Example Priority 1. Managed Org policy /etc/claude-code/CLAUDE.md Lowest 2. User Personal habits ~/.claude/CLAUDE.md, ~/.claude/rules/*.md 3. Project Project rules CLAUDE.md and .claude/rules/*.md from cwd to root 4. Local Local overrides CLAUDE.local.md (gitignored) 5. AutoMem Auto memory MEMORY.md entrypoint 6. TeamMem Team memory Cross-org sync Highest Why this order? The file comment (L9) states explicitly: \u0026ldquo;Files are loaded in reverse order of priority.\u0026rdquo; LLMs pay more attention to later parts of the prompt, so the most specific instructions (Local \u0026gt; Project \u0026gt; User \u0026gt; Managed) are placed last. This is not CSS specificity — it\u0026rsquo;s a design that leverages LLM attention bias.\nUpward Directory Traversal and Deduplication Starting from originalCwd, it walks up to the filesystem root, then calls dirs.reverse() to process from root downward (L851-857). In monorepos, the parent CLAUDE.md loads first and the child project\u0026rsquo;s CLAUDE.md layers on top.\nWorktree deduplication (L868-884): When a git worktree is nested inside the main repo, an isNestedWorktree check prevents the same CLAUDE.md from being loaded twice.\n@include directive (L451-535): Lexes markdown tokens to ignore @path inside code blocks, recursively resolving only @path in text nodes. Maximum depth of 5.\n5. System/User Context Separation \u0026ndash; dual-memoize Cache context.ts (189 lines) separates the system prompt into two independent contexts:\ngetSystemContext() (L116): Git state, cache breaker getUserContext() (L155): CLAUDE.md merged string, current date Why split into two? Because of the Anthropic API\u0026rsquo;s prompt caching strategy. Git state (session-fixed) and CLAUDE.md (invalidated only on file changes) have different cache lifetimes, so cache_control must be applied differently. Both functions are wrapped in memoize and execute only once per session.\n3 Cache Invalidation Paths setSystemPromptInjection() (context.ts:29): Clears both caches clearMemoryFileCaches() (claudemd.ts:1119): Clears memory files only resetGetMemoryFilesCache() (claudemd.ts:1124): Clears memory files + fires InstructionsLoaded hook This separation distinguishes between worktree switches (no reload needed) and actual reloads (after compaction).\n6. Token Budget \u0026ndash; Response Continuation Decisions checkTokenBudget() in tokenBudget.ts (93 lines) controls whether to continue responding, not prompt size:\nCOMPLETION_THRESHOLD = 0.9 -- continue if below 90% DIMINISHING_THRESHOLD = 500 -- 3+ consecutive turns, \u0026lt;500 tokens each -\u0026gt; diminishing returns if (!isDiminishing \u0026amp;\u0026amp; turnTokens \u0026lt; budget * 0.9) -\u0026gt; continue if (isDiminishing || continuationCount \u0026gt; 0) -\u0026gt; stop with event else -\u0026gt; stop without event Why 0.9? Models tend to start summarizing near the budget limit. Stopping at 90% prevents \u0026ldquo;wrapping up\u0026rdquo; summaries and keeps the work going. The nudgeMessage explicitly instructs \u0026ldquo;do not summarize.\u0026rdquo;\nDiminishing returns detection prevents the model from falling into repetitive patterns. Sub-agents stop immediately (L51) — they don\u0026rsquo;t have their own budgets.\nRust Comparison Aspect TS Rust Hook event types 26+ PreToolUse, PostToolUse (2 only) Hook execution Async AsyncGenerator Synchronous Command::output() Hook results 7 yield variants + JSON Allow/Deny/Warn (3 via exit code) Input modification hookUpdatedInput Not possible allow != bypass Guaranteed Not implemented (security vulnerability) CLAUDE.md 6-stage discovery 4 candidates per dir @include Recursive, depth 5 Not supported Token budget checkTokenBudget() with 3 decisions None Prompt cache memoize + 3 invalidation paths Rebuilt every time Insights The dual meaning of \u0026ldquo;hook\u0026rdquo; is the biggest source of architectural confusion \u0026ndash; The 85 React hooks are not in scope for Rust reimplementation. Only the runtime hooks (~5,600 lines) are porting targets. However, this runtime engine includes 26 event types, an async protocol ({\u0026quot;async\u0026quot;:true} background switching), and prompt requests (bidirectional stdin/stdout). Precisely scoping the meaning of \u0026ldquo;hooks\u0026rdquo; is the starting point for accurate estimation.\nCLAUDE.md\u0026rsquo;s \u0026ldquo;last is strongest\u0026rdquo; pattern is deliberate exploitation of LLM attention bias \u0026ndash; In the 6-stage hierarchical loading (Managed -\u0026gt; User -\u0026gt; Project -\u0026gt; Local -\u0026gt; AutoMem -\u0026gt; TeamMem), the most specific instructions are placed at the end of the prompt for maximum influence. This design emerges at the intersection of API prompt cache hit-rate optimization + LLM behavioral characteristics, not from architectural tidiness.\nThe \u0026ldquo;allow != bypass\u0026rdquo; invariant in resolveHookPermissionDecision() is the security cornerstone \u0026ndash; The current Rust hooks.rs judges allow/deny solely by exit code. Without implementing JSON result parsing and the subsequent checkRuleBasedPermissions check, a malicious hook could bypass deny rules — a security vulnerability. Clearly delineating the boundary between automation convenience and security policy is the fundamental challenge of the hook system.\nNext post: #5 \u0026ndash; MCP Services and the Plugin-Skill Extension Ecosystem\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-harness-anatomy-4/cover-en.jpg","permalink":"/posts/2026-04-06-harness-anatomy-4/","title":"Claude Code Harness Anatomy #4 — Runtime Hooks: 26+ Events and CLAUDE.md 6-Stage Discovery"},{"content":"Overview Beyond its 42 built-in tools, Claude Code can extend with unlimited external tools via MCP (Model Context Protocol). This post analyzes the connection management architecture of client.ts (3,348 lines), the OAuth authentication system of auth.ts (2,465 lines), the 4-layer security model, and config deduplication. We then dissect the structural differences between plugins and skills, the 5-layer skill discovery engine, and the circular reference resolution pattern in mcpSkillBuilders.ts.\n1. MCP Client \u0026ndash; Connection Management Is Harder Than the Protocol Memoization-Based Connection Pool connectToServer is wrapped with lodash.memoize. The cache key is name + JSON(config). Since MCP servers are stateful (stdio processes, WebSocket connections), creating a new connection for every tool call would be catastrophically bad for performance.\nonclose handler invalidates the cache -\u0026gt; next call automatically reconnects fetchToolsForClient and fetchResourcesForClient each have their own LRU cache (20 entries) Tool Proxy Pattern MCP tools are converted to native Tool interfaces:\nname: Format mcp__\u0026lt;normalized_server\u0026gt;__\u0026lt;normalized_tool\u0026gt; call(): ensureConnectedClient -\u0026gt; callMCPToolWithUrlElicitationRetry -\u0026gt; callMCPTool checkPermissions(): Always passthrough — MCP tools use a separate permission system annotations: Maps MCP annotations like readOnlyHint, destructiveHint URL Elicitation Retry: OAuth-based MCP servers can require authentication mid-tool-call (error code -32042). A retry loop shows the user the URL, waits for authentication to complete, and retries.\nConnection State Machine and 3-Strike Terminal Error stateDiagram-v2 [*] --\u003e Pending: Config loaded Pending --\u003e Connected: connectToServer success Pending --\u003e Failed: Connection timeout Pending --\u003e NeedsAuth: 401 UnauthorizedError Pending --\u003e Disabled: isMcpServerDisabled() Connected --\u003e Connected: Tool call success Connected --\u003e Failed: 3 consecutive terminal errors Connected --\u003e NeedsAuth: 401 during callMCPTool Connected --\u003e Pending: onclose cache invalidation NeedsAuth --\u003e Pending: Auth completed NeedsAuth --\u003e NeedsAuth: 15-min TTL cache Failed --\u003e Pending: reconnectMcpServer() Disabled --\u003e Pending: toggleMcpServer() note right of Connected Exists in memoize cache fetchTools/Resources also cached end note3-strike rule: 3 consecutive terminal errors force a transition to Failed state. This prevents endlessly retrying against dead servers.\n15-minute needs-auth cache: Retrying a server that returned 401 every time would cause 30+ connectors to fire simultaneous network requests. The TTL cache prevents unnecessary retries.\n2. OAuth \u0026ndash; The Reality of 2,465 Lines The reason auth.ts is 2,465 lines is that real-world OAuth servers don\u0026rsquo;t consistently implement the RFCs:\nComponent Description RFC 9728 + 8414 discovery Server can run AS on a separate host -\u0026gt; discover AS URL via PRM PKCE Public client — code_verifier/code_challenge required XAA (Cross-App Access) Exchange IdP id_token for access_token at the MCP server\u0026rsquo;s AS Non-standard error normalization Slack returns HTTP 200 with {\u0026quot;error\u0026quot;:\u0026quot;invalid_grant\u0026quot;} Keychain storage macOS Keychain integration (getSecureStorage()) Rust porting implications: OAuth is not an SDK dependency but a complex async state machine. Discovery (2 stages) -\u0026gt; PKCE -\u0026gt; callback server -\u0026gt; token storage -\u0026gt; refresh -\u0026gt; revocation -\u0026gt; XAA. Porting the whole thing is impractical, so starting with stdio MCP + API key authentication is realistic.\n3. 4-Layer Security Model MCP security is not a single gate but a composition of trust levels:\nflowchart TD subgraph L1[\"1. Enterprise\"] E1[\"managed-mcp.json\u0026lt;br/\u0026gt;If present, blocks all other sources\"] E2[\"denylist / allowlist\u0026lt;br/\u0026gt;name, command, URL patterns\"] end subgraph L2[\"2. Project\"] P1[\".mcp.json loaded\"] P2[\"pending -\u003e user approval -\u003e approved\"] end subgraph L3[\"3. Server\"] S1[\"Independent OAuth tokens per server\"] S2[\"Keychain storage\"] end subgraph L4[\"4. Channel\"] C1[\"GrowthBook allowlist\u0026lt;br/\u0026gt;tengu_harbor_ledger\"] C2[\"Structured events\u0026lt;br/\u0026gt;not plain text matching\"] end L1 --\u003e L2 --\u003e L3 --\u003e L4 style L1 fill:#ffcdd2 style L2 fill:#fff9c4 style L3 fill:#c8e6c9 style L4 fill:#e1f5feEach layer operates independently, and Enterprise takes highest priority. Even if .mcp.json exists in the project, it\u0026rsquo;s blocked if it hits the enterprise denylist.\nConfig Sources and Deduplication (config.ts 1,578 lines) Config source priority (higher wins):\nEnterprise managed (managed-mcp.json) Local (per-user project settings) User (global ~/.claude.json) Project (.mcp.json) Plugin (dynamic) claude.ai connectors (lowest) Why is deduplication needed? The same MCP server can exist in both .mcp.json and claude.ai connectors. getMcpServerSignature creates stdio:[command|args] or url:\u0026lt;base\u0026gt; signatures, unwrapping CCR proxy URLs to original vendor URLs before comparison.\nEnvironment variable expansion: Supports ${VAR} and ${VAR:-default} syntax. Missing variables are reported as warnings rather than errors to prevent partial connection failures.\n4. Plugins vs Skills \u0026ndash; Structural Differences Dimension Skills Plugins Essence Prompt extension (SKILL.md = text) System extension (skills + hooks + MCP) Installation Drop a single file Marketplace git clone Runtime code None (pure text) Yes (MCP servers, hook scripts) Toggle Implicit (file existence) Explicit (/plugin UI) ID scheme File path {name}@builtin or {name}@marketplace Skills are the embodiment of the \u0026ldquo;file = extension\u0026rdquo; principle. A single SKILL.md works as an extension immediately without installation or building.\nPlugin Service Separation of Concerns File Role Side Effects pluginOperations.ts Pure library functions None pluginCliCommands.ts CLI wrappers process.exit, console output PluginInstallationManager.ts Background coordinator AppState updates The pure functions in pluginOperations are reused by both CLI and interactive UI.\nMarketplace coordination: diffMarketplaces() compares declared marketplaces against actual installations. New installs trigger auto-refresh; existing updates only set a needsRefresh flag. New installs need auto-refresh to prevent \u0026ldquo;plugin not found\u0026rdquo; errors, while updates let users choose when to apply.\n5. 5-Layer Skill Discovery Engine Loading source priority in loadSkillsDir.ts (1,086 lines):\nflowchart TD subgraph Discovery[\"Skill Discovery\"] A[\"1. policySettings\u0026lt;br/\u0026gt;managed-settings.json\"] B[\"2. userSettings\u0026lt;br/\u0026gt;~/.claude/skills/\"] C[\"3. projectSettings\u0026lt;br/\u0026gt;.claude/skills/\u0026lt;br/\u0026gt;project root to home\"] D[\"4. --add-dir\u0026lt;br/\u0026gt;additional directories\"] E[\"5. legacy\u0026lt;br/\u0026gt;/commands/ directory\"] end subgraph Dedup[\"Deduplication\"] F[\"realpath() symlink resolution\"] G[\"File ID based first-wins\"] end subgraph Parse[\"Frontmatter Parsing\"] H[\"description, when_to_use\"] I[\"allowed-tools\"] J[\"model, context, hooks\"] K[\"paths, shell\"] end A --\u003e B --\u003e C --\u003e D --\u003e E E --\u003e F --\u003e G G --\u003e H \u0026 I \u0026 J \u0026 K style Discovery fill:#e1f5fe style Parse fill:#fff3e0Frontmatter System 15+ fields are extracted from SKILL.md\u0026rsquo;s YAML frontmatter:\ndescription, when_to_use: Used by the model for skill selection allowed-tools: List of tools permitted during skill execution model: Force a specific model context: fork: Execute in a separate context hooks: Skill-specific hook configuration paths: Path-based activation filter shell: Inline shell command execution Lazy Disk Extraction of Bundled Skills 17 bundled skills compiled into the CLI binary (skills/bundled/) are extracted to disk on first invocation if they have a files field:\nO_NOFOLLOW | O_EXCL flags prevent symlink attacks 0o600 permissions restrict access resolveSkillFilePath() rejects .. paths to prevent directory escape Why extract to disk? So the model can read reference files using the Read/Grep tools. Keeping them only in memory would make them inaccessible to the model.\nmcpSkillBuilders \u0026ndash; A 44-Line Circular Reference Solution mcpSkillBuilders.ts (44 lines) is small but architecturally significant.\nProblem: mcpSkills.ts needs functions from loadSkillsDir.ts, but a direct import creates a circular reference (client.ts -\u0026gt; mcpSkills.ts -\u0026gt; loadSkillsDir.ts -\u0026gt; ... -\u0026gt; client.ts).\nSolution: A write-once registry. loadSkillsDir.ts registers functions at module initialization time, and mcpSkills.ts retrieves them when needed. Dynamic imports fail in the Bun bundler, and literal dynamic imports trigger dependency-cruiser\u0026rsquo;s circular dependency check, making this approach the only viable solution.\nLeaf modules in the dependency graph import only types, and runtime registration happens exactly once at startup.\nRust Comparison Area TS (Complete) Rust (Current) Name normalization normalization.ts mcp.rs — same logic Server signature getMcpServerSignature mcp_server_signature — includes CCR proxy unwrap stdio JSON-RPC SDK-dependent mcp_stdio.rs — direct implementation (initialize, tools/list, tools/call) OAuth 2,465-line full implementation None — types only Connection management memoize + onclose reconnection None Skill loading 5-layer + 15-field frontmatter 2 directories, SKILL.md only Bundled skills 17 built-in None Plugins Built-in + marketplace None Security 4-layer (Enterprise-\u0026gt;Channel) None Key gap: Rust has implemented bootstrap (config -\u0026gt; transport) and stdio JSON-RPC. The SDK-less JSON-RPC implementation in mcp_stdio.rs is meaningful progress. However, OAuth, connection lifecycle, channel security, and the full skill discovery system are all absent.\nInsights MCP is not a \u0026ldquo;protocol\u0026rdquo; but an \u0026ldquo;integration framework\u0026rdquo; \u0026ndash; What client.ts\u0026rsquo;s 3,348 lines tell us is that the hard part is not JSON-RPC but connection lifecycle management. Memoization, auto-reconnect, session expiry detection, 401 retry, 3-strike terminal errors, needs-auth caching. External processes (stdio) and remote services (HTTP/SSE) die unpredictably, OAuth tokens expire, and networks drop. This is code that reflects the reality that \u0026ldquo;connect once and done\u0026rdquo; doesn\u0026rsquo;t exist.\nSkills embody the \u0026ldquo;file = extension\u0026rdquo; principle \u0026ndash; A single SKILL.md works as an extension immediately without installation or building. This simplicity, combined with incremental complexity via frontmatter (model specification, hooks, path filters), accommodates both beginners and power users. Plugins are the organizational layer above skills, packaging \u0026ldquo;skills + hooks + MCP servers\u0026rdquo; together.\nmcpSkillBuilders.ts is a 44-line architecture lesson \u0026ndash; The only solution that simultaneously satisfies Bun bundler\u0026rsquo;s dynamic import constraints and dependency-cruiser\u0026rsquo;s circular dependency check was a \u0026ldquo;write-once registry.\u0026rdquo; The pattern where leaf modules import only types and runtime registration happens once at startup is a broadly applicable approach to resolving circular references in complex module systems — worth remembering.\nNext post: #6 \u0026ndash; Beyond Claude Code: A Retrospective on Building an Independent 7-Crate Harness\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-harness-anatomy-5/cover-en.jpg","permalink":"/posts/2026-04-06-harness-anatomy-5/","title":"Claude Code Harness Anatomy #5 — MCP Services and the Plugin-Skill Extension Ecosystem"},{"content":"Overview This is the final post in the series that systematically dissected Claude Code\u0026rsquo;s TypeScript source across 27 sessions. In Phase 1 we understood the architecture of 100k+ lines of TS code, in Phase 2 we reimplemented core patterns in Rust, and in Phase 3 we designed and built an independent agent harness that overcomes the 8 limitations we discovered. This post covers the limitation analysis, 5 design principles, 7-crate architecture, 61 tests, and a full retrospective of the journey.\n1. 8 Limitations of Claude Code\u0026rsquo;s Architecture From 27 sessions of analysis, we distinguished strengths from limitations. The strengths (AsyncGenerator pipeline, 3-tier concurrency, hook extensibility, CLAUDE.md discovery, MCP support, self-contained tool interface, 7-path error recovery) represent excellent design. However, the following 8 limitations motivated the independent harness:\n# Limitation Source Session Impact 1 React/Ink dependency — heavy TUI S08 Unnecessary dependency in headless mode 2 Single provider (effectively Anthropic-only) S01 Cannot use OpenAI or local models 3 main.tsx 4,683-line monolith S01 CLI/REPL/session mixed in one file 4 Synchronous tool execution (Rust port) S03 No streaming pipelining 5 TS ecosystem-locked plugins S13 No language-neutral extensions 6 85 React hooks mixing UI/runtime S08 Dual meaning of \u0026ldquo;hook\u0026rdquo; 7 Implicit prompt caching dependencies S10 3 cache invalidation paths are implicit 8 MCP OAuth 2,465-line complexity S12 RFC inconsistency is the root cause 2. 5 Design Principles We established 5 core principles to overcome these limitations:\nPrinciple 1 \u0026ndash; Multi-provider: Support Anthropic, OpenAI, and local models (Ollama) through a single abstraction.\n#[async_trait] pub trait Provider: Send + Sync { async fn stream(\u0026amp;self, request: ProviderRequest) -\u0026gt; Result\u0026lt;EventStream, ProviderError\u0026gt;; fn available_models(\u0026amp;self) -\u0026gt; \u0026amp;[ModelInfo]; fn name(\u0026amp;self) -\u0026gt; \u0026amp;str; } ProviderRequest is a provider-neutral struct that each implementation converts to its own API format.\nPrinciple 2 \u0026ndash; Native async: Fully async based on tokio. yield -\u0026gt; tx.send(), yield* -\u0026gt; channel forwarding replaces the AsyncGenerator pattern.\nPrinciple 3 \u0026ndash; Module separation: Conversation engine, tools, hooks, and prompts are each separate crates. No repeating the main.tsx monolith.\nPrinciple 4 \u0026ndash; Language-neutral extensions: SKILL.md compatibility + MCP servers as plugin units.\nPrinciple 5 \u0026ndash; Full MCP utilization: Leveraging not just tools but resources, prompts, and sampling across the full spec.\n3. 7-Crate Architecture graph TD CLI[\"harness-cli\u0026lt;br/\u0026gt;REPL binary\"] --\u003e CORE[\"harness-core\u0026lt;br/\u0026gt;Conversation engine + turn loop\"] CORE --\u003e PROV[\"harness-provider\u0026lt;br/\u0026gt;LLM provider abstraction\"] CORE --\u003e TOOLS[\"harness-tools\u0026lt;br/\u0026gt;Tool registry + built-in tools\"] CORE --\u003e HOOKS[\"harness-hooks\u0026lt;br/\u0026gt;Hook pipeline\"] CORE --\u003e PROMPT[\"harness-prompt\u0026lt;br/\u0026gt;CLAUDE.md discovery\"] CORE --\u003e MCP[\"harness-mcp\u0026lt;br/\u0026gt;MCP client\"] MCP --\u003e TOOLS style CLI fill:#b3e5fc style CORE fill:#fff9c4 style PROV fill:#c8e6c9 style TOOLS fill:#c8e6c9 style HOOKS fill:#c8e6c9 style PROMPT fill:#c8e6c9 style MCP fill:#e1bee7Core design: Only harness-core depends on other crates. The rest are independent of each other (except harness-mcp -\u0026gt; harness-tools). This structure enables:\nIndependent cargo test for each crate No harness-core changes needed when adding providers MCP tools implementing the same Tool trait as built-in tools Crate Core Responsibility Test Count harness-provider LLM API calls, SSE parsing, retries 11 harness-tools Tool registry, 3-tier concurrency 12 harness-hooks Shell hook execution, deny short-circuit, rewrite chain 9 harness-prompt 6-stage CLAUDE.md, SHA-256 deduplication 9 harness-core Conversation engine, StreamingToolExecutor 6 harness-mcp JSON-RPC, stdio transport 14 harness-cli REPL binary \u0026ndash; Provider Trait \u0026ndash; Multi-Provider The existing Rust port\u0026rsquo;s ApiClient trait was Anthropic-specific (ApiRequest with Anthropic fields). The Provider trait accepts a provider-neutral ProviderRequest that each implementation converts to its own API format. Box\u0026lt;dyn Provider\u0026gt; enables runtime fallback chains.\nConversationEngine \u0026ndash; Turn Loop pub struct ConversationEngine { session: Session, provider: Box\u0026lt;dyn Provider\u0026gt;, tool_executor: StreamingToolExecutor, hook_pipeline: HookPipeline, prompt_builder: PromptBuilder, budget: TokenBudget, } Instead of the existing Rust port\u0026rsquo;s ConversationRuntime\u0026lt;C, T\u0026gt; generic pattern, we use trait objects. The provider must be swappable at runtime (model fallback), and generics fix the type at compile time, lacking flexibility.\nStreaming Tool Execution (Pipelining) We solved the biggest constraint of the existing Rust port — \u0026ldquo;collect all SSE events then execute tools\u0026rdquo;:\nWhen a ContentBlockStop(ToolUse) event arrives from EventStream, forward immediately After is_concurrency_safe() check, parallel processing via tokio::spawn Tool execution proceeds while the API is still streaming 4. Phase 2 Retrospective \u0026ndash; Extending the Existing Port Before Phase 3\u0026rsquo;s independent harness, we extended the existing rust/ prototype in Phase 2:\nSprint Achievement Core Pattern S14-S15 Orchestration module + 3-tier concurrency tokio::JoinSet-based parallel execution S16-S17 Tool expansion (19 -\u0026gt; 26) Added Task, PlanMode, AskUser S18-S19 Hook execution pipeline stdin JSON, deny short-circuit S20-S21 Skill discovery .claude/skills/ scan, prompt injection Most of Phase 2\u0026rsquo;s code was rewritten in Phase 3. However, the questions discovered during prototyping (\u0026ldquo;Why AsyncGenerator?\u0026rdquo;, \u0026ldquo;Why should tools be unaware of the UI?\u0026rdquo;) determined the final design.\n5. 61 Tests and the MockProvider Pattern All crates are independently testable. MockProvider enables verifying the conversation engine\u0026rsquo;s full turn loop without actual API calls:\nharness-provider: 11 tests (SSE parsing, retries, streams) harness-tools: 12 tests (registry, concurrency, execution) harness-hooks: 9 tests (deny short-circuit, rewrite chain, timeouts) harness-prompt: 9 tests (6-stage discovery, hash deduplication) harness-core: 6 tests (turn loop, tool calls, max iterations) harness-mcp: 14 tests (JSON-RPC, initialization, tool listing) 6. How Phase 1-2 Lessons Shaped the Design flowchart LR subgraph Phase1[\"Phase 1 -- Understanding\"] direction TB P1A[\"S02: AsyncGenerator chain\"] P1B[\"S05: 42-tool classification\"] P1C[\"S08: Runtime vs React hooks\"] P1D[\"S10: 6-stage CLAUDE.md\"] P1E[\"S12: MCP connection management\"] P1F[\"S13: Skills = prompts\"] end subgraph Phase3[\"Phase 3 -- Independent Harness\"] direction TB P3A[\"EventStream + mpsc channels\"] P3B[\"Tool trait + 3-tier\"] P3C[\"HookPipeline (runtime only)\"] P3D[\"PromptAssembler separation\"] P3E[\"harness-mcp stdio\"] P3F[\"SKILL.md compatible\"] end P1A --\u003e|\"yield -\u003e tx.send()\"| P3A P1B --\u003e|\"fail-closed defaults\"| P3B P1C --\u003e|\"scope reduction\"| P3C P1D --\u003e|\"cache splitting\"| P3D P1E --\u003e|\"implemented without SDK\"| P3E P1F --\u003e|\"text injection\"| P3F style Phase1 fill:#e1f5fe style Phase3 fill:#fff3e0 Lesson Source Design Impact StreamingToolExecutor 4-stage state machine S03 Async implementation in harness-core QueryDeps callback DI\u0026rsquo;s type safety limits S03 Trait object DI 6-layer Bash security chain S06 check_permissions() + hook separation Agent = recursive harness instance S06 ConversationEngine reuse ApiClient sync trait blocks pipelining S03 Provider async trait Deny short-circuit + Rewrite chaining S09 Identical pattern in HookPipeline SHA-256 content hash outperforms path hash S11 Content hash in harness-prompt 7. Top 10 Architecture Patterns Learned Core architecture patterns extracted from 27 sessions:\nAsyncGenerator/Stream pipeline: The core abstraction for streaming LLM responses 3-tier tool concurrency: ReadOnly/Write/Dangerous classification balances safety and performance ToolSpec + ToolResult duality: Separating metadata (for LLM) from execution results Hook chain execution: Deny short-circuit, rewrite chain, independent post-hook transforms 6-stage prompt discovery: Managed -\u0026gt; user -\u0026gt; project -\u0026gt; local overrides MCP adapter pattern: Unifying external protocol tools into the internal Tool trait Provider abstraction: Swapping Anthropic/OpenAI behind the same interface SSE incremental parsing: Assembling network chunks into event frames MockProvider testing: Verifying engine behavior with predefined event sequences Skills = prompts: Text injection sufficient instead of complex plugin systems 8. Full Journey Retrospective Phase Sessions Key Deliverables Phase 1 \u0026ndash; Understanding S00-S13 14 analysis documents, Rust prototype Phase 2 \u0026ndash; Reimplementation S14-S21 Orchestration, 26 tools, hooks, skills Phase 3 \u0026ndash; Independent Harness S22-S27 7-crate workspace, 61+ tests Claude Code is a prompt engineering runtime. The core loop assembles messages, the tool system grants the ability to interact with the world, and the permission system sets boundaries. CLAUDE.md injects context, MCP integrates external systems, hooks and agents enable automation/delegation, and plugins/skills transform it into a user extension platform.\nFuture Directions True streaming: Processing SSE byte streams chunk by chunk Permission system: Per-tool user approval workflows MCP SSE transport: HTTP SSE support beyond stdio Token budget integration: Automatic context window budget management Multi-turn agent mode: Autonomous iteration + breakpoint system Insights Good abstractions emerge at boundaries \u0026ndash; Provider trait, Tool trait, HookRunner trait. Every core abstraction is a trait defining module boundaries. The existing Rust port\u0026rsquo;s ConversationRuntime\u0026lt;C, T\u0026gt; generics provide strong compile-time guarantees but had limitations for scenarios like swapping providers at runtime or dynamically registering MCP tools. Box\u0026lt;dyn Provider\u0026gt; + Box\u0026lt;dyn Tool\u0026gt; trait objects buy runtime flexibility at a minor vtable cost. Relative to LLM API latency (hundreds of ms to seconds), the vtable overhead is immeasurable.\nThe value of prototypes lies in questions, not code \u0026ndash; Most of Phase 1-2\u0026rsquo;s prototype code was rewritten in Phase 3. But questions like \u0026ldquo;Why AsyncGenerator?\u0026rdquo;, \u0026ldquo;Why should tools be unaware of UI?\u0026rdquo;, and \u0026ldquo;Why doesn\u0026rsquo;t allow bypass?\u0026rdquo; determined the final design. The act of reading 100k lines of code is not the answer itself — the design intent (the why) discovered during reading is the true deliverable.\nMost of the TS code\u0026rsquo;s complexity is defensive lines \u0026ndash; Permission layers, frontmatter parsing, deduplication, symlink prevention. These aren\u0026rsquo;t features — they\u0026rsquo;re defenses. Rust can guarantee some of this at compile time through its type system and ownership model, but runtime policies like filesystem security and user config precedence must be implemented explicitly. The 27 sessions were the process of mapping these defensive lines, and that map guided the independent harness\u0026rsquo;s design.\nSeries complete. The full analysis documents are available at the claw-code repository.\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-harness-anatomy-6/cover-en.jpg","permalink":"/posts/2026-04-06-harness-anatomy-6/","title":"Claude Code Harness Anatomy #6 — Beyond Claude Code: A Retrospective on Building an Independent 7-Crate Harness"},{"content":"Overview Google\u0026rsquo;s Veo model family has rapidly evolved from an experimental video generator to a full-featured production tool. Veo 3.1, released in October 2025, brings improved realism, native audio generation, and fine-grained editing controls through Vertex AI and the Flow App. Meanwhile, a practical ecosystem of background removal and composition tools has emerged around these AI-generated videos, with services like VideoBGRemover and n8n workflow templates making automated video pipelines accessible to creators and developers alike.\nVeo Evolution: From 1.0 to 3.1 Google has iterated on Veo at a remarkable pace. Each version brought meaningful capability jumps rather than incremental polish.\nflowchart LR A[\"Veo 1.0 \u0026lt;br/\u0026gt; Basic generation\"] --\u003e B[\"Veo 2.0 \u0026lt;br/\u0026gt; Quality + coherence\"] B --\u003e C[\"Veo 3.0 \u0026lt;br/\u0026gt; Native audio \u0026lt;br/\u0026gt; Multimodal input\"] C --\u003e D[\"Veo 3.1 \u0026lt;br/\u0026gt; Editing controls \u0026lt;br/\u0026gt; Scene expansion \u0026lt;br/\u0026gt; Longer clips\"]What Veo 3.1 Adds Improved realism and physics \u0026mdash; lighting, shadows, and object interactions look noticeably more natural Scene coherence \u0026mdash; characters and environments stay consistent across longer sequences Longer clips \u0026mdash; extended generation beyond previous limits Scene expansion \u0026mdash; extend existing footage with AI-generated continuations Editing controls \u0026mdash; object removal, lighting adjustments, and shadow manipulation directly in the pipeline Audio upgrades \u0026mdash; refined native audio generation that syncs with visual content Flow App integration \u0026mdash; \u0026ldquo;Ingredients to Video\u0026rdquo; and \u0026ldquo;Frames to Video\u0026rdquo; modes for different creative workflows Veo 3.1 ships in two variants: Standard (higher quality, slower) and Fast (quicker turnaround). Both support 720p and 1080p output, and are accessible through the Vertex AI API.\nObject Removal in Vertex AI One of the more practical features in Veo 3.1 is mask-based object removal, available through Vertex AI Studio. The workflow is straightforward:\nStep What you do Typical time Preparation Upload video, identify objects to remove 2\u0026ndash;5 min Masking Draw masks over unwanted objects frame-by-frame or with tracking 3\u0026ndash;8 min Generation AI fills masked regions with context-appropriate background 1\u0026ndash;3 min QA Review output, iterate if artifacts appear 3\u0026ndash;6 min per pass Key tips for clean results:\nMask slightly larger than the object to avoid edge artifacts Write explicit prompts describing what the background should look like after removal Google\u0026rsquo;s Flow editor is gradually rolling out Add/Remove tools for a more visual workflow The Background Removal Ecosystem While Veo handles generation and basic editing, dedicated background removal tools fill a specific niche: extracting subjects from video or images with alpha transparency.\nCommercial Services VideoBGRemover is a cloud service focused on video:\nPer-second pricing ($4.80/min standard, down to $2.50/min at volume) Support for MP4, MOV, WEBM, and GIF formats 9 export formats including alpha-channel outputs Sub-5-minute processing for typical clips API access for programmatic integration withoutBG offers an open-source background removal API with a Pro tier for higher-quality cloud processing.\nOpen-Source Options The open-source ecosystem is rich, particularly around Meta\u0026rsquo;s SAM (Segment Anything Model):\nSAM-remove-background \u0026mdash; extracts objects and removes backgrounds using SAM directly remback \u0026mdash; fine-tunes SAM specifically for background removal tasks carvekit \u0026mdash; a full framework for automated high-quality background removal, wrapping multiple segmentation models remove-bg (WebGPU) \u0026mdash; runs background removal directly in the browser using WebGPU, eliminating server costs entirely The WebGPU approach is particularly interesting: it moves inference to the client GPU, meaning zero API costs and no data leaving the user\u0026rsquo;s machine. For privacy-sensitive use cases or high-volume processing, this could be more practical than cloud APIs.\nThe RGBA output (RGB color channels plus an Alpha transparency channel) is what makes compositing possible \u0026mdash; you get a clean subject that can be layered over any background.\nn8n Workflow Templates for Video Automation The most interesting development is the videobgremover/videobgremover-n8n-templates repository, which packages complete automation pipelines as n8n workflows:\nflowchart TD subgraph T1[\"Template 01: Video Composition\"] A1[\"Upload video\"] --\u003e B1[\"VideoBGRemover API \u0026lt;br/\u0026gt; Background removal\"] B1 --\u003e C1[\"Composite on \u0026lt;br/\u0026gt; new background\"] C1 --\u003e D1[\"Export to \u0026lt;br/\u0026gt; Google Drive\"] end subgraph T2[\"Template 02: AI UGC Ad Generator\"] A2[\"Screen recording\"] --\u003e B2[\"Gemini analysis\"] B2 --\u003e C2[\"Sora 2 \u0026lt;br/\u0026gt; AI actor generation\"] C2 --\u003e D2[\"VideoBGRemover \u0026lt;br/\u0026gt; composition\"] D2 --\u003e E2[\"Final UGC ad\"] end subgraph T3[\"Other Templates\"] F3[\"03: Image composition\"] G3[\"04: AI background generation\"] H3[\"05: Lottie overlay\"] endThe UGC Ad Pipeline (Template 02) Template 02 is particularly notable. It chains multiple AI services into a single automated flow:\nInput: A screen recording of your product or app Gemini: Analyzes the recording to understand what the product does Sora 2: Generates a realistic AI actor presenting the product VideoBGRemover: Removes the actor\u0026rsquo;s background and composites them over the screen recording Output: A ready-to-publish UGC-style advertisement This is a concrete example of how orchestration tools like n8n turn individual AI capabilities into end-to-end production workflows.\nVeo vs. the Competition Veo 3.1 competes primarily with OpenAI\u0026rsquo;s Sora and other video generation models. The key differentiator is Google\u0026rsquo;s integration depth \u0026mdash; Veo lives inside Vertex AI, which means it connects directly to other Google Cloud services, the Flow App provides a visual editing layer, and the API makes it embeddable in custom pipelines (including n8n workflows like the ones above).\nSora focuses on creative generation quality, while Veo is positioning itself as a more complete video production toolkit with editing, removal, and composition features built in.\nQuick Links Veo 3.1 overview \u0026mdash; feature breakdown and comparison with other video AI models Veo object removal guide \u0026mdash; step-by-step masking and prompt workflow in Vertex AI Studio VideoBGRemover \u0026mdash; commercial video background removal service with API withoutBG \u0026mdash; open-source background removal with Pro API tier n8n workflow templates \u0026mdash; automation templates for video composition pipelines Best background removal tools 2026 \u0026mdash; comparison of cloud and local options rembg vs Cloud API \u0026mdash; decision guide for choosing background removal approach carvekit과 rembg 비교 (Korean) \u0026mdash; Python background removal library comparison RGBA explainer (Korean) \u0026mdash; brief intro to RGB vs RGBA and alpha transparency Takeaway The video AI space is shifting from \u0026ldquo;generate a clip\u0026rdquo; to \u0026ldquo;produce a video.\u0026rdquo; Veo 3.1 represents this with its editing controls and scene manipulation features. But the real story might be in the tooling layer \u0026mdash; n8n templates that chain Gemini + Sora + background removal into automated ad pipelines show where this is heading. Individual AI models are becoming components in larger production systems, and the orchestration layer is where the practical value compounds.\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-veo-video-ai/cover-en.jpg","permalink":"/posts/2026-04-06-veo-video-ai/","title":"Google Veo 3.1 and the Video AI Background Removal Ecosystem"},{"content":"Overview In the previous post (Dev Log #8) I covered the S3 migration for tone/angle images, EC2 deployment fixes, and hex color extraction. This time I stepped back from feature work to focus on observability.\nThe goal was to instrument the FastAPI server with OpenTelemetry, trace every stage of the search and generation pipelines, and ship those traces to Grafana Cloud via Grafana Alloy running on EC2. The work spanned two days, and the contrast between them was stark: Day 1 was a clean implementation sprint; Day 2 was a wall of integration debugging.\nArchitecture — Trace Collection Path Here\u0026rsquo;s how traces flow from the application to Grafana Cloud.\nflowchart LR A[\"FastAPI app \u0026lt;br/\u0026gt; (OTel SDK)\"] --\u003e|OTLP HTTP| B[\"Grafana Alloy \u0026lt;br/\u0026gt; (localhost:4318)\"] B --\u003e|OTLP HTTP| C[\"Grafana Cloud \u0026lt;br/\u0026gt; Tempo\"] C --\u003e D[\"Grafana UI \u0026lt;br/\u0026gt; trace explorer\"] A --\u003e|span creation| E[\"traced_span \u0026lt;br/\u0026gt; CPU/memory metrics\"] E --\u003e AThree components make this work. The OTel SDK inside the app creates spans. Grafana Alloy on EC2 receives OTLP and batches it. Grafana Cloud Tempo stores and serves the traces.\nDay 1 — Clean Initial Implementation The first day went smoothly. I added the OpenTelemetry packages, created a telemetry module, wired it into the app lifespan, and inserted spans into both pipelines.\nTelemetry Module Structure # telemetry.py — Provider configured at import time _resource = Resource.create({ \u0026#34;service.name\u0026#34;: \u0026#34;hybrid-image-search\u0026#34;, \u0026#34;deployment.environment\u0026#34;: _environment, }) _provider = TracerProvider(resource=_resource) _exporter = OTLPSpanExporter(endpoint=f\u0026#34;{_endpoint}/v1/traces\u0026#34;) _provider.add_span_processor(SimpleSpanProcessor(_exporter)) trace.set_tracer_provider(_provider) tracer = trace.get_tracer(\u0026#34;hybrid-image-search\u0026#34;) The provider is set at module level because uvicorn binds the ASGI app immediately after importing the module. If FastAPIInstrumentor doesn\u0026rsquo;t find a valid provider at that point, it caches a no-op tracer and instrumentation silently does nothing.\nPipeline Spans The search pipeline got spans for embedding generation, vector search, and re-ranking. The generation pipeline got spans for reference injection (generation.injection), prompt building (generation.prompt_build), and the Gemini API call (generation.gemini_api).\nI also added database indices preemptively, before they showed up as bottlenecks in the trace data.\nDay 2 — Reality Check After installing Grafana Alloy on EC2 and configuring the Grafana Cloud connection, zero traces appeared. What followed was a chain of six consecutive fix commits.\nIssue 1: TracerProvider Initialization Timing TracerProvider wasn\u0026rsquo;t set before uvicorn loaded the app, so FastAPIInstrumentor latched onto the default no-op provider. Fix: configure the provider at import time, before any app code runs.\nIssue 2: BatchSpanProcessor Async Flush Under uv run, the process exits quickly enough that BatchSpanProcessor\u0026rsquo;s background thread never gets a chance to flush. Fix: switch to SimpleSpanProcessor for synchronous export on span creation.\nIssue 3: Silent gRPC Exporter Failure The gRPC exporter swallowed connection failures without logging. Fix: switch to the OTLP HTTP exporter. HTTP returns clear status codes and error messages, and it connects directly to Alloy\u0026rsquo;s default port (4318).\nIssue 4: Telemetry Init Crashing the App Any exception during OTel initialization took down the entire application. Fix: wrap init in try/except so telemetry failures degrade gracefully instead of preventing startup.\nIssue 5: FastAPIInstrumentor Missing the Provider FastAPIInstrumentor().instrument() sometimes failed to discover the global provider. Fix: pass tracer_provider explicitly.\nIssue 6: Module Import Ordering The app = FastAPI() call and the instrumentation call in main.py had ordering issues. Fix: move FastAPIInstrumentor to module level, immediately after app creation.\nGrafana Alloy Configuration The Alloy config deployed to EC2 is minimal.\notelcol.receiver.otlp \u0026#34;default\u0026#34; { grpc { endpoint = \u0026#34;127.0.0.1:4317\u0026#34; } http { endpoint = \u0026#34;127.0.0.1:4318\u0026#34; } output { traces = [otelcol.processor.batch.default.input] } } otelcol.processor.batch \u0026#34;default\u0026#34; { timeout = \u0026#34;5s\u0026#34; output { traces = [otelcol.exporter.otlphttp.grafana_cloud.input] } } otelcol.exporter.otlphttp \u0026#34;grafana_cloud\u0026#34; { client { endpoint = env(\u0026#34;GRAFANA_OTLP_ENDPOINT\u0026#34;) auth = otelcol.auth.basic.grafana_cloud.handler } } otelcol.auth.basic \u0026#34;grafana_cloud\u0026#34; { username = env(\u0026#34;GRAFANA_INSTANCE_ID\u0026#34;) password = env(\u0026#34;GRAFANA_API_TOKEN\u0026#34;) } The app sends OTLP HTTP to localhost:4318. Alloy batches spans every 5 seconds and forwards them to Grafana Cloud Tempo. All credentials are managed through environment variables.\ntraced_span — Automatic CPU/Memory Metrics The final piece was a traced_span context manager that automatically measures CPU time and memory consumption around each span.\n@contextmanager def traced_span(name, **attrs): \u0026#34;\u0026#34;\u0026#34;Create a span with automatic CPU/memory measurement.\u0026#34;\u0026#34;\u0026#34; mem_before = _process.memory_info().rss cpu_before = _process.cpu_times() with tracer.start_as_current_span(name) as span: for k, v in attrs.items(): span.set_attribute(k, v) yield span mem_after = _process.memory_info().rss cpu_after = _process.cpu_times() span.set_attribute(\u0026#34;process.memory_mb\u0026#34;, round(mem_after / 1024 / 1024, 1)) span.set_attribute(\u0026#34;process.memory_delta_kb\u0026#34;, round((mem_after - mem_before) / 1024, 1)) span.set_attribute(\u0026#34;process.cpu_user_ms\u0026#34;, round((cpu_after.user - cpu_before.user) * 1000, 1)) span.set_attribute(\u0026#34;process.cpu_system_ms\u0026#34;, round((cpu_after.system - cpu_before.system) * 1000, 1)) It uses psutil.Process to capture RSS memory and CPU user/system time at span entry and exit. This makes it possible to see exactly how much each pipeline stage costs in Grafana. Both the search and generation pipelines were migrated to use traced_span.\nCommit Log Message Changed files feat: add no-text directive for injected refs and remove color palettes prompt.py, App.tsx, GeneratedImageDetail.tsx deps: add OpenTelemetry packages for observability requirements.txt feat: add telemetry module with OpenTelemetry init and tracer telemetry.py feat: wire OpenTelemetry init into app lifespan main.py feat: add OpenTelemetry spans to search pipeline stages search.py feat: add OpenTelemetry spans to generation pipeline generation.py add indices DB migration infra: add Grafana Alloy config and EC2 setup guide infra/alloy/config.alloy deps: move OpenTelemetry packages to pyproject.toml pyproject.toml fix: set OTel TracerProvider at import time telemetry.py fix: use SimpleSpanProcessor for reliable export under uv run telemetry.py fix: switch to OTLP HTTP exporter for reliable trace delivery telemetry.py fix: add error handling for telemetry init telemetry.py fix: pass tracer_provider explicitly to FastAPIInstrumentor main.py fix: move FastAPI instrumentation to module level in main.py main.py feat: add traced_span helper with CPU/memory resource metrics telemetry.py feat: use traced_span for CPU/memory metrics in search and generation pipelines search.py, generation.py Insights OTel provider setup must complete at import time. ASGI servers like uvicorn bind routers and middleware immediately after importing the app module. If FastAPIInstrumentor doesn\u0026rsquo;t find a valid TracerProvider at that moment, it caches a no-op tracer — and no amount of later configuration will fix it. Setting the provider at the top of the telemetry module prevents this entirely.\nBatchSpanProcessor is for long-lived processes only. In short-lived contexts like uv run or test suites, the background flush thread never gets a chance to fire. SimpleSpanProcessor trades throughput for reliability, but the tradeoff is reasonable for development and small-scale production workloads.\nStart with HTTP, not gRPC. The OTLP gRPC exporter silently absorbs connection failures, making debugging painful. The HTTP exporter returns explicit status codes and error bodies. When wiring up new infrastructure, getting HTTP working first and then switching to gRPC if needed is a more efficient debugging path.\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-hybrid-search-dev9/cover-en.jpg","permalink":"/posts/2026-04-06-hybrid-search-dev9/","title":"Hybrid Image Search Dev Log #9 — OpenTelemetry Distributed Tracing, Grafana Cloud Integration, traced_span Helper"},{"content":"Overview I recently added distributed tracing to my hybrid-image-search FastAPI service using OpenTelemetry and Grafana Cloud. The goal was simple: see exactly where time is spent when a user searches for images — from the API request through vector search to Gemini API generation. What followed was a multi-day debugging journey through exporter protocols, tracer provider timing, and span processor choices. This post covers the full architecture, the working code, and every fix along the way.\nArchitecture The observability pipeline has three layers: the FastAPI application emits traces via OpenTelemetry, Grafana Alloy running on the same EC2 instance receives and batches them, and Grafana Cloud Tempo stores them for querying.\nflowchart LR A[\"FastAPI App \u0026lt;br/\u0026gt; OpenTelemetry SDK\"] --\u003e|OTLP HTTP :4318| B[\"Grafana Alloy \u0026lt;br/\u0026gt; on EC2\"] B --\u003e|OTLP HTTP| C[\"Grafana Cloud \u0026lt;br/\u0026gt; Tempo\"] C --\u003e D[\"Grafana UI \u0026lt;br/\u0026gt; Trace Explorer\"]The key design decision is using OTLP HTTP (port 4318) rather than gRPC. This turned out to matter a lot — more on that in the debugging section.\nStep 1: The Telemetry Module The core of the setup is a single telemetry.py module that initializes OpenTelemetry at import time:\n# backend/src/telemetry.py import logging, os from contextlib import contextmanager import psutil from opentelemetry import trace from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor _endpoint = os.environ.get( \u0026#34;OTEL_EXPORTER_OTLP_ENDPOINT\u0026#34;, \u0026#34;http://localhost:4318\u0026#34; ) _environment = os.environ.get(\u0026#34;DEPLOYMENT_ENV\u0026#34;, \u0026#34;dev\u0026#34;) _resource = Resource.create({ \u0026#34;service.name\u0026#34;: \u0026#34;hybrid-image-search\u0026#34;, \u0026#34;deployment.environment\u0026#34;: _environment, }) _provider = TracerProvider(resource=_resource) _exporter = OTLPSpanExporter(endpoint=f\u0026#34;{_endpoint}/v1/traces\u0026#34;) _provider.add_span_processor(SimpleSpanProcessor(_exporter)) trace.set_tracer_provider(_provider) tracer = trace.get_tracer(\u0026#34;hybrid-image-search\u0026#34;) Three things to note:\nTracerProvider is set at module level, not inside a function. This avoids a timing issue where FastAPIInstrumentor grabs a reference to the tracer provider at import time — if you set it later in a lifespan function, the instrumentor already has the no-op provider.\nSimpleSpanProcessor instead of BatchSpanProcessor. The batch processor buffers spans and exports them on a background thread, which sounds better for performance. But when running under uv run, the process can exit before the background thread flushes. SimpleSpanProcessor exports each span synchronously, ensuring nothing is lost.\nOTLP HTTP exporter, not gRPC. The gRPC exporter requires additional dependencies (grpcio) and had reliability issues in this setup. The HTTP exporter using requests just works.\nStep 2: The traced_span Helper Beyond auto-instrumentation, I wanted custom spans that capture resource usage — how much memory a Gemini API call allocates, how much CPU time vector search takes:\n_process = psutil.Process(os.getpid()) @contextmanager def traced_span(name, **attrs): \u0026#34;\u0026#34;\u0026#34;Create a span with automatic CPU/memory measurement.\u0026#34;\u0026#34;\u0026#34; mem_before = _process.memory_info().rss cpu_before = _process.cpu_times() with tracer.start_as_current_span(name) as span: for k, v in attrs.items(): span.set_attribute(k, v) yield span mem_after = _process.memory_info().rss cpu_after = _process.cpu_times() span.set_attribute(\u0026#34;process.memory_mb\u0026#34;, round(mem_after / 1024 / 1024, 1)) span.set_attribute(\u0026#34;process.memory_delta_kb\u0026#34;, round((mem_after - mem_before) / 1024, 1)) span.set_attribute(\u0026#34;process.cpu_user_ms\u0026#34;, round((cpu_after.user - cpu_before.user) * 1000, 1)) span.set_attribute(\u0026#34;process.cpu_system_ms\u0026#34;, round((cpu_after.system - cpu_before.system) * 1000, 1)) Usage in the generation route:\nwith traced_span(\u0026#34;generation.gemini_api\u0026#34;, model=model_name): response = await model.generate_content_async(prompt) with traced_span(\u0026#34;generation.prompt_build\u0026#34;, ref_count=len(references)): prompt = build_prompt(query, references) In Grafana Tempo, these show up as child spans under the FastAPI root span, with memory and CPU attributes visible in the span details panel.\nStep 3: Wiring Into FastAPI The application wiring happens in two places:\n# main.py — module level (after app creation) from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor FastAPIInstrumentor.instrument_app(app) # main.py — lifespan function from telemetry import init_telemetry @asynccontextmanager async def lifespan(app): try: init_telemetry(db_engine=db_engine) except Exception as e: logger.warning(\u0026#34;Telemetry init failed: %s\u0026#34;, e) yield The init_telemetry function handles optional extras like SQLAlchemy instrumentation. The key insight: FastAPIInstrumentor must be at module level, not inside the lifespan. If you instrument inside lifespan, the instrumentor may capture the wrong tracer provider.\nError handling around init_telemetry is intentional — telemetry should never crash the application. If Alloy is down or the endpoint is misconfigured, the service still runs.\nStep 4: Grafana Alloy on EC2 Grafana Alloy acts as the local collector. It receives OTLP traces from the FastAPI app, batches them, and forwards to Grafana Cloud:\notelcol.receiver.otlp \u0026#34;default\u0026#34; { grpc { endpoint = \u0026#34;127.0.0.1:4317\u0026#34; } http { endpoint = \u0026#34;127.0.0.1:4318\u0026#34; } output { traces = [otelcol.processor.batch.default.input] } } otelcol.processor.batch \u0026#34;default\u0026#34; { timeout = \u0026#34;5s\u0026#34; output { traces = [otelcol.exporter.otlphttp.grafana_cloud.input] } } otelcol.exporter.otlphttp \u0026#34;grafana_cloud\u0026#34; { client { endpoint = env(\u0026#34;GRAFANA_OTLP_ENDPOINT\u0026#34;) auth = otelcol.auth.basic.grafana_cloud.handler } } otelcol.auth.basic \u0026#34;grafana_cloud\u0026#34; { username = env(\u0026#34;GRAFANA_INSTANCE_ID\u0026#34;) password = env(\u0026#34;GRAFANA_API_TOKEN\u0026#34;) } Alloy binds to 127.0.0.1 only — no external exposure. The authentication credentials come from environment variables, set via systemd unit file on the EC2 instance.\nThe 5-second batch timeout is a good balance: short enough for near-real-time visibility, but enough to bundle multiple spans per request.\nThe Debugging Journey Getting from \u0026ldquo;install packages\u0026rdquo; to \u0026ldquo;traces visible in Grafana\u0026rdquo; took about 12 iterations. Here is the sequence of issues and fixes:\nStep Problem Fix 1 No traces appearing at all TracerProvider was set inside lifespan; FastAPIInstrumentor had already grabbed the no-op provider. Moved to module level. 2 Traces lost on process exit BatchSpanProcessor background thread did not flush before uv run terminated. Switched to SimpleSpanProcessor. 3 gRPC connection failures grpcio had intermittent issues on the EC2 instance. Switched to OTLP HTTP exporter. 4 App crashed when Alloy was down No error handling around init_telemetry. Added try/except in lifespan. 5 FastAPI spans missing custom attributes FastAPIInstrumentor was called before tracer provider was set. Ensured provider is set at import time, instrumentor at module level after app creation. The most subtle bug was issue 1. OpenTelemetry\u0026rsquo;s global tracer provider is a singleton — once FastAPIInstrumentor reads it, it caches that reference. If the global provider is still the no-op default at that point, all auto-instrumented spans go nowhere, even if you set the real provider later.\nWhat Shows Up in Grafana After everything is wired correctly, filtering by service.name = hybrid-image-search in Grafana Tempo shows the full request waterfall:\nflowchart TD A[\"GET /search\"] --\u003e B[\"search.vector_query\"] A --\u003e C[\"search.rerank\"] A --\u003e D[\"generation.injection\"] D --\u003e E[\"generation.prompt_build\"] E --\u003e F[\"generation.gemini_api\"]Each span carries:\nDuration — wall clock time process.memory_mb — RSS at span end process.memory_delta_kb — memory allocated during the span process.cpu_user_ms / process.cpu_system_ms — CPU time consumed This makes it straightforward to identify that, for example, generation.gemini_api spans average 1.2 seconds and allocate ~8MB, while search.vector_query takes 200ms with negligible memory impact.\nLessons Learned Set TracerProvider at import time. Any instrumentor that runs at import or module level will capture whatever provider exists at that moment. Late initialization means silent no-ops.\nUse SimpleSpanProcessor in dev and short-lived processes. BatchSpanProcessor is better for production throughput, but it relies on clean shutdown. If your process exits abruptly, spans are lost.\nOTLP HTTP is more portable than gRPC. Fewer dependencies, simpler debugging (you can curl the endpoint), and no protobuf compilation issues.\nAlloy is a better local collector than direct-to-cloud export. It decouples the app from Grafana Cloud auth, handles batching and retries, and means the app only needs to know about localhost:4318.\nWrap telemetry init in error handling. Observability should degrade gracefully. A misconfigured collector should never take down your application.\nCustom resource metrics via psutil are cheap and valuable. The overhead of memory_info() and cpu_times() per span is negligible, but having memory/CPU data alongside timing data makes performance debugging much richer.\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-grafana-otel-fastapi/cover-en.jpg","permalink":"/posts/2026-04-06-grafana-otel-fastapi/","title":"Setting Up Grafana Cloud Observability for a FastAPI Application"},{"content":"Overview Anthropic\u0026rsquo;s Claude has evolved from a chatbot into an entire ecosystem. Chat is the conversational interface on web and desktop. Cowork is a desktop agent that controls your files, browser, and connected apps. Code is a terminal-based CLI that gives developers full access to codebases and system-level tools. This post breaks down how the three products differ, when to use each one, and why Claude Code\u0026rsquo;s token costs grow geometrically — plus practical tips to keep them under control.\nChat, Cowork, Code — The Capability Spectrum The three products sit on a spectrum of accessibility versus control.\ngraph LR A[\"Chat \u0026lt;br/\u0026gt; Web + Desktop \u0026lt;br/\u0026gt; Conversation-first\"] --\u003e B[\"Cowork \u0026lt;br/\u0026gt; Desktop only \u0026lt;br/\u0026gt; Files + Browser + Apps\"] B --\u003e C[\"Code \u0026lt;br/\u0026gt; Terminal CLI \u0026lt;br/\u0026gt; Full codebase + system\"] style A fill:#e8f4f8,stroke:#2196F3 style B fill:#fff3e0,stroke:#FF9800 style C fill:#fce4ec,stroke:#E91E63Chat — The Foundation Platforms: Web (claude.ai) + desktop app Key features: Projects (similar to GPTs), Google Docs integration, connectors, web search, Research mode Best for: Everyone — writing, summarization, Q\u0026amp;A, research Claude Chat\u0026rsquo;s edge is long-document processing and writing quality. Where ChatGPT leans creative and Gemini excels at multimodal + Google Workspace integration, Claude is built for handling large volumes of text with precision.\nCowork — The Agent for Non-Developers Cowork is essentially \u0026ldquo;Claude Code for non-developers.\u0026rdquo; It runs exclusively on the Windows/Mac desktop app and is far easier to set up than Code.\nFive core capabilities:\nCapability What it does Example File management Analyze and create local files Receipt photos → Excel spreadsheet Browser control AI clicks through Chrome directly Automated web navigation and form filling App connectors Gmail, Calendar, Notion, Slack integration Slack channel analysis, email automation Skills Bundled, repeatable workflows Automated newsletter generation Plugins Connectors + Skills combined LinkedIn posting automation Code — The Developer\u0026rsquo;s Terminal Companion Claude Code is a CLI tool that runs in the terminal with access to your entire codebase.\nKey differences from Cowork:\ngraph TB subgraph Cowork[\"Cowork Domain\"] F1[\"File analysis/creation\"] F2[\"Browser automation\"] F3[\"App connectors\"] F4[\"Skills/Plugins\"] end subgraph Code[\"Code Domain\"] C1[\"Full codebase access\"] C2[\"Sub-agent execution\"] C3[\"Git integration\"] C4[\"MCP server connections\"] C5[\"Terminal command execution\"] end Cowork --\u003e|\"When you need more power\"| Code style Cowork fill:#fff3e0,stroke:#FF9800 style Code fill:#fce4ec,stroke:#E91E63 Cowork: Day-to-day task automation — file analysis, browser control, app integration Code: Software development — custom code, advanced automation, system-level control Recommended path: Start with Cowork, graduate to Code when you need the advanced capabilities.\nPricing Plan Monthly Notes Free $0 Basic chat only Pro $20 Chat + Cowork + Code access Max $100/$200 High-volume usage, higher token limits Use the desktop app over the web. Cowork and Code features are limited in the browser.\nClaude Code Token Optimization — Understanding the Cost Curve Using Claude Code carelessly causes token costs to grow geometrically. Understanding the underlying mechanism is essential.\nWhy Costs Grow Geometrically Claude Code re-reads the entire conversation with every message. As conversations grow longer, each subsequent message consumes more tokens than the last.\ngraph TD M1[\"Message 1 \u0026lt;br/\u0026gt; ~7.5K tokens\"] --\u003e M10[\"Message 10 \u0026lt;br/\u0026gt; ~25K tokens\"] M10 --\u003e M20[\"Message 20 \u0026lt;br/\u0026gt; ~100K tokens\"] M20 --\u003e M30[\"Message 30 \u0026lt;br/\u0026gt; ~232K tokens\"] M30 -.- NOTE[\"Message 30 costs \u0026lt;br/\u0026gt; 31x more than Message 1\"] style M1 fill:#c8e6c9,stroke:#4CAF50 style M10 fill:#fff9c4,stroke:#FFC107 style M20 fill:#ffe0b2,stroke:#FF9800 style M30 fill:#ffcdd2,stroke:#F44336 style NOTE fill:#f5f5f5,stroke:#9E9E9EEssential Tips for Beginners (19 of 52) The source video covers 52 tips total. Here are the key beginner-level ones.\nConversation management\nMake /clear a habit — Reset after each task. This zeroes out token accumulation. Scope your prompts — \u0026ldquo;Fix line 10 of readme\u0026rdquo; beats \u0026ldquo;fix this file\u0026rdquo; Batch simple commands — Combine easy tasks into a single message Paste only what\u0026rsquo;s relevant — Code snippets, not entire files Stay at the keyboard — Unattended sessions risk infinite loops Model selection 6. Default to Sonnet — Opus is expensive for routine work 7. Match model to task:\nHaiku: Simple questions, file renames Sonnet: General development (good default) Opus: Architecture decisions, deep debugging Other settings and habits\nKeep unnecessary files out of context Use .claudeignore to exclude large files and directories Keep task scope small Clean up conversations after verifying results Related Tools — Quick Links Tool Description Whispree macOS menu bar STT app for Apple Silicon. Fully local, open-source. Whisper + LLM post-processing with Korean-English code-switching optimization. Voice-to-prompt is 3-5x faster than typing. OpenClaude Open-source coding agent CLI in the style of Claude Code. Supports OpenAI, Gemini, DeepSeek, Ollama, and 200+ models. Includes VS Code extension. WorkMux Run multiple AI agents in parallel from your terminal. Source Videos Claude Cowork is easier and more powerful than Code for beginners (Korean) Understanding the differences between Claude Chat, Cowork, and Code (Korean) Your Claude Code tokens are melting — Beginner tips, Part 1 (Korean) Takeaway The Claude ecosystem forms a clear spectrum: Chat for everyone, Cowork for business automation, Code for developers. Start with the tool that matches your skill level, but if you use Claude Code, understand the token structure first. When message 30 costs 31 times more than message 1, optimization is not optional — it is the price of admission.\n","date":"2026-04-06T00:00:00+09:00","image":"/images/posts/2026-04-06-claude-chat-cowork-code/cover-en.jpg","permalink":"/posts/claude-chat-cowork-code/","title":"The Claude Ecosystem Explained — Chat, Cowork, and Code"},{"content":"Overview On April 1, 2026, a developer using the Claude Code Max 20 plan ($200/month) burned through 100% of their usage in roughly 70 minutes during a normal coding session. JSONL log analysis revealed an average cache read ratio of 36.1% (minimum 21.1%) — far below the 90%+ that should be expected. Every token was billed at full price.\nThat incident gave rise to ArkNill/claude-code-cache-analysis: a community-driven investigation that grew from personal debugging into a systematic, proxy-measured analysis confirming 7 bugs across 5 layers.\nBackground: A Plan Drained in 70 Minutes The immediate workaround was downgrading from v2.1.89 to v2.1.68 (npm). Cache read immediately recovered to 97.6% average (119 entries), confirming the regression was v2.1.89-specific.\nA transparent monitoring proxy (cc-relay) was then configured using the ANTHROPIC_BASE_URL environment variable to capture per-request data. Combined with reports from 91+ related GitHub issues and contributors including @Sn3th, @rwp65, and a dozen others, the scattered findings were consolidated into structured, measured analysis.\nThe 7 Confirmed Bugs (as of v2.1.91) flowchart TD A[\"Claude Code Request\"] --\u003e B{\"Version Check\"} B --\u003e|\"v2.1.89 standalone\"| C[\"B1: Sentinel \u0026lt;br/\u0026gt; Cache prefix corruption \u0026lt;br/\u0026gt; → 4-17% cache read\"] B --\u003e|\"--resume flag\"| D[\"B2: Resume \u0026lt;br/\u0026gt; Full context replayed uncached \u0026lt;br/\u0026gt; → 20x cost per resume\"] B --\u003e|\"v2.1.91\"| E[\"Cache normal: 95-99%\"] E --\u003e F{\"Still active bugs\"} F --\u003e G[\"B3: False RL \u0026lt;br/\u0026gt; Fake rate limit error \u0026lt;br/\u0026gt; 0 API calls made\"] F --\u003e H[\"B4: Microcompact \u0026lt;br/\u0026gt; Tool results silently cleared \u0026lt;br/\u0026gt; mid-session\"] F --\u003e I[\"B5: Budget Cap \u0026lt;br/\u0026gt; 200K aggregate limit \u0026lt;br/\u0026gt; → truncated to 1-41 chars\"] F --\u003e J[\"B8: Log Inflation \u0026lt;br/\u0026gt; JSONL entry duplication \u0026lt;br/\u0026gt; → 2.87x local inflation\"] Bug What It Does Impact Status (v2.1.91) B1 Sentinel Standalone binary corrupts cache prefix 4-17% cache read (v2.1.89) Fixed B2 Resume --resume replays full context uncached 20x cost per resume Fixed B3 False RL Client blocks API calls with fake error Instant \u0026ldquo;Rate limit reached\u0026rdquo;, 0 API calls Unfixed B4 Microcompact Tool results silently cleared mid-session Context quality degrades Unfixed B5 Budget Cap 200K aggregate limit on tool results Older results truncated to 1-41 chars Unfixed (MCP override only) B8 Log Inflation Extended thinking duplicates JSONL entries 2.87x local token inflation Unfixed Server Peak-hour limits tightened + 1M billing bug Reduced effective quota By design Key Bug Deep Dives B1: Sentinel Bug (Fixed) Claude Code ships in two forms. The standalone binary is a single ELF 64-bit executable (~228MB) with an embedded Bun runtime. It contained a Sentinel replacement mechanism (cch=00000) that corrupted cache prefixes — causing dramatically low cache read rates.\nThe npm package (cli.js, ~13MB, executed by Node.js) does not contain this logic and was immune to Bug 1.\nIn v2.1.91, routing stripAnsi through Bun.stripANSI appears to have closed the Sentinel gap. Both npm and standalone now achieve identical 84.7% cold-start cache read.\nB2: Resume Bug (Fixed) Using --resume caused the entire conversation context to be sent as billable input with no cache benefit — up to 20x the expected cost per resume. Fixed in v2.1.91\u0026rsquo;s transcript chain break patch, but avoiding --resume and --continue entirely is still the recommended approach.\nB3: False Rate Limiting (Unfixed) The client generates \u0026ldquo;Rate limit reached\u0026rdquo; errors locally without ever making an API call. Measured across 151 entries / 65 sessions. The session appears throttled while the API has not been contacted at all.\nB4 \u0026amp; B5: Microcompact and Budget Cap (Unfixed) Tool results are silently deleted mid-session (327 events detected), and a 200K aggregate limit causes older file read results to be truncated to 1-41 characters. After approximately 15-20 tool uses, earlier context is effectively gone without any warning.\nCache TTL (Not a Bug) Idle gaps of 13+ hours cause a full cache rebuild on resume. Cache write costs $3.75/M versus read at $0.30/M — a 12.5x difference. Shorter gaps (5-26 minutes) maintain 96%+ cache. This is by design (5-minute TTL), not a bug — but worth understanding.\nnpm vs Standalone: v2.1.90 Benchmark Metric npm Standalone Winner Overall cache read % 86.4% 86.2% Tie Stable session 95-99.8% 95-99.7% Tie Sub-agent cold start 79-87% 47-67% npm Sub-agent warmed (5+ req) 87-94% 94-99% Tie Usage for full test suite 7% of Max 20 5% of Max 20 Tie In v2.1.91, the sub-agent cold start gap is also closed. Both achieve 84.7% cold-start cache read identically.\nAnthropic\u0026rsquo;s Official Position Lydia Hallie from Anthropic posted on X (April 2):\n\u0026ldquo;Peak-hour limits are tighter and 1M-context sessions got bigger, that\u0026rsquo;s most of what you\u0026rsquo;re feeling. We fixed a few bugs along the way, but none were over-charging you.\u0026rdquo;\nShe recommended using Sonnet as default, lowering effort level, starting fresh instead of resuming, and capping context with CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000.\nThe analysis agrees that the cache bugs are fixed, but identifies five additional active mechanisms that Anthropic\u0026rsquo;s statement does not address.\nWhat You Can Do Right Now Update to v2.1.91 — fixes the cache regression responsible for the worst drain npm and standalone are equivalent on v2.1.91 — either install method is fine Do not use --resume or --continue — replays full context as billable input Start fresh sessions periodically — the 200K tool result cap (B5) means older file reads silently truncate after ~15-20 tool uses Avoid /dream and /insights — silent background API calls that consume quota // ~/.claude/settings.json — disable auto-update { \u0026#34;env\u0026#34;: { \u0026#34;DISABLE_AUTOUPDATER\u0026#34;: \u0026#34;1\u0026#34; } } Closing Thoughts This analysis is a strong example of community-driven debugging at its best. A simple transparent proxy via ANTHROPIC_BASE_URL, combined with systematic testing across v2.1.89 through v2.1.91, produced measured evidence behind phenomena reported across 91+ GitHub issues.\nThe cache bugs (B1, B2) are fixed in v2.1.91. The remaining five bugs are still active. For Max plan users, applying the practical mitigations above and pinning a validated version with DISABLE_AUTOUPDATER is the most reliable defensive posture until Anthropic addresses the remaining issues.\nSource repository: ArkNill/claude-code-cache-analysis\n","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-claude-code-cache-analysis/cover.jpg","permalink":"/posts/2026-04-03-claude-code-cache-analysis/","title":"Claude Code Cache Bug Analysis: 7 Confirmed Bugs and Their Impact"},{"content":"Overview Claude Code now ships a full plugin marketplace ecosystem. This is not just an extension installer — it is a complete distribution system with centralized discovery, version pinning, automatic updates, permission controls, and support for multiple source backends including GitHub, npm, GitLab, and local paths. This post breaks down every layer of the system from plugin authoring to marketplace distribution and permission management.\nMarketplace Architecture The plugin system is organized into three tiers: the marketplace catalog, individual plugin sources, and the local cache. The flow from developer to end user involves several distinct stages.\nflowchart TD A[\"Developer \u0026lt;br/\u0026gt; Authors Plugin\"] --\u003e B[\"plugin.json \u0026lt;br/\u0026gt; Manifest\"] B --\u003e C[\"marketplace.json \u0026lt;br/\u0026gt; Catalog Entry\"] C --\u003e D{\"Distribution Source\"} D --\u003e E[\"GitHub \u0026lt;br/\u0026gt; owner/repo\"] D --\u003e F[\"GitLab \u0026lt;br/\u0026gt; git URL\"] D --\u003e G[\"npm \u0026lt;br/\u0026gt; package registry\"] D --\u003e H[\"Relative Path \u0026lt;br/\u0026gt; ./plugins/...\"] E --\u003e I[\"End User\"] F --\u003e I G --\u003e I H --\u003e I I --\u003e J[\"\u0026lt;br/\u0026gt;/plugin marketplace add\u0026lt;br/\u0026gt;Register Catalog\"] J --\u003e K[\"\u0026lt;br/\u0026gt;/plugin install\u0026lt;br/\u0026gt;Install Plugin\"] K --\u003e L[\"~/.claude/plugins/cache \u0026lt;br/\u0026gt; Local Cache\"] L --\u003e M[\"Claude Code \u0026lt;br/\u0026gt; Plugin Active\"]Creating Plugins Plugin Directory Structure Every plugin revolves around a .claude-plugin/plugin.json manifest. The most common mistake is placing functional directories inside .claude-plugin/. Only plugin.json belongs there — everything else lives at the plugin root.\nmy-plugin/ ├── .claude-plugin/ │ └── plugin.json ← manifest only ├── skills/ │ └── code-review/ │ └── SKILL.md ├── commands/ ├── agents/ ├── hooks/ │ └── hooks.json ├── .mcp.json ← MCP server config ├── .lsp.json ← LSP server config ├── bin/ ← executables added to Bash PATH └── settings.json ← default settings on plugin enable The plugin.json Manifest { \u0026#34;name\u0026#34;: \u0026#34;quality-review-plugin\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;Adds a /quality-review skill for quick code reviews\u0026#34;, \u0026#34;version\u0026#34;: \u0026#34;1.0.0\u0026#34;, \u0026#34;author\u0026#34;: { \u0026#34;name\u0026#34;: \u0026#34;Your Name\u0026#34;, \u0026#34;email\u0026#34;: \u0026#34;you@example.com\u0026#34; }, \u0026#34;homepage\u0026#34;: \u0026#34;https://github.com/you/quality-review-plugin\u0026#34;, \u0026#34;repository\u0026#34;: \u0026#34;https://github.com/you/quality-review-plugin\u0026#34;, \u0026#34;license\u0026#34;: \u0026#34;MIT\u0026#34; } The name field defines the skill namespace. A plugin named quality-review-plugin exposes its hello skill as /quality-review-plugin:hello. This namespacing prevents conflicts when multiple plugins define skills with the same name. To change the prefix, update name in plugin.json.\nAdding Skills Skills live under skills/, where the folder name becomes the skill name. Claude automatically invokes model-driven skills based on task context when a description is provided in the frontmatter.\n--- name: code-review description: Reviews code for best practices and potential issues. Use when reviewing code, checking PRs, or analyzing code quality. --- When reviewing code, check for: 1. Code organization and structure 2. Error handling 3. Security concerns 4. Test coverage The $ARGUMENTS placeholder captures any text the user provides after the skill name, enabling dynamic input: /my-plugin:hello Alex.\nAdding LSP Servers The official marketplace already provides LSP plugins for TypeScript, Python, Rust, Go, C/C++, Java, Kotlin, PHP, Lua, Swift, and C#. For unsupported languages, define a custom .lsp.json at the plugin root:\n{ \u0026#34;go\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;gopls\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;serve\u0026#34;], \u0026#34;extensionToLanguage\u0026#34;: { \u0026#34;.go\u0026#34;: \u0026#34;go\u0026#34; } } } Once installed, Claude gains two capabilities automatically: automatic diagnostics after every file edit (type errors, missing imports, syntax issues) and code navigation (jump to definition, find references, call hierarchies).\nDefault Settings Plugins can ship a settings.json to configure defaults when the plugin is enabled. Currently only the agent key is supported, which activates one of the plugin\u0026rsquo;s custom agents as the main thread:\n{ \u0026#34;agent\u0026#34;: \u0026#34;security-reviewer\u0026#34; } The Marketplace Schema marketplace.json Structure The marketplace catalog lives at .claude-plugin/marketplace.json in the repository root.\n{ \u0026#34;name\u0026#34;: \u0026#34;company-tools\u0026#34;, \u0026#34;owner\u0026#34;: { \u0026#34;name\u0026#34;: \u0026#34;DevTools Team\u0026#34;, \u0026#34;email\u0026#34;: \u0026#34;devtools@example.com\u0026#34; }, \u0026#34;metadata\u0026#34;: { \u0026#34;description\u0026#34;: \u0026#34;Internal developer tools marketplace\u0026#34;, \u0026#34;version\u0026#34;: \u0026#34;1.0.0\u0026#34;, \u0026#34;pluginRoot\u0026#34;: \u0026#34;./plugins\u0026#34; }, \u0026#34;plugins\u0026#34;: [ { \u0026#34;name\u0026#34;: \u0026#34;code-formatter\u0026#34;, \u0026#34;source\u0026#34;: \u0026#34;./plugins/formatter\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;Automatic code formatting on save\u0026#34;, \u0026#34;version\u0026#34;: \u0026#34;2.1.0\u0026#34;, \u0026#34;author\u0026#34;: { \u0026#34;name\u0026#34;: \u0026#34;DevTools Team\u0026#34; } }, { \u0026#34;name\u0026#34;: \u0026#34;deployment-tools\u0026#34;, \u0026#34;source\u0026#34;: { \u0026#34;source\u0026#34;: \u0026#34;github\u0026#34;, \u0026#34;repo\u0026#34;: \u0026#34;company/deploy-plugin\u0026#34; }, \u0026#34;description\u0026#34;: \u0026#34;Deployment automation tools\u0026#34; } ] } The metadata.pluginRoot field is a convenience shortcut: setting it to \u0026quot;./plugins\u0026quot; lets you write \u0026quot;source\u0026quot;: \u0026quot;formatter\u0026quot; instead of \u0026quot;source\u0026quot;: \u0026quot;./plugins/formatter\u0026quot; for each plugin entry.\nReserved names: The following are blocked for third-party use: claude-code-marketplace, claude-code-plugins, claude-plugins-official, anthropic-marketplace, anthropic-plugins, agent-skills, knowledge-work-plugins, life-sciences. Names that impersonate official marketplaces (like official-claude-plugins) are also blocked.\nPlugin Source Types Source Format Notes Relative path \u0026quot;./plugins/my-plugin\u0026quot; Git-based distribution only; fails with URL-based delivery GitHub {\u0026quot;source\u0026quot;: \u0026quot;github\u0026quot;, \u0026quot;repo\u0026quot;: \u0026quot;owner/repo\u0026quot;} Supports ref and sha pinning Git URL {\u0026quot;source\u0026quot;: \u0026quot;url\u0026quot;, \u0026quot;url\u0026quot;: \u0026quot;https://...\u0026quot;} Works with GitLab, Bitbucket, self-hosted Git subdirectory {\u0026quot;source\u0026quot;: \u0026quot;git-subdir\u0026quot;, \u0026quot;url\u0026quot;: \u0026quot;...\u0026quot;, \u0026quot;path\u0026quot;: \u0026quot;tools/plugin\u0026quot;} Sparse clone for monorepos npm {\u0026quot;source\u0026quot;: \u0026quot;npm\u0026quot;, \u0026quot;package\u0026quot;: \u0026quot;pkg-name\u0026quot;} Installed via npm install Critical distinction: The marketplace source (where to fetch marketplace.json) and plugin sources (where to fetch individual plugins) are independent concepts. The marketplace source supports ref only; plugin sources support both ref (branch/tag) and sha (exact commit).\nVersion Pinning with sha { \u0026#34;name\u0026#34;: \u0026#34;my-plugin\u0026#34;, \u0026#34;source\u0026#34;: { \u0026#34;source\u0026#34;: \u0026#34;github\u0026#34;, \u0026#34;repo\u0026#34;: \u0026#34;owner/plugin-repo\u0026#34;, \u0026#34;ref\u0026#34;: \u0026#34;v2.0.0\u0026#34;, \u0026#34;sha\u0026#34;: \u0026#34;a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0\u0026#34; } } Using sha pins to an exact commit, guaranteeing reproducible installs regardless of branch updates. This is the recommended approach for production environments.\nStrict Mode The strict field (default: true) controls whether plugin.json is the authority for component definitions. When strict: true, the plugin manifest takes precedence. Set strict: false to allow marketplace-level overrides:\n{ \u0026#34;name\u0026#34;: \u0026#34;my-plugin\u0026#34;, \u0026#34;source\u0026#34;: \u0026#34;./plugins/my-plugin\u0026#34;, \u0026#34;strict\u0026#34;: false } Distribution Strategies GitHub (Recommended) Push your repository with a .claude-plugin/marketplace.json at the root. Users add it with:\n/plugin marketplace add your-org/your-marketplace-repo For specific branches or tags:\n/plugin marketplace add https://gitlab.com/company/plugins.git#v1.0.0 Team Auto-Configuration Add marketplace configuration to .claude/settings.json in a shared repository. When team members trust the folder, Claude Code automatically registers the marketplace:\n{ \u0026#34;extraKnownMarketplaces\u0026#34;: [ { \u0026#34;name\u0026#34;: \u0026#34;company-tools\u0026#34;, \u0026#34;source\u0026#34;: \u0026#34;github\u0026#34;, \u0026#34;repo\u0026#34;: \u0026#34;myorg/claude-plugins\u0026#34; } ] } Container Pre-Population For CI/CD and containerized environments, forcedPlugins in managed settings installs plugins automatically without user interaction. This is the standard approach for enterprise deployments.\nAuto-Update Configuration Official Anthropic marketplaces have auto-update enabled by default. Third-party marketplaces default to disabled. To keep plugin updates enabled while managing Claude Code updates manually:\nexport DISABLE_AUTOUPDATER=1 export FORCE_AUTOUPDATE_PLUGINS=1 CLI Reference Command Description /plugin marketplace add \u0026lt;source\u0026gt; Register a marketplace /plugin marketplace list List registered marketplaces /plugin marketplace update \u0026lt;name\u0026gt; Fetch latest catalog /plugin marketplace remove \u0026lt;name\u0026gt; Remove marketplace and its plugins /plugin install \u0026lt;name\u0026gt;@\u0026lt;marketplace\u0026gt; Install a plugin /plugin disable \u0026lt;name\u0026gt;@\u0026lt;marketplace\u0026gt; Disable without uninstalling /plugin enable \u0026lt;name\u0026gt;@\u0026lt;marketplace\u0026gt; Re-enable a disabled plugin /plugin uninstall \u0026lt;name\u0026gt;@\u0026lt;marketplace\u0026gt; Remove a plugin /reload-plugins Reload all plugins without restarting Installation scopes:\nUser scope (default): applies across all projects Project scope: shared with collaborators via .claude/settings.json Local scope: personal, current repository only Permission System Integration Rule Evaluation Order Permissions follow a strict deny → ask → allow precedence. The first matching rule wins, so deny rules always take precedence over allow rules.\n{ \u0026#34;permissions\u0026#34;: { \u0026#34;allow\u0026#34;: [ \u0026#34;Bash(npm run *)\u0026#34;, \u0026#34;Bash(git commit *)\u0026#34;, \u0026#34;WebFetch(domain:github.com)\u0026#34; ], \u0026#34;deny\u0026#34;: [ \u0026#34;Bash(git push *)\u0026#34;, \u0026#34;Read(~/.ssh/**)\u0026#34; ] } } Permission Modes Mode Behavior default Prompts on first use of each tool acceptEdits Auto-accepts file edits for the session plan Analysis only; no file modification or command execution auto Background safety checks then auto-approve (research preview) dontAsk Denies all tools not pre-approved bypassPermissions Skips all prompts (isolated environments only) bypassPermissions still prompts for writes to .git, .claude, .vscode, .idea, and .husky to prevent accidental corruption.\nFine-Grained Rule Syntax { \u0026#34;permissions\u0026#34;: { \u0026#34;allow\u0026#34;: [ \u0026#34;Bash(npm run build)\u0026#34;, \u0026#34;Bash(git * main)\u0026#34;, \u0026#34;mcp__puppeteer__puppeteer_navigate\u0026#34;, \u0026#34;Agent(Explore)\u0026#34;, \u0026#34;Read(/src/**)\u0026#34; ], \u0026#34;deny\u0026#34;: [ \u0026#34;Agent(Plan)\u0026#34;, \u0026#34;Edit(//etc/**)\u0026#34; ] } } Path pattern prefixes for Read/Edit rules:\n//path — absolute path from filesystem root ~/path — relative to home directory /path — relative to project root path or ./path — relative to current directory Extending Permissions with Hooks PreToolUse hooks run before the permission prompt and can dynamically block or approve tool calls:\n{ \u0026#34;hooks\u0026#34;: { \u0026#34;PreToolUse\u0026#34;: [ { \u0026#34;matcher\u0026#34;: \u0026#34;Bash\u0026#34;, \u0026#34;hooks\u0026#34;: [{ \u0026#34;type\u0026#34;: \u0026#34;command\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;validate-command.sh\u0026#34; }] } ] } } A hook exiting with code 2 blocks the call even if an allow rule would otherwise permit it. A hook returning \u0026ldquo;allow\u0026rdquo; does not bypass deny rules — those still apply.\nPermissions vs Sandboxing These are complementary, not interchangeable:\nPermissions control which tools Claude Code can use and which paths/domains it can access Sandboxing provides OS-level enforcement for Bash command filesystem and network access A Read(./.env) deny rule blocks the Read tool, but does not prevent cat .env in Bash. For true OS-level file access control, enable sandboxing alongside permission rules.\nOfficial Marketplace Plugin Catalog The official marketplace (claude-plugins-official) is automatically available in every Claude Code installation.\nCode Intelligence (LSP): clangd-lsp, csharp-lsp, gopls-lsp, jdtls-lsp, kotlin-lsp, lua-lsp, php-lsp, pyright-lsp, rust-analyzer-lsp, swift-lsp, typescript-lsp\nExternal Integrations: github, gitlab, atlassian (Jira/Confluence), asana, linear, notion, figma, vercel, firebase, supabase, slack, sentry\nDevelopment Workflows: commit-commands, pr-review-toolkit, agent-sdk-dev, plugin-dev\nOutput Styles: explanatory-output-style, learning-output-style\nTo submit a plugin: claude.ai/settings/plugins/submit or platform.claude.com/plugins/submit\nQuick Links Create and distribute a plugin marketplace Create plugins guide Discover and install prebuilt plugins Configure permissions Official plugin submission (Claude.ai) Official plugin submission (Console) Plugin catalog browser Insights Plugin vs standalone configuration is a distribution decision, not a technical one. Both approaches support the same set of features. The real question is: does this configuration need to be shared? Standalone .claude/ is faster to iterate on; plugins are the right choice once you need versioned, shareable, marketplace-distributed functionality. The only functional trade-off is that plugin skills get namespaced (/my-plugin:hello instead of /hello).\nMarketplace source and plugin source independence is the key architectural insight. A single marketplace catalog at acme-corp/plugin-catalog can reference plugins from a dozen different repositories, each pinned to different branches or commits. This separation lets you evolve the catalog and the plugins independently.\nRelative paths in marketplace.json are a subtle footgun. They work only when users add the marketplace via Git (GitHub, GitLab, git URL). If you distribute your marketplace.json via a direct URL, relative paths silently fail to resolve. Always use GitHub, npm, or git URL sources when targeting URL-based distribution.\nPin to sha in production. Using ref (branch or tag) means a branch push or tag move can silently change what gets installed. SHA pinning guarantees reproducibility. Pair with release channels (separate stable and beta branches) for a proper versioning workflow.\nThe bypassPermissions mode is for containers only. It looks tempting for development speed, but it removes meaningful protection from prompt injection attacks. The acceptEdits mode offers a better balance: it auto-approves file edits while still prompting for Bash commands and web fetches. For fully automated pipelines, use bypassPermissions inside a sandboxed container where damage is bounded.\n","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-claude-code-plugin-marketplace/cover.jpg","permalink":"/posts/2026-04-03-claude-code-plugin-marketplace/","title":"Claude Code Plugin Marketplace: A Deep Dive"},{"content":"Overview In the previous post (Dev Log #7) I implemented LLM-based automatic tone/angle category injection. This sprint focused on making that implementation actually work in production.\nThree major areas were addressed. First, the remaining local filesystem reads for category images were fully migrated to S3. Second, a CUDA dependency conflict that crashed the EC2 server on startup was resolved by pinning torch to a CPU-only index. Third, dominant hex colors are now extracted from tone reference images, stored in the database, and rendered as color swatches in the structured prompt UI.\nTone/Angle Category Images — Migrating to S3 The previous implementation left a subtle bug in injection.py: _list_category_images() was reading from data/tone_angle_image_ref/{category}/ via local os.listdir(). Since EC2 instances don\u0026rsquo;t have this directory, the function always returned an empty list, silently disabling the entire injection feature on production.\nThe fix was straightforward — thread an S3Storage instance through to select_auto_injection() and replace the directory walk with an s3.list_objects(prefix) call.\n# Before: reads local directory def _list_category_images(category: str) -\u0026gt; list[str]: folder = TONE_ANGLE_IMAGE_DIR / category return [f.name for f in folder.iterdir() if ...] # After: lists from S3 by prefix def _list_category_images(category: str, s3: S3Storage) -\u0026gt; list[str]: prefix = f\u0026#34;refs/tone_angle_image_ref/{category}/\u0026#34; keys = s3.list_objects(prefix) return [basename(k) for k in keys if k.lower().endswith(IMAGE_EXTS)] The S3 key cache (build_ref_key_cache) was also updated so that nested paths like data/tone_angle_image_ref/a(natural,film) are correctly mapped to refs/tone_angle_image_ref/a(natural,film)/{filename} by using Path.relative_to(\u0026quot;data\u0026quot;).\nEC2 Deployment — Pinning CPU-Only torch The production EC2 instance was failing to start with a missing libcudnn.so.9 error when loading the embedding model. sentence-transformers pulls in torch as a dependency, and uv was resolving to a CUDA-enabled build that referenced GPU libraries not present on the instance.\nThe dev environment had both nvidia-cudnn-cu12 and nvidia-cudnn-cu13 installed, masking the issue. Production only had cu13, causing the crash.\nThe fix is to pin torch to a CPU-only build directly in pyproject.toml, bypassing the CUDA resolution path entirely.\n# pyproject.toml — explicit CPU-only torch index [[tool.uv.index]] name = \u0026#34;pytorch-cpu\u0026#34; url = \u0026#34;https://download.pytorch.org/whl/cpu\u0026#34; explicit = true [tool.uv.sources] torch = [{ index = \u0026#34;pytorch-cpu\u0026#34; }] With this in place, uv sync always installs the CPU build regardless of the host GPU configuration.\nHex Color Extraction — Dominant Color Analysis To give users a visual sense of what tone a reference image represents, dominant hex colors are now extracted at generation time and stored in the generation_logs table under a new hex_colors JSON column.\nThe pipeline looks like this:\nflowchart TD A[\"Image generation request\"] --\u003e B[\"LLM category classification\"] B --\u003e C[\"List category images from S3\"] C --\u003e D[\"Select random images \u0026lt;br/\u0026gt; (tone + angle)\"] D --\u003e E[\"Extract dominant hex colors \u0026lt;br/\u0026gt; (PIL + K-Means)\"] E --\u003e F[\"Store hex_colors in \u0026lt;br/\u0026gt; generation_logs\"] F --\u003e G[\"Gemini API image generation\"] G --\u003e H[\"Return hex_colors in API response\"] H --\u003e I[\"Structured prompt UI \u0026lt;br/\u0026gt; renders color swatches\"]Color extraction uses scikit-learn\u0026rsquo;s KMeans to cluster pixel values and returns the centroid of each cluster as a hex string.\ndef extract_dominant_hex_colors(image_bytes: bytes, n_colors: int = 5) -\u0026gt; list[str]: img = Image.open(io.BytesIO(image_bytes)).convert(\u0026#34;RGB\u0026#34;) img = img.resize((100, 100)) # downscale for speed pixels = np.array(img).reshape(-1, 3) km = KMeans(n_clusters=n_colors, n_init=3) km.fit(pixels) centers = km.cluster_centers_.astype(int) return [f\u0026#34;#{r:02x}{g:02x}{b:02x}\u0026#34; for r, g, b in centers] The extracted values are passed through InjectedReference.hex_colors in the API response and consumed by the frontend.\nStructured Prompt Display — with Hex Swatches The image detail modal\u0026rsquo;s \u0026ldquo;작업 프롬프트\u0026rdquo; section previously dumped the raw output of getFullPrompt() with whitespace-pre-wrap. That meant raw markdown-style headers (###), separator lines (===), and JSON hex arrays were all visible as plain text.\nA new renderStructuredPrompt() function was added to render the same data in a readable form:\n### headings → styled section headers in amber/sky tones === separator → \u0026lt;hr\u0026gt; element - 이미지 N: lines → badge + description list items hex_colors array → colored circle + monospace hex code pill badge The clipboard copy path still uses fullPrompt raw text, so copying is unaffected.\nNo-Text Directive and Color Palette Removal A \u0026ldquo;no-text\u0026rdquo; directive was added to injected reference prompts — explicitly instructing the model not to reproduce any text or watermarks from the reference images. Separately, the color palette dot visualization was removed from image card overlays and the detail modal. The structured hex swatches in the prompt section fill that role adequately, and the dots added visual clutter without much utility.\nCommit Log Message Changed files fix: list tone/angle category images from S3 instead of local filesystem injection.py, storage.py, generation.py fix: pin torch to CPU-only index to prevent broken CUDA deps on EC2 pyproject.toml fix: fix the injection prompt prompt.py, injection.py docs: update README to reflect recent changes README.md feat: extract dominant hex colors from tone reference images injection.py, schemas.py, api.ts, DB migration feat: structured prompt display with hex color swatches in image detail GeneratedImageDetail.tsx feat: add no-text directive for injected refs and remove color palettes prompt.py, App.tsx, GeneratedImageDetail.tsx get rid of the test folder deleted test/ Insights Make the production/dev environment gap explicit in code. After the S3 migration, the file listing code still referenced local paths. This type of bug silently passes in development and only surfaces after deployment. Using the storage abstraction (S3Storage) consistently across all callers is the right defense.\nPin CUDA-sensitive dependencies explicitly. torch can resolve to either CPU or CUDA builds depending on the environment. On a CPU-only EC2 instance, a CUDA build fails at import time. Pinning to a CPU-only index in pyproject.toml eliminates this entire class of problem — no per-instance manual intervention needed.\nSeparate raw data serialization from UI rendering. The pattern of deriving both a copy-friendly raw string and a richly structured visual representation from the same source data is clean and maintainable. Keeping getFullPrompt() intact while adding renderStructuredPrompt() alongside it is a good example of this principle.\n","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-hybrid-search-dev8/cover.jpg","permalink":"/posts/2026-04-03-hybrid-search-dev8/","title":"Hybrid Image Search Dev Log #8 — Tone/Angle S3 Migration, EC2 Deployment Fixes, Hex Color Extraction"},{"content":"Overview k-skill is an open-source curated skill collection for Claude Code built specifically for Korean users, maintained by NomaDamas. With 1,371 GitHub stars and 113 forks, the project covers tasks that are deeply embedded in Korean daily life — booking SRT and KTX trains, checking KBO baseball scores, sending KakaoTalk messages, processing HWP documents, and looking up fine dust air quality.\nIt supports Claude Code, Codex, OpenCode, and OpenClaw/ClawHub. No additional client API layer is required: skills either run directly or route through the k-skill-proxy server with plain HTTP requests.\nArchitecture: How k-skill Integrates with Claude Code The diagram below shows the full integration flow from user intent to skill execution.\nflowchart TD User[\"User\"] --\u003e CC[\"Claude Code \u0026lt;br/\u0026gt; (AI Agent)\"] CC --\u003e Skills[\"k-skill Collection\"] Skills --\u003e Auth[\"Skills Requiring Auth\"] Skills --\u003e NoAuth[\"No Auth Required\"] Skills --\u003e Proxy[\"Proxy-Routed Skills\"] Auth --\u003e SRT[\"SRT Booking\"] Auth --\u003e KTX[\"KTX Booking\"] Auth --\u003e Toss[\"Toss Securities\"] NoAuth --\u003e KBO[\"KBO Game Results\"] NoAuth --\u003e Lotto[\"Lotto Check\"] NoAuth --\u003e HWP[\"HWP Document Processing\"] NoAuth --\u003e Zip[\"Postal Code Search\"] NoAuth --\u003e KakaoTalk[\"KakaoTalk Mac CLI\"] NoAuth --\u003e Delivery[\"Package Tracking\"] Proxy --\u003e Subway[\"Seoul Subway Arrivals\"] Proxy --\u003e Dust[\"Fine Dust \u0026lt;br/\u0026gt; PM10 \u0026amp; PM2.5\"] Proxy --\u003e Coupang[\"Coupang Product Search\"] Proxy --\u003e Law[\"Korean Law Search\"] Proxy --\u003e ProxySrv[\"k-skill-proxy \u0026lt;br/\u0026gt; (self-hosted)\"]Complete Skill Inventory k-skill currently ships 18 distinct skills across five domains.\nTransportation Skill Description Auth SRT Booking Search, reserve, confirm, cancel SRT trains Required KTX Booking Full Korail booking with Dynapath anti-bot helper Required Seoul Subway Arrivals Real-time arrival info per station via k-skill-proxy Proxy URL Daily Life Skill Description Auth Fine Dust PM10/PM2.5 by current location or region fallback None Postal Code Search Official Korea Post zipcode lookup by address keyword None Package Tracking CJ Logistics and Korea Post official tracking None Blue Ribbon Restaurants Nearby Blue Ribbon Survey-rated restaurants None Nearby Bars KakaoMap-based bar info with hours, menu, seats, phone None Daiso Product Search In-store inventory check at specific Daiso branches None Used Car Prices SK Rent-a-Car Tago BUY snapshot for purchase price and monthly lease None Sports and Entertainment Skill Description Auth KBO Game Results Schedule, scores, and team filters by date None K League Results K League 1 and 2 results, standings None Lotto Check Latest draw results and number matching None Work and Documents Skill Description Auth HWP Document Processing .hwp to JSON/Markdown/HTML, image extraction None Korean Law Search Statutes, court decisions, official interpretations Local only KakaoTalk Mac CLI Read, search, and send KakaoTalk messages on macOS None Shopping and Finance Skill Description Auth Coupang Product Search Rocket Delivery filter, deals, price range via coupang-mcp None Toss Securities Account summary, portfolio, prices, orders via tossctl Required Deep Dive: KakaoTalk Mac CLI The KakaoTalk skill stands out as a particularly creative integration. It wraps kakaocli, a macOS-only CLI tool, allowing Claude Code to read conversation history and send messages directly from the terminal.\nPrerequisites brew install silver-flight-group/tap/kakaocli The terminal application must have Full Disk Access and Accessibility permissions granted in System Settings. Without Full Disk Access, even read commands will fail. Without Accessibility, send and harvest automation will not work.\nIf KakaoTalk for Mac is not installed, mas handles that too:\nbrew install mas mas account mas install 869223134 Key Commands # Verify permissions and DB access first kakaocli status kakaocli auth # List recent conversations kakaocli chats --limit 10 --json # Read recent messages from a specific chat kakaocli messages --chat \u0026#34;Jisoo\u0026#34; --since 1d --json # Search across all conversations kakaocli search \u0026#34;meeting\u0026#34; --json # Test send to yourself (safe) kakaocli send --me _ \u0026#34;test message\u0026#34; # Dry-run to preview without sending kakaocli send --dry-run \u0026#34;Team Announcements\u0026#34; \u0026#34;Meeting at 3pm today\u0026#34; The safety design is worth noting. The skill workflow mandates a --dry-run preview before sending to anyone other than yourself, and actual dispatch requires explicit user confirmation. This prevents the AI agent from autonomously firing off messages — a sound default for any messaging automation.\nInstallation Flow The standard setup follows three steps:\nFollow docs/install.md to install all skills (Node.js and Python packages are both involved; global install is the default) Run the k-skill-setup skill to verify credentials and environment variables Read each feature doc to understand expected inputs, examples, and limitations Skills that require authentication (SRT, KTX, Toss Securities) follow a documented credential resolution order defined in docs/setup.md. Secret storage rules and prohibited patterns are captured in docs/security-and-secrets.md, with standardized environment variable names to avoid conflicts.\nThe k-skill-proxy is a self-hostable proxy server for skills that need to reach public APIs (Seoul subway, fine dust, Coupang, Korean law). The proxy removes the need to configure API keys on the client side for those services.\nWhy k-skill Matters The core problem k-skill addresses is straightforward: Korea\u0026rsquo;s internet ecosystem runs on a parallel set of platforms — KakaoTalk instead of iMessage or Slack, Korail and SRT instead of Amtrak, HWP files instead of Word or Google Docs, Coupang instead of Amazon. Global AI tooling is built around global services. None of these Korean platforms get first-class support out of the box.\nk-skill fills that gap by packaging the knowledge of how to interact with each of these Korean-specific surfaces into reusable Claude Code skills. The approach is deliberately pragmatic: where a reliable MCP server exists (like coupang-mcp or korean-law-mcp), k-skill routes through it. Where it does not, the skill talks to official public interfaces directly or through a proxy.\nThe project itself is a solid piece of open-source engineering — multi-runtime (JavaScript + Python + Shell), versioned with npm Changesets, CI/CD on GitHub Actions, and a clear separation between skill logic and secret management. For Korean developers working with Claude Code, it is the most practical starting point for automating the parts of daily life that generic AI agents simply cannot reach.\nGitHub: NomaDamas/k-skill Stars: 1,371 | Forks: 113 Primary language: JavaScript (Python and Shell also present) ","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-k-skill-korean-claude-code/cover.jpg","permalink":"/posts/2026-04-03-k-skill-korean-claude-code/","title":"k-skill: A Korean-Specific Skill Collection for Claude Code"},{"content":"Overview Previous post: Log-Blog Dev Log #5\nIf #5 was about implementing Firecrawl deep docs and the bilingual publishing pipeline, #6 is about tying up the loose ends that followed. After restructuring the blog into content/ko/posts/ and content/en/posts/, new users still couldn\u0026rsquo;t create this structure from scratch — the setup skill needed expanding. In parallel, real-world usage revealed an AI chat CDP navigation race condition that needed a retry fix, a Perplexity noise URL slipping through the classifier, and the plugin itself needed migrating from global to marketplace-based installation. Version bumped from 0.2.0 to 0.2.1.\ngraph TD A[\"log-blog #6 Changes\"] --\u003e B[\"Bilingual Setup Skill\"] A --\u003e C[\"CDP Reliability Fix\"] A --\u003e D[\"Plugin Marketplace Migration\"] A --\u003e E[\"README Documentation\"] B --\u003e B1[\"Phase 3A: Multi-language Hugo \u0026lt;br/\u0026gt; languages: block generation\"] B --\u003e B2[\"Phase 3B: Existing blog \u0026lt;br/\u0026gt; missing languages: detection\"] B --\u003e B3[\"publisher --language routing\"] B --\u003e B4[\"post_advisor: deduplication\"] C --\u003e C1[\"CDP navigation retry \u0026lt;br/\u0026gt; (race condition fix)\"] C --\u003e C2[\"Perplexity /search/new \u0026lt;br/\u0026gt; noise filter\"] C --\u003e C3[\"Actionable error messages\"] D --\u003e D1[\"0.2.0: Bilingual features\"] D --\u003e D2[\"0.2.1: CDP fix\"] Bilingual Hugo Setup Skill Expansion Background After #5 restructured the blog repo into content/ko/posts/ and content/en/posts/ and published 12 bilingual posts, there was a gap: /logblog:setup still only knew how to create a single-language content/posts/ layout. New users installing the plugin couldn\u0026rsquo;t bootstrap the bilingual workflow from scratch.\nImplementation — Phase 3A: New Blog Multilingual Setup The setup skill\u0026rsquo;s question flow was redesigned. During Hugo site generation, it now asks three things:\nBlog name Primary language (en/ko, default: en) Multi-language support? — If yes, which languages (e.g., en,ko) When multilingual is selected, the skill generates a proper Hugo languages: block in hugo.yaml:\nlanguages: en: languageName: English weight: 1 contentDir: content/en menu: main: [] social: [] ko: languageName: 한국어 weight: 2 contentDir: content/ko menu: main: [] social: [] It also creates per-language content directories and initial posts. Both content/en/posts/hello-world.md and content/ko/posts/hello-world.md are created with matching filenames — Hugo automatically links translations by filename.\nImplementation — Phase 3B: Existing Blog Migration Detection A trickier case is when language directories already exist but the Hugo config is missing the languages: block. Without it, Hugo silently ignores the language-specific directories and the language switcher doesn\u0026rsquo;t work.\nSetup skill Step 2.5 now detects this:\nls -d \u0026#34;{path}/content/ko/posts\u0026#34; \u0026#34;{path}/content/en/posts\u0026#34; 2\u0026gt;/dev/null grep -c \u0026#34;^languages:\u0026#34; \u0026#34;{path}/hugo.yaml\u0026#34; 2\u0026gt;/dev/null If directories exist but languages: is absent, the skill warns the user and offers to add it — preserving all existing settings while injecting just the languages: section.\nPublisher and post_advisor Integration Alongside the setup skill, publisher.py gained a --language parameter. When passed, it looks up the matching path in config.yaml\u0026rsquo;s language_content_dirs mapping:\ncontent_dir = config.blog.content_path_for(language) post_advisor.py was also updated. Previously it only scanned the single content_dir. Now it scans all paths in language_content_dirs, deduplicating by filename. This fixes the scan command showing only one language\u0026rsquo;s posts on a bilingual blog.\nAI Chat CDP Reliability Improvements Problem: CDP Navigation Race Condition When running Chrome via uv run log-blog chrome-cdp with existing tabs open, Playwright intermittently hit a \u0026ldquo;navigation interrupted\u0026rdquo; error when opening a new page and navigating to a URL. The cause is a Chrome event race between existing tabs and the newly created page.\nBefore the fix, the code made a single attempt and returned None on failure:\nawait page.goto(url, wait_until=\u0026#34;domcontentloaded\u0026#34;, timeout=timeout_ms) Fix: Retry Logic Added a _NAV_RETRIES = 2 constant and retry logic that only triggers on \u0026ldquo;interrupted\u0026rdquo; in the error message — not on timeouts or network errors:\n_NAV_RETRIES = 2 # retry count for CDP navigation race conditions for attempt in range(_NAV_RETRIES + 1): try: await page.goto(url, wait_until=\u0026#34;domcontentloaded\u0026#34;, timeout=timeout_ms) last_err = None break except Exception as nav_err: last_err = nav_err if attempt \u0026lt; _NAV_RETRIES and \u0026#34;interrupted\u0026#34; in str(nav_err).lower(): logger.debug(\u0026#34;CDP navigation interrupted (attempt %d), retrying\u0026#34;, attempt + 1) await page.wait_for_timeout(500) else: raise The narrow condition (only \u0026ldquo;interrupted\u0026rdquo; triggers retry) is intentional — retrying on timeouts would double latency on slow networks.\nPerplexity Noise Filter Perplexity browsing history included both real conversation URLs (perplexity.ai/search/...) and the new-search landing page (perplexity.ai/search/new). The landing page has no conversation content, but the old classifier tagged it as ai_chat_perplexity and triggered a CDP fetch attempt.\nOne line added to _AI_NOISE_PATTERNS:\nre.compile(r\u0026#34;perplexity\\.ai/search/new(?:[?#]|$)\u0026#34;), # \u0026#34;new search\u0026#34; landing page Improved Error Messages CDP fetch failures previously logged a bare \u0026quot;AI chat fetch failed for URL: error\u0026quot; message — not actionable. Both ai_chat_fetcher.py and content_fetcher.py now surface the remedy:\nlogger.warning( \u0026#34;AI chat fetch failed for %s (%s): %s. \u0026#34; \u0026#34;Ensure Chrome is running with: uv run log-blog chrome-cdp\u0026#34;, url, service, e, ) Plugin Marketplace Migration Background: The Version-String Trap Previously the plugin was installed directly at ~/.claude/plugins/logblog/. The update mechanism compared version strings in plugin.json. If the version string isn\u0026rsquo;t bumped, /plugin reports \u0026ldquo;already at the latest version\u0026rdquo; — even if 15 commits of new features landed.\nThat\u0026rsquo;s exactly what happened after #5: Firecrawl, bilingual support, and skill updates were all deployed but the version stayed at \u0026quot;0.1.0\u0026quot;. After discovering this, the plugin was migrated to marketplace-based installation at ~/.claude/plugins/marketplaces/logblog/, with explicit version management.\nVersion Scheme: 0.2.0 then 0.2.1 0.2.0 — Firecrawl deep docs, bilingual blog support, setup skill multilingual expansion, publisher --language routing. New features warrant a minor version bump.\n0.2.1 — CDP reliability fix and Perplexity noise filter. Bug fixes, so patch increment.\nThe marketplace.json plugin entry was updated to reflect the latest version info.\nREADME Documentation The README received a substantial update documenting features that existed in code but not in writing:\nBilingual workflow: End-to-end flow — write Korean post, translate to English, deploy both to content/{lang}/posts/ Firecrawl integration: Using --deep flag for full documentation site crawling Dev Log mode: How to generate dev log posts from session data via the skill AI chat fetching: Running chrome-cdp to start Chrome with CDP, per-service auth_profile configuration in config.yaml Commit Log Message Changes docs: update README with bilingual workflow, Firecrawl, dev logs, and AI chat features +31 -7 chore: bump plugin version to 0.2.0 +1 -1 feat: add multi-language Hugo setup to setup skill and publisher +226 -26 chore: bump plugin version to 0.2.1 +1 -1 fix: improve AI chat CDP reliability and Perplexity noise filter +30 -3 Insights This session was a classic \u0026ldquo;built the feature, infrastructure didn\u0026rsquo;t keep up\u0026rdquo; pattern. #5 created a bilingual blog by manually restructuring the repo, but the setup skill still produced single-language blogs. Features and their corresponding onboarding experience need to stay synchronized.\nThe CDP race condition is the hardest kind of bug to catch — it\u0026rsquo;s timing-dependent and doesn\u0026rsquo;t reproduce consistently. The narrow retry trigger (only on \u0026ldquo;interrupted\u0026rdquo;) turned out to be the right call. Retrying on all errors would mask real problems and add latency on slow networks for no benefit.\nPlugin version management looks simple but directly determines whether users receive updates. Without a version bump, new features are invisible to existing installs. The marketplace migration makes this process more explicit and visible.\nThere\u0026rsquo;s a pleasing meta quality to log-blog being documented by log-blog. The friction discovered while writing a post motivates the next commit, and that commit becomes the content for the next post.\n","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-log-blog-dev6/cover.jpg","permalink":"/posts/2026-04-03-log-blog-dev6/","title":"Log-Blog Dev Log #6 — Bilingual Setup, CDP Reliability, Marketplace Migration"},{"content":"Overview Reaching for media controls on Android typically means pulling down the notification shade or switching apps entirely. MediaFloat takes a different approach: a compact, draggable overlay bar showing Previous, Play/Pause, and Next stays visible above every app, always within reach.\nBuilt in Kotlin with Jetpack Compose, targeting Android 10+, and released under Apache License 2.0, MediaFloat is a focused single-purpose tool. The source lives at Leuconoe/MediaFloat.\nCore Architecture MediaFloat combines three Android system capabilities to deliver its persistent overlay:\nflowchart TD A[\"Media App\u0026lt;br/\u0026gt;(YouTube, Spotify, etc.)\"] --\u003e|\"Publishes MediaSession\"| B[\"NotificationListenerService\u0026lt;br/\u0026gt;(Detects active media sessions)\"] B --\u003e|\"Playback state \u0026amp; transport actions\"| C[\"ForegroundService\u0026lt;br/\u0026gt;(Overlay runtime)\"] C --\u003e|\"WindowManager overlay\"| D[\"Compose UI\u0026lt;br/\u0026gt;(Floating control bar)\"] D --\u003e|\"Previous / Play / Next\"| B E[\"User Settings\u0026lt;br/\u0026gt;(Main / Settings / Advanced)\"] --\u003e|\"Position, size, theme\"| CThe Three Permission Pillars Permission or Access Role SYSTEM_ALERT_WINDOW Draws the floating bar above all other apps FOREGROUND_SERVICE + FOREGROUND_SERVICE_SPECIAL_USE Keeps the overlay runtime alive in the background POST_NOTIFICATIONS Required foreground-service notification (Android 13+) Notification listener access Reads active MediaSession state and transport actions Android Overlay: How It Works SYSTEM_ALERT_WINDOW — labeled \u0026ldquo;Display over other apps\u0026rdquo; in Android settings — lets an app insert views into the system window layer via WindowManager.addView(). This sits above the normal app window hierarchy, which is why the overlay remains visible regardless of what the user is doing.\nMediaFloat pairs this with Jetpack Compose. Rather than inflating XML layouts into the overlay window, a ComposeView is embedded into the WindowManager-managed surface. This gives the floating bar the full expressive power of Material 3 Compose components while keeping it lightweight.\nWhy a Foreground Service Is Non-Negotiable Android aggressively kills background processes to preserve battery. Any UI component that must persist when the host app is backgrounded needs to run inside a Foreground Service:\nThe service must post a user-visible notification — the cost of keeping the overlay alive Android 13+ requires POST_NOTIFICATIONS to show that notification The FOREGROUND_SERVICE_SPECIAL_USE type specifically covers non-standard foreground service use cases like screen overlays NotificationListenerService: The Media Session Bridge Media apps publish playback state through Android\u0026rsquo;s MediaSession API. NotificationListenerService gives MediaFloat a system-level subscription to those sessions. Once a session is detected, MediaController handles the transport commands — Previous, Play/Pause, Next — dispatched back to whatever media app is active.\nThis architecture means MediaFloat works identically with Spotify, YouTube, podcast apps, or any app that exposes a MediaSession. No app-specific integration required.\nApp Structure: Single Module, Five Surfaces MediaFloat deliberately stays as a single-module Android app. The README explicitly calls this out as a way to keep setup, runtime behavior, and recovery paths understandable.\nThe Five App Surfaces flowchart LR Main[\"Main\u0026lt;br/\u0026gt;Start / stop overlay\u0026lt;br/\u0026gt;Readiness check\"] Settings[\"Settings\u0026lt;br/\u0026gt;Buttons, size presets\u0026lt;br/\u0026gt;Opacity, behavior\"] Advanced[\"Advanced\u0026lt;br/\u0026gt;Language, theme\u0026lt;br/\u0026gt;Sidebar, persistent mode\"] Support[\"Support\u0026lt;br/\u0026gt;Setup guidance\u0026lt;br/\u0026gt;Version, license\"] Debug[\"Debug\u0026lt;br/\u0026gt;Runtime diagnostics\u0026lt;br/\u0026gt;Transport commands\"] Main --- Settings Settings --- Advanced Advanced --- Support Support --- DebugThe Debug surface stands out: it exposes runtime readiness inspection, media session diagnostics, direct transport command sending, log clearing, and a recent events view. Shipping developer tooling inside the release build — behind an Advanced setting toggle — is a practical pattern for overlay apps where permission state and service lifecycle are inherently hard to observe from the outside.\nReadiness Checks and Fault Recovery MediaFloat models its startup preconditions explicitly. Before the overlay can run, three conditions must hold:\nOverlay access granted (SYSTEM_ALERT_WINDOW) Notification listener access granted Notification posting permitted (Android 13+) If any condition is missing, the app surfaces shortcuts directly to the relevant Android settings screen rather than showing a generic error. This is the kind of detail that separates a polished overlay app from a frustrating one — Android\u0026rsquo;s permission model is multi-step, and guiding the user through each gate matters.\nAutomation Integration MediaFloat exposes an exported intent action:\nsw2.io.mediafloat.action.SHOW_OVERLAY This lets external automation tools — Tasker, MacroDroid, Android Shortcuts, Bixby Routines — trigger the overlay flow without opening the app UI. The launcher shortcut set also exposes both Launch widget and Stop widget as pinnable home-screen shortcuts via ShortcutManager.\nIf the readiness preconditions are not met when the action fires, the app falls back to the main UI so the user can complete setup.\nMulti-Language Support v0.2.1 uses the AppCompat app-language API, which provides per-app locale selection on Android 13+ and graceful fallback on older supported versions. Shipped languages: System default, English, Korean, Chinese, Japanese, Spanish, and French.\nThe language picker lives in Advanced; the current active language is reflected in Support. This is the correct pattern for in-app language switching without requiring a system-level locale change.\nWhat v0.2.1 Intentionally Omits The README is upfront about current constraints:\nNo freeform resizing — only built-in size presets Single horizontal control family — no alternative button arrangements Button combinations limited to Previous / Play·Pause / Next layouts Overlay behavior depends on Android permission state and an active MediaSession being available \u0026ldquo;Intentionally constrained\u0026rdquo; is the phrase used, reflecting a design philosophy that prioritizes stability and comprehensibility over feature breadth. Recent commits point toward v0.3.0 with thumbnail support and sidebar spacing refinements already merged.\nTech Stack Summary Item Detail Language Kotlin UI Framework Jetpack Compose + Material 3 Target Platform Android 10+ Build System Gradle License Apache License 2.0 Key Android APIs SYSTEM_ALERT_WINDOW, ForegroundService, NotificationListenerService, MediaController, ShortcutManager Takeaways MediaFloat is a clean reference implementation for the Android floating overlay pattern. The combination of SYSTEM_ALERT_WINDOW + Foreground Service + NotificationListenerService is the standard three-part recipe for any persistent, system-level UI that needs to respond to media state — and MediaFloat keeps each piece clearly separated.\nA few implementation choices worth noting for anyone building similar apps:\nUsing Jetpack Compose inside a WindowManager overlay surface is increasingly the right default over XML-inflated views The exported automation action (SHOW_OVERLAY) is a low-cost way to make a utility app composable in user workflows Shipping Debug tooling inside the app — gated behind an Advanced toggle — is the right call for anything involving Android permissions and service lifecycle, where external observability is limited Build it yourself with ./gradlew installDebug after cloning the repository. Release signing is documented in keystore.properties.example.\n","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-mediafloat-android/cover.jpg","permalink":"/posts/2026-04-03-mediafloat-android/","title":"MediaFloat: Anatomy of an Android Floating Media Control Overlay"},{"content":"Previous: PopCon Dev Log #1\nOverview Today\u0026rsquo;s session split between polishing the \u0026ldquo;outside\u0026rdquo; and hardening the \u0026ldquo;inside\u0026rdquo; of PopCon. The morning was branding — turning a Gemini-generated image into logo and favicon assets, then writing a GitHub-ready README. The afternoon turned into a Docker debugging session that led to pipeline quality improvements, finishing with retry logic and per-emoji error handling.\n1. Logo \u0026amp; Favicon — From Gemini Image to Brand Assets The first task was converting a 2880×1440 Gemini-generated image into PopCon brand assets. The image was center-cropped to 1:1 then resized into multiple sizes.\nFile Size Purpose logo.png 512×512 Header logo favicon.ico 16/32/48 multi-size Browser tab icon favicon-16x16.png 16×16 Small favicon favicon-32x32.png 32×32 Standard favicon apple-touch-icon.png 180×180 iOS home screen icon-192.png / icon-512.png 192×192 / 512×512 PWA icons Why the Favicon Wasn\u0026rsquo;t Showing in Docker Two issues stacked on each other.\nNext.js App Router priority: app/favicon.ico takes precedence over public/favicon.ico. A default favicon was already sitting in app/ and needed to be replaced there. Docker image baking: The Dockerfile uses COPY . . at build time. Changing files on disk has no effect until the container is rebuilt. docker compose build frontend \u0026amp;\u0026amp; docker compose up -d frontend Confirmed with curl -I localhost:3000/favicon.ico returning HTTP 200.\n2. Full Product README Next came writing a README that reads like a product landing page rather than a bare technical doc.\nThe first version put both English and Korean in a single file. Feedback: \u0026ldquo;the two languages aren\u0026rsquo;t distinguishable.\u0026rdquo; Split them into separate files with language toggle links at the top of each.\nREADME.md — English, with English | [한국어](README.ko.md) at the top README.ko.md — Korean, with [English](README.md) | 한국어 at the top README vs. Reality After the first commit, reviewing the actual code revealed several discrepancies.\nItem What the README said What the code does Image generation model Google Imagen Gemini Flash Image VEO mode Dual-frame I2V Start frame + motion prompt only (API limitation) Video duration \u0026ldquo;under 4 seconds\u0026rdquo; Exactly 4s (API minimum), trimmed in post-processing Preprocessing step Not mentioned Crop → square pad → resize to 512×512 Job persistence Not mentioned Redis with 24-hour TTL Missing endpoint Not listed /api/job/{job_id}/emoji/{filename} Both README files updated and pushed.\n3. Docker Debugging — Fighting the API Key The afternoon session opened with Docker logs showing nothing working. The root cause was a trailing venv appended to the API key in .env.\nPOPCON_GOOGLE_API_KEY=AIzaSy...-mAcuv venv Likely copied from a terminal where the venv activation command ran next. Removed the trailing text, restarted — but the key itself turned out to be expired too. Generated a fresh one from Google AI Studio.\nRunning the pipeline for real surfaced a series of quality problems.\nIssues Found and Fixed flowchart TD A[\"Pipeline run\"] --\u003e B[\"Review output\"] B --\u003e C[\"Problems found\"] C --\u003e D[\"Cloud artifacts \u0026lt;br/\u0026gt; in background\"] C --\u003e E[\"Duplicate characters \u0026lt;br/\u0026gt; sprite sheets\"] C --\u003e F[\"Checkerboard background \u0026lt;br/\u0026gt; NB2 fake transparency\"] C --\u003e G[\"VEO edge lines \u0026lt;br/\u0026gt; left/right borders\"] D --\u003e H[\"Remove rembg entirely \u0026lt;br/\u0026gt; brightness-based crop instead\"] E --\u003e I[\"Prompt: single character only \u0026lt;br/\u0026gt; no sticker sheets\"] F --\u003e J[\"Prompt: #FFFFFF background \u0026lt;br/\u0026gt; NOT checkerboard\"] G --\u003e K[\"ffmpeg 2% edge crop\"]The biggest decision was removing rembg entirely. Every attempt at background removal made things worse — isnet-general-use left cloud artifacts, u2net wasn\u0026rsquo;t better. Instead: prompt VEO to generate white backgrounds, then use brightness-based cropping to extract content.\n# processor.py — brightness-based content detection brightness = arr.astype(float).mean(axis=2) content_mask = (brightness \u0026gt; 10) \u0026amp; (brightness \u0026lt; 245) Removing rembg[cpu]\u0026gt;=2.0.0 from pyproject.toml and replacing it with numpy\u0026gt;=1.26.0 also slimmed the Docker image.\nLINE File Naming Fix Checking the LINE Creators Market guidelines: files must be named 001.png through 040.png. The code was saving them by action name.\n# packager.py for i, emoji_path in enumerate(emoji_paths): line_name = f\u0026#34;{i + 1:03d}.png\u0026#34; zf.write(emoji_path, line_name) 4. Retry Logic and API Throttling The final commit focused on resilience. Generating a full 24-emoji set means many API calls to VEO and Gemini — and they occasionally return 503 or 429. Previously, one failure killed the whole job.\nPer-Emoji Error Handling The fix wraps each emoji\u0026rsquo;s pipeline stages in a try/except, marks failures with \u0026quot;error\u0026quot; status, and continues with the rest.\n# worker.py — per-emoji error handling failed_indices = set() for i, action in enumerate(actions): try: # ... pose generation, animation, post-processing except Exception as e: logger.error(f\u0026#34;Emoji {i} ({action.name}) failed: {e}\u0026#34;) failed_indices.add(i) status.results[i].status = \u0026#34;error\u0026#34; save_job(status) A new \u0026quot;done_with_errors\u0026quot; job status means the ZIP is still available even if some emojis failed.\nAPI Retry with Exponential Backoff Both the Gemini Image and VEO calls now retry up to three times on transient errors.\n# pose_generator.py — retry logic async def _generate_image(self, prompt, reference_image_path=None, max_retries=3): for attempt in range(max_retries): try: response = await asyncio.to_thread( self.client.models.generate_content, ... ) return ... except (ServerError, ClientError) as e: if attempt == max_retries - 1: raise wait = 2 ** attempt # 1s, 2s, 4s logger.warning(f\u0026#34;Attempt {attempt+1} failed, retrying in {wait}s: {e}\u0026#34;) await asyncio.sleep(wait) Type System Sync Status types updated across backend and frontend.\nLayer Change backend/models.py Added \u0026quot;error\u0026quot; to EmojiStatus, \u0026quot;done_with_errors\u0026quot; to JobStatusType frontend/lib/api.ts Mirrored the same status union types frontend/components/ProgressTracker.tsx Red card UI for \u0026quot;error\u0026quot; status emojis frontend/components/EmojiPreview.tsx Show ZIP download button on \u0026quot;done_with_errors\u0026quot; too Summary The four commits today in order:\n# Work done 1 Logo/favicon assets, branding, full product README 2 Split README into English and Korean with language toggle 3 Updated READMEs to match actual pipeline behavior 4 Retry logic, per-emoji error handling, API throttling The Docker debugging session was unexpectedly productive — it forced a real pipeline run that surfaced quality issues, and the decision to remove rembg entirely turned out to be the right call. Less code, smaller Docker image, cleaner output.\nNext up: final quality validation before submitting to LINE Creators Market.\n","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-popcon-dev2/cover.jpg","permalink":"/posts/2026-04-03-popcon-dev2/","title":"PopCon Dev Log #2 — Branding, README, Docker Debugging, and Retry Logic"},{"content":"Overview Pylette is a Python library that extracts representative color palettes from images. It supports two algorithms — K-Means clustering and Median Cut — and can be used via both a command-line interface and a Python API. With 164 stars and 16 forks it is a modest open-source project, but its design is clean and practical. This post analyzes the library architecture with a focus on the color.py source file and the Color class implementation.\nColor Extraction Pipeline Pylette\u0026rsquo;s internal processing breaks down into three stages: image loading, algorithm application, and Color object construction.\nflowchart TD A[\"Image Input \u0026lt;br/\u0026gt; (file, URL, directory)\"] --\u003e B[\"Load with Pillow \u0026lt;br/\u0026gt; Alpha masking\"] B --\u003e C{\"Extraction Algorithm\"} C --\u003e|\"KMeans\"| D[\"K-Means Clustering \u0026lt;br/\u0026gt; via scikit-learn\"] C --\u003e|\"MedianCut\"| E[\"Median Cut \u0026lt;br/\u0026gt; color-space partitioning\"] D --\u003e F[\"Cluster centroids → Color objects\"] E --\u003e F F --\u003e G[\"Build Palette \u0026lt;br/\u0026gt; normalize frequencies\"] G --\u003e H{\"Output form\"} H --\u003e|\"CLI\"| I[\"Rich table display\"] H --\u003e|\"Python API\"| J[\"Palette object returned\"] H --\u003e|\"--export-json\"| K[\"JSON file saved\"]Color Class Deep Dive Pylette/src/color.py is the core data structure for the entire library. In 106 lines it handles all color representation and conversion logic.\nInitialization and RGBA Handling class Color(object): def __init__(self, rgba: tuple[int, ...], frequency: float): assert len(rgba) == 4, \u0026#34;RGBA values must be a tuple of length 4\u0026#34; *rgb, alpha = rgba self.rgb = cast(tuple[int, int, int], rgb) self.rgba = rgba self.a = alpha self.freq: float = frequency self.weight = alpha / 255.0 Two things stand out here. First, *rgb, alpha = rgba unpacks the RGBA tuple in a single starred assignment — idiomatic Python. Second, self.weight = alpha / 255.0 normalizes the alpha channel to the 0–1 range, which feeds into the alpha_mask_threshold filtering logic that excludes transparent pixels from the extraction.\nColor Space Conversion Properties @property def hsv(self) -\u0026gt; tuple[float, float, float]: return colorsys.rgb_to_hsv( r=self.rgb[0] / 255, g=self.rgb[1] / 255, b=self.rgb[2] / 255 ) @property def hls(self) -\u0026gt; tuple[float, float, float]: return colorsys.rgb_to_hls( r=self.rgb[0] / 255, g=self.rgb[1] / 255, b=self.rgb[2] / 255 ) HSV and HLS conversions delegate to Python\u0026rsquo;s standard library colorsys module, keeping external dependencies minimal. Declaring them as @property means callers write color.hsv and color.hls as attribute accesses. Internally, RGB values are normalized to 0–1 before conversion.\nLuminance Calculation luminance_weights = np.array([0.2126, 0.7152, 0.0722]) @property def luminance(self) -\u0026gt; float: return np.dot(luminance_weights, self.rgb) The weights [0.2126, 0.7152, 0.0722] are the ITU-R BT.709 standard coefficients for sRGB luminance. They reflect the human visual system\u0026rsquo;s sensitivity: the eye is most sensitive to green (0.7152) and least sensitive to blue (0.0722). The --sort-by luminance CLI option uses this value to order the extracted palette.\nComparison Operator and Sorting def __lt__(self, other: \u0026#34;Color\u0026#34;) -\u0026gt; bool: return self.freq \u0026lt; other.freq Only __lt__ is implemented — Python\u0026rsquo;s sorted() and list.sort() only require this method to function. There is no need for functools.total_ordering when only frequency-based sorting matters. When --sort-by luminance is selected on the CLI, the palette is re-sorted using the luminance property instead.\nColor Space Dispatch Accessor def get_colors( self, colorspace: ColorSpace = ColorSpace.RGB ) -\u0026gt; tuple[int, ...] | tuple[float, ...]: colors = { ColorSpace.RGB: self.rgb, ColorSpace.HSV: self.hsv, ColorSpace.HLS: self.hls } return colors[colorspace] This is the dictionary dispatch pattern. Instead of an if/elif chain, a dict maps each ColorSpace enum member to the corresponding property value. ColorSpace is defined as an enum in a separate types.py. The return type tuple[int, ...] | tuple[float, ...] reflects the fact that RGB returns integers while HSV and HLS return floats.\nExtraction Algorithm Comparison Criterion K-Means Median Cut Approach Iterative centroid search Recursive color-space partitioning Result Statistically representative colors Balanced color distribution Speed Requires iterative convergence Deterministic, faster Default Yes No Best for Complex gradients and photos Simple block-color images K-Means converges iteratively but better reflects the actual distribution of colors in the image. Median Cut is deterministic — the same image always produces the same palette — which is useful when reproducibility matters.\nUsage Examples CLI # Default: 5 colors, K-Means, RGB pylette image.jpg # 8 colors in HSV colorspace, export to JSON pylette photo.png --n 8 --colorspace hsv --export-json --output colors.json # Median Cut with transparent image handling pylette logo.png --mode MedianCut --alpha-mask-threshold 128 # Batch process with parallel workers pylette images/*.png --n 6 --num-threads 4 Sample output:\n✓ Extracted 5 colors from sunset.jpg ┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Hex ┃ RGB ┃ Frequency┃ ┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ #FF6B35 │ (255, 107, 53) │ 28.5% │ │ #F7931E │ (247, 147, 30) │ 23.2% │ │ #FFD23F │ (255, 210, 63) │ 18.7% │ │ #06FFA5 │ (6, 255, 165) │ 15.4% │ │ #4ECDC4 │ (78, 205, 196) │ 14.2% │ └──────────┴─────────────────┴──────────┘ Python API from Pylette import extract_colors palette = extract_colors(image=\u0026#39;image.jpg\u0026#39;, palette_size=8) for color in palette.colors: print(f\u0026#34;RGB: {color.rgb}\u0026#34;) print(f\u0026#34;Hex: {color.hex}\u0026#34;) print(f\u0026#34;HSV: {color.hsv}\u0026#34;) print(f\u0026#34;Luminance: {color.luminance:.2f}\u0026#34;) print(f\u0026#34;Frequency: {color.freq:.2%}\u0026#34;) # Export to JSON palette.to_json(filename=\u0026#39;palette.json\u0026#39;, colorspace=\u0026#39;hsv\u0026#39;) Batch Processing from Pylette import batch_extract_colors results = batch_extract_colors( images=[\u0026#39;image1.jpg\u0026#39;, \u0026#39;image2.png\u0026#39;, \u0026#39;image3.jpg\u0026#39;], palette_size=8, max_workers=4, mode=\u0026#39;KMeans\u0026#39; ) for result in results: if result.success and result.palette: print(f\u0026#34;✓ {result.source}: {len(result.palette.colors)} colors\u0026#34;) result.palette.export(f\u0026#34;{result.source}_palette\u0026#34;) Position in the Python Image Processing Ecosystem Pylette composes Pillow (image loading), NumPy (array operations), and scikit-learn (K-Means) to solve the narrow problem of color extraction. Compared with similar tools:\ncolorgram.py: The closest competitor. Simpler API but lacks color space conversion support and JSON export. sklearn.cluster.KMeans directly: More flexible, but you must build the entire image processing pipeline yourself. PIL.Image.quantize: Median Cut based, but produces no palette metadata — no frequency information, no color space conversion. Pylette\u0026rsquo;s strengths are its dual CLI/API interface, transparent image support, built-in color space conversion, and structured JSON export. Its weaknesses are lack of GPU acceleration and potential slowness on very large images.\nDesign Lessons Several Pythonic patterns in the Color class are worth noting:\nLazy @property computation: HSV, HLS, hex, and luminance are all computed only when accessed. There is no caching, but since Color objects are used immutably this is not a problem in practice.\nDictionary dispatch in get_colors(): Adding a new color space requires adding a single dict entry rather than modifying an elif chain.\nStandard library first: Using colorsys for color space conversion avoids an external dependency entirely.\nPrecise type hints: The tuple[int, ...] | tuple[float, ...] return type accurately captures the difference between integer RGB and float HSV/HLS values, which is information a type checker can use.\nSummary Pylette solves the well-defined problem of \u0026ldquo;extract representative colors from an image\u0026rdquo; with a clean interface. The Color class packs RGB, HSV, HLS, hex, luminance, and frequency into 106 lines as a self-contained data structure. It is a practical, ready-to-use library for real tasks such as design system color extraction, image classification, and visualization tooling.\nGitHub: qTipTip/Pylette Docs: qtiptip.github.io/Pylette PyPI: pip install Pylette ","date":"2026-04-03T00:00:00+09:00","image":"/images/posts/2026-04-03-pylette-color-extraction/cover.jpg","permalink":"/posts/2026-04-03-pylette-color-extraction/","title":"Pylette: Analyzing a Python Color Palette Extraction Library"},{"content":"Overview The AI image generation space is evolving rapidly. Beyond simple text-to-image, the entire stack is being reorganized — from layer decomposition and real-time editing to video generation and multimodal serving. This post analyzes four notable recent projects.\nQwen-Image-Layered — Decomposes images into RGBA layers, building editability in from the start Nano Banana 2 — Based on Gemini 3.1 Flash, delivering Pro-level quality at Flash speed Veo 3.1 — Video generation with sound, reference image-based style guidance vLLM-Omni — Unifying text/image/audio/video into a single serving framework How these technologies combine in the PopCon project is covered in PopCon Dev Log #1.\nAI Image Pipeline Architecture The current AI image generation ecosystem can be organized into a single pipeline as follows.\ngraph LR A[\"텍스트 프롬프트\"] --\u003e B[\"이미지 생성 모델\u0026lt;br/\u0026gt;Qwen-Image / Gemini\"] B --\u003e C[\"정적 이미지\"] C --\u003e D[\"레이어 분해\u0026lt;br/\u0026gt;Qwen-Image-Layered\"] D --\u003e E[\"RGBA 레이어별 편집\"] C --\u003e F[\"동영상 생성\u0026lt;br/\u0026gt;Veo 3.1\"] F --\u003e G[\"사운드 포함 동영상\"] B --\u003e H[\"서빙 인프라\u0026lt;br/\u0026gt;vLLM-Omni\"] H --\u003e I[\"API Endpoint\u0026lt;br/\u0026gt;OpenAI 호환\"] E --\u003e J[\"최종 에셋\u0026lt;br/\u0026gt;PopCon 이모지\"] G --\u003e JThe key point is that a clear three-stage structure of generation -\u0026gt; decomposition/editing -\u0026gt; serving is emerging. Let\u0026rsquo;s look at the tools at each stage.\nQwen-Image-Layered — Building Editability Through Layer Decomposition Item Details GitHub QwenLM/Qwen-Image-Layered Stars 1,741 Language Python License Apache 2.0 Paper arXiv:2512.15603 Core Idea Traditional image editing has been dominated by mask-based inpainting. Qwen-Image-Layered takes a different approach by decomposing images into multiple RGBA layers from the start. It\u0026rsquo;s essentially AI performing Photoshop\u0026rsquo;s layer concept automatically.\nArchitecture Analysis Base model: Diffusion model fine-tuned on top of Qwen2.5-VL Pipeline: QwenImageLayeredPipeline (HuggingFace diffusers integration) Output format: RGBA PNG layers + PSD/PPTX export support Inference settings: num_inference_steps=50, true_cfg_scale=4.0, 640 resolution recommended from diffusers import QwenImageLayeredPipeline import torch pipeline = QwenImageLayeredPipeline.from_pretrained(\u0026#34;Qwen/Qwen-Image-Layered\u0026#34;) pipeline = pipeline.to(\u0026#34;cuda\u0026#34;, torch.bfloat16) inputs = { \u0026#34;image\u0026#34;: image, \u0026#34;layers\u0026#34;: 4, # Number of layers to decompose into (variable) \u0026#34;resolution\u0026#34;: 640, \u0026#34;cfg_normalize\u0026#34;: True, } output = pipeline(**inputs) Notable Design Patterns Variable layer count: Decompose into as many layers as desired — 3, 8, or more. Recursive decomposition is also supported, enabling \u0026ldquo;infinite decomposition\u0026rdquo; where a single layer is further decomposed. Separated editing pipeline: After decomposition, individual layers are edited with Qwen-Image-Edit and recombined with combine_layers.py. Clean separation of concerns. PSD export: Uses the psd-tools library to connect directly with designer workflows. PopCon Application When creating animated emoji, decomposing characters/backgrounds/props into layers enables independent animation of each element. For example, only the character moves while the background stays fixed.\nQwen-Image Ecosystem — 20B MMDiT Foundation Model To understand Qwen-Image-Layered, you need to look at the parent project Qwen-Image as well.\nItem Details GitHub QwenLM/Qwen-Image Stars 7,694 Model Size 20B MMDiT Latest Version Qwen-Image-2.0 (2026.02) Qwen-Image is a foundation model with strengths in text rendering (especially Chinese) and precise image editing. Qwen-Image-2.0, released in February 2026, improved the following:\nProfessional typography rendering — Direct generation of infographics like PPTs, posters, and comics Native 2K resolution — Fine detail in people, nature, and architecture Unified understanding + generation — Integrating image generation and editing into a single mode Lightweight architecture — Smaller model size, faster inference speed It ranked #1 among open-source image models in AI Arena blind testing with over 10,000 evaluations.\nNano Banana 2 — Image Generation at Gemini Flash Speed Google\u0026rsquo;s Official Announcement Nano Banana 2 (officially Gemini 3.1 Flash Image), released by Google in February 2026, delivers Nano Banana Pro quality at Flash speed.\nKey features:\nAdvanced world knowledge: Accurate rendering leveraging Gemini\u0026rsquo;s real-time web search information Precise text rendering and translation: Accurate text generation for marketing mockups and infographics Subject consistency: Maintaining consistency across up to 5 characters and 14 objects Production specs: 512px to 4K, supporting various aspect ratios SynthID + C2PA: Built-in AI-generated image provenance tracking technology nano-banana-2-skill CLI Analysis Item Details GitHub kingbootoshi/nano-banana-2-skill Stars 299 Language TypeScript (Bun runtime) License MIT This project wraps Nano Banana 2 as a CLI tool, and the design is quite clever.\nArchitecture Features Multi-model support: Easy model switching with --model flash (default), --model pro, etc. Green Screen pipeline: A single -t flag generates transparent background assets AI generates on green screen -\u0026gt; FFmpeg colorkey + despill -\u0026gt; ImageMagick trim Auto-detects key color from corner pixels (since AI uses approximations like #05F904 instead of exact #00FF00) Cost tracking: Records every generation in ~/.nano-banana/costs.json Claude Code Skill: Also works as a Claude Code plugin, enabling image generation through natural language commands like \u0026ldquo;generate an image of\u0026hellip;\u0026rdquo; Cost Structure Resolution Flash Cost Pro Cost 512x512 ~$0.045 N/A 1K ~$0.067 ~$0.134 2K ~$0.101 ~$0.201 4K ~$0.151 ~$0.302 At $0.15 per 4K image, this is very affordable. A realistic price point for bulk asset generation.\nPopCon Application When bulk-generating PopCon emoji assets, Nano Banana 2\u0026rsquo;s -t (transparent background) mode is immediately usable. The workflow is to generate character assets on a green screen and automatically remove the background through the FFmpeg pipeline.\nVeo 3.1 — AI Video Generation with Sound Google\u0026rsquo;s Veo 3.1 is a model that generates videos with sound from text prompts.\nKey Features Native audio generation: Sound is included in the video without separate TTS/sound models Reference image-based style guide: Upload multiple images to specify character/scene style Portrait video support: Uploading portrait images generates social media-ready vertical videos 8-second duration: Currently supports up to 8-second video generation Pricing Tiers Model Plan Features Veo 3.1 Fast AI Pro High quality + speed optimized Veo 3.1 AI Ultra Best-in-class video quality PopCon Application Going beyond static emoji, Veo 3.1 can add short animations and sound effects to emoji. Suitable for scenarios like \u0026ldquo;a smiling character waving for 2 seconds + sound effect.\u0026rdquo;\nvLLM-Omni — Multimodal Serving Framework Item Details GitHub vllm-project/vllm-omni Stars 4,094 Language Python Latest Release v0.18.0 (2026.03) Paper arXiv:2602.02204 Why It Matters All the models above (Qwen-Image, Qwen-Image-Layered, etc.) are great, but serving them in production is a separate problem. vLLM-Omni fills this gap.\nArchitecture Highlights The original vLLM only supported text-based autoregressive generation. vLLM-Omni extends it in three ways:\nOmni-modality: Processing text, image, video, and audio data Non-autoregressive architecture: Supporting parallel generation models like Diffusion Transformers (DiT) Heterogeneous output: From text generation to multimodal output Performance Optimizations KV cache management: Leverages vLLM\u0026rsquo;s efficient KV cache as-is Pipeline stage overlapping: High throughput OmniConnector-based full decoupling: Dynamic resource allocation between stages Distributed inference: Full support for tensor, pipeline, data, and expert parallelism Supported Models (as of March 2026) Major models supported in v0.18.0:\nQwen3-Omni / Qwen3-TTS: Unified text + image + audio Qwen-Image / Qwen-Image-Edit / Qwen-Image-Layered: Image generation/editing/decomposition Bagel, MiMo-Audio, GLM-Image: Other multimodal models Diffusion (DiT) stack: Image/video generation Day-0 Support Pattern A notable aspect of vLLM-Omni is the \u0026ldquo;Day-0 support\u0026rdquo; pattern that provides serving support simultaneously with new model releases. vLLM-Omni support was available on the same day Qwen-Image-2512 launched, and the same was true for Qwen-Image-Layered. This demonstrates close collaboration between model development teams and serving infrastructure teams.\nPopCon Application When building the emoji generation API for the PopCon service, using vLLM-Omni as the serving layer allows the entire pipeline — generating images with Qwen-Image and decomposing them with Qwen-Image-Layered — to be hidden behind a single OpenAI-compatible API.\nQuick Links Qwen-Image-Layered GitHub — Image layer decomposition model Qwen-Image GitHub — 20B image foundation model Qwen-Image-Layered Paper nano-banana-2-skill GitHub — Gemini-based image generation CLI Nano Banana 2 Official Blog — Google official announcement Veo 3.1 Introduction Page — Video generation with sound vLLM-Omni GitHub — Multimodal serving framework vLLM-Omni Paper Insights The ecosystem is vertically integrating. The Qwen team covers the entire stack from foundation model (Qwen-Image) to specialized models (Layered, Edit) to serving (vLLM-Omni Day-0 support). Google has bundled generation with Nano Banana 2, video with Veo 3.1, and provenance tracking with SynthID/C2PA. We\u0026rsquo;ve entered a stage where the completeness of the entire pipeline rather than individual model performance determines competitiveness.\nEditability is the new differentiator. The competitive axis is shifting from \u0026ldquo;generating good images\u0026rdquo; to \u0026ldquo;how easily can you modify the generated images.\u0026rdquo; Qwen-Image-Layered\u0026rsquo;s layer decomposition is a prime example of this direction. When separated at the layer level, basic operations like recolor, resize, and reposition physically cannot affect other content.\nServing infrastructure is the bottleneck. No matter how good a model is, it\u0026rsquo;s meaningless if you can\u0026rsquo;t serve it in production. vLLM-Omni extending the text-only vLLM to cover Diffusion Transformers is an attempt to resolve this bottleneck. In particular, optimizations like long sequence parallelism and cache acceleration are bringing the serving costs of image generation models down to realistic levels.\nThe toolchain determines developer experience. There\u0026rsquo;s a reason a CLI wrapper like nano-banana-2-skill earned 299 stars. The experience of getting a transparent background asset with a single line like nano-banana \u0026quot;robot mascot\u0026quot; -t -o mascot is fundamentally different from reading API docs and writing code. Since it also works as a Claude Code skill, you can generate images directly from your AI coding assistant.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-ai-image-gen-ecosystem/cover-en.jpg","permalink":"/posts/2026-04-02-ai-image-gen-ecosystem/","title":"AI Image Generation Ecosystem Analysis — Qwen-Image-Layered, Nano Banana 2, Veo 3.1, vLLM-Omni"},{"content":"Overview Animated emoji and stickers are a core revenue source and user expression medium in the mobile messaging ecosystem. The KakaoTalk emoticon market is worth hundreds of billions of won annually, and LINE Creators Market is an open platform where creators worldwide participate. This post surveys platform-specific technical specs, existing creation tools, open-source alternatives, and image correction techniques to analyze what niche the PopCon project can target.\nHow PopCon is implemented in this market is covered in PopCon Dev Log #1.\nMarket Status KakaoTalk Emoticons The KakaoTalk Emoticon Store is the largest digital sticker market in South Korea. Key characteristics:\nReview-based registration: Creators submit and go through KakaoTalk\u0026rsquo;s review process before launch Animated emoticons: 24-frame animations in APNG or GIF format Revenue sharing: Creators receive 35% (platform fees are relatively high) Intensifying competition: Thousands of new emoticon sets are submitted monthly, with a low approval rate LINE Creators Market LINE operates a market open to global creators. It has two categories — animated stickers and emoji — each with different specifications.\nAnimated Sticker Specs:\nItem Specification Image size Max 320 x 270px (minimum 270px on one side) Frame count 5-20 frames (APNG) Play duration Max 4 seconds Loop count 1-4 loops File size Max 1MB per sticker, max 60MB total ZIP File format APNG (.png extension) Set composition Choose from 8, 16, or 24 stickers Background Transparent required Color space RGB Emoji Specs:\nItem Specification Image size 180 x 180px Set composition 8-40 (standard), up to 305 with text emoji File size Max 1MB per emoji, ZIP under 20MB Resolution Min 72dpi, RGB Design guideline Bold, dark outlines, simple shapes A particularly notable point in LINE\u0026rsquo;s review guidelines is that emoji are displayed large like stickers when sent alone. Therefore, designs need to be identifiable at small sizes while also looking good at large sizes.\nExisting Creation Tool Analysis Emorevi Emorevi is an AI-powered animated emoticon creation SaaS.\nCore Features:\nAI Generation: Automatic animation generation from a single image Smart Interpolation: Natural interpolation algorithms between frames Platform Optimized: Presets for KakaoTalk, LINE, Discord, and other platforms Multi-format support: Export to MP4, GIF, APNG, WebP Style Transfer: Animation style customization Real-time Preview: Live preview during editing Pricing:\nPlan Price Tickets Per-ticket cost Basic $9.99 1,000 $0.01 Standard $29.99 3,600 (+600 bonus) $0.008 Premium $99.99 14,000 (+4,000 bonus) $0.007 Emorevi offers a \u0026ldquo;from one image to animation\u0026rdquo; workflow, but its ticket-based billing model means costs accumulate with bulk production. Quality control over generated outputs is also limited.\nOpen-Source Solutions Partymoji Partymoji is a web-based animated GIF generator built with TypeScript + Rust.\nStack: TypeScript (219K LoC), Rust (GIF encoder), runs in web browser Features: Applies party effects (rainbow, rotation, sparkle, etc.) to images to create animated GIFs Live demo: https://mikeyburkman.github.io/partymoji/ Highlights: IndexedDB-based project saving, Bezier curve animation control Limitations: No output features tailored to emoticon/sticker platform specs; effect-focused (not original character animation) gif_emoji gif_emoji is a minimal Python (Pillow) tool that converts images into rotating GIFs.\nOutput: 32x32 GIF, 36 frames (rotating 10 degrees each) Use case: Slack custom emoji (compliant with 60KB limit) Code size: 1,655 lines of Python — very concise Limitations: Only rotation animation, hardcoded size/frame count Both projects take an \u0026ldquo;apply effects to images\u0026rdquo; approach. This is fundamentally different from making the character itself move (expression changes, hand waving, etc.).\nImage Correction Techniques In the animated emoji production pipeline, input image quality directly impacts the final output. Let\u0026rsquo;s look at two related technologies.\nWAIR — Wide-angle Image Rectification WAIR is a deep learning model for correcting wide-angle/fisheye lens distortion.\nArchitecture: ResNet50-based, ImageNet pretrained Distortion models: Supports FOV, Division Model, and Equidistant Performance: PSNR 26.43 / SSIM 0.85 on ADE20k dataset (FOV model) Practicality: Distortion parameters estimated from 256x256 input can be applied to 1024x1024 originals (warping in 5.3ms) Emoji relevance: Useful for distortion correction when users use photos from smartphone wide-angle cameras as emoji source material Deep-OAD — Image Orientation Angle Detection Deep-OAD is a model that detects and automatically corrects image rotation angles.\nV2 update: Achieved SOTA with ViT (Vision Transformer) Accuracy: Test MAE of 6.5 degrees across the 0-359 degree range Training data: Trained on most MS COCO images Application: Automatically detecting orientation of user-uploaded images for correction in the preprocessing stage These two technologies can be integrated into a preprocessing pipeline that \u0026ldquo;automatically normalizes the source images provided by users.\u0026rdquo;\nTool Comparison graph LR subgraph 상용[\"상용 서비스\"] A[\"이모레비\u0026lt;br/\u0026gt;AI 애니메이션 생성\u0026lt;br/\u0026gt;티켓 과금\"] end subgraph 오픈소스[\"오픈소스 도구\"] B[\"Partymoji\u0026lt;br/\u0026gt;효과 기반 GIF\u0026lt;br/\u0026gt;TypeScript + Rust\"] C[\"gif_emoji\u0026lt;br/\u0026gt;회전 GIF\u0026lt;br/\u0026gt;Python\"] end subgraph 보정[\"이미지 보정\"] D[\"WAIR\u0026lt;br/\u0026gt;광각 왜곡 보정\u0026lt;br/\u0026gt;ResNet50\"] E[\"Deep-OAD\u0026lt;br/\u0026gt;방향 감지\u0026lt;br/\u0026gt;ViT\"] end subgraph 목표[\"PopCon\"] F[\"캐릭터 애니메이션\u0026lt;br/\u0026gt;플랫폼 규격 준수\u0026lt;br/\u0026gt;로컬 실행\"] end A --\u003e|\"영감\"| F B --\u003e|\"GIF 인코딩 참고\"| F C --\u003e|\"Pillow 파이프라인\"| F D --\u003e|\"전처리\"| F E --\u003e|\"전처리\"| FDifferentiation from PopCon Summarizing the limitations of existing tools reveals the position PopCon can occupy:\nAspect Existing Tools PopCon Animation method Effect application (rotation, party) or AI black box Intentional movement via character rigging Platform specs Generic GIF output LINE/KakaoTalk spec presets built in Cost SaaS billing (Emorevi) Local execution, free Control level Limited parameters Fine-grained frame-by-frame control Image preprocessing None Distortion correction + orientation detection pipeline integration Output format Primarily GIF APNG, GIF, WebP multi-format The key differentiators boil down to three points:\nAutomated spec compliance — Providing presets for LINE animated sticker constraints like 320x270px, 5-20 frames, and 4-second limits to reduce submission trial and error Character-centric animation — Instead of \u0026ldquo;applying\u0026rdquo; effects, generating animation where the character \u0026ldquo;moves\u0026rdquo; Preprocessing pipeline — Integrating correction models like WAIR and Deep-OAD to normalize input images of varying quality Quick Links Emorevi — AI Animated Emoticon Creation LINE Creators Market Animated Sticker Guidelines LINE Creators Market Emoji Guidelines Partymoji — Web-based Animated GIF Generator gif_emoji — Python Rotating GIF Generator WAIR — Wide-angle Image Distortion Correction Deep-OAD — Automatic Image Orientation Detection Insights The market entry barrier is in \u0026ldquo;review\u0026rdquo; — It\u0026rsquo;s harder to consistently produce quality that passes KakaoTalk/LINE review than to technically create animations. Having automation tools strictly follow specs is the first challenge. The open-source gap is large — partymoji and gif_emoji are at the \u0026ldquo;toy\u0026rdquo; level. There are virtually no open-source tools that generate character animations while complying with platform specs. Emorevi\u0026rsquo;s limitations are an opportunity — The SaaS model accumulates costs with bulk production, and fine control over AI-generated output is difficult. There\u0026rsquo;s demand for a locally-run tool with frame-by-frame control. Preprocessing automation determines UX — If a user\u0026rsquo;s uploaded photo is tilted or has wide-angle distortion, the result looks awkward no matter how good the animation engine is. Integrating preprocessing with models like WAIR + Deep-OAD can significantly improve perceived quality. APNG is the essential format — Both LINE and KakaoTalk officially support APNG. It has richer color representation than GIF (alpha channel support) and better file size efficiency. PopCon\u0026rsquo;s default output format should be APNG. ","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-emoji-market-research/cover-en.jpg","permalink":"/posts/2026-04-02-emoji-market-research/","title":"Animated Emoji Market Research — From Platform Specs to Open-Source Tools"},{"content":"Overview On March 31, 2026, the entire source code of Anthropic\u0026rsquo;s AI coding agent Claude Code was publicly leaked through source map (.map) files included in the NPM package. Approximately 1,900 TypeScript files comprising over 512,000 lines of code were exposed, revealing unreleased internal features such as the Buddy gacha system, Kairos always-on assistant, and Undercover Mode — none of which Anthropic had publicly announced. Although no model weights were leaked, the incident has sent shockwaves through the industry because the harness design — the core competitive advantage in the agent era — was exposed in its entirety.\nTimeline — What Source Maps Are and How It Happened Claude Code is an official CLI tool that Anthropic distributes through the NPM registry. When deploying JavaScript/TypeScript projects, it\u0026rsquo;s standard practice for build tools to minify the code. A .map file (source map) is a debugging file that maps the minified code back to the original source. It should never be included in production deployments.\nThe problem was that a build configuration error caused these source map files to be included in the public NPM package as-is. The source maps pointed directly to the original TypeScript source code stored in Anthropic\u0026rsquo;s R2 storage bucket, which was also publicly accessible. Security researcher Chai Found Show first discovered this and shared it on X (Twitter), where the post exceeded 3.1 million views. Within hours, the entire source code was archived on GitHub, garnering over 100 stars and 1,900 forks.\nAnthropic quickly deployed an update removing the source maps and withdrew previous versions from NPM, but the GitHub archive had already spread permanently. What\u0026rsquo;s even more shocking is that this wasn\u0026rsquo;t the first time. In 2025, the same source map leak occurred with versions v2.8 and v4.228. Just five days before this leak, on March 26, a separate incident exposed the unannounced model Mythos and draft blog posts due to a CMS configuration error. Two configuration errors occurred within five days.\nflowchart LR A[\"TypeScript 원본 소스\"] --\u003e B[\"빌드 \u0026amp; 번들링\"] B --\u003e C[\".map 소스맵 생성\"] B --\u003e D[\"minified JS 번들\"] C --\u003e E[\"R2 스토리지 버킷\u0026lt;br/\u0026gt;(공개 접근 가능)\"] D --\u003e F[\"NPM 패키지 배포\"] C --\u003e|\".npmignore 누락\"| F F --\u003e G[\"보안 연구자 발견\"] G --\u003e H[\"GitHub 아카이브\u0026lt;br/\u0026gt;(1,900+ forks)\"]Scale and Structure of the Leaked Code The leaked codebase consists of approximately 1,900 TypeScript files and over 512,000 lines of code. It runs on the Bun runtime and features a terminal UI built with React and Ink. The technology stack includes Zod v4 for schema validation, an MCP (Model Context Protocol) client manager, an OpenTelemetry-based observability system, and feature flag management through GrowthBook.\nArchitecturally, the most notable aspect is the inclusion of over 40 permission-gated tools. The modules handling AI calls and streaming alone account for 46,000 lines, and a multi-agent orchestration system (Coordinator Mode) is fully implemented. A single Claude instance can spawn and manage multiple worker agents in parallel, with inter-worker communication conducted through XML messages and a shared scratchpad directory.\nThe entry point is main.tsx, and the architecture comprises a bootstrap layer, conversation engine, service layer (API), orchestration layer, tool layer (40+ tools), and utility layer (plugins, permissions). Sessions persist as JSONL files in the .claude directory, and large outputs are stored separately as tool result files in memory. Analysis revealed numerous circular dependencies and some Rust native modules (fuzzy search, Napi modules, etc.).\nUnreleased Features — Buddy, Kairos, Ultra Plan The most talked-about aspect of the leak was the features Anthropic had not publicly disclosed. These were hidden behind environment variables and feature flags, inactive for regular users.\nBuddy System is a Tamagotchi-style AI companion feature. It includes 18 species (duck, dragon, axolotl, capybara, mushroom, ghost, etc.) with rarity tiers from Common to 1%-chance Legendary. Cosmetics include hats and color variants (shiny), along with five personality stats: debugging, patience, chaos, wisdom, and snark. It was designed so Claude would generate a unique name and personality (\u0026ldquo;soul description\u0026rdquo;) on first launch. The code even included a schedule for an April 1-7 teaser period and a May official release (Anthropic employees first).\nKairos is an always-on assistant mode. It runs continuously without waiting for user input, maintaining an append-only log (\u0026ldquo;tick\u0026rdquo;) recording daily observations and actions. It has a 15-second blocking budget so that tasks disrupting the user workflow for more than 15 seconds are automatically deferred. It also includes logic to receive periodic alerts and decide whether to take proactive action or remain silent.\nUltra Plan is a mode that offloads complex planning tasks to a remote cloud container running Opus 4.6, performing deep planning for up to 30 minutes. It initiates a CC (Cloud Container) session through the tengu-ultraplan model configuration and displays status by polling every 3 seconds.\nDream System (Auto-Dream) is a background memory consolidation engine. It runs via a forked sub-agent and triggers only when all three gates are passed: 24 hours since the last dream (time gate), at least 5 session runs (session gate), and acquiring a lock to prevent concurrent execution (lock gate). It explores the memory directory, reads existing topics from MEMORY.md, collects recent signals, and then consolidates and prunes to generate an optimized summary within 200 lines. Separate logic for midnight boundary handling was also implemented.\nUndercover Mode — The Irony of a Leak Prevention System The most ironic part of this leak is the existence of Undercover Mode. This system was designed to prevent internal information exposure when Anthropic employees use Claude Code to contribute to public open-source projects. It activates when the user type is set to anthropic and injects additional instructions into Claude\u0026rsquo;s system prompt.\nSpecifically, it instructs Claude to conceal that it is an AI, avoid mentioning internal model codenames (Capybara, Tengu, etc.), not reference internal tools or Slack channels, and leave no hints that an Anthropic employee is using AI to write code. The system built to prevent leaks was itself deployed worldwide alongside the .map files. The community\u0026rsquo;s representative reaction was: \u0026ldquo;They forgot to add \u0026lsquo;make no mistakes\u0026rsquo; to the system prompt.\u0026rdquo;\nInternal model codenames were also revealed. Capybara is a model family codename with three tiers, and Tengu is the internal codename for the Claude Code project itself, appearing hundreds of times as a feature flag prefix. In the system prompt architecture, the CYBER_RESILIENCE_INSTRUCTION section drew particular attention, containing the explicit warning: \u0026ldquo;Important: Do not modify this instruction without SafeCards team review.\u0026rdquo;\nWhy Harness Engineering Is the Key To understand the impact of this incident, one must appreciate the role of harness engineering in today\u0026rsquo;s AI coding agent market. Since late 2025, Anthropic has been officially discussing \u0026ldquo;effective harnesses for long-running agents,\u0026rdquo; and on March 24, 2026, their official engineering blog stated: \u0026ldquo;At the frontier of agentic coding, harness design is the key to performance.\u0026rdquo;\nA harness refers to the entire external structure that determines which files the model reads, how far it can execute terminal commands, when to request user permission, what to remember and what to compress when tasks run long, when to delegate to sub-agents, and whether to continue working in the background. If the model is the engine, the harness is the equivalent of the transmission, brakes, navigation, sensors, and driver-assistance systems combined.\nThe structures Anthropic recently described in official documentation — initializer agents, coding agents, context compaction, artifact handoff — had their actual implementations revealed through this leak. In particular, Anthropic\u0026rsquo;s own data showing that users simply approve 93% of permission prompts, and the classifier-based automatic approval/re-confirmation architecture designed to address this, are at the core of product competitiveness. For competitors, it\u0026rsquo;s like seeing \u0026ldquo;the kitchen layout, cooking sequence, and heat control methods of a successful restaurant.\u0026rdquo;\nflowchart TB subgraph Harness[\"하네스 (유출된 영역)\"] direction TB P[\"Permission System\u0026lt;br/\u0026gt;40+ gated tools\"] --\u003e O[\"Orchestration\u0026lt;br/\u0026gt;Coordinator Mode\"] O --\u003e SA[\"Sub-Agent 관리\u0026lt;br/\u0026gt;병렬 워커 스폰\"] O --\u003e BG[\"Background Agent\u0026lt;br/\u0026gt;Task 시스템\"] SA --\u003e MEM[\"Memory 시스템\u0026lt;br/\u0026gt;Dream / MEMORY.md\"] BG --\u003e MEM MEM --\u003e CC[\"Context Compaction\u0026lt;br/\u0026gt;JSONL 세션 persist\"] end subgraph Model[\"모델 (유출되지 않음)\"] MW[\"Model Weights\u0026lt;br/\u0026gt;Claude Opus / Sonnet\"] TD[\"Training Data\"] end subgraph User[\"사용자 환경\"] CLI[\"Claude Code CLI\u0026lt;br/\u0026gt;Bun + React Ink\"] IDE[\"IDE Bridge\u0026lt;br/\u0026gt;LSP 통합\"] end User --\u003e Harness Harness --\u003e ModelCommunity Reactions and Suspicions Community reactions fell into three camps. The first was the \u0026ldquo;it\u0026rsquo;s not a big deal\u0026rdquo; position, arguing that since no model weights were leaked, Claude\u0026rsquo;s core competitive advantage remains safe. On Hacker News, opinions like \u0026ldquo;the underlying model is what makes Claude valuable, not the client code\u0026rdquo; were expressed.\nThe second was the \u0026ldquo;serious trust issue\u0026rdquo; position. The core concern is that a company building a tool entrusted with file system and terminal access failed to protect its own software twice. The irony of a company that puts AI safety first making repeated mistakes in basic software supply chain controls — release hygiene, packaging review, source map removal — was pointed out.\nThe third was the \u0026ldquo;deliberate leak suspicion,\u0026rdquo; primarily raised by Korean YouTubers. The argument is that it\u0026rsquo;s hard to believe source maps passed through multiple stages of a CI/CD pipeline. Questions were raised about whether someone intentionally removed the source map exclusion setting from .npmignore, the timing coinciding with OpenAI Codex being released as open source, and the proximity to April Fools\u0026rsquo; Day on April 1. However, these remain speculations, and Anthropic officially confirmed it was a deployment error in the CI pipeline.\nSecurity Implications — Supply Chain Security Fundamentals The most important technical lesson from this incident is the fundamentals of software supply chain security. Automatically verifying whether source map files are included in production bundles within the CI/CD pipeline is a task that requires just a single checklist item. A whitelist approach using .npmignore or the files field in package.json is safer, and an automatic scanning process for bundle output size and content before release would have prevented both leaks.\nNo user data was leaked. API keys, personal information, and conversation histories were not included — what was exposed was the CLI client code itself. However, from an attacker\u0026rsquo;s perspective, knowledge of internal architecture can increase the efficiency of attacks such as prompt injection, permission check bypasses, and guardrail evasion. The logic of the permission system, tool call ordering, and connection points between background tasks and the local bridge are now public knowledge.\nFrom an enterprise customer perspective, even though no data was immediately leaked, the maturity of deployment and review processes must be reassessed. A company that promotes safety as its core brand repeatedly making mistakes in basic build configuration carries a trust cost.\nOpenClaude — Rebirth from Leaked Code The most dramatic aftermath of the leak is the emergence of OpenClaude. Built on the leaked Claude Code source, it is an open-source fork that adds an OpenAI-compatible provider shim, allowing GPT-4o, Gemini, DeepSeek, Ollama, and 200+ other models to run within Claude Code\u0026rsquo;s exact UI and workflow.\nWhat Stays, What Changes What OpenClaude preserves is the entire Claude Code harness. Bash, file read/write/edit, grep, glob, agents, tasks, MCP, slash commands, streaming output, multi-step reasoning — the terminal-first workflow from Claude Code operates unchanged. The only thing that changes is the backend model. Three environment variables are all it takes:\nexport CLAUDE_CODE_USE_OPENAI=1 export OPENAI_API_KEY=sk-your-key-here export OPENAI_MODEL=gpt-4o Changing OPENAI_BASE_URL alone connects any OpenAI-compatible provider — OpenRouter (Gemini), DeepSeek, Groq, Mistral, LM Studio, Ollama (local models), and more. Codex backends are also supported, with two modes: codexplan (GPT-5.4, high-reasoning) and codexspark (GPT-5.3 Codex Spark, fast loops).\nInstallation and Profile System npm install -g @gitlawb/openclaude The /provider slash command runs a guided setup that saves the preferred provider and model to .openclaude-profile.json. From that point, the profile alone launches with the optimal provider and model. Local Ollama instances are detected automatically.\nCommunity Reception — Opportunity vs. Copyright As of April 2026, the project has attracted 8,176 stars and 3,131 forks on GitHub, representing explosive growth. The prevailing developer verdict is that \u0026ldquo;for anyone who wanted Claude Code\u0026rsquo;s UX while having freedom over model cost and API choice, this is an immediate answer.\u0026rdquo;\nThe Korean tech community on GeekNews, however, is far more critical. Reactions like \u0026ldquo;stealing stolen goods,\u0026rdquo; \u0026ldquo;no different from pirated software being passed around,\u0026rdquo; and \u0026ldquo;does this person not understand copyright?\u0026rdquo; dominate the comments. The project name itself may be legally problematic since \u0026ldquo;Claude\u0026rdquo; is a registered Anthropic trademark — a commenter noted that a similar project, Clawdbot, had to rename itself to OpenClaw. The OpenClaude repository itself includes a disclaimer: \u0026ldquo;OpenClaude is an independent community project and is not affiliated with, endorsed by, or sponsored by Anthropic.\u0026rdquo;\nLegal Tension and Technical Merit Given its foundation in leaked source code, the threat of legal action from Anthropic remains real. Anthropic holds copyright over the Claude Code source, and distributing a fork of leaked proprietary code may constitute infringement. The project declares an MIT license, but whether Gitlawb has the authority to apply that license is the central legal question.\nOn technical merit, the project has earned broadly positive assessments independent of the legal controversy. A VS Code extension, Firecrawl integration, Android install guide, and LM Studio provider support (PR #227) reflect a rapidly growing contributor community. The fact that an ecosystem of this scale emerged within days of the leak is paradoxical proof of just how reusable and well-structured the Claude Code harness architecture was.\nQuick Links Claude Code LEAKS is INSANE! - Julian Goldie SEO — Comprehensive analysis of the leak and unreleased features (Buddy, Kairos, Undercover Mode) Claude Code LEAKED - What It Really Means — Technical analysis of codebase structure, architecture, and improvement points Claude Code source code leak. Why would they do this? — Deliberate leak suspicions, gacha system/Dream system detailed analysis (Korean) More critical than AI model leaks — Claude Code leak, partial harness exposure — Interpreting the incident from a harness engineering perspective (Korean) Dissecting the leaked Claude Code CLI source code - bkamp — Community source code analysis OpenClaude GitHub Repository — Multi-model coding agent CLI built on the leaked source (8,176 stars) GeekNews: OpenClaude born from Claude Code source leak — 200+ models via Claude Code UI: GPT-4o, Gemini, Ollama and more Insights This Claude Code source code leak vividly demonstrates where competitive advantage lies in the AI era. The fact that it was the harness architecture rather than model weights that was leaked reveals the reality that core IP in the agent era no longer resides solely in model parameters. The internal complexity of Claude Code — over 40 permission-gated tools, multi-agent orchestration, memory consolidation through the Dream system, and the Kairos always-on assistant with its 15-second blocking budget — far exceeded most expectations. At the same time, the fact that it could have been prevented with just one line in .npmignore or a single artifact verification step in the CI pipeline reaffirms the importance of fundamentals.\nThe emergence of OpenClaude shows that the fallout from this incident extends well beyond information disclosure. A full-stack coding agent for other models rebuilt from leaked harness code in a matter of days is, paradoxically, a testament to the quality of Claude Code\u0026rsquo;s design. The fact that Anthropic, a company that bills itself as \u0026ldquo;the safety company,\u0026rdquo; caused repeated incidents in the most basic parts of its software supply chain is a technical irony that could escalate into an enterprise trust issue. The lesson for developers from this incident is that no matter how sophisticated a security system you build (Undercover Mode), a single configuration line in the build pipeline can render it all useless. In the end, software security is determined not by the most glamorous features but by the most mundane checklists.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-claude-code-leak/cover-en.jpg","permalink":"/posts/2026-04-02-claude-code-leak/","title":"Claude Code Source Code Leak — Agent Architecture Exposed Through an NPM Source Map Mistake"},{"content":"Overview Previous posts covered the basic concepts of harnesses (the three elements of guardrails/monitoring/feedback loops), checkpointing and state management for long-running agents, and plugin ecosystems. This post covers two perspectives not previously addressed. First, the prompt -\u0026gt; context -\u0026gt; harness -\u0026gt; agentic 4-axis framework from SilbeDeveloper\u0026rsquo;s YouTube video and the core philosophy that \u0026ldquo;prompts are requests, harnesses are physical barriers.\u0026rdquo; Second, the planner-generator-evaluator trio architecture and sprint contract pattern from a TILNOTE article analyzing Anthropic\u0026rsquo;s harness design documentation. Related posts: Long-Running Agents and Harness Engineering, HarnessKit Dev Log #3\ngraph TD A[\"AI 활용 4축 프레임워크\"] --\u003e B[\"1. 프롬프트 엔지니어링\u0026lt;br/\u0026gt;말을 잘 거는 기술\"] A --\u003e C[\"2. 컨텍스트 엔지니어링\u0026lt;br/\u0026gt;필요한 정보를 제공하는 기술\"] A --\u003e D[\"3. 하네스 엔지니어링\u0026lt;br/\u0026gt;규칙과 울타리를 만드는 기술\"] A --\u003e E[\"4. 에이전틱 엔지니어링\u0026lt;br/\u0026gt;자율 워크플로우를 설계하는 기술\"] B -.-\u003e|\"천장 존재\"| C C -.-\u003e|\"정보만으론 부족\"| D D -.-\u003e|\"상호보완\"| E style D fill:#ff6b6b,stroke:#c92a2a,color:#fff The 4-Axis Framework — From Prompt to Agentic In the video Prompt Engineering Is Over: The Era of \u0026lsquo;Harness\u0026rsquo; Has Arrived, SilbeDeveloper organizes AI utilization methodologies into four axes. These axes are not graduated sequentially — they are all simultaneously necessary and complementary.\nThe Ceiling of Prompts Prompt engineering is the skill of \u0026ldquo;talking to AI effectively.\u0026rdquo; Specifying \u0026ldquo;an engineering calculator with sin/cos support and a GUI\u0026rdquo; instead of just \u0026ldquo;make me a calculator\u0026rdquo; yields different results. But there\u0026rsquo;s a ceiling. No matter how sophisticated the prompt, you can\u0026rsquo;t get good code without knowledge of the project\u0026rsquo;s tech stack, code structure, and DB schema.\nWhy Context Alone Isn\u0026rsquo;t Enough Context engineering provides project structure, existing code, API documentation, and design guidelines together. Anthropic\u0026rsquo;s definition: \u0026ldquo;The skill of appropriately selecting and providing the information AI needs to do its work.\u0026rdquo; The key is not providing a lot, but providing exactly what\u0026rsquo;s needed right now.\nBut there are problems that context engineering can\u0026rsquo;t solve no matter how well designed. Cases where the AI has all the information but does something unexpected. You assign it to a payment system and it changes the DB schema on its own, or prints credit card numbers to the log. This isn\u0026rsquo;t an information problem — it\u0026rsquo;s a problem of rules and boundaries.\nHarness vs Agentic — Reins vs Horse Training Previous posts covered the basic concepts of harnesses but didn\u0026rsquo;t clearly articulate the relationship with agentic engineering. The video\u0026rsquo;s summary is clean:\nPerspective Agentic Engineering Harness Engineering Analogy The skill of training the horse The skill of making the reins Focus How the AI thinks What the AI can and cannot do Failure Response Prompt changes, reasoning loop adjustments Automatically adding rules/tests Human Role Delegator, supervisor Designer, boundary setter The key in one line: No matter how well-trained the horse, it cannot plow a field without reins.\nStructural Non-Repeatability — The Core Philosophy of Harnesses Previous posts covered guardrails and feedback loops, but the most important statement from the video deserves separate discussion:\nWhen an agent violates a rule, you don\u0026rsquo;t fix the prompt by saying \u0026ldquo;try harder.\u0026rdquo; You fix the harness so that failure becomes structurally impossible to repeat.\nRequest vs Physical Barrier Suppose an AI agent directly called the DB from frontend code.\nPrompt approach: Add \u0026ldquo;Don\u0026rsquo;t call the DB directly\u0026rdquo; to the prompt -\u0026gt; It makes the same mistake next time. Because a prompt is a request, not enforcement. Harness approach: Add an architecture test so that the moment the frontend folder imports DB, the build fails. It becomes structurally impossible. This distinction matters because previous posts addressed \u0026ldquo;guardrails\u0026rdquo; at a conceptual level. The framing of \u0026ldquo;prompts are requests, tooling boundaries are physical barriers\u0026rdquo; provides a criterion for judging what level of constraint to apply in practice.\nThe 4 Pillars of Harness — Beyond the Original 3 Elements Previous posts covered the guardrails/monitoring/feedback loop triad. The video introduces Martin Fowler\u0026rsquo;s 4-pillar structure, which overlaps with the original three elements but includes two notable additions.\nNew Pillar 1: Tool Boundaries Physically limiting what tools an AI agent can use and what it can access:\nFile system: src/ folder is read/write, config/ folder is read-only API: Internal API calls allowed, external service calls blocked Database: SELECT allowed, DROP TABLE absolutely forbidden Terminal: Only whitelisted commands can be executed While the previous posts\u0026rsquo; \u0026ldquo;guardrails\u0026rdquo; defined \u0026ldquo;what shouldn\u0026rsquo;t be done,\u0026rdquo; tool boundaries are a physical layer that systemically blocks access itself.\nNew Pillar 2: Garbage Collection (Automated Code Quality Cleanup) Named by Martin Fowler, this concept wasn\u0026rsquo;t covered in previous posts. AI references existing code to write new code, and if the existing code has bad patterns, it copies them. This is an automated cleanup system to prevent bad patterns from snowballing:\nAutomatic detection of coding rule violations Automatic discovery of duplicate code and auto-generation of refactoring PRs Automatic removal of dead code Periodic checking of architectural anti-patterns The key: Every time an agent makes a mistake, that mistake becomes a new rule. Adding linter rules, adding tests, adding constraints — the harness grows increasingly sophisticated through this evolutionary characteristic.\nPlanner-Generator-Evaluator Architecture From here, the content comes from the article Anthropic\u0026rsquo;s Harness Design: Planner-Generator-Evaluator Architecture. This is an entirely new architecture pattern not covered in previous posts.\ngraph LR subgraph 오케스트레이션 P[\"플래너\u0026lt;br/\u0026gt;스펙 확장 + 설계\"] end subgraph 실행 G[\"생성기\u0026lt;br/\u0026gt;코드 작성\"] end subgraph 검증 E[\"평가자\u0026lt;br/\u0026gt;QA + 채점\"] end P --\u003e|\"제품 스펙\"| G G --\u003e|\"구현 결과\"| E E --\u003e|\"피드백 + 점수\"| G E --\u003e|\"통과\"| R[\"완료\"] E --\u003e|\"미달\"| G T[\"Playwright\u0026lt;br/\u0026gt;브라우저 자동화\"] --\u003e E style P fill:#4dabf7,stroke:#1c7ed6,color:#fff style G fill:#69db7c,stroke:#2f9e44,color:#fff style E fill:#ff6b6b,stroke:#c92a2a,color:#fffWhy a Single Agent Breaks Down There are two causes of collapse in long-duration tasks:\nContext instability: As the context window fills up, earlier decisions become entangled, and when the model \u0026ldquo;senses\u0026rdquo; it\u0026rsquo;s approaching its limits, it tends to rush to finish Lenient self-evaluation: When you ask an agent to evaluate its own output, it tends to conclude \u0026ldquo;it\u0026rsquo;s fine\u0026rdquo; even when the actual quality has defects Checkpointing/state management covered in previous posts addressed the first problem. The solution to the second problem is role separation — the generator-evaluator loop borrowed from GANs.\nFrom GAN Intuition to Engineering Just as a generator and discriminator compete in a GAN (Generative Adversarial Network) to improve quality:\nGenerator: Creates the output Evaluator: Scores and critiques according to criteria Generator: Takes the feedback and creates the next version What repeats is not \u0026ldquo;vague improvement\u0026rdquo; but \u0026ldquo;improvement that satisfies specific criteria.\u0026rdquo; The more independent the evaluator, the less \u0026ldquo;leniency\u0026rdquo; there is. However, since the evaluator is also an LLM, its default tendency is lenient — scoring habits must be calibrated with few-shot examples and score decomposition.\nThe Role of the Planner In the trio, the planner expands 1-4 sentence requests into a \u0026ldquo;sufficiently large\u0026rdquo; product spec. Core principles:\nDon\u0026rsquo;t include premature implementation details — wrong decisions propagate downstream Write around product context and high-level design, leaving room for implementation Actively look for opportunities to integrate AI features into the product Sprint Contracts — Contractualizing the Definition of Done Previous posts covered checkpoints but didn\u0026rsquo;t address how to define \u0026ldquo;what counts as done.\u0026rdquo; In Anthropic\u0026rsquo;s harness, the device that fills this gap is the sprint contract.\nThe Contract Process Before each sprint begins, the generator and evaluator negotiate:\nGenerator proposes: Presents an implementation plan and verification methods Evaluator reviews: Checks alignment with the spec and testability Execute after agreement: Code writing only begins after consensus The key pattern is fixing inter-agent communication as file-based artifacts. One side writes files, the other reads, modifies, and adds. Even when context wobbles, the work state remains explicit, which is advantageous for long-running tasks.\nCost vs Quality Approach Time Result Single agent 20 min Looks plausible on the surface but core features are broken Planner-generator-evaluator harness 6 hours More features, actually working quality The decisive factors that made the difference: the evaluator\u0026rsquo;s real interaction-based QA and contract-based definition of done.\nThe Evaluator Operates, Not Just Screenshots If the evaluator judges from a single still image, it misses quality issues that emerge in interactions, layout, and state transitions. Anthropic\u0026rsquo;s solution:\nGive the evaluator browser automation tools like Playwright The evaluator clicks, navigates, and observes screens on its own It writes scores and detailed critiques per criterion Even subjective design quality is made scorable. Four axes:\nOverall design polish — consistent mood/identity Originality — escaping the template/default component feel Craftsmanship — fundamentals like typography, spacing, contrast Functionality — usability Since models tend to achieve functionality and fundamentals comfortably, greater weight should be placed on polish and originality to push beyond the comfort zone.\nWhen Models Improve, Lighten the Harness An important insight not covered in previous posts: each component of the harness is an assumption about \u0026ldquo;what the model can\u0026rsquo;t do alone.\u0026rdquo; As models advance, those assumptions shift.\nSprint Removal Example With stronger models:\nConsistent builds lasting over 2 hours became possible without sprint decomposition The sprint structure was removed, and evaluation was reduced to \u0026ldquo;once at the end\u0026rdquo; This prevented unnecessary mechanisms from merely increasing costs However, evaluators don\u0026rsquo;t become entirely unnecessary. When the task falls outside the model\u0026rsquo;s reliability boundary — for example, when core interactions keep getting left as stubs — the evaluator remains valuable insurance.\nPractical principle: Stress-test the harness with each new model release and redesign by removing parts that have become dead weight.\nQuick Links Prompt Engineering Is Over: The Era of \u0026lsquo;Harness\u0026rsquo; Has Arrived (YouTube) — SilbeDeveloper, 4-axis framework and harness 4-pillar structure Anthropic\u0026rsquo;s Harness Design: Planner-Generator-Evaluator Architecture (TILNOTE) — Analysis of Anthropic\u0026rsquo;s harness design documentation Harness design for long-running application development (Anthropic) — Original reference Long-Running Agents and Harness Engineering — Previous post: checkpoints, state management, 3 elements HarnessKit Dev Log #3 — Previous post: plugin triggers, marketplace Insights While previous posts focused on the \u0026ldquo;what\u0026rdquo; of harnesses (guardrails, monitoring, feedback loops), these two sources complement the \u0026ldquo;why\u0026rdquo; and \u0026ldquo;how.\u0026rdquo;\nOn the \u0026ldquo;why\u0026rdquo; side, the 4-axis framework clarifies how harnesses relate to prompts and context. The distinction that prompts are requests while harnesses are physical barriers provides a practical criterion for deciding \u0026ldquo;should this rule go in CLAUDE.md or be enforced as a linter rule?\u0026rdquo;\nOn the \u0026ldquo;how\u0026rdquo; side, the planner-generator-evaluator architecture presents concrete implementation patterns for harnesses. In particular, the patterns of contractualizing the definition of done through sprint contracts and performing real interaction-based QA by equipping the evaluator with Playwright are immediately applicable. And the insight \u0026ldquo;when models improve, lighten the harness\u0026rdquo; reframes harnesses not as permanent, immutable infrastructure but as a collection of assumptions about model capabilities. In HarnessKit development as well, a process for re-evaluating the necessity of each skill with every new model release would be needed.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-harness-beyond-prompt-engineering/cover-en.jpg","permalink":"/posts/2026-04-02-harness-beyond-prompt-engineering/","title":"Don't Fix the Prompt, Fix the Harness — The 4-Axis Framework and Generator-Evaluator Architecture"},{"content":"Overview Previous Post: #3 — Plugin Trigger Fixes and Marketplace Recommendation System\nIn this #4 installment, the marketplace installation infrastructure was stabilized across 17 commits and v0.3.0 was released. The introduction of marketplace.json secured the claude plugin add installation path, READMEs were split into English and Korean, and a comprehensive plugin trigger review was completed — unifying CLAUDE_PLUGIN_ROOT, adding preset checks, and implementing installation verification. Marketplace recommendations were also redesigned with a pre-verification approach.\nmarketplace.json — The Starting Point for Plugin Installation Problem Installing a plugin from the Claude Code marketplace requires .claude-plugin/marketplace.json. Without this file, the claude plugin add command cannot be used, forcing users to manually clone the repository.\nSolution marketplace.json was added and the source path was changed to a ./ relative path to enable marketplace installation. This was the starting point for v0.3.0.\ngraph LR A[\"User\"] --\u003e|\"claude plugin add\"| B[\"Marketplace\"] B --\u003e|\"marketplace.json reference\"| C[\"plugin.json\"] C --\u003e D[\"skills / hooks installation\"] D --\u003e E[\"HarnessKit activated\"] README Split — Separating English and Korean Once listed on the marketplace, English-speaking users will also read the README. Mixing two languages in a single README is inconvenient for both audiences. README.md was rewritten as English-only, and README.ko.md was added as a separate Korean version.\nComprehensive Plugin Trigger Review and Fixes Spec-Based Approach Rather than simply fixing bugs, a spec document was written first to classify 5 triggering issues. They were prioritized as CRITICAL, MAJOR, and MINOR, and after a spec review, the fix plan was finalized before implementation began.\ngraph TD A[\"Write Spec \u0026lt;br/\u0026gt; Identify 5 Issues\"] --\u003e B[\"Spec Review \u0026lt;br/\u0026gt; Fix CRITICAL/MAJOR\"] B --\u003e C[\"Create Implementation Plan\"] C --\u003e D[\"Unify \u0026lt;br/\u0026gt; CLAUDE_PLUGIN_ROOT\"] C --\u003e E[\"Add \u0026lt;br/\u0026gt; Preset Check\"] C --\u003e F[\"Add Installation \u0026lt;br/\u0026gt; Verification to Status Skill\"] D --\u003e G[\"Migrate All \u0026lt;br/\u0026gt; hooks / skills\"] E --\u003e G F --\u003e G G --\u003e H[\"v0.3.0 Release\"]CLAUDE_PLUGIN_ROOT Unification A mix of claude plugin path, hardcoded absolute paths, and relative paths was unified under the single CLAUDE_PLUGIN_ROOT environment variable. All hooks including guardrails.sh and pre-commit-test.sh, as well as init and setup skills, were migrated to the same pattern.\n# Unified pattern: environment variable + dirname fallback PLUGIN_DIR=\u0026#34;${CLAUDE_PLUGIN_ROOT:-$(cd \u0026#34;$(dirname \u0026#34;$0\u0026#34;)/..\u0026#34; \u0026amp;\u0026amp; pwd)}\u0026#34; Preset Check Added post-edit-lint.sh and post-edit-typecheck.sh were running before the preset was configured, causing errors. A check for the preset file\u0026rsquo;s existence was added to exit early if it is missing.\nInstallation Verification Feature A feature to verify plugin installation status was added to the /harnesskit:status skill. It provides an at-a-glance view of skill file existence, hooks execution permissions, and configuration file integrity.\nMarketplace Verified Recommendation System Real-time marketplace search-based recommendations were replaced with a pre-verified marketplace-recommendations.json.\nThe update-recommendations.sh script crawls the marketplace to refresh the list /harnesskit:init recommends plugins from this list that match the project /harnesskit:insights also references the same list to ensure consistent recommendations 3-Step Sliding Window Tool Sequence The tool usage pattern analysis in session-end.sh was upgraded. Instead of simple counts, tool sequences are tracked using a 3-step sliding window and recorded in tool:summary format. Detecting repeated patterns improves the precision of automation suggestions.\nv0.3.0 Release After all fixes were applied, the version in plugin.json was bumped to 0.3.0. Since the marketplace plugin cache detects version changes and refreshes, the changes are propagated to installed users as well.\nCommit Log Message Changes feat: add marketplace.json for plugin installation marketplace fix: use ./ relative path in marketplace.json source marketplace docs: split README into English and Korean versions docs docs: add Korean README docs docs: add spec for plugin trigger review — 5 fixes docs docs: address spec review — fix CRITICAL and MAJOR issues docs docs: add implementation plan for plugin trigger fixes docs fix: add preset check to post-edit hooks + CLAUDE_PLUGIN_ROOT fallback hooks refactor: unify PLUGIN_DIR to CLAUDE_PLUGIN_ROOT with fallback hooks refactor: migrate skills from \u0026lsquo;claude plugin path\u0026rsquo; to CLAUDE_PLUGIN_ROOT skills feat: add verified marketplace-recommendations.json templates feat: add update-recommendations.sh for marketplace crawling scripts feat: rewrite init marketplace discovery with verified recs skills feat: add recommendations.json reference to insights skills feat: upgrade tool sequence to 3-step sliding window hooks feat: add plugin installation verification to status skills chore: bump version to 0.3.0 for plugin cache refresh plugin Insights Listing on a marketplace means transforming \u0026ldquo;a tool that works in my environment\u0026rdquo; into \u0026ldquo;a product that works in anyone\u0026rsquo;s environment.\u0026rdquo; Adding a single marketplace.json is simple, but it cascades into path reference unification, environment variable fallbacks, handling unconfigured presets, and installation status verification. Writing and reviewing the spec document before implementation was effective — identifying all 5 issues at once and prioritizing them enabled a systematic migration instead of scattered fixes. The principle of \u0026ldquo;fix the docs before fixing the code\u0026rdquo; proved valid once again.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-harnesskit-dev4/cover-en.jpg","permalink":"/posts/2026-04-02-harnesskit-dev4/","title":"HarnessKit Dev Log #4 — Marketplace Stabilization and v0.3.0 Release"},{"content":"Overview Previous Post: #6 — S3 Image Storage Migration and Branding\nIn this #7 installment, three key tasks were carried out across 7 commits. First, the existing search-score-based tone/angle image injection logic was completely replaced with a Gemini Flash LLM category classification approach. Second, the broken download button after S3 migration was fixed via a backend proxy. Third, a feature was added allowing users to adjust tone/angle ratios and regenerate images. Additionally, large image data was removed from the repo and package management was migrated to pyproject.toml.\ngraph TD A[\"User Prompt\"] --\u003e B[\"Gemini Flash\u0026lt;br/\u0026gt;Category Classification\"] B --\u003e C{\"Select 1 of 4 Categories\"} C --\u003e D[\"a: Natural/Film\"] C --\u003e E[\"b: Vivid/Colorful\"] C --\u003e F[\"c: Cinematic/Contrast\"] C --\u003e G[\"d: Beauty\"] D --\u003e H[\"Random 2 Images\u0026lt;br/\u0026gt;Tone + Angle\"] E --\u003e H F --\u003e H G --\u003e H B --\u003e I[\"Tone/Angle Ratio\u0026lt;br/\u0026gt;25 / 50 / 75 / 100%\"] H --\u003e J[\"Apply Ratio to Prompt\u0026lt;br/\u0026gt;Reference tone at N%\"] I --\u003e J J --\u003e K[\"Gemini Image Generation\"] K --\u003e L{\"User Ratio Adjustment?\"} L --\u003e|\"Yes\"| M[\"injection_override\u0026lt;br/\u0026gt;Same images, different ratio\"] M --\u003e J L --\u003e|\"No\"| N[\"Done\"] Full Replacement of Tone/Angle Injection with LLM Category Classification Background The previous tone/angle auto-injection system used the hybrid search pipeline to find candidate images and scored them using tone_score/angle_score from images.json, selecting from the top 20%. This approach had two problems:\nTone/angle images were mixed in with the regular search image pool, leading to potentially inappropriate selections Selection based solely on search scores regardless of the prompt\u0026rsquo;s mood resulted in inconsistency The new approach categorizes 299 dedicated tone/angle reference images into 4 categories managed separately, and lets the LLM analyze the prompt to determine both the category and the application ratio.\nCategory Description Image Count a(natural,film) Natural, film-like feel with warm tones 129 b(vivid,colorful) Vibrant and colorful, high saturation 39 c(cinematic,contrast) Cinematic mood, strong contrast 80 d(beauty) Beauty/portrait style, soft lighting 51 Implementation Full rewrite of injection.py:\nAll existing search+score logic (_search_candidates_for_injection, _select_best_category_ref) was removed and replaced with a lightweight classification call using Gemini Flash. The LLM analyzes the prompt and returns a category and tone/angle ratio (25/50/75/100%) as JSON.\nCLASSIFICATION_PROMPT = \u0026#34;\u0026#34;\u0026#34;\\ You are an expert who analyzes image generation prompts to select the most suitable tone/angle category and determine the application ratio for tone and angle. ## Ratio Guide - 25%: The prompt already specifies a very specific style/tone → reference minimally - 50%: Some style direction exists but needs reinforcement - 75%: Topic-focused with weak style specification, needs heavy reference - 100%: No style-related mentions at all, rely entirely on reference ## Response Format (output JSON only) {{\u0026#34;category\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;tone_ratio\u0026#34;: N, \u0026#34;angle_ratio\u0026#34;: N}} \u0026#34;\u0026#34;\u0026#34; Based on the classification result, tone and angle images are randomly selected from the corresponding category folder. Both are chosen from the same category but use different images.\nSchema changes:\nscore: float was removed from InjectedReference and replaced with category: str + ratio: int. A category field was also added to InjectionInfo so the frontend can display which category was selected.\nclass InjectedReference(BaseModel): filename: str category: str = \u0026#34;\u0026#34; ratio: int = 100 Prompt construction changes:\nbuild_generation_prompt() was updated to directly incorporate ratio information into the prompt:\nTone/color reference. Only reference the color, tone, and mood of this image. Reference the tone at only {N}%. Do NOT incorporate any non-color elements such as composition, subject, shape, or background from this image. S3 integration:\nThe 299 tone/angle reference images were uploaded to S3, and 4 category subdirectories were registered in ref_dirs to be served the same way as existing image_ref_1~4. The S3 key structure is refs/tone_angle_image_ref/{category}/{filename}, mirroring the local directory structure.\nTroubleshooting During the initial S3 upload, the key structure was flat as refs/a(natural,film)/..., mixing with existing image_ref_1~4 images at the same level. Based on user feedback, a parent folder was added to create refs/tone_angle_image_ref/a(natural,film)/... to match the repo structure, and build_ref_key_cache was updated to use Path.relative_to(\u0026quot;data\u0026quot;) for correct caching of nested directories.\n# Before: only used p.name → \u0026#34;a(natural,film)\u0026#34; # After: relative path from data/ → \u0026#34;tone_angle_image_ref/a(natural,film)\u0026#34; try: ref_subdir = str(p.relative_to(\u0026#34;data\u0026#34;)) except ValueError: ref_subdir = p.name S3 Image Download Button Fix Background After migrating to S3 in #6, the download button was found to be non-functional. Clicking the button would open the image in a new tab or just zoom in on screen, but would not save it as a file.\nRoot Cause Analysis The HTML \u0026lt;a download\u0026gt; attribute only works with same-origin URLs. Before the S3 migration, images were served from /images/filename on the same domain, so there was no issue. After migration, URLs changed to https://\u0026lt;bucket\u0026gt;.s3.\u0026lt;region\u0026gt;.amazonaws.com/... cross-origin format, causing browsers to ignore the download attribute.\nAdditionally, newly generated images used data URIs (data:image/png;base64,...) where fetch() worked fine, but history images used presigned S3 URLs which were blocked by CORS policy, preventing fetch() as well.\nImplementation The fix was done in two stages:\nStage 1 \u0026ndash; Frontend downloadImage helper:\nReplaced \u0026lt;a href download\u0026gt; tags with \u0026lt;button\u0026gt; elements, using JavaScript to fetch a blob and trigger a programmatic download.\nexport const downloadImage = async (filename: string): Promise\u0026lt;void\u0026gt; =\u0026gt; { const downloadUrl = `/images/${encodeURIComponent(filename)}/download`; const response = await fetch(downloadUrl, { credentials: \u0026#39;include\u0026#39; }); if (!response.ok) throw new Error(`Download failed: ${response.status}`); const blob = await response.blob(); const blobUrl = URL.createObjectURL(blob); const a = document.createElement(\u0026#39;a\u0026#39;); a.href = blobUrl; a.download = filename; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(blobUrl); }; Stage 2 \u0026ndash; Backend download proxy endpoint:\nA GET /images/{filename}/download endpoint was added to stream image bytes directly from S3 and return them with a Content-Disposition: attachment header. The existing /images/{filename} used a 302 redirect approach which couldn\u0026rsquo;t resolve the CORS issue, so a separate proxy was necessary.\nOwnership verification (check_file_ownership) and Content-Disposition header injection defense (quote removal) were also included.\nUser Ratio Adjustment Regeneration Background The tone/angle ratio determined by the LLM may not match the user\u0026rsquo;s intent. For example, the LLM might decide on 75% tone, but the user wants to lower it to 25%. While the first generation uses the AI\u0026rsquo;s judgment, users should be able to click on a generated image, go to the detail view, change the ratio, and regenerate.\nImplementation InjectionOverride schema added:\nAn InjectionOverride model was added to the backend, along with an optional injection_override field in GenerateImageRequest. When this field is present, LLM classification is skipped and generation proceeds directly with the user-specified ratio and the same image files.\nclass InjectionOverride(BaseModel): tone_filename: str angle_filename: str category: str tone_ratio: int = Field(ge=25, le=100) angle_ratio: int = Field(ge=25, le=100) Frontend ratio adjustment UI:\nAn interaction was added to the GeneratedImageDetail component where clicking the tone/angle ratio badges cycles through 25 -\u0026gt; 50 -\u0026gt; 75 -\u0026gt; 100 -\u0026gt; 25. When the ratio differs from the original, a \u0026ldquo;Regenerate with changed ratio\u0026rdquo; button appears, which sends a generation request including injection_override.\nconst RATIO_STEPS = [25, 50, 75, 100] as const; const nextRatio = (current: number) =\u0026gt; { const idx = RATIO_STEPS.indexOf(current as typeof RATIO_STEPS[number]); return RATIO_STEPS[(idx + 1) % RATIO_STEPS.length]; }; Repo Cleanup and Package Management Migration With all images now on S3, the large image reference data (split zip files) remaining in the repo was removed, and reference image directories and zip files were added to .gitignore. Additionally, dependency management was migrated from requirements.txt to pyproject.toml, adopting the standard Python package management approach.\nCommit Log Order Type Message Files Changed 1 chore ignore ref image dirs and zip files from repo 1 2 chore migrate to pyproject.toml for package management 3 3 feat replace score-based injection with LLM category classification 11 4 fix use tone_angle_image_ref parent folder in S3 key structure 2 5 remove get rid of all the image reference data from the repo 20 6 fix download button now works for S3-hosted images 4 7 feat allow user to adjust tone/angle ratios and regenerate 5 Insights Using an LLM as a classifier is far more flexible than keyword mapping. Initially, I tried to map categories using keywords, but prompts often use indirect expressions like \u0026ldquo;emotional cafe interior,\u0026rdquo; making it clear that keyword-based mapping would fail for most prompts. Using Gemini Flash as a lightweight classifier can determine both category and ratio in a single call, and fixing the response format to JSON makes parsing straightforward.\nThe hidden cost of S3 migration is CORS. Switching from local file serving to S3 is relatively simple, but features that implicitly assumed same-origin break one by one. The fact that the \u0026lt;a download\u0026gt; attribute is ignored for cross-origin URLs is specified in the HTML spec, but it\u0026rsquo;s easy to overlook until you actually encounter it. A backend proxy endpoint can completely bypass CORS, but since traffic routes through the server, a separate CDN setup may be needed for large volumes of files.\nUser overrides should be included in the design from the start. If you consider an interface where users can adjust AI-determined values from the beginning, you won\u0026rsquo;t need major schema changes when adding features later. In this case, the injection_override field was added all at once, but if ratio parameters had been separated in the initial design, the extension would have been more natural.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-hybrid-search-dev7/cover-en.jpg","permalink":"/posts/2026-04-02-hybrid-search-dev7/","title":"Hybrid Image Search Dev Log #7 — LLM Category Classification and S3 Migration"},{"content":"Overview Previous Post: #4 — Preparing for Official Marketplace Registration\nIn this #5 installment, two major features were added. First, Deep Docs crawling using the Firecrawl API — going beyond existing Playwright-based single-page scraping to structurally collect entire documentation sites. Second, bilingual (Korean/English) blog support — when a post is written, a translation is automatically generated and deployed according to Hugo\u0026rsquo;s multilingual structure. Across 15 commits, work progressed from design document creation through implementation to SDK type fixes.\ngraph TD A[\"log-blog v0.5\"] --\u003e B[\"Firecrawl Deep Docs\"] A --\u003e C[\"Bilingual Blog\"] B --\u003e B1[\"FirecrawlConfig\"] B --\u003e B2[\"firecrawl_fetcher module\"] B --\u003e B3[\"content_fetcher routing\"] B --\u003e B4[\"CLI --deep flag\"] C --\u003e C1[\"Design doc\"] C --\u003e C2[\"post skill translation step\"] C --\u003e C3[\"Default language English-first\"] Firecrawl Deep Docs Integration Background The existing log-blog content collection was Playwright-based. It rendered single pages in a headless browser and extracted text, but this had limitations with documentation sites (Honeycomb Docs, MDN, etc.). It would only fetch the overview of a single page, missing the detailed content of related subpages.\nFirecrawl solves this problem. Given a URL, it crawls the site\u0026rsquo;s subpages and returns structured markdown. It also supports JavaScript rendering, so SPA-based documentation sites can be processed as well.\nImplementation Step 1: Design document — The scope and interfaces for Firecrawl integration were defined first. The structure adds a Firecrawl route to the existing URL type routing in content_fetcher.py.\nStep 2: Config system extension — A FirecrawlConfig dataclass was added to config.py.\n@dataclass class FirecrawlConfig: api_key: str = \u0026#34;\u0026#34; max_pages: int = 10 timeout: int = 30 A firecrawl section was also added to config.example.yaml to document the API key configuration method.\nStep 3: firecrawl_fetcher module — A dedicated fetcher using the firecrawl-py SDK was implemented. The key point is that routing to Firecrawl only happens when the URL type is DOCS_PAGE and the --deep flag is active.\nStep 4: content_fetcher routing — A Firecrawl route was added to the URL type branching in content_fetcher.py. It follows the same pattern as existing YouTube, GitHub, and Playwright branches: DOCS_PAGE -\u0026gt; firecrawl_fetcher.\nStep 5: CLI \u0026ndash;deep flag — A --deep option was added to the fetch command so users can explicitly activate Deep Docs mode.\nTroubleshooting In the initial implementation, the Firecrawl SDK return type was accessed as a dict, but it actually returned typed objects. result['content'] needed to be result.content instead. This type mismatch was fixed in the final commit.\nBilingual Blog Pipeline Background As the blog grew, an English readership became necessary. Hugo supports multilingual content with content/ko/posts/ and content/en/posts/ structures, but manually translating every post is impractical.\nImplementation Design document — Hugo\u0026rsquo;s multilingual structure, translation workflow, and default language switching strategy were documented.\nPost skill translation step — A translation stage was added to the post generation skill. Posts written in Korean are automatically translated to English (or vice versa) and deployed to both language directories.\nDefault language English-first — Since browsing history is predominantly in English, the default writing language was switched to English. Automatically generating Korean translations improves overall pipeline efficiency.\nSkill updates — The deep docs workflow for Steps 3-5 was reflected in the skill, and a Firecrawl API key prompt was added to the setup skill.\nCommit Log Message Changes docs: add design spec for Firecrawl deep docs integration +85 -0 docs: add implementation plan for Firecrawl deep docs integration +120 -0 feat: add firecrawl-py dependency for deep docs fetching +2 -1 docs: bilingual blog design spec +95 -0 feat: add FirecrawlConfig to config system +15 -2 feat: add firecrawl_fetcher module for deep docs crawling +78 -0 feat: route deep DOCS_PAGE URLs to Firecrawl in content_fetcher +25 -3 feat: add \u0026ndash;deep flag to fetch command for Firecrawl deep docs +12 -1 docs: add firecrawl config section to example config +8 -0 feat: add Firecrawl API key prompt to setup skill +5 -0 feat: update skill for deep docs workflow in Steps 3-5 +45 -12 docs: bilingual blog implementation plan +110 -0 feat: add bilingual translation step to post skill +35 -8 feat: flip default language to English-first in post skill +6 -6 fix: use Firecrawl SDK typed objects instead of dict access +8 -8 Insights The biggest lesson from this development was the value of writing design documents first. For both Firecrawl integration and bilingual support, design documents (design spec + implementation plan) were written before implementation began. This made it possible to clearly identify integration points with existing code and add features cleanly without unnecessary refactoring. Even when unexpected issues like the Firecrawl SDK typed object problem arose during implementation, the scope of fixes remained localized because the overall architecture was already established. Having 5 out of 15 commits be documentation may seem inefficient, but in practice it was an investment that increased the accuracy of implementation commits.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-log-blog-dev5/cover-en.jpg","permalink":"/posts/2026-04-02-log-blog-dev5/","title":"Log-Blog Dev Log #5 — Firecrawl Deep Docs Integration and Bilingual Blog"},{"content":"Overview OpenClaw has surpassed 300K GitHub stars, overtaking both React and the Linux kernel. Creator Peter Steinberger was acquired by OpenAI, and Anthropic is following suit with similar features (Channels, Dispatch). This post analyzes what OpenClaw is, why 80% of apps could disappear, and the future of the AI agent ecosystem, drawing from NetworkChuck\u0026rsquo;s hands-on video and a Y Combinator founder interview.\nWhat Is OpenClaw OpenClaw is not an AI model itself. As NetworkChuck clearly explained, \u0026ldquo;OpenClaw is not itself an AI. It\u0026rsquo;s a harness. It\u0026rsquo;s a layer sitting on top of other AI.\u0026rdquo; In other words, OpenClaw is a gateway that sits on top of various AI models.\nThis gateway runs as a Node.js app 24/7, connecting three core pillars:\ngraph TD GW[\"OpenClaw Gateway\u0026lt;br/\u0026gt;(Node.js service, runs 24/7)\"] subgraph Models[\"1. AI Models (swappable)\"] M1[\"OpenAI GPT-5.4\"] M2[\"Anthropic Claude\"] M3[\"Ollama (local models)\"] end subgraph Channels[\"2. Channels (user touchpoints)\"] C1[\"Telegram\"] C2[\"Discord\"] C3[\"Slack\"] C4[\"WhatsApp\"] C5[\"Web UI / TUI\"] end subgraph Memory[\"3. Memory (local markdown)\"] S1[\"soul.md\u0026lt;br/\u0026gt;(agent identity)\"] S2[\"identity.md\"] S3[\"memory.md\u0026lt;br/\u0026gt;(long-term memory)\"] S4[\"memory/daily journals\"] end GW --\u003e Models GW --\u003e Channels GW --\u003e MemoryIn the Y Combinator interview, Peter Steinberger described OpenClaw\u0026rsquo;s key differentiator: \u0026ldquo;The biggest difference about what I built is that it actually runs on your computer. Everything I\u0026rsquo;ve seen so far runs in the cloud. When it runs on your computer, it can do everything.\u0026rdquo;\nHe says it can control ovens, Teslas, lights, Sonos speakers, and even bed temperature. ChatGPT can\u0026rsquo;t do any of that.\nInstall in 5 Minutes In the video, NetworkChuck actually started a timer and demonstrated installing OpenClaw in under 5 minutes. The key steps are surprisingly simple:\nPrepare a VPS or local server - works anywhere Run the one-line install command (copy from openclaw.ai) Choose an AI model - OpenAI (API key or ChatGPT Pro subscription), Anthropic, or Ollama Connect a channel - create a bot token via Telegram Bot Father and connect it Enable Hooks - boot, bootstrap, command logger, session memory After installation, you chat with the agent in a TUI (Terminal User Interface) to set its name, personality, and role. This conversation is immediately written to the soul.md file. NetworkChuck put it this way: \u0026ldquo;When you configure OpenClaw, you configure it by talking to OpenClaw itself. It\u0026rsquo;s kind of like a Pokemon game vibe.\u0026rdquo;\nLive Demo: Days of N8N Work in One Sentence What NetworkChuck emphasized most was the comparison with existing automation tools:\nTask N8N OpenClaw News aggregator Multiple nodes + hours of setup + Python coding One sentence, one shot IT server monitoring dashboard Tutorial-length separate video Natural language instruction -\u0026gt; live dashboard auto-generated Tell the agent \u0026ldquo;Aggregate cybersecurity news and evaluate whether it\u0026rsquo;s worth reading,\u0026rdquo; and it scrapes Reddit, Hacker News, and YouTube, then evaluates everything. Assign it an IT engineer role, and it inspects the server\u0026rsquo;s CPU, RAM, internet speed, and security logs, then creates a real-time dashboard.\nThe Creator\u0026rsquo;s Aha Moment Peter Steinberger\u0026rsquo;s Aha Moment came during a trip to Marrakech. He sent a voice message to his agent via WhatsApp, even though he had never built that feature. Ten seconds later, he got a reply.\nThe agent\u0026rsquo;s explanation was impressive: it received a message without a file extension, analyzed the header, converted it to WAV with ffmpeg, decided that locally installing Whisper would take too long, found an OpenAI API key, and completed transcription via curl. All in about 9 seconds.\nPeter\u0026rsquo;s key insight: \u0026ldquo;What coding models are good at is creative problem solving. This is an abstract skill that applies not just to code but to all real-world tasks.\u0026rdquo;\n80% of Apps Will Disappear In the Y Combinator interview, Peter made a provocative prediction about the future of the app ecosystem:\n\u0026ldquo;80% of apps will disappear. Why do you need MyFitnessPal? The agent already knows I\u0026rsquo;m making bad decisions. If I go to Smashburger, it guesses what I like and logs it automatically. To-do apps? Tell the agent and it reminds you the next day. You don\u0026rsquo;t even need to care where it\u0026rsquo;s stored.\u0026rdquo;\nHis criteria are clear:\ngraph LR A[\"Current App Ecosystem\"] --\u003e B{\"What is the core function?\"} B --\u003e|\"Data management\"| C[\"High chance of extinction\u0026lt;br/\u0026gt;(to-do, fitness, notes, etc.)\"] B --\u003e|\"Hardware sensor dependent\"| D[\"High chance of survival\u0026lt;br/\u0026gt;(camera, GPS, etc.)\"] C --\u003e E[\"Replaced by AI agents\u0026lt;br/\u0026gt;via natural language\"] D --\u003e F[\"Sensor data fed\u0026lt;br/\u0026gt;to agents\"]\u0026ldquo;Every app that manages data can be managed more naturally by an agent. Only apps with sensors will survive.\u0026rdquo;\nMemory Ownership and Data Silos Both videos emphasized the importance of memory. OpenClaw\u0026rsquo;s memory consists of local markdown files:\nsoul.md - The agent\u0026rsquo;s identity and personality (\u0026ldquo;You\u0026rsquo;re not a chatbot. You\u0026rsquo;re becoming someone\u0026rdquo;) identity.md - Basic identity information memory.md - Long-term memory (spouse\u0026rsquo;s birthday, child\u0026rsquo;s favorite color, etc.) memory/daily files - Daily journals (\u0026ldquo;Day 1. Awakened.\u0026rdquo;) Peter said this is the decisive difference from ChatGPT or Claude: \u0026ldquo;Companies want to lock you into their data silos. The beauty of OpenClaw is that it \u0026lsquo;claws into\u0026rsquo; the data. Memory is just markdown files on your machine.\u0026rdquo;\nThese memory files inevitably contain sensitive personal information. Peter himself admitted: \u0026ldquo;There are memories that shouldn\u0026rsquo;t leak. If you had to choose between hiding your Google search history or your memory file - it\u0026rsquo;s the memory file.\u0026rdquo;\nBot-to-Bot: The Next Step Peter is already looking at the next stage. Beyond human-bot interaction, it\u0026rsquo;s bot-to-bot interaction:\nYour bot negotiates restaurant reservations with the restaurant\u0026rsquo;s bot If there\u0026rsquo;s no digital interface, the bot hires a human to make a phone call or stand in line Specialized bots by purpose: personal life, work, relationship management The community has already produced projects like Maltbook, where bots talk to each other, and there are even cases of bots hiring humans for real-world tasks.\nSecurity Concerns: An Unavoidable Reality NetworkChuck raised security concerns in an interesting way. After having viewers install OpenClaw, he says: \u0026ldquo;You just configured OpenClaw. One of the most insecure things ever. Prompt injection, malware hidden in skills. You are a walking CVE.\u0026rdquo;\nSince OpenClaw has access to everything on your computer by default, using it without security settings poses serious risks. It\u0026rsquo;s a double-edged sword \u0026ndash; as powerful as it is dangerous.\nModel Commoditization and the Shift in Value Peter also made sharp observations about the future of AI models:\n\u0026ldquo;Every time a new model comes out, people say \u0026lsquo;Oh my God, this is so good.\u0026rsquo; A month later they complain \u0026lsquo;It\u0026rsquo;s degraded, they quantized it.\u0026rsquo; No, nothing happened. Your expectations just went up.\u0026rdquo;\nOpen-source models are reaching the level of top-tier models from a year ago, and people complain that even those aren\u0026rsquo;t good enough. As this pattern repeats, models increasingly become commodities. OpenClaw\u0026rsquo;s \u0026ldquo;swappable brain\u0026rdquo; design perfectly reflects this trend.\nSo where does value remain? Peter\u0026rsquo;s answer: memory and data ownership. Models get swapped, apps disappear, but an agent that holds your context and memories is irreplaceable.\nQuick Links NetworkChuck - OpenClaw Hands-on and Analysis Y Combinator - OpenClaw Creator Interview: 80% of Apps Will Disappear OpenClaw Official Site Insights Synthesizing both videos, OpenClaw is not just another AI tool but a project that represents a paradigm shift in software.\nFirst, the democratization of interfaces. Previously, using AI meant going to each company\u0026rsquo;s platform. OpenClaw takes the approach of \u0026ldquo;coming to where you are,\u0026rdquo; letting you use the same agent across Telegram, Discord, WhatsApp, and more.\nSecond, the redefinition of apps. Peter\u0026rsquo;s \u0026ldquo;80% extinction\u0026rdquo; prediction seems radical, but the logic is solid. Apps whose core function is data management (to-do, fitness, notes) can be replaced by natural language agents. Only apps that depend on hardware sensors will remain.\nThird, the beginning of the data sovereignty war. ChatGPT, Claude, and others lock memory in their own servers. OpenClaw returns full ownership to users via local markdown files. If the most important asset of the AI era is \u0026ldquo;data about me,\u0026rdquo; then who owns that data will become the central battleground.\nHowever, as NetworkChuck warned, security remains unresolved. An agent with access to your entire computer is powerful, but vulnerabilities from prompt injection or malicious skills are equally significant. Proper security configuration is essential to avoid becoming \u0026ldquo;a walking CVE.\u0026rdquo;\nMore important than the number 300K GitHub stars is the question OpenClaw poses: In a world where apps are unnecessary, where does the value of software lie?\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-openclaw-ai-apps/cover-en.jpg","permalink":"/posts/2026-04-02-openclaw-ai-apps/","title":"OpenClaw Launch and the Future of the AI App Ecosystem - Will 80% of Apps Disappear?"},{"content":"Overview PopCon (Pop + Icon) is a web application that takes a single character image as input and automatically generates animated emoji sets that meet LINE specifications. It uses Google\u0026rsquo;s Imagen (Nano Banana 2) for pose generation, VEO 3.1 for animation, and ffmpeg + Pillow for post-processing \u0026ndash; a 3-stage AI pipeline built from scratch in a single day.\nThis post pairs well with the preliminary research posts:\nAI Image Generation Ecosystem \u0026ndash; Technical survey Animated Emoji Market Research \u0026ndash; Market analysis Project Structure and Pipeline Background LINE animated emojis have strict specifications: 180x180px APNG format, 8-40 per set, under 300KB per file. The goal is to automate what would otherwise take days of manual work per character using AI.\nArchitecture The entire system consists of 4 services:\ngraph LR subgraph Frontend[\"Frontend \u0026lt;br/\u0026gt; Next.js 15\"] Upload[\"Image Upload\"] Editor[\"Action Editor\"] Progress[\"Progress Tracking\"] Download[\"ZIP Download\"] end subgraph Backend[\"Backend \u0026lt;br/\u0026gt; FastAPI\"] API[\"REST API\"] Preprocess[\"Image Preprocessing\"] end subgraph Worker[\"Celery Worker\"] S1[\"Stage 1 \u0026lt;br/\u0026gt; Pose Generation \u0026lt;br/\u0026gt; Imagen\"] S2[\"Stage 2 \u0026lt;br/\u0026gt; Animation \u0026lt;br/\u0026gt; VEO 3.1\"] S3[\"Stage 3 \u0026lt;br/\u0026gt; Post-processing \u0026lt;br/\u0026gt; ffmpeg + Pillow\"] Pack[\"ZIP Packaging\"] end Redis[\"Redis \u0026lt;br/\u0026gt; Job Store\"] Upload --\u003e API Editor --\u003e API API --\u003e Redis API --\u003e S1 S1 --\u003e S2 S2 --\u003e S3 S3 --\u003e Pack Progress --\u003e Redis Download --\u003e PackDocker Compose manages all services:\nservices: redis: image: redis:7-alpine backend: build: ./backend ports: [\u0026#34;8000:8000\u0026#34;] environment: - POPCON_GOOGLE_API_KEY=${POPCON_GOOGLE_API_KEY} - POPCON_REDIS_URL=redis://redis:6379/0 volumes: - /tmp/popcon:/tmp/popcon worker: build: ./backend command: celery -A worker.celery_app worker --loglevel=info --concurrency=2 frontend: build: ./frontend ports: [\u0026#34;3000:3000\u0026#34;] Migrating from In-Memory State to Redis Background The initial implementation managed JOB_STORE as a Python dict. Jobs were created in the FastAPI process and status was updated in the Celery worker, but there was a problem \u0026ndash; in Docker Compose, backend and worker are separate processes. Even if they use the same image, memory is not shared.\nTroubleshooting When the worker called update_job, the backend\u0026rsquo;s /api/job/{job_id}/status endpoint still showed queued status. The frontend\u0026rsquo;s polling was stuck at \u0026ldquo;Generating\u0026hellip;\u0026rdquo; indefinitely.\nThe solution was using Redis as the state store:\n# job_store.py — Redis-backed job store def save_job(status: JobStatus) -\u0026gt; None: \u0026#34;\u0026#34;\u0026#34;Persist a JobStatus to Redis.\u0026#34;\u0026#34;\u0026#34; r = _get_redis() r.set(_key(status.job_id), status.model_dump_json(), ex=86400) def get_job(job_id: str) -\u0026gt; JobStatus | None: \u0026#34;\u0026#34;\u0026#34;Load a JobStatus from Redis.\u0026#34;\u0026#34;\u0026#34; r = _get_redis() data = r.get(_key(job_id)) if data is None: return None return JobStatus.model_validate_json(data) Serialization/deserialization was handled with Pydantic\u0026rsquo;s model_dump_json() and model_validate_json(), with a 24-hour TTL for automatic cleanup. Replacing all JOB_STORE[job_id] accesses with get_job() / save_job() calls resulted in 175 lines added and 58 lines deleted across 5 files.\nWrestling with the VEO 3.1 API Background VEO 3.1 is an Image-to-Video (I2V) generation model that takes a starting image and motion prompt to generate video. The original plan was to use dual-frame I2V with both start and end frames.\nFour Consecutive Fix Commits Four issues arose in succession during VEO API integration:\n1. Model ID Error \u0026ndash; The model name from the documentation was rejected by the actual API. veo-3.1-generate-preview was the correct ID.\n2. Minimum Duration \u0026ndash; VEO 3.1\u0026rsquo;s minimum video length is 4 seconds, and LINE emoji maximum is also 4 seconds. They happened to match exactly, but initially setting it to 2 seconds caused an API error.\n3. Dual-frame Not Supported \u0026ndash; The last_frame parameter was not yet supported in VEO 3.1 preview. The workaround was using start frame + strong motion prompt:\n# NOTE: last_frame (dual-frame I2V) is not yet supported on VEO 3.1 preview. # We rely on the start frame + strong motion prompt instead. async def animate(self, start_image, end_image, action, output_dir): full_motion = ( f\u0026#34;{action.motion_prompt} \u0026#34; f\u0026#34;The character transitions to: {action.end_prompt}\u0026#34; ) prompt = build_motion_prompt(full_motion) video_bytes = await self._generate_video(prompt, start_image, end_image) 4. video_bytes Was None \u0026ndash; VEO returned videos as download URIs instead of inline bytes. A branch was added to download from video.video.uri via httpx.get, with redirect following enabled:\nfor video in operation.result.generated_videos: if video.video.video_bytes: return video.video.video_bytes if video.video.uri: resp = await asyncio.to_thread( httpx.get, video.video.uri, headers={\u0026#34;x-goog-api-key\u0026#34;: self.api_key}, timeout=120, follow_redirects=True, ) resp.raise_for_status() return resp.content APNG Compression Strategy Background LINE emojis have a 300KB per-file limit. Extracting 12 frames from VEO-generated video easily pushes 180x180 APNG files past 300KB.\nImplementation An iterative compression strategy was implemented. It progressively reduces frame count and color count until the file falls under 300KB:\nstrategies = [ (total_frames, None), # All frames, full color (10, None), # 10 frames, full color (10, 128), # 10 frames, 128 colors (8, 64), # 8 frames, 64 colors (5, 32), # 5 frames, 32 colors ] for frame_count, colors in strategies: n = min(frame_count, total_frames) # Select frames at even intervals indices = [round(i * (total_frames - 1) / (n - 1)) for i in range(n)] selected = [frame_paths[i] for i in indices] # Proportionally adjust delay to match frame count adjusted_delay_ms = max(1, round(original_duration_ms / n)) if colors is not None: _quantize_frames(copies, colors) build_apng(copies, output_path, delay_ms=adjusted_delay_ms) if output_path.stat().st_size \u0026lt;= max_size: return output_path When reducing frames, evenly spaced frames are selected and the delay is proportionally increased to maintain the total playback duration.\nBackground Removal Failure and Strategy Pivot Background The initial design planned to use rembg for background removal to create transparent APNGs. The pipeline extracted frames from VEO video, then removed backgrounds with rembg (u2net).\nTroubleshooting Multiple problems cascaded during quality inspection of the actual output:\nPhase 1 \u0026ndash; Background Artifacts: rembg couldn\u0026rsquo;t fully remove floor/shadow from VEO videos, leaving gray smudges. The motion prompt was updated with \u0026quot;Plain solid white background. No shadows, no ground, no floor\u0026quot;, and the rembg model was changed to isnet-general-use.\nPhase 2 \u0026ndash; Cloud Effect: The isnet model removed backgrounds more aggressively, also stripping parts of the character. A custom alpha mask combining rembg confidence and pixel brightness was attempted, but it produced a mosaic-pattern side effect.\nPhase 3 \u0026ndash; Strategy Pivot: The decision was made to completely remove rembg. Instead:\nPose generation prompts explicitly specified \u0026quot;Plain solid white (#FFFFFF) background. NOT transparent, NOT checkerboard pattern\u0026quot; VEO motion prompts included the same background instructions Switched from background removal to brightness-based content cropping def resize_frame(input_path, output_path, size=None, padding_ratio=0.05): \u0026#34;\u0026#34;\u0026#34;Crop to content via brightness detection, scale to fill.\u0026#34;\u0026#34;\u0026#34; img = Image.open(input_path).convert(\u0026#34;RGB\u0026#34;) arr = np.array(img) # Detect content pixels that aren\u0026#39;t white or black brightness = arr.astype(float).mean(axis=2) content_mask = (brightness \u0026gt; 10) \u0026amp; (brightness \u0026lt; 245) rows = np.any(content_mask, axis=1) cols = np.any(content_mask, axis=0) if rows.any() and cols.any(): y_min, y_max = np.where(rows)[0][[0, -1]] x_min, x_max = np.where(cols)[0][[0, -1]] img = img.crop((x_min, y_min, x_max + 1, y_max + 1)) # Fill canvas (5% padding) pad = int(min(size) * padding_ratio) target_w = size[0] - pad * 2 target_h = size[1] - pad * 2 scale = min(target_w / img.width, target_h / img.height) img = img.resize((int(img.width * scale), int(img.height * scale)), Image.LANCZOS) Ultimately, removing the rembg dependency (including onnxruntime) significantly reduced Docker image size and processing time.\nLINE Spec Validation and File Naming Fix Background The LINE Creators Market guidelines were scraped with Firecrawl and compared against the current config.\nImplementation Most specs matched, but two discrepancies were found:\nItem LINE Official Previous Implementation Status File naming 001.png ~ 040.png 00_happy.png Mismatch Minimum set size 8 1 (for testing) Mismatch The packager was updated to convert filenames to LINE specifications:\nwith zipfile.ZipFile(zip_path, \u0026#34;w\u0026#34;, compression=zipfile.ZIP_DEFLATED) as zf: zf.write(tab_path, \u0026#34;tab.png\u0026#34;) for i, emoji_path in enumerate(emoji_paths): line_name = f\u0026#34;{i + 1:03d}.png\u0026#34; # 001.png, 002.png, ... zf.write(emoji_path, line_name) The strategy is to keep descriptive names like 00_happy.png for internal working files, converting to LINE spec names only when adding to the ZIP archive.\nImage Preprocessing Pipeline Background User-uploaded or AI-generated character images were sometimes not square or had excessive whitespace. Imagen occasionally generated images that weren\u0026rsquo;t 1:1 ratio, and once the character was duplicated at the top of the image.\nImplementation Two-pronged fixes were made:\n1. Upload Image Preprocessing \u0026ndash; numpy-based content detection to crop whitespace, apply square padding, and resize to 512x512:\ndef preprocess_character_image(image_path: Path) -\u0026gt; None: img = Image.open(image_path).convert(\u0026#34;RGB\u0026#34;) arr = np.array(img) brightness = arr.astype(float).mean(axis=2) content_mask = (brightness \u0026gt; 10) \u0026amp; (brightness \u0026lt; 245) # ... bounding box detection and crop ... max_side = max(img.width, img.height) pad = int(max_side * 0.05) canvas_size = max_side + pad * 2 canvas = Image.new(\u0026#34;RGB\u0026#34;, (canvas_size, canvas_size), (255, 255, 255)) canvas = canvas.resize((512, 512), Image.LANCZOS) canvas.save(image_path) 2. Force 1:1 Ratio from Imagen \u0026ndash; Using the API\u0026rsquo;s aspect_ratio parameter:\nconfig=types.GenerateContentConfig( response_modalities=[\u0026#34;IMAGE\u0026#34;], image_config=types.ImageConfig(aspect_ratio=\u0026#34;1:1\u0026#34;), ) 3. Prevent Character Duplication \u0026ndash; Adding explicit instructions to the prompt:\n\u0026#34;Draw exactly ONE character, centered and filling the frame. Do NOT create multiple copies, sticker sheets, or sprite sheets.\u0026#34; Frontend Progress UX Improvements Background Generating 24 emojis takes several minutes, but the existing UI showed status only through a simple progress bar and small gray dots. There was no way to tell which emoji was at which stage.\nImplementation The backend was already providing data that the UI wasn\u0026rsquo;t utilizing:\nEmojiResult\u0026rsquo;s per-emoji status (generating_pose, animating, processing, done, failed) EmojiResult\u0026rsquo;s action name (happy, laugh, cry\u0026hellip;) The ProgressTracker component was completely rewritten:\nStage pipeline \u0026ndash; A 3-stage (Poses / Animation / Processing) mini stepper visualizing the current position Emoji grid \u0026ndash; Each emoji displayed with name + status icon + colored border. Active emojis have a pulse animation Elapsed time \u0026ndash; Real-time timer in the upper right Localized stage labels \u0026ndash; \u0026ldquo;Generating character poses\u0026rdquo; instead of generating_poses Docker Environment Issues Background Docker-related issues came up repeatedly during development.\nTroubleshooting Favicon Not Showing \u0026ndash; I didn\u0026rsquo;t know that Next.js App Router\u0026rsquo;s app/favicon.ico takes priority over public/favicon.ico. Even after replacing app/favicon.ico, the Docker container was using the previous build so the change wasn\u0026rsquo;t reflected.\n# Container rebuild required docker compose build frontend \u0026amp;\u0026amp; docker compose up -d frontend API Key Contamination \u0026ndash; The POPCON_GOOGLE_API_KEY value in the .env file had venv appended to the end. This was a copy-paste mistake, but since the error message only said 400 INVALID_ARGUMENT: API key not valid, it took time to identify the root cause.\nWorker Restart Oversight \u0026ndash; docker compose restart doesn\u0026rsquo;t detect image changes. Since the worker uses the same image as the backend, building only the backend means the worker should also use the new image, but compose may not detect this. The --force-recreate flag was needed.\nEdge Artifacts in VEO Videos Background Black lines appeared at the left and right edges of generated emojis. VEO 3.1 appeared to be leaving artifacts at video boundaries during generation.\nImplementation An ffmpeg crop filter was added to trim 2% from the video edges:\n# Before \u0026#34;-vf\u0026#34;, f\u0026#34;fps={fps}\u0026#34;, # After — 2% edge crop before frame extraction \u0026#34;-vf\u0026#34;, f\u0026#34;crop=in_w*0.96:in_h*0.96:in_w*0.02:in_h*0.02,fps={fps}\u0026#34;, Commit Log Message Changes docs: add design spec and implementation plan New feat: project scaffolding with config and LINE emoji constants +186 feat: add Pydantic models for job status, emoji results, and action presets +109 feat: add 24 default emoji action presets with prompt templates +219 feat: add frame processor for resize, bg removal, and frame extraction +204 fix: use rembg[cpu] for onnxruntime backend +1 -1 feat: add APNG builder with iterative compression strategy +195 feat: add ZIP packager with tab image generation +96 feat: add Nano Banana 2 pose generator with subject consistency +107 feat: add VEO 3.1 animator with dual-frame I2V support +98 feat: add Celery worker with 3-stage emoji generation pipeline +251 feat: add FastAPI routes for emoji generation, status, preview, and download +186 feat: add Docker Compose setup for backend, worker, and Redis +34 feat: scaffold Next.js frontend with PopCon brand colors +6890 feat: add frontend components, editor flow, and landing page +887 -58 docs: add bilingual English/Korean README +307 fix: align frontend API URLs with backend routes +23 -11 fix: replace in-memory JOB_STORE with Redis-backed job store +175 -58 fix: correct character image URL double-prefixing and align status types +11 -2 fix: use config.last_frame instead of end_image for VEO 3.1 dual-frame API +16 -8 fix: use correct VEO model ID veo-3.1-generate-preview +1 -1 fix: set VEO duration to 4s (API minimum), trim in post-processing +1 -1 fix: disable last_frame (unsupported on VEO 3.1 preview), use start frame + strong prompt +9 -5 chore: temporarily allow 1 emoji per set for testing +2 -2 fix: download VEO video from URI when video_bytes is None +15 -1 fix: follow redirects when downloading VEO video from URI +1 fix: serve emoji files via API endpoint, convert file paths to URLs +22 -2 Insights Don\u0026rsquo;t trust AI API docs \u0026ndash; test them yourself. VEO 3.1 differed from its documentation on four fronts: model ID, minimum duration, dual-frame support, and response format. Each required a separate fix commit.\nOverlooking process isolation costs more time than it saves. Even though backend and worker in Docker Compose use the same image, they don\u0026rsquo;t share memory. Realizing this took time. Using Redis as the job store from the start would have avoided a 5-file refactoring.\nSometimes it\u0026rsquo;s better to boldly abandon background removal. After trying three rembg variations (u2net -\u0026gt; isnet-general-use -\u0026gt; custom alpha mask), the cleanest approach turned out to be not removing backgrounds at all and instead generating white background images. This also reduced dependencies (onnxruntime) and processing time.\nNegative instructions are key in prompt engineering. \u0026ldquo;Solid white background\u0026rdquo; alone sometimes led AI models to generate checkerboard patterns or gradients. Explicit negations like \u0026quot;NOT transparent, NOT checkerboard pattern\u0026quot; and \u0026quot;Do NOT create multiple copies\u0026quot; were far more effective.\nLINE specs check down to the filename level. I thought matching the API response format and image size would suffice, but there are detailed specs like ZIP internal filenames needing to be 001.png through 040.png. Reading the official guidelines thoroughly before submission is essential.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-popcon-dev1/cover-en.jpg","permalink":"/posts/2026-04-02-popcon-dev1/","title":"PopCon Dev Log #1 - Building an AI Animated Emoji Generator"},{"content":"Overview Based on freeCodeCamp\u0026rsquo;s System Design Concepts Course and Interview Prep, this post covers the essential concepts you need to know for system design interviews and real-world practice. From the physical layer hierarchy of computers to CAP theorem, networking, load balancing, caching, and database strategies \u0026ndash; all the fundamentals needed for designing distributed systems in one place.\ngraph TD A[\"System Design Core Concepts\"] --\u003e B[\"Computer Fundamentals\"] A --\u003e C[\"Design Principles\"] A --\u003e D[\"Networking\"] A --\u003e E[\"Infrastructure\"] A --\u003e F[\"Data\"] B --\u003e B1[\"Disk → RAM → Cache → CPU\"] C --\u003e C1[\"CAP Theorem\"] C --\u003e C2[\"Availability \u0026amp;lt;br/\u0026amp;gt; SLO / SLA\"] C --\u003e C3[\"Throughput vs Latency\"] D --\u003e D1[\"TCP / UDP / DNS\"] D --\u003e D2[\"API: REST / GraphQL / gRPC\"] E --\u003e E1[\"Load Balancer\"] E --\u003e E2[\"Proxy / CDN\"] F --\u003e F1[\"SQL vs NoSQL\"] F --\u003e F2[\"Caching / Sharding / Replication\"] Computer Hardware Layer Hierarchy The starting point of system design is understanding how individual computers work. You need to understand the hierarchy of data storage and access speeds to predict bottlenecks.\nDisk Storage \u0026ndash; Non-volatile storage. Split into HDDs (80-160 MB/s) and SSDs (500-3,500 MB/s). The OS, applications, and user files are stored here.\nRAM \u0026ndash; Volatile memory. Holds variables, intermediate calculations, and runtime stacks for currently running programs. Read/write speeds of 5,000+ MB/s, faster than SSDs.\nCache (L1/L2/L3) \u0026ndash; Megabyte-scale ultra-fast memory. L1 cache access time is on the order of nanoseconds. The CPU looks for data in L1 -\u0026gt; L2 -\u0026gt; L3 -\u0026gt; RAM order.\nCPU \u0026ndash; The brain of the computer. Compilers convert high-level language code into machine code, which the CPU then fetches, decodes, and executes.\nThis hierarchy is the rationale behind caching strategies in system design. Placing frequently accessed data in higher layers dramatically reduces average access time.\nThe Big Picture of Production Architecture graph LR User[\"User\"] --\u003e LB[\"Load Balancer \u0026amp;lt;br/\u0026amp;gt; nginx\"] LB --\u003e S1[\"Server 1\"] LB --\u003e S2[\"Server 2\"] S1 --\u003e DB[\"External Storage\"] S2 --\u003e DB S1 --\u003e LOG[\"Logging / Monitoring\"] S2 --\u003e LOG LOG --\u003e ALERT[\"Alerts \u0026amp;lt;br/\u0026amp;gt; Slack / PagerDuty\"] S1 --\u003e CICD[\"CI/CD \u0026amp;lt;br/\u0026amp;gt; Jenkins / GitHub Actions\"]Key components of a production environment:\nCI/CD Pipeline \u0026ndash; Tools like Jenkins and GitHub Actions automatically deploy code from the repo through tests to production servers. Load Balancer / Reverse Proxy \u0026ndash; Tools like nginx distribute user requests evenly across multiple servers. External Storage \u0026ndash; Databases run on separate servers connected via network, isolated from production servers. Logging / Monitoring \u0026ndash; Tools like PM2 for backends and Sentry for frontends capture errors in real time. Integrating alerts into a Slack channel enables immediate response. The golden rule of debugging: Never debug directly in production. Follow the sequence of reproducing in staging -\u0026gt; fixing -\u0026gt; hotfix rollout.\nCAP Theorem and Design Trade-offs The CAP theorem (Brewer\u0026rsquo;s Theorem), the most important theoretical foundation for distributed system design, states that only two of three properties can be achieved simultaneously.\nProperty Meaning Analogy Consistency All nodes have identical data Google Docs \u0026ndash; one person edits and it\u0026rsquo;s immediately reflected for everyone Availability The system is always responsive A 24/7 online shopping mall Partition Tolerance The system operates despite network partitions In a group chat, if one person disconnects, the rest continue chatting Banking systems choose CP (Consistency + Partition Tolerance). They can temporarily sacrifice availability for financial accuracy. In contrast, social media feeds choose AP (Availability + Partition Tolerance), allowing slight data inconsistencies to ensure the system always responds.\nThe key is finding not the \u0026ldquo;perfect solution\u0026rdquo; but the \u0026ldquo;optimal solution for our use case.\u0026rdquo;\nAvailability and SLO/SLA Availability is a measure of a system\u0026rsquo;s operational performance and reliability. Targeting \u0026ldquo;Five 9\u0026rsquo;s\u0026rdquo; (99.999%) means annual downtime of only about 5 minutes.\nAvailability Annual Allowed Downtime 99.9% ~8.76 hours 99.99% ~52 minutes 99.999% ~5.26 minutes SLO (Service Level Objective) \u0026ndash; Internal performance targets. \u0026ldquo;99.9% of web service requests must respond within 300ms.\u0026rdquo;\nSLA (Service Level Agreement) \u0026ndash; A formal contract with customers. Violating the SLA requires providing refunds or compensation.\nResilience strategies:\nRedundancy \u0026ndash; Keep backup systems on standby at all times Fault Tolerance \u0026ndash; Prepare for unexpected failures or attacks Graceful Degradation \u0026ndash; Maintain core functionality even when some features are unavailable Throughput vs Latency Metric Unit Meaning Server Throughput RPS (Requests/sec) Number of requests a server processes per second Database Throughput QPS (Queries/sec) Number of queries a DB processes per second Data Throughput Bytes/sec Data transfer rate of a network or system Latency ms Response time for a single request Throughput and latency have a trade-off relationship. Increasing throughput via batch processing can increase latency for individual requests. In system design, you need to find the right balance for your use case.\nNetworking Fundamentals \u0026ndash; IP, TCP, UDP, DNS IP Addresses and Packets The foundation of all network communication is IP addresses. IPv4\u0026rsquo;s 32-bit address space (~4 billion addresses) is running out, driving the transition to IPv6 (128-bit). The IP header of a data packet contains sender and receiver addresses, and the application layer uses protocols like HTTP to interpret the data.\nTCP vs UDP TCP (Transmission Control Protocol) \u0026ndash; Connection-oriented, guarantees order, supports retransmission. Suitable for web browsing, file transfers, and email. Establishes connections via a three-way handshake (SYN -\u0026gt; SYN-ACK -\u0026gt; ACK).\nUDP (User Datagram Protocol) \u0026ndash; Connectionless, no order guarantee, fast. Suitable for real-time streaming, gaming, and VoIP. Trades some packet loss for speed.\nDNS (Domain Name System) The internet\u0026rsquo;s phone book that translates human-readable domains (google.com) into IP addresses. Resolution follows: browser cache -\u0026gt; OS cache -\u0026gt; recursive resolver -\u0026gt; root server -\u0026gt; TLD server -\u0026gt; authoritative server.\nAPI Design \u0026ndash; REST, GraphQL, gRPC REST (Representational State Transfer) The most common API style. Manipulates resources using HTTP methods (GET, POST, PUT, DELETE) and URL paths. Each request is independent under the stateless principle.\nGraphQL Allows clients to request exactly the data they need. Solves over-fetching and under-fetching problems, but increases server implementation complexity and makes caching difficult.\ngRPC (Google Remote Procedure Call) A binary protocol using Protocol Buffers. Supports bidirectional streaming over HTTP/2. Offers higher performance than REST for inter-microservice communication.\nFeature REST GraphQL gRPC Data Format JSON JSON Protobuf (binary) Protocol HTTP/1.1 HTTP HTTP/2 Use Case Public APIs Complex queries Service-to-service Streaming Limited Subscription Bidirectional Load Balancing and Proxies Load Balancing Strategies Distributes traffic across multiple servers to prevent overloading a single server.\nRound Robin \u0026ndash; Distributes requests sequentially. The simplest approach. Least Connections \u0026ndash; Routes to the server with the fewest current connections. IP Hash \u0026ndash; Hashes the client IP to always route to the same server. Useful for session persistence. Weighted \u0026ndash; Assigns weights based on server performance. Forward Proxy vs Reverse Proxy Forward Proxy \u0026ndash; Operates on the client side. Hides the user\u0026rsquo;s IP and is used for content filtering and caching. (e.g., VPN)\nReverse Proxy \u0026ndash; Operates on the server side. Hides the actual server\u0026rsquo;s IP and handles load balancing, SSL termination, and caching. (e.g., nginx, HAProxy)\nCaching Strategies graph TD Client[\"Client\"] --\u003e CDN[\"CDN Cache\"] CDN --\u003e LB[\"Load Balancer\"] LB --\u003e APP[\"Application \u0026amp;lt;br/\u0026amp;gt; In-Memory Cache\"] APP --\u003e REDIS[\"Redis / Memcached\"] REDIS --\u003e DB[\"Database\"]Caching can be applied at every layer of the system:\nBrowser Cache \u0026ndash; Stores static assets (CSS, JS, images) on the client CDN \u0026ndash; Caches content on geographically distributed servers to reduce latency Application Cache \u0026ndash; Keeps frequent DB query results in memory using Redis or Memcached DB Query Cache \u0026ndash; Caches identical query results at the database level Cache Invalidation strategies are critical:\nWrite-Through \u0026ndash; Updates cache and DB simultaneously on writes. High consistency but write latency. Write-Back \u0026ndash; Updates cache first, DB later in batches. Fast but risk of data loss. Write-Around \u0026ndash; Writes only to DB, cache is refreshed on reads. Suitable for infrequently read data. Databases \u0026ndash; SQL vs NoSQL, Sharding, Replication SQL vs NoSQL Feature SQL (PostgreSQL, MySQL) NoSQL (MongoDB, Cassandra) Schema Fixed schema, table-based Flexible schema, document/KV/graph Scaling Vertical (Scale Up) Horizontal (Scale Out) Transactions ACID guaranteed BASE (eventual consistency) Best For Relational data, complex joins High-volume unstructured data, fast writes Sharding A horizontal partitioning strategy that distributes data across multiple DB instances. Shard key selection is critical \u0026ndash; uneven distribution (hot spots) concentrates load on specific shards.\nReplication Copies data across multiple nodes to improve read performance and fault tolerance.\nLeader-Follower \u0026ndash; The leader handles writes while followers handle reads Leader-Leader \u0026ndash; All nodes can read and write, but conflict resolution is complex Quick Links System Design Concepts Course and Interview Prep \u0026ndash; Full freeCodeCamp course Insights The essence of system design is \u0026ldquo;trade-offs.\u0026rdquo; Just as the CAP theorem lets you choose only two, every design decision is about what you gain and what you give up. Increasing throughput raises latency; strengthening consistency reduces availability. Good system designers don\u0026rsquo;t memorize correct answers \u0026ndash; they develop the ability to find the optimal compromise for their use case. While this course covers a broad range, the biggest lesson is that each concept doesn\u0026rsquo;t stand alone but interlocks with the others. CDN is an extension of caching, sharding is a practical application of the CAP theorem, and load balancing sits at the intersection of availability and scalability.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-system-design-concepts/cover-en.jpg","permalink":"/posts/2026-04-02-system-design-concepts/","title":"System Design Core Concepts - From Interview Prep to Production Architecture"},{"content":"Overview Previous Post: Trading Agent Dev Log #7 covered agent settings UI and signal card improvements. In this #8 installment, the single min_rr_score gate was replaced with a 5-factor Composite Score system, along with building a new stock research page, sell validation logic, and project rebranding — a major overhaul spanning 41 commits.\n1. Stock Research Page (Stock Info) Problem Even when a signal was generated, there was no page to view the stock\u0026rsquo;s fundamental analysis, technical indicators, and institutional flow at a glance. Having to open an external brokerage HTS every time caused delays in decision-making.\nImplementation A /api/research router was added to the backend with 5 endpoints (basic info, financials, technical indicators, institutional flow, news/disclosures). The frontend was split into 9 section components.\n// frontend/src/components/stockinfo/ structure DiscoverySidebar.tsx // Stock search sidebar ResearchHeader.tsx // Stock basic info header PriceChartSection.tsx // Candlestick chart + technical indicators FundamentalsSection.tsx // Key financial metrics ValuationSection.tsx // Valuation comparison InvestorFlowSection.tsx // Foreign/institutional flow PeerSection.tsx // Industry peer comparison InsiderSection.tsx // Insider trading SignalHistorySection.tsx // Past signal history Charts were upgraded from simple line charts to candlestick + volume + moving averages (MA) + Bollinger Bands (BB) overlays, and technical indicators are displayed as mini-chart 2x2 grid cards for RSI, MACD, Bollinger Bands, and volume trends respectively.\n2. Signal Pipeline Improvements — Linear Confidence and Sell Validation From Sigmoid to Linear Mapping The existing sigmoid-based compute_confidence had a \u0026ldquo;dead zone\u0026rdquo; where R/R scores between 0.5 and 1.5 produced nearly identical confidence values. This was replaced with linear mapping and the min_rr_score threshold was lowered to 0.3 to capture a wider range of signals.\nSell (SELL) Validation Logic A problem was discovered where SELL signals were generated for stocks not held in the portfolio. Two hard gates were added:\nRisk Manager: Reject SELL for unheld stocks + minimum hold time validation Market Scanner: Force-convert SELL to HOLD for unheld stocks SELL/HOLD direction rules were also explicitly added to the expert panel prompts so that the Chief Analyst gives opinions while being aware of the current holdings.\n3. Multi-Factor Composite Score System This is the core change of this installment. Signal filtering that relied on a single R/R score was completely replaced with a weighted sum of 5 independent factors.\nflowchart LR subgraph 5-Factors[\"5 Sub-Scores\"] A[\"R/R Ratio\u0026lt;br/\u0026gt;(Risk-Reward)\"] B[\"Expert Consensus\u0026lt;br/\u0026gt;(Agreement Level)\"] C[\"Fundamental\u0026lt;br/\u0026gt;(PER, ROE, etc.)\"] D[\"Technical Momentum\u0026lt;br/\u0026gt;(RSI, MACD, Volume)\"] E[\"Institutional Flow\u0026lt;br/\u0026gt;(Foreign/Institutional)\"] end subgraph Weights[\"Weight Normalization\"] W[\"normalize_weights()\"] end subgraph Quality[\"Data Quality\"] Q[\"confidence_grades\u0026lt;br/\u0026gt;A=1.0 B=0.85 C=0.6 D=0.3\"] end A --\u003e W B --\u003e W C --\u003e W D --\u003e W E --\u003e W W --\u003e|\"weighted sum\"| R[\"raw score\"] R --\u003e|\"x quality\"| F[\"Composite Score\u0026lt;br/\u0026gt;(0~100)\"] Q --\u003e FSub-Score Design Each factor is normalized to a 0-1 range, returning a default of 0.5 (neutral) when data is unavailable.\n# backend/app/models/composite_score.py def score_fundamental( per: float | None = None, roe: float | None = None, debt_ratio: float | None = None, operating_margin: float | None = None, ) -\u0026gt; float: \u0026#34;\u0026#34;\u0026#34;Normalize each metric independently to 0-1 and return the average. Missing metrics are excluded from the calculation.\u0026#34;\u0026#34;\u0026#34; components: list[float] = [] if per is not None and per \u0026gt; 0: components.append(min(max(1.0 - per / 40.0, 0.0), 1.0)) if roe is not None: components.append(min(max(roe / 30.0, 0.0), 1.0)) # ... debt_ratio, operating_margin follow the same pattern return sum(components) / len(components) if components else 0.5 The institutional/foreign flow score uses sigmoid normalization. The net purchase total is divided by a base amount (default 1 billion KRW) and mapped to the -1 to 1 range.\ndef score_institutional_flow( foreign_net: float = 0, institution_net: float = 0, scale: float = 1_000_000_000, # 1 billion KRW ) -\u0026gt; float: combined = foreign_net + institution_net return 1.0 / (1.0 + math.exp(-combined / scale)) Weight Normalization and Aggregation User-configured weights are automatically normalized to sum to 1.0. The final score is computed by multiplying the weighted sum by a data quality multiplier and converting to a 0-100 scale.\ndef compute_composite_score( rr_score: float, calibration_ceiling: float = 2.0, expert_analyses: list[dict] | None = None, dart_financials: dict | None = None, technicals: dict | None = None, investor_trend: dict | None = None, confidence_grades: dict[str, str] | None = None, weights: dict[str, float] | None = None, ) -\u0026gt; float: w = normalize_weights(weights) if weights else dict(DEFAULT_WEIGHTS) # ... compute 5 sub-scores ... raw = ( w[\u0026#34;rr_ratio\u0026#34;] * rr_sub + w[\u0026#34;expert_consensus\u0026#34;] * expert_sub + w[\u0026#34;fundamental\u0026#34;] * fundamental_sub + w[\u0026#34;technical\u0026#34;] * technical_sub + w[\u0026#34;institutional\u0026#34;] * institutional_sub ) quality = compute_data_quality_multiplier(confidence_grades or {}) return min(max(raw * quality * 100, 0.0), 100.0) Data Quality Multiplier Each expert is assigned a data reliability grade (A/B/C/D), and the average grade is applied as a multiplier. When data quality is low, the composite score is automatically discounted.\nGrade Multiplier A 1.00 B 0.85 C 0.60 D 0.30 4. UI Sliders and DB Migration Weight Adjustment UI Five per-factor weight sliders were added to the Settings page. As users move the sliders, the normalized proportions are displayed in real time. The existing min_rr_score slider was replaced with min_composite_score, with a default threshold of 15%.\nFull-Stack Migration Changing from min_rr_score to min_composite_score required modifications across all of the following layers.\nLayer File Changes Scoring module composite_score.py 5 sub-scores + aggregation functions (new) Scanner market_scanner.py Remove compute_confidence, connect composite score Risk manager risk_manager.py Change gate criteria API router agents.py Add weight fields, rename fields Frontend types types.ts Add 5 weight fields Settings UI SettingsView.tsx Add 5 sliders DB trading.db Column rename + weight default inserts 5. Other Improvements Alpha Pulse Rebranding The project was renamed from \u0026ldquo;KIS Trading\u0026rdquo; to Alpha Pulse. All branding assets including favicon, manifest, header bar, and app title were replaced.\nInfrastructure Fixes APScheduler cron day-of-week conversion: Fixed schedule tasks to run on the correct day by converting between standard cron (0=Sun) and APScheduler (0=Mon) day-of-week indices uvicorn WebSocket: Switched from websockets to wsproto implementation to resolve DeprecationWarning Schedule sorting: Sorted schedule task list in ascending order by cron time (hour:minute) Expert Panel Enhancements Analysis quality was improved by providing each expert with additional data including investor flow trends, DART disclosure summaries, and specialty-specific confidence grades.\nCommit Log Date Description Category 03-24 Sort schedule tasks by cron time UI 03-25 Make agent settings configurable + signal card UI improvements feat 03-25 Update CLAUDE.md multi-agent system documentation docs 03-30 Switch uvicorn WebSocket to wsproto fix 03-30 Fix APScheduler cron day-of-week conversion fix 03-30 Stock Info page design doc + implementation plan docs 03-31 Add technical_service module (reusable indicator calculations) feat 03-31 Add research types + API functions feat 03-31 /api/research router (5 endpoints) feat 03-31 9 stockinfo section components + DiscoverySidebar feat 03-31 InsiderSection, SignalHistorySection components feat 03-31 ResearchPanel, StockInfoView, CSS completion feat 03-31 Connect StockInfoView to app navigation feat 03-31 Resolve lint errors and complete stockinfo components fix 03-31 Fix verbatimModuleSyntax compatible import type fix 03-31 Return search results + prevent stale state on stock switch fix 03-31 Signal pipeline fix design doc docs 03-31 Candlestick chart + volume, MA, BB overlays feat 03-31 Separate technical indicator mini chart cards feat 03-31 Add compute_confidence linear mapping function feat 03-31 Replace sigmoid with linear confidence, min_rr_score 0.3 feat 03-31 Technical indicator cards 2x2 grid layout feat 03-31 SELL validation — reject unheld stocks + minimum hold time feat 03-31 Force-convert unheld stock SELL to HOLD feat 03-31 Add SELL/HOLD direction rules to Chief Analyst prompt feat 03-31 Enhance expert data — flow, DART, confidence grades feat 03-31 Adjust price/volume chart area spacing fix 03-31 Calibration ceiling slider, min hold time input feat 03-31 Fix missing RSI gauge CSS fix 03-31 Multi-factor composite score design doc (Approach C) docs 03-31 Multi-factor composite score implementation plan docs 03-31 Rebrand from KIS Trading to Alpha Pulse feat 04-01 5 sub-score functions + data quality multiplier feat 04-01 compute_composite_score + weight normalization feat 04-01 Connect composite score to pipeline, remove compute_confidence feat 04-01 Change min_rr_score gate to min_composite_score (15%) feat 04-01 Add weight fields to API router feat 04-01 Add weight fields to frontend types feat 04-01 Weight slider UI, replace min_composite_score feat 04-01 DB migration — column rename + weight defaults feat Insights Limitations of a single metric: Filtering trade signals with min_rr_score alone makes it impossible to distinguish stocks with high R/R but weak fundamentals, or stocks with good institutional flow but negative technical indicators. Transitioning to a multi-factor system allows evaluating each dimension independently and combining them via weighted sum. Users can adjust weights through sliders to tune according to their investment style (fundamental-focused vs. momentum-focused).\nThe value of reflecting data quality in scores: Not all factor data is of equal quality. For stocks with outdated DART disclosures, or stocks with low volume where technical indicators are unstable, a high composite score may not actually be reliable. Introducing a data quality multiplier to distinguish between \u0026ldquo;70 points calculated from good data\u0026rdquo; and \u0026ldquo;70 points calculated from poor data\u0026rdquo; was the core of this design.\nThe cost of field name changes that cut across the entire stack: Renaming a single min_rr_score to min_composite_score required modifications across 7 layers — DB, backend models, API router, frontend types, and UI components. Using more generic naming in the initial design could have reduced this cost.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-trading-agent-dev8/cover-en.jpg","permalink":"/posts/2026-04-02-trading-agent-dev8/","title":"Trading Agent Dev Log #8 — Multi-Factor Weighted Score System and Composite Score Migration"},{"content":"Overview Today we cover three interesting topics. First, Google\u0026rsquo;s TurboQuant research, a KV Cache quantization technique that can expand local LLM context windows by up to 6x on the same hardware. Next, we look at the Korean AI character chat platform Plit, and finally, we analyze a workflow for rapidly building premium websites with 3D animations using the Claude Code + Nano Banana 2 combination.\nTurboQuant - A Game Changer for Local AI What Is the KV Cache Problem? The biggest bottleneck when running LLMs locally is the KV Cache (Key-Value Cache). The KV Cache is the memory region that stores conversation history, and it consumes increasingly more GPU/NPU RAM as chats get longer. Since the model itself also occupies memory, context windows are realistically limited to 8K-16K tokens on consumer hardware (8-32GB RAM).\nAnythingLLM founder Timothy Carabatsos explains the practical impact of this problem:\nWith an 8K context window, you can\u0026rsquo;t even summarize a single YouTube podcast. With 16K, you barely can, but other tasks on the system may stall. At 32K, these tasks become trivial.\nThe Core of TurboQuant Google\u0026rsquo;s TurboQuant research quantizes the KV Cache to store approximately 6x more tokens in the same memory space. Benchmarks confirm that memory usage decreases by roughly 4x compared to F16 (the conventional approach).\nflowchart LR A[\"Conventional Local LLM\u0026lt;br/\u0026gt;8K context\"] --\u003e|\"TurboQuant\u0026lt;br/\u0026gt;KV Cache quantization\"| B[\"Improved Local LLM\u0026lt;br/\u0026gt;32K~48K context\"] B --\u003e C[\"Podcast summarization\"] B --\u003e D[\"Document analysis\"] B --\u003e E[\"Complex agent workflows\"] F[\"RAM usage\u0026lt;br/\u0026gt;F16 baseline\"] --\u003e|\"~4x reduction\"| G[\"RAM usage\u0026lt;br/\u0026gt;with TurboQuant\"]Practical Implications Work is currently underway to merge TurboQuant into llama.cpp. Since llama.cpp is the de facto standard for local model execution, once this integration is complete, it will immediately benefit most local AI tools.\nThis is especially significant given the recent surge in DDR5 memory prices, making TurboQuant\u0026rsquo;s ability to maximize existing hardware utilization all the more valuable. For a 7B model:\nItem Before After TurboQuant Context window 8K tokens 32K+ tokens KV Cache memory 100% ~25% Podcast summarization Not possible Possible Complex workflows Limited Practical Cloud models will still have the edge for million-token-scale tasks, but this could be a turning point where a significant portion of everyday AI tasks become feasible locally.\nPlit - AI Character Chat Platform Service Overview Plit is an AI character chat platform developed by the Korean startup Pius. Currently in beta testing, it offers three core features:\nCharacter Chat \u0026ndash; 1:1 conversations with AI characters Talk Rooms \u0026ndash; Themed open conversation spaces Stories \u0026ndash; Branching interactive stories Its positioning is similar to overseas services like Character.ai and Janitor AI, but its differentiation lies in being optimized for Korean. Under the slogan \u0026ldquo;Start chatting with your own AI character,\u0026rdquo; it features a structure for exploring popular and new characters.\nTrends in the AI Character Chat Market AI character chat platforms are a rapidly growing space worldwide. Following Character.ai\u0026rsquo;s explosive growth, various competing services have emerged, and Plit can be seen as an entry targeting the Korean market. The branching story feature is noteworthy for attempting to expand beyond simple chat into interactive content.\nClaude Code + Nano Banana 2 - One-Shot Premium Website Creation Full Workflow Overview This workflow, introduced by Jack Roberts who runs an AI automation business, centers on the idea that you can create a mobile-responsive, SEO-optimized, premium website with 3D animations even without coding experience.\nflowchart TD S1[\"Step 1: Brand Extraction\u0026lt;br/\u0026gt;Scrape existing site with Firecrawl\"] --\u003e S2[\"Step 2: Image Generation\u0026lt;br/\u0026gt;Create 3D assets with Nano Banana 2\"] S2 --\u003e S3[\"Step 3: Video Transition\u0026lt;br/\u0026gt;Generate animation from start/end frames\"] S3 --\u003e S4[\"Step 4: Website Build\u0026lt;br/\u0026gt;Generate HTML with Claude Code + Skills\"] S4 --\u003e S5[\"Step 5: Deploy\u0026lt;br/\u0026gt;Connect domain and host\"] T1[\"Firecrawl API\"] -.-\u003e S1 T2[\"Nano Banana 2\u0026lt;br/\u0026gt;(16x9, 2K+)\"] -.-\u003e S2 T3[\"Kling 3.0\u0026lt;br/\u0026gt;(video generation)\"] -.-\u003e S3 T4[\"Claude Code\u0026lt;br/\u0026gt;+ 3D Website Skill\"] -.-\u003e S4Detailed 5-Step Process Step 1 \u0026ndash; Brand Extraction: Use Firecrawl\u0026rsquo;s branding scraping feature to automatically extract colors, logos, and brand assets from the target website. Large-scale automation is also possible via the API.\nStep 2 \u0026ndash; 3D Asset Generation: Generate images in Nano Banana 2 at 16x9 ratio, minimum 2K resolution. Key tips are specifying a clean white background and running at least 4 iterations to select the best results. 1K resolution is insufficient, so always use 2K or higher.\nStep 3 \u0026ndash; Scroll Animation Video: Feed two images \u0026ndash; the assembled state (start frame) and the disassembled state (end frame) \u0026ndash; into a video generation tool like Kling 3.0 to create a transition video. Previously, you would have needed to manually create hundreds of frames, but now just two images are enough.\nStep 4 \u0026ndash; Build Website with Claude Code: Use Claude Code\u0026rsquo;s skill system (/skillcreator) to install 3D Website Builder and Asset Generation skills, then automatically generate an HTML website integrating the created assets. Activating \u0026ldquo;edit automatically\u0026rdquo; mode with the shift shortcut makes the process even faster.\nStep 5 \u0026ndash; Reference-Based Refinement: Provide the HTML structure of an existing website as a reference to further refine the layout and design.\nKey Insights The most notable aspect of this workflow is the toolchain combination. Individual tools (Firecrawl, Nano Banana, Claude Code) each serve specific roles, but when connected through the skill system, they become a single automation pipeline. Jack Roberts mentions he has been selling websites worth thousands of dollars using this approach.\nQuick Links Topic Link TurboQuant Explainer (AnythingLLM) YouTube Plit Official Site plit.io Claude Code + Nano Banana 2 Website Creation YouTube AnythingLLM Official Site anythingllm.com Firecrawl Developer Tools firecrawl.dev Insights Local AI is becoming practical at a rapid pace. TurboQuant is not just academic research \u0026ndash; through llama.cpp integration, it will meaningfully expand the range of AI tasks possible on consumer hardware. The context expansion from 8K to 32K transforms local models from \u0026ldquo;good for a few exchanges\u0026rdquo; to \u0026ldquo;capable of document analysis and agent workflows.\u0026rdquo;\nLocalization is key in the AI character chat market. Plit\u0026rsquo;s strategic choice to start as a Korean-specialized service during its beta phase targets the gap where Character.ai\u0026rsquo;s English-centric service cannot perfectly handle Korean nuances.\nThe paradigm of website creation is shifting. What the Nano Banana 2 workflow demonstrates is that the traditional flow of \u0026ldquo;code -\u0026gt; design -\u0026gt; deploy\u0026rdquo; can be replaced with \u0026ldquo;brand extraction -\u0026gt; asset generation -\u0026gt; AI build.\u0026rdquo; Claude Code\u0026rsquo;s skill system in particular opens up the possibility of automating repetitive website creation at scale. For freelancers and agencies, this could represent a qualitative transformation in productivity.\n","date":"2026-04-02T00:00:00+09:00","image":"/images/posts/2026-04-02-turboquant-plit/cover-en.jpg","permalink":"/posts/2026-04-02-turboquant-plit/","title":"TurboQuant, Plit, and Nano Banana 2 - From Local AI Quantization to AI Website Creation"},{"content":"Google NotebookLM makes \u0026ldquo;no-code RAG\u0026rdquo; a practical reality. Upload your documents, videos, and URLs as sources, and you get a custom AI assistant that answers only from that data. Based on a hands-on tutorial from a creator with over a year of daily usage, this guide covers 12 practical applications and the data preparation techniques that make them work.\nWhat Is NotebookLM? \u0026ldquo;No-Code RAG\u0026rdquo; — A Custom AI Built on Your Own Data RAG (Retrieval-Augmented Generation) is a technique that grounds LLM responses by searching external data. Traditionally, implementing RAG meant chunking documents, generating embedding vectors, storing them in a vector database, and building a retrieval pipeline. NotebookLM reduces all of this to a single UI interaction. You upload sources; Google handles the RAG pipeline internally.\nThe core feature is that it answers only from the data you provide. Asking ChatGPT or Gemini a question generates a response from everything in its training data. NotebookLM answers only within the scope of the sources you have uploaded. This is a structural solution to the hallucination problem.\nChatGPT Hallucination and Source-Grounded Answers The classic example is asking ChatGPT about an event that never happened — it will produce a plausible-sounding narrative about a completely fabricated story. This is hallucination: an LLM confidently generating content that is not factually true by recombining patterns from its training data.\nNotebookLM blocks this at the source level. If information is not in the sources, it says so. Every response includes source citation numbers so you can immediately verify the origin. For work where accuracy matters — business reports, academic paper reviews — this difference is decisive.\nFrom Prompt Engineering to Data Engineering The previous paradigm of AI use was \u0026ldquo;how to ask well\u0026rdquo; — prompt engineering. NotebookLM changes the paradigm. \u0026ldquo;What data to include\u0026rdquo; now determines answer quality. Good sources matter more than a good prompt. This can be called data engineering, and it is the core skill for using NotebookLM effectively.\nflowchart LR subgraph 기존방식[\"Traditional AI Use\"] direction TB A[\"General-purpose LLM \u0026lt;br/\u0026gt; ChatGPT, Gemini\"] B[\"Prompt engineering \u0026lt;br/\u0026gt; Must ask well\"] C[\"Hallucination risk \u0026lt;br/\u0026gt; Unclear sourcing\"] A --\u003e B --\u003e C end subgraph 새방식[\"NotebookLM Approach\"] direction TB D[\"Upload sources \u0026lt;br/\u0026gt; PDF, URL, video, etc.\"] E[\"Data engineering \u0026lt;br/\u0026gt; Curate good sources\"] F[\"Source-grounded answers \u0026lt;br/\u0026gt; With citation numbers\"] D --\u003e E --\u003e F end 기존방식 -- \"Paradigm shift\" --\u003e 새방식Data Preparation Is Everything Supported Source Types NotebookLM supports a variety of source formats:\nText documents: Google Docs, copy-pasted text PDF files: Research papers, reports, contracts URLs: Web pages, blog posts Video: YouTube videos (analyzed via captions) Images: Screenshots, charts (OCR-based) Audio: Recorded files, podcasts YouTube videos are especially powerful — NotebookLM automatically extracts captions for analysis. A one-hour lecture becomes a searchable source from just a URL.\nAuto-Collecting Sources with Deep Research NotebookLM has Deep Research built in. For a given topic, it searches the web and automatically adds relevant sources to your notebook. Two modes are available:\nQuick search: Quickly finds related sources by keyword. Good for simple research. Deep research mode: Cross-analyzes multiple sources for in-depth investigation. Useful for complex topics like \u0026ldquo;2026 semiconductor industry outlook.\u0026rdquo; Auto-collected sources are added to the notebook automatically, saving you from hunting for URLs manually. That said, you should always cross-verify the reliability of auto-collected sources.\nSource Limits and Management Free plan: Up to 50 sources per notebook Pro plan: Up to 300 sources per notebook 50 sources is sufficient for most work purposes. The key is source quality, not quantity. Too many irrelevant sources actually degrades answer quality.\nData Curation Process (Based on 1+ Year of Use) The video presenter shares a data curation process developed through over a year of daily NotebookLM use:\nDefine the topic clearly: One notebook, one topic. Not \u0026ldquo;AI in general\u0026rdquo; but \u0026ldquo;2026 generative AI market outlook.\u0026rdquo; Curate reliable sources: Prioritize papers, official reports, and primary sources over blog posts. Remove duplicates: If multiple sources cover the same content, keep the most comprehensive one. Use notes: Summarizing key points from sources into notes enriches the context for subsequent queries. 12 Practical Applications 1. Custom AI Assistant (Handbook-Based) Upload company handbooks, internal policies, and standard operating procedures (SOPs) to create an organization-specific AI assistant. A new employee who asks \u0026ldquo;what is the process for filing travel expenses?\u0026rdquo; gets an accurate step-by-step answer sourced from the policy document.\nPreviously this meant asking a colleague or searching the intranet. With NotebookLM and your manuals loaded, you have a 24/7 instant-answer internal helper. The impact is largest in teams that repeatedly answer the same questions (HR, IT helpdesk).\nA practical tip: add a FAQ or collection of frequently asked questions alongside the manual. This covers edge cases that are not explicitly documented.\n2. Deep Research Reports For complex reports on topics like \u0026ldquo;2026 economic and industry outlook,\u0026rdquo; use Deep Research to auto-collect sources, then request analysis. Load central bank reports, think tank publications, and major investment bank research — then ask \u0026ldquo;compare and analyze the top three risk factors.\u0026rdquo; The result is a citation-backed breakdown from each source\u0026rsquo;s perspective.\nReport writing time shrinks from days to hours. The key point is that NotebookLM is not \u0026ldquo;writing the report for you\u0026rdquo; — it is giving you a structural framework for analysis. Final judgment and context remain human responsibilities.\n3. Cross-Source Verification Load 3-5 sources on a single claim and ask \u0026ldquo;compare each source\u0026rsquo;s position on this argument.\u0026rdquo; The result is a breakdown by agree/disagree/conditional agreement. For example, load a McKinsey report, an OECD paper, and academic studies on \u0026ldquo;will AI reduce jobs?\u0026rdquo; and instantly see where the perspectives diverge.\nParticularly valuable for fact-checking and early-stage research. You can quickly determine whether all sources reach the same conclusion or whether there are conflicting interpretations, which deepens the quality of analysis.\n4. Meeting Notes and Recording Analysis Upload meeting recordings or auto-generated transcripts to extract not just summaries but action items, decisions made, and unresolved issues. Specific queries like \u0026ldquo;list the tasks the team lead committed to in this meeting\u0026rdquo; are supported.\nTeams with heavy meeting schedules can accumulate weekly notes and track \u0026ldquo;what was decided in the past month that has not been completed yet.\u0026rdquo; The value of the notebook compounds as meeting records accumulate.\n5. Paper Review and Comparative Analysis Upload several related papers and ask \u0026ldquo;compare the research methodology and conclusions of each paper.\u0026rdquo; The result is a systematic comparison table. This dramatically reduces literature review time for graduate students and researchers.\nA particularly useful feature is citation tracking. Ask \u0026ldquo;is the key argument in paper A corroborated in other sources?\u0026rdquo; and see cross-verification results with source numbers. Reading papers in the context of other related work improves comprehension.\n6. Study Guides and Auto-Generated Quizzes Upload a textbook or course materials and ask \u0026ldquo;create a 20-question quiz based on this content.\u0026rdquo; The result includes multiple choice, short answer, and true/false questions. Each answer includes an explanation with a source citation showing where in the material it comes from.\nUseful not just for exam prep but also for producing team training materials. Generate comprehension quizzes from new employee onboarding materials to significantly reduce the burden on whoever runs training. The study guide feature also works in summary mode: \u0026ldquo;extract the 10 core concepts from this material and explain each in one paragraph.\u0026rdquo;\n7. Audio Overview (Podcast Conversion) One of NotebookLM\u0026rsquo;s signature features. Upload sources and two AI hosts generate an audio podcast where they discuss and explain the content. Even a dense report becomes something you can absorb on your commute.\nAvailable in multiple languages, and the conversational format makes it more accessible than dry formal writing. For long documents that the whole team needs to read, generating and sharing an audio overview increases the rate of actual consumption.\n8. Contract and Legal Document Analysis Upload a contract and ask \u0026ldquo;find clauses that disadvantage the first party\u0026rdquo; or \u0026ldquo;summarize the penalty clauses.\u0026rdquo; Since NotebookLM answers only from the sources, it cannot fabricate clauses that do not exist.\nIdeal as a first-pass filter for non-lawyers reviewing contracts. Of course, final legal review should go to a professional — but the time needed to figure out \u0026ldquo;where to focus attention\u0026rdquo; is substantially reduced. Uploading multiple contracts to compare terms side by side is also possible.\n9. Competitive Analysis Matrix Upload competitor IR materials, news articles, and industry reports, then ask \u0026ldquo;create a comparison matrix of competitors A, B, and C by revenue, core products, and market share.\u0026rdquo; The result is a structured comparison table.\nUseful for business planning and strategy meeting preparation. Uploading English-language competitor materials and receiving analysis in your preferred language eliminates translation overhead. Update sources quarterly and it doubles as a dashboard tracking competitive landscape changes.\n10. Resume and Cover Letter Writing Upload a job listing alongside your career history as sources, then ask \u0026ldquo;draft a cover letter tailored to this listing.\u0026rdquo; Because it is source-based, NotebookLM will not fabricate experience — it reframes your actual history to match the posting\u0026rsquo;s requirements.\nUpload multiple job listings simultaneously and ask \u0026ldquo;what competencies do positions A and B both require?\u0026rdquo; for cross-analysis. Useful for identifying the overlap between existing experience and a new field when planning a career transition.\n11. Blog and Content Planning For content creators, NotebookLM acts as a research assistant. Load reference materials, competitive content, and keyword research results, then ask \u0026ldquo;suggest an outline for a blog post on this topic.\u0026rdquo; You get a structured outline grounded in your collected sources.\nThe critical distinction from ChatGPT is that content planning comes from the specific materials you gathered, not generic knowledge — which means more differentiated perspectives. Add SEO analysis data and you can structure content around search intent.\n12. Project Documentation Collect a project\u0026rsquo;s specifications, meeting notes, technical documents, and email threads as sources to create an AI that understands the full project context. Ask \u0026ldquo;summarize the major milestones and current status of this project\u0026rdquo; to get a unified view of scattered information.\nUseful for generating handover documents during team transitions, organizing retrospective materials, and producing stakeholder-ready summaries. Development teams can load PRDs, technical specs, and API docs to keep project context centralized in one place.\nFree vs Pro Why Free Is Often Enough NotebookLM\u0026rsquo;s free plan is remarkably generous. The core features — source-based Q\u0026amp;A, audio overview, and note generation — are all available for free. The 50-source limit is sufficient for handling one project or topic. Individual users and small teams can execute all 12 applications above on the free plan.\nWhat Pro Adds Included in Google One AI Premium or Workspace subscriptions:\nFeature Free Pro Sources per notebook 50 300 Audio overview Basic Custom instructions Deep Research Limited Extended use Response quality Gemini base Gemini advanced Team sharing Limited Team collaboration Real Scenario: Building a 2026 Economic Outlook Report The flow demonstrated in the video:\nAuto-collect sources on \u0026ldquo;2026 economic outlook\u0026rdquo; via Deep Research Filter to only high-reliability sources (central bank, KDI, major investment banks) Query: \u0026ldquo;compare and analyze the outlook by key economic indicator\u0026rdquo; Request sector-specific analysis (semiconductors, automotive, biotech) Generate audio overview in podcast format Request final report draft The entire process completes in 2-3 hours. The same work done manually would take 2-3 days.\nA Developer\u0026rsquo;s Perspective on NotebookLM Comparison with RAG: Same Effect Without Chunking/Embedding To accurately appreciate NotebookLM\u0026rsquo;s value from a developer\u0026rsquo;s perspective, it helps to have built a RAG pipeline from scratch. A standard RAG implementation requires:\nDocument loading and preprocessing Text chunking (with overlap) Embedding model for vector conversion Vector DB storage (Pinecone, Chroma, etc.) Similarity search at query time Combine retrieved chunks + original query → LLM prompt Generate and post-process answer NotebookLM compresses these 7 steps into 2: upload source → ask question. No thinking required about chunking strategy, embedding model selection, or vector DB operation. Google runs an optimized pipeline internally.\nNotebookLM Has Democratized RAG for Non-Developers NotebookLM\u0026rsquo;s real innovation is not technical brilliance — it is accessibility. Marketers, planners, researchers, and others without coding skills can now build RAG-quality AI from their own data. Previously, creating \u0026ldquo;a chatbot trained on our company\u0026rsquo;s documents\u0026rdquo; required a request to the engineering team. Now anyone can do it in five minutes.\nThis parallels how spreadsheets democratized data analysis. Making expert-level tasks accessible to everyone — that is the position NotebookLM occupies in the AI ecosystem.\nPotential as a Project Documentation Hub An interesting scenario for development teams is using NotebookLM as a project documentation hub. Load PRDs, technical specs, API docs, and Architecture Decision Records (ADRs) as sources. When a new team member asks \u0026ldquo;why did we choose Redis for this project?\u0026rdquo;, it cites the relevant ADR in its answer.\nThere are limits, though. NotebookLM is not well-suited for source-loading raw code, and it does not sync in real time. The potential as a document-based project context management tool is real, but it plays a different role from codebase analysis tools like Cursor or Claude Code.\nTakeaways The biggest shift I notice using NotebookLM is that the model of interaction with AI itself changes. We are moving from an era of studying \u0026ldquo;how to ask good questions\u0026rdquo; to one where \u0026ldquo;how to curate good data\u0026rdquo; is the core skill.\nThis shift has an important implication. Prompt engineering has a high barrier to entry — writing good prompts requires understanding how LLMs work. Data curation, by contrast, connects directly to the existing expertise of domain specialists. Accountants know which financial documents matter. Researchers know which papers are essential. NotebookLM converts that domain knowledge directly into AI productivity.\nOne more thing worth noting from a developer\u0026rsquo;s perspective: what NotebookLM demonstrates is not the final form of RAG, but a starting point. Today you upload sources manually. The evolution toward real-time data source integration, API-based automatic updates, and team-level knowledge graph construction is plausible. Google is positioning NotebookLM as a core touchpoint in the Gemini ecosystem, so it is worth tracking how this tool continues to evolve.\nSource: 직장인이라면 지금 당장 써야 할 무료 AI | 노트북LM 실전 활용법 12가지 (최신 가이드) — 오빠두엑셀\n","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-notebooklm-guide/cover-en.jpg","permalink":"/posts/2026-04-01-notebooklm-guide/","title":"12 Practical Uses for NotebookLM — A Complete Guide to Your Free AI Research Assistant"},{"content":"In the last week of March 2026, the open source ecosystem was hit by a cascade of supply chain attacks. axios — with weekly downloads over 100 million — was compromised on npm. LiteLLM — with 97 million monthly downloads — was breached on PyPI. And Claude Code\u0026rsquo;s source code was exposed through npm .map files. This post covers the technical details of each incident, the common patterns they share, and what you can do about it.\n1. The axios Supply Chain Attack (2026-03-31) How It Happened The npm account of axios lead maintainer jasonsaayman was taken over. The attacker changed the account email to a ProtonMail address (ifstap@proton.me), then used a long-lived npm token to bypass GitHub Actions CI/CD entirely and publish directly via the npm CLI.\nBoth release branches (1.x and 0.x) were compromised within 39 minutes:\nInfected version Safe version axios@1.14.1 axios@1.14.0 axios@0.30.4 axios@0.30.3 The malicious dependency plain-crypto-js@4.2.1 had been pre-staged on npm 18 hours before the attack under account nrwise (nrwise@proton.me). Pre-built payloads for three operating systems made this a highly premeditated operation.\nWhat the Malware Did The infected axios versions inject a fake dependency called plain-crypto-js@4.2.1. This package is not imported anywhere in the axios source — its sole purpose is to deploy a cross-platform RAT (Remote Access Trojan) via the postinstall script.\nPlatform-Specific Payloads OS Behavior Artifact file macOS Downloads trojan from C2 via AppleScript /Library/Caches/com.apple.act.mond Windows Drops executable in ProgramData %PROGRAMDATA%\\wt.exe Linux Executes Python script /tmp/ld.py Self-Concealment Mechanism After execution, the malware deletes itself and replaces package.json with a clean version pre-prepared as package.md, evading forensic detection. Even opening node_modules after infection would show everything as normal.\nAttack Flow flowchart TD A[\"npm account takeover\u0026lt;br/\u0026gt;jasonsaayman\"] --\u003e B[\"Email changed\u0026lt;br/\u0026gt;ifstap@proton.me\"] B --\u003e C[\"Long-lived npm token\u0026lt;br/\u0026gt;bypasses CI/CD\"] C --\u003e D[\"plain-crypto-js@4.2.1\u0026lt;br/\u0026gt;pre-staged 18 hours earlier\"] C --\u003e E[\"Publish axios@1.14.1\"] C --\u003e F[\"Publish axios@0.30.4\"] E --\u003e G[\"npm install runs\"] F --\u003e G G --\u003e H[\"postinstall script executes\"] H --\u003e I[\"Platform detection\"] I --\u003e J[\"macOS: AppleScript RAT\"] I --\u003e K[\"Windows: drop wt.exe\"] I --\u003e L[\"Linux: run ld.py\"] J --\u003e M[\"C2 communication\u0026lt;br/\u0026gt;sfrclak.com:8000\"] K --\u003e M L --\u003e M M --\u003e N[\"Replace with package.md\u0026lt;br/\u0026gt;self-concealment\"] style A fill:#ff6b6b,color:#fff style M fill:#ff6b6b,color:#fff style N fill:#ffa94d,color:#fffIndicators of Compromise (IOC) Item Value C2 domain sfrclak.com C2 IP 142.11.206.73 C2 port 8000 Malicious npm account nrwise (nrwise@proton.me) Malicious package plain-crypto-js@4.2.1 Compromised account email ifstap@proton.me Additional Infected Packages Additional packages distributing the same malware were identified:\n@shadanai/openclaw (versions 2026.3.28-2, 2026.3.28-3, 2026.3.31-1, 2026.3.31-2) @qqbrowser/openclaw-qbot@0.0.130 (contains a tampered axios@1.14.1 in node_modules) Incident Response Timeline The situation was shared in real time in GitHub issue axios/axios#10604. Collaborator DigitalBrainJS was unable to act directly because jasonsaayman held higher permissions. The situation was only resolved after requesting the npm team to revoke all tokens.\n2. The LiteLLM Supply Chain Attack (2026-03-24) Background: The TeamPCP Campaign This incident was part of a chained supply chain attack campaign by the TeamPCP hacking group, which started with security scanner Trivy.\nDate Target 2026-02-28 Initial Trivy repository compromise 2026-03-19 76 Trivy GitHub Actions tags tampered 2026-03-20 28+ npm packages taken over 2026-03-21 Checkmarx KICS GitHub Action compromised 2026-03-24 LiteLLM PyPI package compromised LiteLLM was using Trivy in CI/CD security scanning without version pinning. When the tampered Trivy ran, PyPI publish tokens were transferred to the attacker.\nAttack Method The attacker uploaded litellm v1.82.7 (10:39 UTC) and v1.82.8 (10:52 UTC) directly using the stolen PyPI token.\nThe core attack vector was a .pth file. Python\u0026rsquo;s .pth files, when placed in site-packages, execute automatically when the Python interpreter starts — meaning any Python execution in that environment triggers the malicious code, even without import litellm.\n# litellm_init.pth (34,628 bytes) — one-liner import os, subprocess, sys; subprocess.Popen([sys.executable, \u0026#34;-c\u0026#34;, \u0026#34;import base64; exec(base64.b64decode(\u0026#39;...\u0026#39;))\u0026#34;]) The decoded payload was a 332-line credential harvesting script that collected:\nSSH keys (RSA, Ed25519, ECDSA, DSA, and all other types) Cloud credentials — AWS/GCP/Azure (including instance metadata) Kubernetes service account tokens and secrets PostgreSQL, MySQL, Redis, MongoDB config files Cryptocurrency wallets — Bitcoin, Ethereum, Solana, and others Shell history — .bash_history, .zsh_history, etc. Collected data was double-encrypted (AES-256-CBC + RSA-4096) and sent to https://models.litellm.cloud/ — a typosquatting domain registered one day before the attack.\nScale of Impact Monthly downloads: ~97 million (~3.4 million/day) PyPI exposure window: ~3 hours Cloud environment prevalence: ~36% (Wiz Research analysis) Affected downstream projects: DSPy (Stanford), CrewAI, Google ADK, browser-use, and others How It Was Caught Ironically, the attacker\u0026rsquo;s own bug triggered the discovery. The .pth file spawned a child process on every Python startup, and each child would also re-execute the .pth, creating a fork bomb — memory would rapidly exhaust. FutureSearch.ai\u0026rsquo;s Callum McMahon noticed the anomaly and filed an issue, but the attacker deployed a botnet of 73 accounts to flood the issue with 88 spam comments in 102 seconds trying to bury it.\nAndrej Karpathy called this incident \u0026ldquo;software horror.\u0026rdquo;\nHow to Check for LiteLLM Infection # Check installed version — 1.82.7 or 1.82.8 means infected pip show litellm | grep Version # Check for .pth file find / -name \u0026#34;litellm_init.pth\u0026#34; 2\u0026gt;/dev/null # Check for backdoor ls ~/.config/sysmon/sysmon.py 2\u0026gt;/dev/null ls ~/.config/systemd/user/sysmon.service 2\u0026gt;/dev/null # Kubernetes environment kubectl get pods -n kube-system | grep node-setup 3. Claude Code Source Code Exposure Around the same time, another npm security incident was reported. The source code of Anthropic\u0026rsquo;s Claude Code CLI was found to be fully recoverable through .map files (source maps) included in the npm package.\nThis was not a malicious attack, but it illustrates that including .map files in an npm package publish exposes the original source behind any obfuscated or bundled code. It is a reminder of the importance of configuring .npmignore or the files field properly.\n4. Common Lessons and Defenses The Pattern Across All Three Incidents All three incidents abused trust in package registries (npm/PyPI):\nPattern axios LiteLLM Claude Code Attack vector npm account takeover PyPI token theft (via Trivy) Source map not excluded Registry npm PyPI npm CI/CD bypass Direct publish Direct publish N/A Malicious behavior postinstall RAT .pth auto-execution Source exposure Concealment attempt Replace with package.md Botnet spam None Immediate Response Checklist npm (axios) # 1. Check for infected version npm ls axios # 2. Pin to safe version npm install axios@1.14.0 # 3. Commit lockfile git add package-lock.json \u0026amp;\u0026amp; git commit -m \u0026#34;fix: pin axios to safe version\u0026#34; # 4. Security audit npm audit # 5. Check for IOC network connections # Look for outbound connections to sfrclak.com or 142.11.206.73 PyPI (LiteLLM) # 1. Pin to safe version pip install \u0026#34;litellm\u0026lt;=1.82.6\u0026#34; # 2. If infected, rotate all secrets # SSH keys, AWS/GCP/Azure credentials, DB passwords, API keys — replace everything Long-Term Defenses Pin versions: Use exact versions instead of ^ or ~. Always commit lockfiles. Block postinstall scripts: Consider npm install --ignore-scripts in CI/CD. Require MFA: Enable TOTP-based 2FA on npm/PyPI maintainer accounts. Manage token lifetimes: Use OIDC-based short-lived tokens instead of long-lived ones. Rotate regularly. Pin CI/CD tool versions: LiteLLM\u0026rsquo;s unversioned Trivy use was the root cause. Security scanners are not exempt. Remove source maps: Audit whether .map files are included in production npm packages. Monitor dependencies: Continuously watch your supply chain with Socket, Snyk, or npm audit. Takeaways The simultaneous npm and PyPI incidents in a single week reveal some uncomfortable truths.\nFirst, maintainers are a single point of failure. In the axios case, one compromised account infected two release branches in 39 minutes, and other collaborators lacked the permissions to do anything. In a world where OIDC-based publishing is not yet widely adopted, long-lived tokens are ticking time bombs.\nSecond, security tools themselves become attack vectors. In the LiteLLM incident, security scanner Trivy became the entry point for the attack. Installing tools without version pins — like apt-get install -y trivy — is trading convenience for security.\nThird, attackers are becoming more sophisticated. Pre-staging payloads 18 hours ahead, self-concealment mechanisms, deploying a 73-account botnet to bury GitHub issues, using AI agents for vulnerability scanning — supply chain attacks are industrializing.\nThe bottom line: when you run npm install or pip install, you are extending trust to thousands of maintainers. Basic hygiene measures — committing lockfiles, pinning versions, --ignore-scripts, and rotating tokens — have never mattered more.\n","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-npm-supply-chain-attacks/cover-en.jpg","permalink":"/posts/2026-04-01-npm-supply-chain-attacks/","title":"2026 npm Supply Chain Attacks — axios, LiteLLM, and the Lessons Learned"},{"content":"Claude Code is powerful on its own, but connecting MCP servers unlocks browser control, web crawling, live documentation lookups, and database manipulation. This post covers four MCP servers you can put to use in real work immediately — including installation and ready-to-use prompts. Reference: AI Usability Research Lab video \u0026ldquo;4 MCP Servers That Claude Code Power Users Already Use | EP.02.\u0026rdquo;\nWhat Is MCP — The Smartphone Analogy The easiest way to understand MCP (Model Context Protocol) is the smartphone analogy.\nThink back to when you first got a smartphone. The hardware was great, but before installing apps, all it could do was make calls and send texts — no messaging apps, no maps, no video. Installing apps is what makes a smartphone actually \u0026ldquo;smart.\u0026rdquo;\nClaude Code works the same way.\nAnalogy Reality Smartphone Claude Code Installing apps from the app store Connecting MCP servers USB-C cable (standard connector) MCP (standard protocol) Apps send notifications automatically Claude auto-selects the right MCP from context MCP is the standard protocol that connects external tools to Claude Code. Like a USB-C cable, one protocol lets you connect hundreds of different tools.\nHow It Works flowchart LR A[\"User\u0026lt;br/\u0026gt;enters prompt\"] --\u003e B[\"Claude Code\u0026lt;br/\u0026gt;reads context\"] B --\u003e C[\"MCP server\u0026lt;br/\u0026gt;auto-selected\"] C --\u003e D[\"External tool\u0026lt;br/\u0026gt;executed\"] D --\u003e E[\"Results organized\u0026lt;br/\u0026gt;returned to user\"]The key is that users don\u0026rsquo;t need to explicitly call MCP. Just as a messaging app auto-notifies you when a message arrives after you install it, once MCP is installed, Claude figures out \u0026ldquo;this task needs a browser\u0026rdquo; and uses Playwright on its own.\nThat said, if you want to force a specific MCP, stating it explicitly in the prompt is more reliable.\nRestart Required After MCP Installation After installing any MCP, you must restart Claude Code. Run /exit, then launch claude again.\n1. Playwright MCP — Hands and Feet for Controlling the Browser What It Does Playwright MCP enables Claude to open a browser, click, and type directly. Visiting sites, clicking buttons, filling forms, taking screenshots — everything you normally do in a browser, Claude can do for you.\nInstallation Install via natural language inside Claude Code:\nInstall the Playwright MCP Or directly from the terminal:\nclaude mcp add playwright -- npx @anthropic-ai/mcp-playwright Use Cases QA testing: Claude opens your website in a real browser and tests it Data collection: Search for restaurants on a map and organize results in a spreadsheet API key setup assistance: Opens a service\u0026rsquo;s website and guides you through getting an API key Visual validation: Takes screenshots and judges whether the layout looks correct Real-World Prompts Using the Playwright MCP, search \u0026#34;Gangnam station restaurants\u0026#34; on Naver Maps and organize the top 10 highest-rated places into a Google Sheet. Fields: name, rating, number of reviews, address Open my website at http://localhost:3000 using Playwright and QA test that all page links work correctly. If any links are broken, list them. Playwright navigates one page at a time with full interaction support — it\u0026rsquo;s best suited for precise interactions, not bulk crawling.\n2. Firecrawl MCP — The Ultimate Tool for Large-Scale Web Crawling What It Does While Playwright clicks through pages one by one, Firecrawl crawls an entire website at once. It converts scraped content into clean structured formats like Markdown or JSON, and includes built-in AI-powered analysis.\nInstallation Firecrawl requires an API key. The free tier gives you roughly 2,000 crawls.\n# Get an API key: sign up at https://firecrawl.dev claude mcp add firecrawl -- npx firecrawl-mcp --api-key YOUR_API_KEY Or inside Claude Code:\nInstall the Firecrawl MCP If getting the API key is tricky, you can delegate the task to Playwright MCP:\nUse Playwright to go to firecrawl.dev and walk me through getting an API key. Go ahead and handle it yourself. Playwright vs. Firecrawl Playwright Firecrawl Approach Direct page-by-page control Bulk crawl of entire sites Speed Slow (includes interaction) Fast (optimized for volume) Output Screenshots, DOM access Markdown, JSON, CSV Best for QA testing, form input Data collection, competitive analysis Cost Free Free tier (2,000 crawls) Real-World Prompts Use the Firecrawl MCP to collect the 10 most recent articles from the toss.tech blog. Fields: title, author, category, summary, URL Sort by newest first and output as a CSV file. Crawl the Musinsa ranking page (https://www.musinsa.com/ranking) for ranks 1 through 50. Fields: rank, brand, product name, discount rate, sale price, product URL Organize into an Excel file and include image URLs. A deeper analysis of Firecrawl is covered in a separate post: Firecrawl — The Definitive Web Scraping Tool for the AI Era.\n3. Context7 MCP — Real-Time Access to the Latest Official Documentation What It Does Sometimes when you ask AI to write code, it invents functions that don\u0026rsquo;t exist. That\u0026rsquo;s because AI training data has an expiration date. For example, Claude might write Next.js 13 syntax when you\u0026rsquo;re on version 15.\nContext7 MCP solves this problem at the root. When you enter a prompt, it fetches the current live official documentation for the relevant library and shows it to Claude — making Claude write code based on actual documentation, not stale training data.\nInstallation Free, no API key required.\nclaude mcp add context7 -- npx @context7/mcp Or:\nInstall the Context7 MCP Real-World Prompts Create a server component for a blog list page using Next.js App Router. use context7 Write code to connect to PostgreSQL with Prisma ORM. use context7 with the latest docs Implement dark mode using Tailwind CSS v4\u0026#39;s new configuration approach. use context7 Add \u0026ldquo;use Context7 MCP and reference the latest docs\u0026rdquo; to your CLAUDE.md and you won\u0026rsquo;t need to specify it every time — Claude will automatically consult live documentation.\n4. Supabase MCP — Control Your Database with Natural Language What It Does Supabase MCP lets Claude directly manipulate a database. Table creation, data insertion, query execution, schema changes — even without knowing SQL, you can work with the database in plain language.\nInstallation You\u0026rsquo;ll need your Supabase project connection details.\nclaude mcp add supabase -- npx @supabase/mcp-server \\ --supabase-url https://YOUR_PROJECT.supabase.co \\ --supabase-key YOUR_SERVICE_ROLE_KEY Or:\nInstall the Supabase MCP. My project URL is https://xxx.supabase.co and my service role key is eyJ... Use Cases Table design: \u0026ldquo;Create users, orders, and products tables with relationships.\u0026rdquo; Data migration: Bulk insert CSV data into a Supabase table RLS policy setup: Configure Row Level Security in plain language Crawl → save to DB: Store Firecrawl-collected data directly in the database Real-World Prompts Create tables for a blog system in Supabase. - posts: id, title, content, author_id, created_at, published - comments: id, post_id, user_id, body, created_at - users: id, email, display_name, avatar_url Set up the foreign key relationships and add RLS policies. Bulk insert the products.csv I crawled into the Supabase products table. Skip duplicate product names and only add new entries. The Real Power: Combining MCP Servers The true value of MCP shows up when you combine multiple servers. Connect several MCPs to Claude Code, and Claude automatically selects the right one for each situation.\nflowchart TD A[\"User request\u0026lt;br/\u0026gt;Analyze competitor courses\"] --\u003e B[\"Claude Code\u0026lt;br/\u0026gt;breaks down the task\"] B --\u003e C[\"Firecrawl MCP\u0026lt;br/\u0026gt;crawl 5 platforms\"] B --\u003e D[\"Context7 MCP\u0026lt;br/\u0026gt;reference analysis framework\"] C --\u003e E[\"Collected data\u0026lt;br/\u0026gt;consolidated\"] D --\u003e E E --\u003e F[\"Supabase MCP\u0026lt;br/\u0026gt;save to DB\"] E --\u003e G[\"Analysis report\u0026lt;br/\u0026gt;Excel output\"]Combination Example: Competitor Course Analysis Step 1 — Data collection (Firecrawl): Crawl the following education platforms for courses related to \u0026#34;Claude Code.\u0026#34; - Inflearn, FastCampus, Class101, Coloso, LearningSpooons Fields: course title, instructor, price, enrollment count, review count, rating, URL Step 2 — Analysis: Based on the collected data, run a step-by-step analysis: - Strengths/weaknesses comparison - Price-tier positioning - Gaps in the market Final output: Excel file. Recommended MCP Setup Summary flowchart LR subgraph FREE[\"Free\"] P[\"Playwright\u0026lt;br/\u0026gt;Browser control\"] C7[\"Context7\u0026lt;br/\u0026gt;Live docs reference\"] end subgraph FREEMIUM[\"Free tier\"] FC[\"Firecrawl\u0026lt;br/\u0026gt;Web crawling\"] SB[\"Supabase\u0026lt;br/\u0026gt;DB control\"] end P --\u003e |\"QA, form input\"| USE[\"Real-world use\"] C7 --\u003e |\"Coding with latest API\"| USE FC --\u003e |\"Bulk data collection\"| USE SB --\u003e |\"Data storage/query\"| USE MCP Use Cost API Key Playwright Browser control, QA Free Not required Firecrawl Web crawling, data collection Free (2,000 crawls) Required Context7 Live official docs reference Free Not required Supabase Database manipulation Free tier Required Insight MCP is Claude Code\u0026rsquo;s app ecosystem. Just as a smartphone without apps is just a phone, Claude Code without MCP is just a text generator. Connect MCP and Claude becomes a true agent — controlling browsers, crawling the web, reading current documentation, and manipulating databases.\nWhat\u0026rsquo;s especially impressive is the low barrier to entry. \u0026ldquo;Install the Playwright MCP\u0026rdquo; is all it takes. And once installed, Claude auto-selects the appropriate MCP from context without you having to invoke it explicitly. Non-developers can do browser automation, web crawling, and database manipulation through natural language alone.\nOne practical tip: if you want to force a specific MCP, naming it in the prompt is the reliable approach. \u0026ldquo;Using Playwright,\u0026rdquo; \u0026ldquo;with Firecrawl,\u0026rdquo; \u0026ldquo;use context7\u0026rdquo; — explicitly naming the tool ensures the intended MCP gets invoked.\nAs more MCPs join the ecosystem — Notion MCP, Figma MCP, Linear MCP, and beyond — Claude Code will evolve from a simple coding tool into a general-purpose work automation platform.\n","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-claude-code-mcp-4/cover-en.jpg","permalink":"/posts/2026-04-01-claude-code-mcp-4/","title":"4 Essential MCP Servers for Claude Code — Playwright to Firecrawl"},{"content":"Through 2024 and 2025, building an AI agent meant making choices from scratch: which framework to use, how to set up the RAG pipeline, how to manage state. In 2026, \u0026ldquo;batteries-included\u0026rdquo; SDKs like the Claude Agent SDK and Codex SDK have arrived and shifted the starting point entirely. This post analyzes the Old vs New paradigm shift in agent architecture and how RAG\u0026rsquo;s role is changing. Related posts: Excalidraw Diagram Skill, NotebookLM Practical Guide\n1. Old vs New: The Paradigm Shift in Agent Architecture The Old Way (2024–2025) Traditional agent development followed this flow:\nChoose a framework — Pick one from LangChain, LangGraph, Pydantic AI, N8N, etc. Define tools — Implement agent capabilities (filesystem access, email retrieval, etc.) from scratch Set up RAG — Design chunking, embedding, and retrieval strategies; wire up a vector DB Build the agent loop — Hand-wire state management, conversation history storage, and a memory system The core problem with this approach: too much glue code. DB table design, session management, ingestion pipelines — infrastructure code unrelated to the agent\u0026rsquo;s actual intelligence occupied a significant portion of the codebase.\nThe New Way: SDK-First Building on the Claude Agent SDK or Codex SDK changes everything:\nConversation history management is built into the SDK — no separate DB needed File search tools (Grep, Read, etc.) are already included — no RAG needed for small knowledge bases Skills and MCP servers let you add tools in a reusable form Sub-agents, Hooks, and permission settings are all declared in a single TypeScript/Python file In practice, the Claude Agent SDK lets you implement more features with less code. Systems like Second Brain — memory building, daily reflection, integrated management — all run on top of one SDK.\nArchitecture Comparison flowchart LR subgraph OLD[\"Old Way (Framework)\"] direction TB A1[\"Choose framework\u0026lt;br/\u0026gt;LangChain / Pydantic AI\"] --\u003e A2[\"Define tools manually\u0026lt;br/\u0026gt;Tool functions\"] A2 --\u003e A3[\"RAG pipeline\u0026lt;br/\u0026gt;Chunking + Embedding + VectorDB\"] A3 --\u003e A4[\"Agent loop\u0026lt;br/\u0026gt;State / Memory / DB\"] end subgraph NEW[\"New Way (SDK-First)\"] direction TB B1[\"Claude Agent SDK\u0026lt;br/\u0026gt;Codex SDK\"] --\u003e B2[\"Skills + MCP servers\u0026lt;br/\u0026gt;Reusable tools\"] B2 --\u003e B3[\"Built-in file search\u0026lt;br/\u0026gt;Grep / Read / Glob\"] B3 --\u003e B4[\"Sub-agents + Hooks\u0026lt;br/\u0026gt;Declarative configuration\"] end OLD -- \"Transition\" --\u003e NEWWhen Do You Still Need a Framework? SDK-First isn\u0026rsquo;t a universal answer. Three clear limitations remain:\nCriterion SDK (Claude Agent SDK, etc.) Framework (Pydantic AI, etc.) Speed Inference overhead makes it slow (10s+) Sub-second response possible Cost Heavy token use; API costs explode with many users Direct control enables cost optimization Control Limited visibility into conversation history and observability Full control over everything Two questions to guide the decision:\nWho\u0026rsquo;s using it? — If it\u0026rsquo;s just you, SDK. If it\u0026rsquo;s many users in production, framework. What are the speed/scale requirements? — If latency is acceptable, SDK. If fast response is essential, framework. In practice, the most realistic pattern is prototyping with an SDK, then porting proven workflows to a framework. Skills and MCP servers are reusable on both sides, so migration cost is low.\nIs RAG Dead? The short answer: no — but its role has changed.\nSmall code/doc bases: File search (Grep) has been shown to outperform semantic search (LlamaIndex research) Large knowledge bases: Vector DB-based RAG is still necessary — searching thousands of documents with Grep isn\u0026rsquo;t realistic Where Skills replace RAG: For code context tasks, skill.md replaces chunking + embedding. The agent loads a skill when needed — that\u0026rsquo;s enough The key isn\u0026rsquo;t \u0026ldquo;RAG or not\u0026rdquo; — it\u0026rsquo;s choosing the search strategy that fits the scale and access pattern of your knowledge.\nInsight The essence of agent development is changing. The question has shifted from \u0026ldquo;which framework should I use?\u0026rdquo; to \u0026ldquo;what Skills should I give my agent?\u0026rdquo; As SDKs abstract away the infrastructure, developers can spend their time on designing the agent\u0026rsquo;s capabilities instead of glue code.\nDeclarative tool composition wins — Skills and MCP servers both work by declaring \u0026ldquo;here\u0026rsquo;s what I can do.\u0026rdquo; We\u0026rsquo;re moving away from procedurally coding agent loops. SDK for prototyping, framework for production — This pattern is the most realistic approach. Since Skills and MCP are reusable on both sides, migration cost stays low. RAG isn\u0026rsquo;t disappearing — it\u0026rsquo;s democratizing — Developers replace RAG with file search and Skills; non-developers get the same effect without code using NotebookLM. Practical applications of this topic are covered in separate posts:\nExcalidraw Diagram Skill — Visual Reasoning for Coding Agents 12 Ways to Use NotebookLM in Practice Reference video:\nEverything You Thought About Building AI Agents is Wrong — Cole Medin ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-ai-agent-paradigm/cover-en.jpg","permalink":"/posts/2026-04-01-ai-agent-paradigm/","title":"A New Paradigm for Building AI Agents — From Frameworks to SDK-First"},{"content":"Overview Previous posts in this series covered Observability vs Monitoring and Honeycomb and Observability Fundamentals. This post takes it a step further. With Honeycomb\u0026rsquo;s MCP (Model Context Protocol) Server now GA, there is a new workflow for connecting observability data directly to AI tools. Add Canvas (the in-app AI assistant) and IDE integration, and the loop from \u0026ldquo;spot an anomaly\u0026rdquo; to \u0026ldquo;fix the code\u0026rdquo; becomes a single continuous flow.\nHoneycomb MCP Server GA — Bridging AI and Observability Honeycomb\u0026rsquo;s MCP Server has reached General Availability. Austin Parker (Honeycomb MCP product lead) described the core concept simply: bring observability data to where your AI tools live.\nMCP (Model Context Protocol) is the standard protocol for AI agents to communicate with external tools. Once you configure the Honeycomb MCP Server, you can access production data directly from Claude Desktop, Cursor, VS Code Copilot, Claude Code, and other AI environments.\nKey capabilities the MCP Server provides:\nEnvironment information: Service maps, dataset details, environment overview Query execution: Generate and run Honeycomb queries from natural language SLO monitoring: Check SLO status and view Boards Trace exploration: Inspect detailed trace waterfalls by trace ID OpenTelemetry guidance: Access up-to-date instrumentation information Canvas: The In-App AI Assistant Canvas is an AI assistant built directly into Honeycomb. It lets you explore observability data through a conversational interface.\nHow It Works Open Canvas and ask a question in natural language: \u0026ldquo;How is our app responding?\u0026rdquo; Canvas identifies the relevant environment and service It automatically generates and executes the necessary queries Result graphs appear side by side with the conversation The AI narrates what the data shows — for example, \u0026ldquo;latency is trending up\u0026rdquo; A principle Honeycomb has held since its founding in 2016 shines here: any query, against any attribute, must execute fast. AI can fire dozens of queries per minute in sequence, and Honeycomb\u0026rsquo;s fast query engine backs that up.\nThe Importance of Validating AI Results Every graph Canvas provides is clickable. You can verify that the query is correct, and drill down into trace waterfalls to inspect the raw data. The key is not blindly trusting the AI\u0026rsquo;s conclusions but validating the reasoning behind them.\nIDE Integration: VS Code/Cursor + Copilot The real strength of Honeycomb MCP is having production data and code visible on the same screen inside your IDE.\nCustom Slash Commands With the MCP Server configured, Honeycomb-specific slash commands become available in your IDE:\n/otel-analysis: Analyzes the OpenTelemetry instrumentation state of your code. The AI references the latest information via MCP rather than stale training data. /otel-instrumentation: Provides instrumentation guidance — which spans to add, which attributes are useful. The core value of these slash commands is information freshness. The OpenTelemetry knowledge baked into AI models becomes outdated over time. MCP provides a path to always reference the latest documentation and best practices.\nDemo: New Team Member Onboarding Scenario The most compelling part of the MCP Server GA announcement demo was the onboarding scenario.\nAfter connecting Honeycomb MCP to Claude Desktop, the request was: \u0026ldquo;Create an interactive artifact that would help a developer on their first day understand the system.\u0026rdquo; The result:\nDataflow Architecture: An interactive diagram visualizing data flow between systems Critical SLOs: A list of key SLOs with their current status Key Board links: Direct links to monitoring dashboards Trace/Query shortcuts: One-click navigation to actual traces and queries All of this is generated automatically by combining MCP tools: get_environment_details, get_service_map, get_slos, get_boards, run_query. The gap between this and manually writing a wiki page and attaching screenshots is enormous.\nReal Debugging Flow: From Canvas to Code Fix The most practical scenario is the end-to-end debugging flow. Here is the actual flow shown in the demo:\nflowchart LR A[\"Ask Canvas\u0026lt;br/\u0026gt;How is our app responding?\"] --\u003e B[\"Auto-run queries\u0026lt;br/\u0026gt;latency anomaly found\"] B --\u003e C[\"Drill into trace\u0026lt;br/\u0026gt;checkout service delay\"] C --\u003e D[\"Pinpoint root cause\u0026lt;br/\u0026gt;get_discounts N+1 query\"] D --\u003e E[\"MCP in IDE\u0026lt;br/\u0026gt;auto-locate code\"] E --\u003e F[\"Suggest fix\u0026lt;br/\u0026gt;convert to batch query\"]Step by Step Step 1 — Detect anomaly in Canvas\nAsking \u0026ldquo;How is our app responding?\u0026rdquo; triggers automatic queries across multiple services for latency and error rates. This reveals abnormally high P99 latency in the checkout service.\nStep 2 — Drill into trace\nCanvas finds the slow trace ID and loads the trace waterfall. Expanding the checkout section shows the get_discounts function consuming most of the time.\nStep 3 — Switch to IDE\nHere Canvas reaches its limit — code changes require an IDE. With the Honeycomb MCP configured in VS Code + Copilot, the request is: \u0026ldquo;Honeycomb shows a checkout latency issue — find the cause in the code.\u0026rdquo;\nStep 4 — MCP executes query\nThe IDE\u0026rsquo;s AI agent runs a query against Honeycomb via MCP. It confirms the same latency pattern and identifies an N+1 query pattern in get_discounts from the trace data.\nStep 5 — Locate code and suggest fix\nThe agent finds the get_discounts function in the codebase, identifies the pattern of executing individual DB queries inside a loop, and proposes a specific fix that converts it to a batch query.\nHoneycomb MCP\u0026rsquo;s Efficient Communication Design Honeycomb MCP is designed to maximize token efficiency in communication with AI agents.\nAPI responses typically come back as JSON, but Honeycomb MCP uses a mix of formats depending on the situation:\nFormat Use Case Text Narrative descriptions, context delivery CSV Tabular query results (rows and columns) JSON Structured metadata ASCII Art Trace waterfalls, simple visualizations The purpose of this mixed-format strategy is clear: deliver the necessary information with the fewest tokens possible. Sending CSV data instead of a full graph image uses far fewer tokens and lets the AI read exact numbers accurately.\nIn Canvas (in-app), graphs render automatically. In IDE integration via MCP, query links are provided instead of graphs — click through to Honeycomb UI when you need to view them directly.\nOpenTelemetry Instrumentation Guidance This is where MCP goes beyond simply \u0026ldquo;reading data.\u0026rdquo; Honeycomb uses MCP to deliver OpenTelemetry expertise to AI agents.\nWhat this means in practice:\nWhen the AI suggests which spans to add to your code, it references Honeycomb\u0026rsquo;s latest best practices The otel-instrumentation slash command provides instrumentation guides matched to the language and framework you are using Advice is based on continuously updated guidance, not the AI model\u0026rsquo;s training data This is highly practical given how rapidly OpenTelemetry versions change. Instrumenting with outdated information — when the API has changed between SDK versions — creates new problems rather than solving existing ones.\nTakeaways The way we consume observability data is changing. The old model was open a dashboard, read the graphs, manually hunt for suspicious traces. Honeycomb Canvas and MCP transform this into \u0026ldquo;ask a question, get an answer.\u0026rdquo;\nIDE integration is a game changer. Having production data and code on the same screen eliminates context switching. As the N+1 debugging demo showed, you can go from spotting an issue in a trace to fixing it in code without leaving your IDE. This was a workflow that Honeycomb\u0026rsquo;s web UI alone could not support.\nThe token efficiency design of the MCP is impressive. The approach of combining Text + CSV + JSON + ASCII art to convey maximum information with minimum tokens is a pattern worth borrowing for other MCP server implementations. In the AI era, API design must consider not just \u0026ldquo;easy for humans to read\u0026rdquo; but also \u0026ldquo;efficiently consumable by AI.\u0026rdquo;\nThe onboarding scenario is realistic. If a developer can get a complete picture of system architecture, SLO status, and key dashboards in a single prompt, onboarding time drops dramatically. This is an example of observability tools expanding from \u0026ldquo;incident response only\u0026rdquo; to \u0026ldquo;everyday development tool.\u0026rdquo;\nReferences\nIntroducing Honeycomb Intelligence MCP Server - Now GA! — Honeycomb official GA announcement AI for Observability: Honeycomb Canvas \u0026amp; MCP — Canvas + MCP debugging demo ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-honeycomb-mcp/cover-en.jpg","permalink":"/posts/2026-04-01-honeycomb-mcp/","title":"Automating Observability with Honeycomb MCP"},{"content":"Overview Claude Code adoption is growing quickly, but learning resources are concentrated in English official documentation, creating a barrier for Korean-speaking users. Recently, Korean resources like the WikiDocs community guide and WeniVooks\u0026rsquo; Vibe Coding Essential have emerged, changing the picture.\nThis post compares and analyzes the four available Claude Code learning resources and maps out recommended learning paths by experience level. If you\u0026rsquo;ve already read the Claude Code Practical Guide series #1–#5 and the Claude Code Automation Triple Play post, this roadmap will help you fill the remaining gaps.\n1. Official Documentation — code.claude.com/docs The first place to check is Anthropic\u0026rsquo;s official documentation. The flow: Overview to understand what Claude Code is, Quickstart for your first hands-on session, then the Reference docs to dig into specific features.\nWhat\u0026rsquo;s Covered Overview: What Claude Code is, what it can do, installation guides by environment Quickstart: Your first real task — from exploring a codebase to committing a change Core Concepts: How it works, Context Window, permission modes Workflows and Best Practices: CLAUDE.md setup, common patterns Platforms and Integrations: VS Code, JetBrains, Slack, GitHub Actions, etc. Korean Version The official documentation has a Korean version at /docs/ko/. Translation quality is solid and it\u0026rsquo;s updated nearly in sync with the English original. If English feels like a barrier, starting with the Korean docs is perfectly reasonable.\nPros and Cons Pros Cons Always up to date Lacks real-world examples Managed by Anthropic directly — most accurate Feature-list heavy; doesn\u0026rsquo;t explain \u0026ldquo;why\u0026rdquo; Korean version available Information overload for beginners Free No community discussion or Q\u0026amp;A Best for: Your first stop when a new feature drops. Less useful for learning from scratch — more useful for existing users asking \u0026ldquo;how exactly does this work?\u0026rdquo;\n2. Anthropic Skilljar — Claude Code in Action Claude Code in Action is a free online course Anthropic offers on the Skilljar platform. It starts from the fundamental question — \u0026ldquo;What is a coding assistant?\u0026rdquo; — and progresses step by step through live demos.\nCourse Highlights Free: All content available with just an account Structured: Concept → demo → hands-on, in that order Official curriculum: Designed directly by Anthropic Progress tracking: Skilljar LMS tracks your completion Pros and Cons Pros Cons Free, official training material English only Structured curriculum Stays at an introductory level Interactive learning experience Doesn\u0026rsquo;t cover advanced topics (Skills, MCP) Certificate available Updates slower than the docs Best for: Someone encountering Claude Code for the first time who needs to understand \u0026ldquo;what this is and why it matters.\u0026rdquo; If you\u0026rsquo;re comfortable with English, take this course before diving into the docs.\n3. WikiDocs Claude Code Guide The WikiDocs Claude Code Guide is a practice-oriented guide created by the Korean community. It includes practical chapters on Skills development and MCP server integration that the official docs don\u0026rsquo;t cover in depth — making it especially valuable for intermediate and advanced users.\nKey Topics Claude Code installation and initial configuration Skills development: Writing, testing, and deploying custom skills MCP server integration: Connecting external tools CLAUDE.md strategies for different project types Real-world troubleshooting cases Companion Beginner\u0026rsquo;s Guide WikiDocs also has a Claude Code Beginner\u0026rsquo;s Guide. Complete beginners should start with the beginner\u0026rsquo;s guide (19202) before moving to the main guide (19104).\nPros and Cons Pros Cons Korean — no language barrier Community-written, accuracy varies Practice and real-world focused May update slower than official docs Covers advanced topics like Skills and MCP Structure is looser than the official course Free, open access Writing depth varies by contributor Best for: After learning the basics and wanting to go deeper into Skills or MCP integration in Korean. A natural next step after the Practical Guide series.\n4. Vibe Coding Essential with Claude Code (WeniVooks) WeniVooks offers a Claude Code guide aimed at non-developers. True to its \u0026ldquo;vibe coding\u0026rdquo; branding, the goal is for people with zero coding experience to build something with Claude Code.\nChapter Structure Chapter Content Audience Ch 0 WeniVooks service intro All Ch 1–2 Claude Code installation, basic usage Beginners Ch 3–4 Hands-on projects (website, automation) Beginner–Intermediate Ch 5 Advanced usage (extensions, customization) Intermediate Pros and Cons Pros Cons Korean, non-developer friendly Some content may be paid Progressive structure: basics → hands-on → advanced May be too shallow for experienced developers Project-based learning Limited advanced topics (MCP, Skills) WeniVooks community support Update cadence uncertain Best for: Someone with no development background who wants to build something with Claude Code. Ideal for PMs, designers, and planners entering AI coding tools.\nComprehensive Comparison Official Docs Skilljar WikiDocs WeniVooks Language English + Korean English Korean Korean Cost Free Free Free Free / partly paid Audience All levels Beginners Intermediate–Advanced Non-dev / Beginner Strength Accuracy, currency Structured education Real-world, advanced topics Non-developer friendly Weakness Lacks real examples Stays basic Accuracy varies Limited depth Covers Skills Yes (Reference) No Yes (practical) Limited Covers MCP Yes (Reference) No Yes (practical) Limited Format Web docs Online course Wiki eBook Recommended Learning Paths Here\u0026rsquo;s how to sequence your learning based on experience level.\nflowchart TD START[\"Start learning\u0026lt;br/\u0026gt;Claude Code\"] START --\u003e Q{\"Do you have\u0026lt;br/\u0026gt;dev experience?\"} Q -- \"No\" --\u003e BEGINNER[\"Beginner Path\"] Q -- \"Yes\" --\u003e Q2{\"Have you used\u0026lt;br/\u0026gt;Claude Code before?\"} Q2 -- \"No\" --\u003e INTERMEDIATE[\"Intermediate Path\"] Q2 -- \"Yes\" --\u003e ADVANCED[\"Advanced Path\"] BEGINNER --\u003e B1[\"1. Skilljar course\"] B1 --\u003e B2[\"2. Official docs (Korean)\"] B2 --\u003e B3[\"3. WeniVooks Vibe Coding\"] B3 --\u003e B4[\"4. WikiDocs beginner's guide\"] INTERMEDIATE --\u003e M1[\"1. Official docs Quickstart\"] M1 --\u003e M2[\"2. WikiDocs guide\"] M2 --\u003e M3[\"3. Practical Guide series\"] M3 --\u003e M4[\"4. Automation Triple Play\"] ADVANCED --\u003e A1[\"1. WikiDocs Skills chapter\"] A1 --\u003e A2[\"2. MCP server integration\"] A2 --\u003e A3[\"3. Build a custom agent\"] A3 --\u003e A4[\"4. Official docs Agent SDK\"] style START fill:#6366f1,color:#fff style BEGINNER fill:#22c55e,color:#fff style INTERMEDIATE fill:#f59e0b,color:#fff style ADVANCED fill:#ef4444,color:#fffBeginner (Non-developer / Coding novice) Skilljar — Understand \u0026ldquo;what is a coding assistant\u0026rdquo; from the ground up Official docs (Korean) — Installation and core concepts WeniVooks Vibe Coding — Build something real with project-based learning WikiDocs Beginner\u0026rsquo;s Guide — Additional practice and community Q\u0026amp;A Intermediate (Has dev experience, new to Claude Code) Official docs Quickstart — Install quickly and complete the first task WikiDocs Guide — Real-world techniques and CLAUDE.md strategies Practical Guide series — Context management, workflow patterns Automation Triple Play — Skills, scheduling, and Dispatch Advanced (Already using Claude Code, wants to go deeper) WikiDocs Skills chapter — Custom skill development in practice MCP server integration — External tool connectivity Custom agent development — Agent SDK usage Official docs Reference — Detailed API reference Insight Looking at the Claude Code learning ecosystem, a few interesting things stand out.\nKorean resources are growing fast. A few months ago, English official docs were the only option. Now there\u0026rsquo;s the WikiDocs guide, WeniVooks, and the official docs\u0026rsquo; Korean translation. This reflects rapid Claude Code adoption in Korea.\n\u0026ldquo;Official docs = best\u0026rdquo; doesn\u0026rsquo;t always hold. Official docs are accurate and current, but they don\u0026rsquo;t explain \u0026ldquo;why you\u0026rsquo;d want this\u0026rdquo; or \u0026ldquo;how to combine things in practice.\u0026rdquo; Community guides like WikiDocs fill that gap. The ideal approach is to use both in parallel.\nThe non-developer market is opening up. WeniVooks\u0026rsquo; \u0026ldquo;Vibe Coding Essential\u0026rdquo; directly targets non-developers. It\u0026rsquo;s a signal that Claude Code is being positioned not just as a dev tool but as \u0026ldquo;a tool that lets anyone code.\u0026rdquo; The era of PMs building their own prototypes and marketers writing data analysis scripts is coming.\nAccount for the lifecycle of learning materials. AI tools change fast. A guide that\u0026rsquo;s accurate today may be outdated in a month. Official docs always stay current, but community guides and eBooks may not. Make it a habit to always ask yourself: \u0026ldquo;Does this apply to the current version?\u0026rdquo;\nRelated posts:\nClaude Code Practical Guide series — Context management to workflows Claude Code Automation Triple Play — Skills, Scheduling, and Dispatch — Skills and automation deep dive ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-claude-code-learning-roadmap/cover-en.jpg","permalink":"/posts/2026-04-01-claude-code-learning-roadmap/","title":"Claude Code Learning Roadmap — From Official Docs to Korean Community Guides"},{"content":"Overview This is the fourth post in the Claude Code Practical Guide series. Previous entries covered context management and workflows (Part 1), new features from the last two months (Part 2), and 27 tips from 500 hours of use (Part 3).\nThis edition covers two core topics. First, Claude Code auto-fix — Anthropic\u0026rsquo;s officially released feature that automates PR creation, CI failure resolution, and reviewer comment incorporation. Second, Cole Medin\u0026rsquo;s Self-Healing AI Coding Workflow — a process where the coding agent visually validates its own work and self-corrects bugs.\nWorkflow Overview The diagram below shows how auto-fix and the Self-Healing workflow connect within the development cycle.\nflowchart TD A[\"Developer writes code\"] --\u003e B[\"PR created\"] B --\u003e C{\"CI pass?\"} C -- Yes --\u003e D[\"Reviewer comments\"] C -- No --\u003e E[\"auto-fix analyzes \u0026lt;br/\u0026gt; CI logs \u0026lt;br/\u0026gt; auto-corrects\"] E --\u003e C D --\u003e F[\"auto-fix applies \u0026lt;br/\u0026gt; comment feedback \u0026lt;br/\u0026gt; updates code\"] F --\u003e C B --\u003e G[\"Self-Healing \u0026lt;br/\u0026gt; workflow runs\"] G --\u003e H[\"3 sub-agents \u0026lt;br/\u0026gt; parallel research\"] H --\u003e I[\"E2E tests \u0026lt;br/\u0026gt; + visual validation\"] I --\u003e J{\"Blocker found?\"} J -- Yes --\u003e K[\"Auto-fix \u0026lt;br/\u0026gt; + retest\"] K --\u003e I J -- No --\u003e L[\"Validation report generated\"]1. Claude Code auto-fix: Remote Automated Corrections Automated PR Tracking and CI Failure Resolution Claude Code auto-fix automatically tracks Pull Requests from a web or mobile environment, detects CI failures, and resolves them on its own. The key is that everything happens remotely. A developer can open a PR, step away, and come back to find the CI passing.\nHere\u0026rsquo;s how it works: auto-fix fetches GitHub Actions logs and precisely diagnoses the failure — distinguishing build errors from lint errors, and code issues from infrastructure issues. For common infrastructure errors like PHP memory exhaustion, it has pre-built resolution templates to avoid unnecessary code changes.\nThree Ways to Use It There are three concrete ways to use auto-fix:\nWeb version: In the Claude Code web interface, select auto-fix from the CI menu of a generated PR Mobile: Directly instruct the AI agent to auto-fix (a quick-launch button for mobile is coming) Paste a PR link: Copy any PR link you want monitored and ask the agent to auto-fix it To get started, the Claude GitHub App must be installed, and auto-fix must be enabled in the repository settings.\nSecurity System Autonomous code modification requires strong security. auto-fix uses an independent safety classifier based on Claude Sonnet 4.6. What makes it distinctive: the classifier inspects the request without looking at the AI\u0026rsquo;s internal reasoning. This means even if prompt injection bypasses the internal logic, the actual actions being executed are separately verified. Actions exceeding granted permissions and sensitive data exfiltration are blocked at the source.\n# .github/settings.yml example — enabling auto-fix claude_code: auto_fix: enabled: true on_ci_failure: true # Auto-fix on CI failure on_review_comment: true # Apply review comments allowed_branches: - \u0026#34;feature/*\u0026#34; - \u0026#34;fix/*\u0026#34; 2. Self-Healing Workflow: Agents That Validate Their Own Work Cole Medin\u0026rsquo;s Approach In \u0026ldquo;This One Command Makes Coding Agents Find All Their Mistakes,\u0026rdquo; Cole Medin pinpoints the core problem precisely: coding agents generate code quickly, but they\u0026rsquo;re terrible at validating their own work. Without a framework provided by the developer, they either rush through validation or skip it entirely.\nThis workflow is packaged as a Claude Code skill (slash command). One /e2e-test command kicks off a 6-phase process. It works immediately on almost any codebase with a frontend.\nThe 6-Phase Validation Process Phase 0 — Pre-check: Verifies Vercel Agent Browser CLI is installed, checks OS environment (Windows requires WSL), etc.\nPhase 1 — Research: Three sub-agents run in parallel:\nMap codebase structure + identify user journeys Analyze database schema Code review (hunt for logic errors) Phase 2 — Test Planning: Define a task list based on research results. Each task is one user journey.\nPhase 3 — E2E Test Loop: Execute each user journey in sequence, navigating pages with Agent Browser CLI and verifying backend state with DB queries.\n# Vercel Agent Browser CLI usage examples npx @anthropic-ai/agent-browser snapshot # Capture current page state npx @anthropic-ai/agent-browser click \u0026#34;Sign In\u0026#34; npx @anthropic-ai/agent-browser screenshot ./screenshots/login.png Phase 4 — Self-correction: Only blocker issues are automatically fixed and retested. The important design philosophy: don\u0026rsquo;t fix everything. Fix only the major blockers so testing can continue; leave the rest in the report for the developer to evaluate.\nPhase 5 — Report: Output results in a structured format — what was fixed, remaining issues, all test paths. Reviewing with screenshots lets you quickly see what paths the agent actually tested.\nThe Power of Visual Validation The most impressive part of this workflow is screenshot-based visual validation. The agent takes screenshots at each step and uses the AI\u0026rsquo;s image analysis capability to determine whether the UI looks correct. This goes beyond \u0026ldquo;pass if no errors\u0026rdquo; — it verifies that the actual screen users see is rendering as intended.\nResponsive validation is also included: a lightweight check that pages render properly on mobile, tablet, and desktop viewports. This is the kind of \u0026ldquo;look and judge\u0026rdquo; validation that\u0026rsquo;s hard to implement with traditional E2E frameworks like Cypress or Playwright — and AI does it instead.\nPractical Usage Tips The workflow can be used in two ways:\nStandalone: Run a full E2E test suite at any point in time Integrated into the feature implementation pipeline: Automatically run regression testing right after the agent implements a feature Since this expands the context window significantly, it\u0026rsquo;s recommended to pass the report to a new session for follow-up work after testing completes.\nInsight auto-fix and Self-Healing point in the same direction. In an era where code generation speed far outpaces verification speed, automating the verification itself is the core challenge. auto-fix layers AI on top of existing CI/review infrastructure; Self-Healing extends verification to the user\u0026rsquo;s perspective via browser automation.\nUsing both together in practice is powerful. Run the Self-Healing workflow locally to validate before pushing, then let auto-fix handle CI failures and review comments after the PR is up. The developer just reviews the final report and screenshots.\nOne important caveat: the developer remains responsible for AI-generated code. As Cole Medin himself emphasized, this workflow isn\u0026rsquo;t about \u0026ldquo;vibe coding\u0026rdquo; — it\u0026rsquo;s about reducing the burden of verification. Auto-correction isn\u0026rsquo;t a silver bullet, and final judgment stays with the human.\nQuick Links Topic Link Claude Code auto-fix video Nova AI Daily - auto-fix launch Self-Healing workflow video Cole Medin - Find All Mistakes Claude Code official docs docs.anthropic.com Vercel Agent Browser CLI npmjs.com/@anthropic-ai/agent-browser Series Part 1 - Context management Claude Code Practical Guide 1 Series Part 2 - New features Claude Code Practical Guide 2 Series Part 3 - 27 tips Claude Code Practical Guide 3 ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-claude-code-autofix/cover-en.jpg","permalink":"/posts/2026-04-01-claude-code-autofix/","title":"Claude Code Practical Guide 4 — auto-fix and the Self-Healing Workflow"},{"content":"Overview This is the fifth post in the Claude Code Practical Guide series. Previous posts covered: context management (#1, 03/19), recent new features (#2, 03/24), 27 tips from a 500-hour user (#3, 03/30), and auto-fix with self-healing workflows (#4, 04/01).\nThis post is based on the AI LABS video 12 Hidden Settings To Enable In Your Claude Code Setup. We\u0026rsquo;ll walk through 12 settings buried in settings.json and environment variables that most users never touch — but enabling them makes a noticeable difference in both performance and daily experience.\nSettings Architecture Overview The diagram below shows which area of Claude Code each of the 12 settings belongs to.\ngraph LR A[\"settings.json\"] --\u003e B[\"Conversation retention \u0026lt;br/\u0026gt; cleanup_period_days\"] A --\u003e C[\"Output limit \u0026lt;br/\u0026gt; max_read_tokens\"] A --\u003e D[\"Auto compact \u0026lt;br/\u0026gt; auto_compact_%\"] A --\u003e E[\"Notifications\"] A --\u003e F[\"Model routing \u0026lt;br/\u0026gt; thinking budget\"] A --\u003e G[\"Permissions mode\"] H[\".claude/ folder\"] --\u003e I[\"path-specific rules\"] H --\u003e J[\"custom slash commands\"] H --\u003e K[\"MCP server config\"] L[\"Env vars\"] --\u003e M[\"CLAUDE_CODE_MAX \u0026lt;br/\u0026gt; _BASH_OUTPUT\"] N[\"hooks system\"] --\u003e O[\"pre/post hooks \u0026lt;br/\u0026gt; exit codes\"] P[\"Open source tools\"] --\u003e Q[\"Claude CTX \u0026lt;br/\u0026gt; Claude Tuner\"] 1. cleanup_period_days — Conversation Retention Period When using /insights or the --resume flag, only the last 30 days of conversations are shown by default. Claude Code deletes older data from the system.\nIf you want to analyze longer-term insights using Opus 4.6\u0026rsquo;s 1M token context window, you need to change this setting.\nLocation: ~/.claude/settings.json\n{ \u0026#34;cleanup_period_days\u0026#34;: 365 } Value Behavior 365 Retain one year of conversations 90 Retain three months (recommended middle ground) 0 Do not retain conversations — insights/resume disabled Note: Setting this too high can make the ~/.claude/ folder quite large. Check your available disk space.\n2. Path-Specific Rules Inside your project\u0026rsquo;s .claude/ folder, you can create rule files that load based on path patterns. When the agent reads or modifies a file, only the rules matching that path pattern are loaded into context.\nWhy This Matters Many people dump all their instructions into a single CLAUDE.md. As projects grow, this file becomes unwieldy and Claude starts losing track of which rules apply when. There is no reason to load backend rules while working on frontend code.\nConfiguration Example Separate rules by file type under .claude/rules/:\n.claude/ rules/ react-components.md # matches src/components/** api-routes.md # matches src/api/** database.md # matches prisma/** or drizzle/** Each rule file is injected into context only when working on files under the matching path. This naturally achieves:\nSeparation of concerns at the instruction level Focus — the agent only sees rules relevant to the current task Efficient use of the context window 3. Output Token Limits and Large File Reading Bash Output Limit When Claude Code reads bash command output, the default cap is 30,000 characters. Commands that produce large output — test suites, build logs, database migrations — get truncated.\n{ \u0026#34;max_output_chars\u0026#34;: 150000 } With a 1M token context window, the 30K limit is a legacy of the 200K era. Raising it to around 150K lets Claude read the full output.\nFile Read Token Limit By default, Claude reads only 25K tokens from a file. For larger files you can set this higher:\n{ \u0026#34;max_read_file_tokens\u0026#34;: 100000 } Bypassing the 2,000-Line Limit There is an important gotcha here. No matter how high you set the token limit, Claude reads at most 2,000 lines at a time and has no idea the rest of the file exists. Anthropic provides no setting to change this limit.\nWorkaround: Add the following instruction to your CLAUDE.md:\n## Large File Reading Rule Before reading any file, check its line count. For files exceeding 2,000 lines, use the offset and limit parameters to read the entire file in sections. You can also set up a hook that fires on every Read tool invocation to check the line count and force chunked reading when it exceeds 2,000 lines.\n4. CLAUDE_CODE_MAX_BASH_OUTPUT — Dedicated Bash Output Limit Setting the CLAUDE_CODE_MAX_BASH_OUTPUT environment variable gives you separate control over the maximum character count for bash command output.\n# Add to ~/.zshrc or ~/.bashrc export CLAUDE_CODE_MAX_BASH_OUTPUT=150000 This works alongside the settings.json configuration and is especially useful in CI/CD pipelines or when dealing with large logs. The default 30K value often shows only the beginning of test results, truncating the actual errors at the end.\n// Can also be set in settings.json { \u0026#34;env\u0026#34;: { \u0026#34;CLAUDE_CODE_MAX_BASH_OUTPUT\u0026#34;: \u0026#34;150000\u0026#34; } } 5. Auto-Compact and Context Management Claude Code automatically runs compact when the context window hits 95%. But even with a 1M token window, output quality starts degrading after 70%.\nOptimal Setting { \u0026#34;auto_compact_percentage_override\u0026#34;: 75 } Triggering compact at 75% ensures the agent always has ample headroom. If you wait until 95%, compact fires after quality has already dropped — meaning the code generated in that late window cannot be trusted.\nTip: Unless you specifically need the full 1M context for large codebase analysis, the 70–80% range is recommended.\n6. Notification Settings When Claude Code runs long tasks, it\u0026rsquo;s easy to miss the completion signal. You can control notification behavior in settings.json.\n{ \u0026#34;notifications\u0026#34;: { \u0026#34;enabled\u0026#34;: true, \u0026#34;sound\u0026#34;: true, \u0026#34;on_complete\u0026#34;: true } } Telemetry and Privacy By default, Claude Code sends usage data to Statsig (usage patterns and latency) and Sentry (error logging). To opt out:\n{ \u0026#34;disable_telemetry\u0026#34;: true, \u0026#34;disable_error_reporting\u0026#34;: true, \u0026#34;disable_feedback_display\u0026#34;: true } Note: The CLI flag --disable-non-essential-traffic looks similar but also blocks automatic updates. Using the three individual settings above is safer.\n7. Model Routing and Thinking Budget The effort Parameter When running sub-agents, the --effort flag controls the thinking level. Not every task needs maximum thinking.\n# Low effort for lightweight tasks claude --agent formatter --effort low # High effort for complex architectural decisions claude --agent architect --effort high Advanced Sub-agent Configuration Sub-agents can be configured beyond just model and MCP tools:\n{ \u0026#34;agents\u0026#34;: { \u0026#34;formatter\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;claude-sonnet-4-20250514\u0026#34;, \u0026#34;effort\u0026#34;: \u0026#34;low\u0026#34;, \u0026#34;background\u0026#34;: true, \u0026#34;skills\u0026#34;: [\u0026#34;lint-fix\u0026#34;], \u0026#34;hooks\u0026#34;: { \u0026#34;post_tool_use\u0026#34;: \u0026#34;./hooks/format-check.sh\u0026#34; } }, \u0026#34;architect\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;claude-opus-4-20250514\u0026#34;, \u0026#34;effort\u0026#34;: \u0026#34;high\u0026#34;, \u0026#34;isolation\u0026#34;: true, \u0026#34;permitted_agent_names\u0026#34;: [\u0026#34;formatter\u0026#34;, \u0026#34;tester\u0026#34;] } } } Option Description skill Inherit a specific skill into the sub-agent effort Control thinking token usage background Whether to run in the background isolation Run in isolation in a separate worktree permitted_agent_names Limit which child agents can be spawned Agent Teams (Experimental) Unlike sub-agents, members of Agent Teams can communicate with each other. A team leader coordinates work while each member operates as an independent Claude session but shares information.\n8. Permissions Mode and Auto-Accept Claude Code\u0026rsquo;s permission system requires user approval for every file modification, bash execution, and similar action. In trusted projects you can automate this.\n{ \u0026#34;permissions\u0026#34;: { \u0026#34;allow\u0026#34;: [ \u0026#34;Read\u0026#34;, \u0026#34;Glob\u0026#34;, \u0026#34;Grep\u0026#34;, \u0026#34;Bash(git *)\u0026#34;, \u0026#34;Bash(npm test)\u0026#34;, \u0026#34;Bash(npx prettier *)\u0026#34; ], \u0026#34;deny\u0026#34;: [ \u0026#34;Bash(rm -rf *)\u0026#34;, \u0026#34;Bash(git push --force *)\u0026#34; ] } } Per-Profile Permission Management — Claude CTX If you need different permission settings across multiple projects, the open-source tool Claude CTX is worth a look:\n# Install (macOS) brew install claude-ctx # Check current profile claude ctx -c # Switch profiles claude ctx work # Switch to work settings claude ctx personal # Switch to personal project settings Claude CTX manages per-profile settings.json and CLAUDE.md files under ~/.claude/profiles/. It automatically backs up the current state on switch so settings never bleed into each other.\n9. MCP Server Configuration MCP (Model Context Protocol) servers can be configured directly in settings.json. You can also assign different MCP tools to different sub-agents.\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;filesystem\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;@modelcontextprotocol/server-filesystem\u0026#34;, \u0026#34;/path/to/project\u0026#34;] }, \u0026#34;github\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;@modelcontextprotocol/server-github\u0026#34;], \u0026#34;env\u0026#34;: { \u0026#34;GITHUB_PERSONAL_ACCESS_TOKEN\u0026#34;: \u0026#34;${GITHUB_TOKEN}\u0026#34; } }, \u0026#34;postgres\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;@modelcontextprotocol/server-postgres\u0026#34;], \u0026#34;env\u0026#34;: { \u0026#34;DATABASE_URL\u0026#34;: \u0026#34;${DATABASE_URL}\u0026#34; } } } } Configuration can be placed at project level (.claude/settings.json) or global level (~/.claude/settings.json), with project level taking priority.\n10. Custom Slash Commands Create markdown files in .claude/commands/ to define custom slash commands.\n.claude/ commands/ review.md → invoked as /review deploy.md → invoked as /deploy e2e-test.md → invoked as /e2e-test Example: /review Command # Code Review Review currently staged changes: 1. Check changes with `git diff --cached` 2. Check for security vulnerabilities 3. Check for performance issues 4. Review code style 5. Output results in a structured format No registration is required. Simply placing the file in the directory is enough for Claude Code to pick it up automatically. Unlike skills, these act as simple prompt templates — useful for collapsing repetitive workflows into a single command.\n11. Pre/Post Hooks and Exit Codes Hooks run custom scripts before or after Claude Code\u0026rsquo;s tool calls. The critical behavior is that the exit code determines what happens next.\nExit Code Behavior Exit Code Behavior Use Case 0 Success, not inserted into context Confirm normal completion 2 Blocking — error message is fed back to Claude Block forbidden commands Other Non-blocking, shown only in verbose mode Warning messages Real Example: Enforcing a Package Manager A hook to force uv when Claude tries to use pip due to training data patterns:\n{ \u0026#34;hooks\u0026#34;: { \u0026#34;pre_tool_use\u0026#34;: [ { \u0026#34;tool\u0026#34;: \u0026#34;Bash\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;./hooks/enforce-uv.sh\u0026#34; } ] } } #!/bin/bash # hooks/enforce-uv.sh if echo \u0026#34;$CLAUDE_TOOL_INPUT\u0026#34; | grep -q \u0026#34;pip install\u0026#34;; then echo \u0026#34;ERROR: Use uv instead of pip. Please use \u0026#39;uv pip install\u0026#39; or \u0026#39;uv add\u0026#39;.\u0026#34; exit 2 # Blocking — Claude reads this message and corrects the command fi exit 0 Forced Large File Reading Hook { \u0026#34;hooks\u0026#34;: { \u0026#34;pre_tool_use\u0026#34;: [ { \u0026#34;tool\u0026#34;: \u0026#34;Read\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;./hooks/check-file-lines.sh\u0026#34; } ] } } This hook checks the line count every time the Read tool runs and forces chunked reading via exit code 2 when the file exceeds 2,000 lines.\n12. Open Source Companion Tools Claude CTX — Profile Manager As mentioned in the permissions section, Claude CTX manages multiple configuration profiles:\n~/.claude/ profiles/ work/ settings.json CLAUDE.md personal/ settings.json CLAUDE.md client-a/ settings.json CLAUDE.md backups/ 2026-04-01T10:00:00/ Customizing Attribution If you find it annoying that Claude automatically adds a co-author to GitHub commits:\n{ \u0026#34;attribution\u0026#34;: { \u0026#34;commit\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;pr\u0026#34;: \u0026#34;\u0026#34; } } Setting these to empty strings prevents co-author tags from being added. You can also set a custom string to display a specific name.\nOther Useful Tips Prompt Stashing: Press Ctrl+S to temporarily save the current prompt, handle other work first, then have it automatically restored Direct Sub-agent Invocation: Use the claude --agent \u0026lt;name\u0026gt; flag to call a specific sub-agent directly and eliminate the loading overhead My Combined settings.json A practical settings.json combining everything above:\n{ \u0026#34;cleanup_period_days\u0026#34;: 90, \u0026#34;max_read_file_tokens\u0026#34;: 100000, \u0026#34;auto_compact_percentage_override\u0026#34;: 75, \u0026#34;notifications\u0026#34;: { \u0026#34;enabled\u0026#34;: true, \u0026#34;on_complete\u0026#34;: true }, \u0026#34;permissions\u0026#34;: { \u0026#34;allow\u0026#34;: [ \u0026#34;Read\u0026#34;, \u0026#34;Glob\u0026#34;, \u0026#34;Grep\u0026#34;, \u0026#34;Bash(git *)\u0026#34;, \u0026#34;Bash(uv *)\u0026#34;, \u0026#34;Bash(npm test)\u0026#34; ], \u0026#34;deny\u0026#34;: [ \u0026#34;Bash(rm -rf *)\u0026#34;, \u0026#34;Bash(git push --force *)\u0026#34; ] }, \u0026#34;attribution\u0026#34;: { \u0026#34;commit\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;pr\u0026#34;: \u0026#34;\u0026#34; }, \u0026#34;disable_telemetry\u0026#34;: true, \u0026#34;disable_error_reporting\u0026#34;: true, \u0026#34;hooks\u0026#34;: { \u0026#34;pre_tool_use\u0026#34;: [ { \u0026#34;tool\u0026#34;: \u0026#34;Bash\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;./hooks/enforce-uv.sh\u0026#34; } ] } } Key Takeaways Understand the settings hierarchy: ~/.claude/settings.json (global) → .claude/settings.json (project) → environment variables, in increasing priority order. Separating settings by project reduces conflicts.\nThe 30K default limit is legacy baggage: That conservative default was set in the 200K context era. In the 1M token world, you need to actively raise output and file read limits to get real value from Claude.\nAuto-compact at 75% is quality insurance: The 95% default says \u0026ldquo;remember as much as possible,\u0026rdquo; but given that quality degrades after 70%, 75% is a practical balance.\nExit code 2 is the heart of hooks: This is not just pre/post processing — it is a mechanism for actively correcting Claude\u0026rsquo;s behavior. Enforcing team coding standards through hooks significantly improves consistency in AI-generated code.\nPath-specific rules are a future investment: They may feel like over-engineering early on, but as codebases grow, a single CLAUDE.md becomes a bottleneck. Splitting early pays off significantly later.\nReferences 12 Hidden Settings To Enable In Your Claude Code Setup — AI LABS Claude Code Official Documentation Claude CTX GitHub ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-claude-code-settings/cover-en.jpg","permalink":"/posts/claude-code-hidden-settings/","title":"Claude Code Practical Guide 5 — 12 Hidden Settings You Should Enable"},{"content":"Overview In the age of AI coding agents, web scraping has evolved from simple data collection into critical infrastructure for competitive analysis, lead enrichment, and market research. But Claude Code\u0026rsquo;s built-in web fetch cannot properly handle JavaScript-rendered sites or pages protected by anti-bot systems. Firecrawl confronts this problem head-on. It converts web data into LLM-ready markdown and structured JSON, and integrates seamlessly with Claude Code through an MCP server.\nWhere Claude Code\u0026rsquo;s web fetch Falls Short Claude Code\u0026rsquo;s built-in web fetch works by fetching raw HTML directly. This approach has three clear limitations.\nJavaScript rendering failure — On SPAs (Single Page Applications) or dynamically loaded sites, it retrieves only an empty shell. Tools like SimilarWeb render their statistics client-side, so web fetch cannot read any of the numbers. Anti-bot blocking — Sites with bot detection like Yellow Pages and Booking.com return repeated 403 errors. In real tests, scraping Yellow Pages plumber listings with web fetch produced nothing but a stream of 403s. Speed and token inefficiency — When scraping four Amazon product pages, web fetch took 5 minutes 30 seconds while Firecrawl completed the same work in 45 seconds. Dumping 13,000 lines of raw HTML into an LLM is a waste of tokens. What Is Firecrawl? Firecrawl is a web scraping platform that converts web data into LLM-friendly formats. Its key characteristics:\nMarkdown conversion: Extracts web pages as clean markdown Schema support: Define only the fields you want and receive structured JSON Anti-bot bypass: Its proprietary Fire Engine passes through bot detection systems Token efficiency: Saves to the local filesystem and extracts only needed data to minimize token usage Open source: Self-hosting is possible (but anti-bot bypass and agent features are paid-only) Firecrawl vs Traditional Scraping flowchart LR subgraph Traditional[\"Traditional approach\u0026lt;br/\u0026gt;Playwright / Puppeteer\"] A[\"Browser install\u0026lt;br/\u0026gt;environment setup\"] --\u003e B[\"Write selectors\u0026lt;br/\u0026gt;parse DOM\"] B --\u003e C[\"Anti-bot\u0026lt;br/\u0026gt;workaround code\"] C --\u003e D[\"raw HTML\u0026lt;br/\u0026gt;13,000+ lines\"] D --\u003e E[\"Post-processing\u0026lt;br/\u0026gt;data cleanup\"] end subgraph Firecrawl[\"Firecrawl approach\"] F[\"CLI or\u0026lt;br/\u0026gt;MCP call\"] --\u003e G[\"Define schema\u0026lt;br/\u0026gt;JSON schema\"] G --\u003e H[\"Auto anti-bot\u0026lt;br/\u0026gt;Fire Engine\"] H --\u003e I[\"LLM-ready\u0026lt;br/\u0026gt;Markdown or JSON\"] end style Traditional fill:#ffcccc,stroke:#cc0000 style Firecrawl fill:#ccffcc,stroke:#00cc00 Playwright / Puppeteer Firecrawl Setup complexity Browser binary + driver config npx firecrawl one-liner Anti-bot Must implement yourself Fire Engine built-in JS rendering Yes (headless browser) Yes (managed sandbox) Output format Raw HTML / DOM objects Markdown / structured JSON LLM integration Requires separate pipeline Direct MCP server connection Token efficiency Low (full HTML) High (schema-based extraction) Large-scale crawling Must implement yourself Built in via crawl / map commands 5 Core Commands Firecrawl CLI offers five main commands.\n1. scrape — Single Page Extraction The most basic command. Specify a URL and retrieve that page\u0026rsquo;s content as markdown.\nnpx firecrawl scrape https://www.amazon.com/dp/B0CZJR9KCZ 2. search — Web Search + Scraping Use this when you do not know the URL. Search by keyword and automatically scrape the result pages.\nnpx firecrawl search \u0026#34;2026 best noise cancelling headphones review\u0026#34; 3. browse — Cloud Browser Interaction Opens a cloud browser session to perform clicks, form input, snapshots, and more. Think of it as Playwright managed by Firecrawl.\nnpx firecrawl browse https://example.com --action \u0026#34;click login button\u0026#34; 4. crawl — Full Site Crawling Starting from a URL, follows links and systematically scrapes an entire site.\nnpx firecrawl crawl https://docs.example.com --limit 100 5. map — Domain URL Discovery Discovers all URLs within a domain to generate a sitemap. Useful for understanding site structure before crawling.\nnpx firecrawl map https://example.com MCP Server Integration with Claude Code The most powerful way to use Firecrawl with Claude Code is via an MCP (Model Context Protocol) server. Setup is straightforward.\nInstallation # Install Firecrawl CLI npx firecrawl setup # Add MCP server in Claude Code claude mcp add firecrawl -- npx -y firecrawl-mcp Usage Example Once connected via MCP, you can use it with natural language.\n# Natural language request in Claude Code \u0026#34;Pull product name, price, rating, and review count from these 5 Amazon product pages and organize them in a table\u0026#34; # Claude Code automatically selects the Firecrawl scrape tool and runs it Schema-Based Extraction Example Define the data fields you want as a JSON schema and get exactly those fields back.\n{ \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;properties\u0026#34;: { \u0026#34;product_name\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;string\u0026#34; }, \u0026#34;price\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;string\u0026#34; }, \u0026#34;rating\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;number\u0026#34; }, \u0026#34;review_count\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;integer\u0026#34; }, \u0026#34;seller\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;string\u0026#34; } }, \u0026#34;required\u0026#34;: [\u0026#34;product_name\u0026#34;, \u0026#34;price\u0026#34;, \u0026#34;rating\u0026#34;] } Applying this schema to an Amazon product page returns clean 5-line JSON instead of 13,000 lines of HTML.\n{ \u0026#34;product_name\u0026#34;: \u0026#34;Sony WH-1000XM5 Wireless Headphones\u0026#34;, \u0026#34;price\u0026#34;: \u0026#34;$278.00\u0026#34;, \u0026#34;rating\u0026#34;: 4.5, \u0026#34;review_count\u0026#34;: 12847, \u0026#34;seller\u0026#34;: \u0026#34;Amazon.com\u0026#34; } Real-World Demo: Amazon Product Scraping Let\u0026rsquo;s compare actual test results.\nTest conditions: Extract product information from 4 Amazon product pages\nClaude Code (web fetch) Claude Code + Firecrawl Time ~5 min 30 sec ~45 sec Success rate Partial (unstable HTML parsing) 100% Token usage High (full raw HTML) Low (schema fields only) Output format Unstructured text Structured JSON SimilarWeb test (JavaScript-rendered site):\nweb fetch: timed out after 4 min 30 sec, collected only empty shells Firecrawl: 42 seconds, fully captured traffic metrics, geographic breakdown, and social media share Yellow Pages test (anti-bot protection):\nweb fetch: continuous 403 errors, 0 results Firecrawl: 53 seconds, 16 business listings collected Pricing Plan Credits Price Notes Free 500 Free One-time, trial use Hobby 3,000/mo $16/mo Personal projects Standard 100,000/mo $83/mo Startups Growth 500,000/mo $333/mo Large-scale operations Open-source self-hosting is available, but the following features are paid-only:\nFire Engine (anti-bot bypass) Agent mode Browser Interact Requires Docker environment setup Practical Use Cases Competitive Analysis Periodically collect competitor traffic data from SimilarWeb to build a dashboard. Impossible with web fetch due to JavaScript rendering, but Firecrawl finishes in 42 seconds.\nLead Enrichment Crawl company websites to extract decision-maker information, tech stacks, and job listings as structured data. Can process 50 company sites at once.\nMarket Research Use schema-based collection to gather competitor product prices, ratings, and reviews from Amazon and other e-commerce platforms. Run on a schedule to track price trends.\nContent Collection Crawl technical blogs and documentation sites to build knowledge bases for RAG (Retrieval-Augmented Generation) pipelines.\nTakeaways Looking at Firecrawl, the paradigm of web scraping is shifting — from \u0026ldquo;directly manipulating the browser\u0026rdquo; to \u0026ldquo;declaring the schema of the data you want.\u0026rdquo;\nWriting selectors with Playwright or Puppeteer, bypassing anti-bot systems, parsing HTML — all of that was ultimately just a means to get the data you wanted. Firecrawl abstracts away those means so the developer only needs to declare what they want, and the rest is handled automatically. This is analogous to how SQL replaced direct filesystem access.\nThat said, the free tier is capped at 500 requests, and the core differentiator — anti-bot bypass — is paid-only. The self-hosted open-source version lacks anti-bot, which means Firecrawl\u0026rsquo;s real value lies in its proprietary Fire Engine technology. It will be worth watching how the pricing model evolves over the long term.\nStill, the ability to instruct web scraping in natural language from within Claude Code via MCP integration has genuine potential to change development workflows — especially for projects that need large-scale data collection, where the time savings relative to the cost are clear.\nReference Videos\nClaude Code + Firecrawl = UNLIMITED Web Scraping — Chase AI 퍼페티어는 이제 그만! AI 웹 스크래핑 끝판왕 Firecrawl CLI 등장 — Nova AI Daily ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-firecrawl-web-scraping/cover-en.jpg","permalink":"/posts/2026-04-01-firecrawl-web-scraping/","title":"Firecrawl — The Web Scraping Powerhouse for the AI Era"},{"content":"A previous post compared Observability vs Monitoring and the approaches of Honeycomb and Grafana. This post dives deeper into Honeycomb\u0026rsquo;s official documentation to solidify the core concepts of observability, then compares self-hostable open source alternatives from a practical standpoint.\nPrevious post: Observability vs Monitoring — Honeycomb vs Grafana\nCore Observability Concepts The definition Honeycomb\u0026rsquo;s documentation emphasizes most is this:\nObservability is about being able to ask arbitrary questions about your environment without having to know ahead of time what you wanted to ask.\nMonitoring means setting thresholds for problems you already know about and receiving alerts. Observability means being able to ask unexpected questions. In a microservices environment, the root cause of an incident can be an infinite combination of factors — predefined dashboards cannot diagnose a new type of problem you have never seen before.\nImproving observability requires two things:\nCollecting telemetry data that contains rich runtime context The ability to repeatedly query that data to discover insights Structured Events vs Metrics vs Logs The heart of Honeycomb\u0026rsquo;s data model is the structured event. Understanding the difference between events, metrics, and logs is the starting point for observability.\nStructured Event An event is a JSON object that completely describes a single unit of work. The full cycle of receiving an HTTP request, processing it, and returning a response becomes one event.\n{ \u0026#34;service.name\u0026#34;: \u0026#34;retriever\u0026#34;, \u0026#34;duration_ms\u0026#34;: 0.011668, \u0026#34;dataset_id\u0026#34;: \u0026#34;46829\u0026#34;, \u0026#34;global.env\u0026#34;: \u0026#34;production\u0026#34;, \u0026#34;global.instance_type\u0026#34;: \u0026#34;m6gd.2xlarge\u0026#34;, \u0026#34;global.memory_inuse\u0026#34;: 671497992, \u0026#34;trace.trace_id\u0026#34;: \u0026#34;845a4de7-...\u0026#34;, \u0026#34;trace.span_id\u0026#34;: \u0026#34;84c82b34...\u0026#34; } The key is that every field is queryable. Find slow requests by duration_ms, group by instance_type, and explore the correlation with memory_inuse — all in one go.\nThe Limits of Pre-aggregated Metrics The metrics approach pre-aggregates data before sending it:\n{ \u0026#34;time\u0026#34;: \u0026#34;4:03 pm\u0026#34;, \u0026#34;total_hits\u0026#34;: 500, \u0026#34;avg_duration\u0026#34;: 113, \u0026#34;p95_duration\u0026#34;: 236 } What if you want to see \u0026ldquo;latency difference based on storage engine cache hit\u0026rdquo;? You would need to pre-create combinations like avg_duration_cache_hit_true, p95_duration_cache_hit_true. This is the curse of dimensionality — as dimensions increase, the number of required metrics grows exponentially.\nThe Limits of Unstructured Logs Logs are easy for humans to read but hard to query. To answer \u0026ldquo;which service takes the longest to start?\u0026rdquo; you need to parse and subtract multiple lines of timestamps. A structured event answers the same question instantly with a single duration_ms field.\ngraph TD A[\"Telemetry Data\"] --\u003e B[\"Structured Events\"] A --\u003e C[\"Metrics\"] A --\u003e D[\"Logs\"] B --\u003e B1[\"Every field queryable \u0026lt;br/\u0026gt; High Cardinality support\"] C --\u003e C1[\"Requires pre-aggregation \u0026lt;br/\u0026gt; Curse of dimensionality\"] D --\u003e D1[\"Requires parsing \u0026lt;br/\u0026gt; Hard to structure\"] B1 --\u003e E[\"Observability achieved\"] C1 --\u003e F[\"Monitoring level\"] D1 --\u003e F style B fill:#f5a623,color:#000 style E fill:#7ed321,color:#000 style F fill:#d0021b,color:#fffDistributed Tracing Tracing ties together instrumentation from separate services to surface cross-service failures. If you run any user-facing software — even a proxy, an app, and a database — you are running a distributed system.\nHow Traces Work A trace tells the story of a complete unit of work. When a user loads a page, their request might pass through an edge proxy, a frontend service, authorization, rate limiting, backend services, and data stores. Each part of this story is told by a span.\nA span represents a single unit of work from a single location in code. Each span contains:\nserviceName — which service the span is from name — the role of the span (function or method name) timestamp and duration — when it started and how long it took traceID — which trace the span belongs to parentID — the parent span that called this one graph LR A[\"Client Request\"] --\u003e B[\"Edge Proxy \u0026lt;br/\u0026gt; span 1\"] B --\u003e C[\"Frontend \u0026lt;br/\u0026gt; span 2\"] C --\u003e D[\"Auth Service \u0026lt;br/\u0026gt; span 3\"] C --\u003e E[\"Rate Limiter \u0026lt;br/\u0026gt; span 4\"] C --\u003e F[\"Backend API \u0026lt;br/\u0026gt; span 5\"] F --\u003e G[\"Database \u0026lt;br/\u0026gt; span 6\"] style A fill:#e8e8e8,color:#000 style B fill:#4a90d9,color:#fff style F fill:#f5a623,color:#000 style G fill:#7ed321,color:#000All spans sharing the same traceID form a complete picture of how a single request flowed through the entire system. By examining span durations, you can pinpoint exactly which service is the bottleneck — something impossible with traditional logs or metrics alone.\nWhy High Cardinality Matters Cardinality refers to the number of unique values a given field can hold. Fields like user_id, trace_id, and request_id can have millions of distinct values — this is high cardinality.\nTraditional metrics tools (Prometheus, Graphite, etc.) handle high cardinality poorly. When label combinations explode, performance degrades sharply. But in observability, questions like \u0026ldquo;why is it slow for this specific user?\u0026rdquo; require tracking individual values — that is the entire point.\nHoneycomb uses columnar storage to efficiently handle high cardinality data. Its BubbleUp feature automatically detects outliers and identifies which field combinations are correlated with the problem.\nCore Analysis Loop Honeycomb\u0026rsquo;s proposed debugging methodology is the Core Analysis Loop:\nObserve: Visualize the overall state of the system Hypothesize: When you spot an anomalous pattern, form a hypothesis about the cause Validate: Slice the data with GROUP BY and WHERE to validate or disprove the hypothesis Iterate: Return to new questions and repeat This is fundamentally different from \u0026ldquo;look at dashboards and wait for alerts.\u0026rdquo; The Query Builder lets you freely explore data by combining SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, and HAVING clauses.\nHoneycomb Intelligence — AI-Powered Analysis Honeycomb Intelligence is a suite of AI features that help engineers investigate faster. The key features include:\nCanvas — An interactive investigation surface where you can ask questions about your system in natural language. Canvas generates queries, visualizations, and explanations automatically, providing a conversational debugging experience Query Assistant — Auto-generates Honeycomb queries from natural language descriptions. Input like \u0026ldquo;show me the slowest endpoints grouped by service\u0026rdquo; becomes an executable query Hosted MCP Service — Honeycomb provides a Model Context Protocol (MCP) server, enabling AI agents and tools (Claude, Cursor, etc.) to query Honeycomb data directly Honeycomb\u0026rsquo;s AI principles commit to transparency about which features use AI, ensuring data is not used to train models, and making AI features optional. Customer data sent to third-party AI providers (like OpenAI or Anthropic) is processed under data processing agreements that prohibit training on customer data.\nSending Data with OpenTelemetry Honeycomb natively supports OpenTelemetry, the open-source standard for collecting telemetry data. If instrumenting code for the first time, Honeycomb recommends starting with OpenTelemetry.\nKey Integration Points OTLP Protocol: Honeycomb receives data via OpenTelemetry Protocol (OTLP) over gRPC, HTTP/protobuf, and HTTP/JSON Direct export: Send OTLP data directly to Honeycomb\u0026rsquo;s endpoint — no collector required for simple setups Collector support: Use the OpenTelemetry Collector to convert legacy formats (OpenTracing, Zipkin, Jaeger) into OTLP The minimum configuration requires two environment variables:\nexport OTEL_EXPORTER_OTLP_ENDPOINT=\u0026#34;https://api.honeycomb.io:443\u0026#34; export OTEL_EXPORTER_OTLP_HEADERS=\u0026#34;x-honeycomb-team=YOUR_API_KEY\u0026#34; OpenTelemetry SDKs are available for Go, Python, Java, .NET, Node.js, Ruby, and more. Each SDK provides auto-instrumentation for common frameworks, meaning you can get traces and metrics with minimal code changes.\nMigration from Legacy Systems If already using Jaeger, Zipkin, or OpenTracing instrumentation, the OpenTelemetry Collector can act as a bridge — receiving data in legacy formats and exporting to Honeycomb in OTLP. This makes migration incremental rather than requiring a full re-instrumentation.\neBPF and Observability eBPF (extended Berkeley Packet Filter) is a technology that runs extended functionality inside the Linux kernel without modifying it. It matters for observability because it enables telemetry collection without any code changes.\nHow It Works JIT Compiler: eBPF programs run through an in-kernel JIT compiler for high performance Hook Points: Connects to predefined hooks — system calls, function entry/exit, kernel tracepoints, network events Kprobes / Uprobes: Where predefined hooks do not exist, kernel probes (Kprobes) or user probes (Uprobes) can attach eBPF programs to almost any point Observability Applications eBPF is especially valuable for languages without automatic instrumentation (C++, Rust, etc.). From outside the application, kernel probes can collect network activity, CPU and memory usage, and network interface metrics.\nOpenTelemetry is currently developing Go-based eBPF auto-instrumentation supporting HTTP client/server, gRPC, and gorilla/mux routers. Support for C++ and Rust is planned.\nOpen Source Alternatives Honeycomb is powerful but SaaS lock-in and cost can be concerns. Here is a practical look at self-hostable open source alternatives.\nJaeger Creator: Uber Backend: Cassandra / Elasticsearch Strengths: Core strength in span-level call timing and latency analysis. Compatible with Zipkin; native OpenTelemetry support Deployment: Kubernetes Helm chart, Jaeger Operator for easy deployment UI: Service-based duration queries and trace timeline visualization on port 16686 # All-in-one (for development/testing) ./jaeger-all-in-one --memory-max-table-size=100000 # EKS deployment kubectl create namespace observability kubectl apply -f jaeger-operator.yaml Zipkin Creator: Twitter Backend: Elasticsearch / MySQL Strengths: Lightweight, simple tracing server. Native integration with Spring Cloud Sleuth Deployment: Single Docker command docker run -d -p 9411:9411 openzipkin/zipkin Automatically generates service call graphs and dependency diagrams, useful for incident analysis. OpenTelemetry support is bridge-based rather than native, requiring more configuration.\nSigNoz Strengths: OpenTelemetry-native open source APM. Provides Honeycomb-style queries and dashboards for self-hosting Backend: ClickHouse (high-performance columnar DB) Advantages: Logs, metrics, and traces in one unified platform. The closest open source alternative to Honeycomb Deployment: AWS ECS CloudFormation templates, full Kubernetes stack support SigNoz receives OTLP (OpenTelemetry Protocol) directly, so you can send data from the OpenTelemetry Collector without any transformation.\nPinpoint Creator: Naver Backend: HBase Strengths: Optimized for large-scale Java application tracing. Bytecode instrumentation applies the agent without any code changes Key Features: Scatter/Timeline charts for detailed call flow and timing analysis. Battle-tested stability in large Korean enterprise environments # Apply agent (JVM option) java -javaagent:pinpoint-agent.jar \\ -Dpinpoint.agentId=myapp-01 \\ -Dpinpoint.applicationName=my-service \\ -jar my-application.jar Comparison Table Tool Backend OTel Support K8s Deployment Core Strength Honeycomb SaaS (AWS) Native N/A (SaaS) High cardinality queries, BubbleUp, AI analysis Jaeger ES / Cassandra Native Helm / Operator High-traffic span tracing Zipkin ES / MySQL Bridge Basic Deployment Simple setup, Spring integration SigNoz ClickHouse Native Full stack All-in-one observability (logs + metrics + traces) Pinpoint HBase Partial Supported Large-scale Java APM, bytecode instrumentation Honeycomb Pricing (2026) Plan Monthly Cost Event Limit Retention Target Free Free 20M/month 60 days Small teams, testing Pro $100+ 1.5B/month 60 days Growing teams, SLO needed Enterprise Custom Unlimited Extended Large scale, Private Cloud Annual contracts receive a 15-20% discount. The Free plan\u0026rsquo;s 20M events is sufficient for validating a small service.\nTakeaways The essence of observability is a mindset shift, not a tool choice. The core question is not \u0026ldquo;what dashboards should we build?\u0026rdquo; but \u0026ldquo;can we ask any question at all?\u0026rdquo; Honeycomb implements this philosophy through structured events and high cardinality queries.\nThe addition of Honeycomb Intelligence signals where the industry is heading — AI-assisted debugging that generates queries from natural language and provides investigation guidance through Canvas. The MCP integration means AI agents can now query production telemetry directly, further lowering the barrier to effective observability.\nPractical selection criteria:\nFast start: Build observability experience first with the Honeycomb Free plan (20M events/month) Self-hosted all-in-one: SigNoz is the closest open source alternative to Honeycomb — good ClickHouse query performance and OTel-native Java-heavy legacy systems: Pinpoint applies via bytecode instrumentation with no code changes Already comfortable with Kubernetes: Jaeger + OpenTelemetry Collector combination has the broadest ecosystem Migration path: OpenTelemetry Collector bridges legacy instrumentation (Jaeger/Zipkin format) to any modern backend, making incremental adoption practical eBPF is still early-stage, but its promise of instrumentation without code changes will make it increasingly important in the Go, C++, and Rust ecosystems. When OpenTelemetry\u0026rsquo;s eBPF-based auto-instrumentation matures, the cost of adopting observability will drop significantly.\nReferences Honeycomb Docs: Introduction to Observability Honeycomb Docs: Events, Metrics, and Logs Honeycomb Docs: Distributed Tracing Honeycomb Docs: eBPF Honeycomb Docs: Build a Query Honeycomb Docs: Send Data with OpenTelemetry Honeycomb Docs: Honeycomb Intelligence Jaeger - Distributed Tracing Zipkin SigNoz - Open Source APM Pinpoint - Application Performance Management ","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-honeycomb-observability/cover.jpg","permalink":"/posts/2026-04-01-honeycomb-observability/","title":"Honeycomb and Observability Fundamentals — Comparing Open Source Alternatives"},{"content":"Stripe has disclosed that it merges over 1,300 AI-authored PRs per week. Engineers write no code themselves — they only review. At the same time, an adversarial development technique inspired by Anthropic research is dramatically improving coding agent reliability. This post analyzes the internal architecture of Stripe Minions and the adversarial development pattern, then looks at how to build something similar yourself.\nStripe Minions — Behind 1,300+ Weekly PRs Why Stripe Stripe is one of the most demanding environments to run coding agents in.\nRuby backend — an uncommon stack that is less familiar to LLMs Massive proprietary libraries — a homegrown, non-open-source codebase Over $1T in annual payment volume — a single code error can be catastrophic The fact that AI-written PRs are being merged at 1,300+ per week in this environment means the workflow reliability is proportionally high. Among Stripe\u0026rsquo;s 3,400+ engineers filing roughly 8,000 PRs per week, the AI-authored share is growing quickly.\nThe Core Principle: System Controls the Agent The key insight of Stripe Minions is that the system controls the agent, not the other way around.\nIn a typical AI coding workflow, the agent handles planning, implementation, and verification. The problem is there is no guarantee the agent performs the verification we actually want. Stripe addresses this by building blueprint-based workflows that combine deterministic nodes and agent nodes.\n\u0026ldquo;In our experience, writing code to deterministically accomplish small decisions we can anticipate — like we always want to lint changes at the end of a run — is far more reliable than asking an agent to do it.\u0026rdquo;\nflowchart TD A[\"Slack / CLI\u0026lt;br/\u0026gt;Entry Point\"] --\u003e B[\"Context Curation\u0026lt;br/\u0026gt;(Deterministic)\"] B --\u003e B1[\"Select MCP tools\u0026lt;br/\u0026gt;from Tool Shed\"] B --\u003e B2[\"Search docs \u0026amp;\u0026lt;br/\u0026gt;assemble context\"] B1 --\u003e C[\"Agent: Implement\u0026lt;br/\u0026gt;(Isolated Dev Box)\"] B2 --\u003e C C --\u003e D[\"Linting \u0026amp; Type Check\u0026lt;br/\u0026gt;(Deterministic)\"] D --\u003e|Fails| E[\"Agent: Fix\"] E --\u003e D D --\u003e|Passes| F[\"Run Tests\u0026lt;br/\u0026gt;(Deterministic)\"] F --\u003e|Fails, up to 2 attempts| G[\"Agent: Fix tests\"] G --\u003e F F --\u003e|Passes| H[\"Human Review\u0026lt;br/\u0026gt;Submit PR\"] F --\u003e|Exceeds 2 failures| I[\"Escalate to\u0026lt;br/\u0026gt;Engineer\"] style A fill:#4a90d9,color:#fff style B fill:#50c878,color:#fff style B1 fill:#50c878,color:#fff style B2 fill:#50c878,color:#fff style C fill:#f5a623,color:#fff style D fill:#50c878,color:#fff style E fill:#f5a623,color:#fff style F fill:#50c878,color:#fff style G fill:#f5a623,color:#fff style H fill:#4a90d9,color:#fff style I fill:#d94a4a,color:#fffGreen = Deterministic Node, Orange = Agent Node — agents operate only in certain parts of the workflow.\nContext Curation — From 500 MCP Tools, Pick the Right Ones Stripe runs a single internal MCP server called Tool Shed that connects internal systems and SaaS platforms. Around 500 MCP tools are registered, but giving all of them to the agent causes confusion rather than helping.\nThe first deterministic node in the workflow analyzes the request, then:\nSearches relevant documentation and tickets to assemble context Selects only the relevant subset of MCP tools to hand to the agent The key is that this selection happens in code, not by the agent.\nIsolated Dev Box — Cattle, Not Pets Every Minion run happens in an isolated AWS EC2 instance pre-loaded with the Stripe codebase and lint cache for fast startup, then discarded when the run ends.\nSuperior permission management and scalability compared to worktrees or local containers An engineer can run multiple Minions in parallel simultaneously From over 3 million tests, only the relevant subset is selected and run When tests fail, the agent attempts fixes up to 2 times, then escalates to a human if the tests still do not pass. Infinite loop prevention is built into the design.\nWhat Other Companies Are Doing Stripe is not alone. Major tech companies are building similar structured workflow engines.\nCompany Tool Notes Shopify Roast Released as open source structured AI workflow engine Airbnb Internal tool Specialized for test migration AWS Internal tool Partially disclosed via blog posts The common thread: none of them delegate everything to agents. All clearly separate deterministic steps from agent steps.\nAdversarial Development — When Agents Argue The Sycophancy Problem — Gets Worse as Models Get Stronger One of AI\u0026rsquo;s biggest problems is sycophancy. LLMs tend to agree with users and to over-evaluate their own output. The troubling part is that this phenomenon gets worse as models become more powerful.\nIn coding agents, this is fatal:\nWhen an agent evaluates its own code → \u0026ldquo;a student grading their own homework\u0026rdquo; It points out a few minor issues while making the review look like it passed Real, serious problems remain hidden The Solution: A Separate Sparring Partner This approach is inspired by GANs (Generative Adversarial Networks). Just as a GAN has a Generator that creates images and a Discriminator that judges their authenticity, a coding agent can split into an Implementer and an Evaluator.\nThe critical point is that the Evaluator operates in a completely separate context session. Without the bias accumulated during implementation, it can produce genuinely objective evaluations.\nflowchart TD UP[\"User Prompt\"] --\u003e PL[\"Planner Agent\u0026lt;br/\u0026gt;Prompt → detailed spec\"] PL --\u003e NEG[\"Contract Negotiation\u0026lt;br/\u0026gt;Sprint split \u0026amp; criteria agreement\"] NEG --\u003e SP[\"Begin Sprint Cycle\"] subgraph SPRINT [\"Sprint N\"] direction TB GEN[\"Implementer Agent\u0026lt;br/\u0026gt;Implement code\"] EVAL[\"Evaluator Agent\u0026lt;br/\u0026gt;Evaluate in independent context\"] GEN --\u003e EVAL EVAL --\u003e|\"Score \u0026lt; threshold\u0026lt;br/\u0026gt;(max 3 retries)\"| GEN end SP --\u003e SPRINT EVAL --\u003e|\"All criteria pass\"| NEXT[\"Next Sprint or Done\"] NEXT --\u003e|\"More sprints remaining\"| SP style UP fill:#4a90d9,color:#fff style PL fill:#9b59b6,color:#fff style NEG fill:#50c878,color:#fff style GEN fill:#f5a623,color:#fff style EVAL fill:#d94a4a,color:#fff style NEXT fill:#4a90d9,color:#fffArchitecture Details Phase 1: Planner Agent\nTakes the user\u0026rsquo;s brief prompt and expands it into a detailed Product Specification Defines tech stack, feature requirements, and structure Phase 2: Contract Negotiation\nImplementer and Evaluator agree in advance Spec is split into multiple Sprints Evaluation criteria and threshold (1–10 score) set per Sprint \u0026ldquo;Adversarial but fair\u0026rdquo; rules are established first Phase 3: Sprint Cycle\nImplementer: Implements the features for the agreed Sprint Evaluator: Scores each criterion 1–10 in an independent context If below threshold, feedback is sent to Implementer for retry (max 3 times) All criteria pass → advance to next Sprint Cross-Model Evaluation An interesting option is using different models for Implementer and Evaluator.\nClaude implements → Codex evaluates Codex implements → Claude evaluates Because different models have different biases, cross-evaluation is more effective at addressing single-model sycophancy. This approach is directly inspired by Anthropic\u0026rsquo;s multi-agent evaluation research.\nBuilding Your Own Structured Workflow The core principles apply at any scale, not just Stripe\u0026rsquo;s.\nDesign Principles Predictable tasks must be deterministic — enforce linting, type checking, and test execution in code Agents only for creative tasks — implementation, bug fixes, and judgment-requiring work Cap retry counts — set a maximum and escalate when exceeded, to prevent infinite loops Curate context up front — do not hand the agent every available tool; provide only the subset needed for the task Isolated execution environment — run in a sandbox that cannot affect production code Considerations for Adopting Adversarial Development Benefit Cost Far higher reliability than single-agent Increased token usage (2-3x) Resolves sycophancy problem Longer execution time Good results possible with cheaper models Initial harness construction cost Reduced human review burden Contract negotiation overhead The core is a reliability vs cost tradeoff. In environments like Stripe where stability is critical, this overhead is fully justified. Even applied to PoC or prototype work, adversarial development delivers substantially higher completeness than a single agent.\nTakeaways The \u0026ldquo;system controls the agent\u0026rdquo; paradigm shift — the era of delegating everything to agents is ending. Stripe, Shopify, Airbnb, and AWS have all adopted the model of inserting agents into specific parts of a larger workflow. The paradox is that reducing agent autonomy actually increases reliability.\nSycophancy is a technical problem that requires a technical solution — if stronger models do not reduce sycophancy, the architecture level must address it. Adversarial development is not a trick — it applies to coding agents a principle validated by GANs.\nContext curation is a competitive advantage — selecting the right subset from Stripe\u0026rsquo;s 500 MCP tools, running the relevant subset of 3 million tests — the accuracy of that \u0026ldquo;selection\u0026rdquo; determines the performance of the entire workflow.\nThe potential of cross-model evaluation — combining Claude and Codex (or similar) lets different models compensate for each other\u0026rsquo;s blind spots. The question of model selection will shift from \u0026ldquo;which model is best?\u0026rdquo; to \u0026ldquo;which combination is optimal?\u0026rdquo;\nSource videos: Stripe\u0026rsquo;s Coding Agents Ship 1,300 PRs EVERY Week / Coding Agent Reliability EXPLODES When They Argue — Cole Medin\n","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-stripe-coding-agents/cover-en.jpg","permalink":"/posts/2026-04-01-stripe-coding-agents/","title":"How Stripe Ships 1,300 PRs a Week — Coding Agents and Adversarial Development"},{"content":"Coding agents excel at text but struggle with visual expression. Claude Code\u0026rsquo;s Skills system is a framework designed to systematically overcome this limitation. Using the Excalidraw diagram skill as a case study, we will go deep on Skills architecture and the philosophy of \u0026ldquo;visual argumentation.\u0026rdquo;\nOverview of Claude Code Skills What Are Skills? Skills are reusable prompt and resource packages — instruction sets bundled into a directory that teach a coding agent how to perform a specific task.\nThe core is the skill.md file. This markdown file defines the agent\u0026rsquo;s behavior: what input it accepts, what steps it follows, and what quality criteria it uses to validate output.\n.claude/skills/ ├── excalidraw-diagram/ │ ├── skill.md # Core instruction set │ ├── reference/ # Reference resources │ │ ├── color-palette.json │ │ └── element-templates/ │ └── render.py # Helper script ├── code-review/ │ └── skill.md └── documentation/ └── skill.md Skills are invoked via slash commands (e.g., /diagram). When Claude Code recognizes the intent of a prompt, it automatically loads the relevant skill.md and follows the workflow defined there.\nSkills vs MCP vs CLAUDE.md — When to Use Which All three extend agent behavior, but they serve different purposes.\nPurpose Scope Example CLAUDE.md Rules that apply to the entire project Always loaded Coding conventions, build commands Skills Systematic workflow for a specific task Loaded on demand Diagram creation, code review MCP Integration with external services (API calls) Tool level Sending Slack messages, DB queries CLAUDE.md says \u0026ldquo;always do it this way in this project.\u0026rdquo; Skills say \u0026ldquo;follow this procedure when doing this task.\u0026rdquo; MCP says \u0026ldquo;communicate with this external system like this.\u0026rdquo;\nThe advantages of Skills are clear:\nContext efficiency — loaded only when needed, no wasted tokens Reusability — build once, use across any project Shareable — distribute via a GitHub repo for anyone to clone and use Deep Analysis: The Excalidraw Diagram Skill The Problem: LLM Visual Limitations What happens when you ask a coding agent to \u0026ldquo;draw an architecture diagram\u0026rdquo; without a skill?\nThe result is a generic arrangement of boxes and arrows. Color choices are nearly random, there is no information hierarchy in the layout, and almost every diagram looks the same. LLMs are optimized for generating text tokens — visual decision-making (color combinations, spatial arrangement, visual flow) requires systematic guidance.\nThe Excalidraw skill solves this. It codifies \u0026ldquo;which colors to use,\u0026rdquo; \u0026ldquo;which layout patterns to apply,\u0026rdquo; and \u0026ldquo;how to validate the result\u0026rdquo; into skill.md.\nDirectory Structure .claude/skills/excalidraw-diagram/ ├── skill.md # Full workflow definition ├── reference/ │ ├── color-palette.json # Brand color system │ └── element-templates/ # Reusable shape templates └── render.py # PNG rendering script (for validation) The role of each file:\nskill.md — Instructs the agent through the full diagram creation process, from input analysis to the validation loop color-palette.json — Consistent color system with primary, secondary, and background colors defined as hex codes element-templates/ — JSON snippets for frequently used visual patterns (flow diagrams, architecture maps, etc.) render.py — Converts Excalidraw JSON to PNG for agent self-validation Breaking Down skill.md Let\u0026rsquo;s walk through the core workflow of skill.md step by step.\nStep 1: Input Processing The skill handles diverse input types:\n## Input Processing - **Code file** → Extract architecture, data flow, class relationships - **PDF document** → Identify core concepts and relationship structure - **YouTube transcript** → Convert explanatory flow into visual structure - **Raw text/notes** → Map relationships between concepts For a code file, it visualizes function call graphs or module dependencies. For a YouTube transcript, it visualizes the logical flow of the explanation.\nStep 2: Depth Assessment This step exists for a practical reason. Claude Code has a 32K token output limit. The Excalidraw JSON for a complex diagram can easily exceed this.\n## Depth Assessment IF simple diagram (single concept, few elements): → Build entire JSON in one pass IF complex diagram (multiple sections, many relationships): → Build section by section, merging incrementally Simple diagrams are generated in one pass; complex ones are built in sections and then merged.\nStep 3: Pattern Mapping This is the key step that prevents the agent from \u0026ldquo;repeating boxes and arrows\u0026rdquo;:\n## Pattern Mapping Choose a visual pattern based on the nature of the input: - System architecture → Layered hierarchy diagram - Data flow → Directed pipeline - Decision process → Branching tree - Comparison → Parallel layout with contrasting colors - Timeline → Horizontal or vertical time axis This also includes design principles like \u0026ldquo;avoid repetitive boxes\u0026rdquo; and \u0026ldquo;use multi-zoom architecture.\u0026rdquo;\nStep 4: JSON Generation Excalidraw\u0026rsquo;s native format is JSON. skill.md specifies the rules to follow when generating it:\n{ \u0026#34;type\u0026#34;: \u0026#34;excalidraw\u0026#34;, \u0026#34;version\u0026#34;: 2, \u0026#34;elements\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;rectangle\u0026#34;, \u0026#34;x\u0026#34;: 100, \u0026#34;y\u0026#34;: 200, \u0026#34;width\u0026#34;: 240, \u0026#34;height\u0026#34;: 80, \u0026#34;backgroundColor\u0026#34;: \u0026#34;#a5d8ff\u0026#34;, \u0026#34;strokeColor\u0026#34;: \u0026#34;#1971c2\u0026#34;, \u0026#34;roundness\u0026#34;: { \u0026#34;type\u0026#34;: 3 }, \u0026#34;boundElements\u0026#34;: [], \u0026#34;label\u0026#34;: { \u0026#34;text\u0026#34;: \u0026#34;Content Fetcher\u0026#34; } } ] } Colors are drawn from the palette, spacing and alignment rules are followed, and arrow connection start and end points are calculated precisely.\nStep 5: Validation Loop (Self-Validation) This is the most powerful part of the skill:\n## Validation Loop (2-4 iterations) 1. Render JSON → PNG via render.py 2. Directly inspect the generated PNG screenshot 3. Evaluate against criteria: - Is the visual flow natural? - Is the information hierarchy clear? - Are arrow connections accurate? - Is color contrast sufficient? - Is any text clipped? 4. If issues found, edit the JSON directly (not regenerate) 5. Repeat 2-4 times The agent \u0026ldquo;sees\u0026rdquo; its own work and revises it. render.py generates the PNG, and Claude Code\u0026rsquo;s multimodal capability analyzes the image to find improvements. Crucially, it edits the existing JSON directly rather than starting over each time.\nFull Workflow Visualization flowchart TD A[\"Input\u0026lt;br/\u0026gt;Code / PDF / Transcript\"] --\u003e B[\"Depth Assessment\u0026lt;br/\u0026gt;Simple vs Complex\"] B --\u003e|Simple| C[\"Generate JSON in one pass\"] B --\u003e|Complex| D[\"Section-by-section generation\u0026lt;br/\u0026gt;handling 32K token limit\"] D --\u003e E[\"Merge sections\"] C --\u003e F[\"Pattern Mapping\u0026lt;br/\u0026gt;Layout + Color selection\"] E --\u003e F F --\u003e G[\"Generate Excalidraw JSON\"] G --\u003e H[\"render.py\u0026lt;br/\u0026gt;PNG rendering\"] H --\u003e I{\"Visual Validation\u0026lt;br/\u0026gt;Flow / Hierarchy / Alignment\"} I --\u003e|Issues found| J[\"Edit JSON directly\"] J --\u003e H I --\u003e|Passes| K[\"Deliver final result\u0026lt;br/\u0026gt;Open in excalidraw.com or\u0026lt;br/\u0026gt;Obsidian\"] style A fill:#4dabf7,stroke:#1971c2,color:#fff style F fill:#69db7c,stroke:#2f9e44,color:#fff style I fill:#ffa94d,stroke:#e8590c,color:#fff style K fill:#b197fc,stroke:#7048e8,color:#fffThe Philosophy of Visual Argumentation The core philosophy of the Excalidraw skill is \u0026ldquo;visual argumentation.\u0026rdquo; This is not about making pretty pictures — the structure of the diagram itself must carry the argument.\nTwo Core Questions skill.md instructs the agent to ask two questions at every step:\n\u0026ldquo;Does the visual structure mirror the concept\u0026rsquo;s behavior?\u0026rdquo; \u0026ldquo;Could someone learn something concrete from this diagram?\u0026rdquo; The first question is about structural coherence. For example, using a circular layout to explain a pipeline creates a mismatch between the concept and the visual. If data flows from A to B, the diagram should flow left-to-right (or top-to-bottom) as well.\nThe second question is about educational value. The diagram should not be mere decoration — it should genuinely help readers understand the concept.\nThe Text Removal Test This is the most impressive validation technique:\nRemove all descriptive text from the diagram. The structure and layout alone must still communicate the argument.\nEven with all the labels stripped out, the direction of arrows, differences in element size, color distinctions, and spatial arrangement should still reveal \u0026ldquo;what matters and what is secondary,\u0026rdquo; and \u0026ldquo;where data flows from and to.\u0026rdquo;\nThis is the essence of visual argumentation. Text is supplementary — the visual structure itself must carry the claim.\nApplication Examples Subject What the structure must convey Bad example Microservices architecture Independence of services, communication paths All services as equal-sized boxes in a row Data pipeline Unidirectional flow, order of transformation stages Bidirectional arrows, random placement Decision tree Branch conditions, differences in outcomes per path All branches represented identically Hierarchical system Superior/subordinate relationships, dependency direction Flat enumeration Practical Demo Walkthrough Let\u0026rsquo;s follow the actual steps for using this skill.\nStep 1: Enter the Prompt In Claude Code, make a request like this:\nCreate a diagram of this file\u0026#39;s architecture /path/to/content_fetcher.py Or more specifically:\nCreate a data pipeline diagram based on this YouTube transcript. Focus on the relationships between core concepts. Step 2: Load skill.md Claude Code recognizes the intent and automatically loads .claude/skills/excalidraw-diagram/skill.md. From this moment, the agent\u0026rsquo;s behavior changes completely — it begins following the workflow defined in skill.md step by step.\nStep 3: Generate and Validate JSON The agent analyzes the input, assesses depth, selects a pattern, and generates the Excalidraw JSON. It then runs render.py to produce a PNG and validates its own output.\n# What the agent runs internally python render.py output.excalidraw --output preview.png # → Generates PNG, analyzes image # → \u0026#34;Arrow spacing is too tight\u0026#34; → Edit JSON # → Re-render → Re-validate # → Repeat 2-4 times Step 4: Render the Result The final JSON can be opened in two ways:\nexcalidraw.com — Open directly in a browser. Free. \u0026ldquo;Open\u0026rdquo; → Select the local .excalidraw file Obsidian Excalidraw plugin — Integrated with your note system. Drop the .excalidraw file in your Vault and it renders immediately Step 5: Iterate The first output will not be perfect. This is intentional. Consider the number of micro-decisions needed to produce a single diagram:\nx, y coordinates for every element Every color choice Start and end points for every arrow Text size and placement Spacing between elements All of these decisions cannot be perfect simultaneously. But if the starting point is 80% complete, the remaining 20% can be reached with 2-3 instructions:\n- The arrows are too short, spread them out - Increase the color contrast, the text is hard to read against the background - Make the \u0026#34;Data Layer\u0026#34; section larger to emphasize its importance The key point is the dramatic time savings compared to drawing from scratch. In a workflow that produces dozens of diagrams every week, this difference adds up to hours.\nGuide to Building Your Own Skills Once you understand the structure of the Excalidraw skill, you can build your own.\nTips for Writing skill.md # My Custom Skill ## Purpose Define the problem this skill solves in one sentence ## Inputs - What kinds of input does it accept - Format and constraints on the input ## Workflow 1. Analysis phase — how to interpret the input 2. Generation phase — what to produce and in what order 3. Validation phase — how to verify the result ## Quality Criteria - Specific, measurable quality standards - A definition of what \u0026#34;good output\u0026#34; looks like ## Anti-patterns - Common traps the agent falls into - Concrete examples of \u0026#34;do not do this\u0026#34; The key principle is specificity. Not \u0026ldquo;create a good diagram\u0026rdquo; but \u0026ldquo;if the information hierarchy is three levels or fewer, generate in a single pass; pull colors from color-palette.json; and the result must pass the text removal test.\u0026rdquo;\nUsing the reference Directory Put information that does not fit in skill.md into the reference directory:\nreference/ ├── color-palette.json # Color code definitions ├── element-templates/ # Reusable patterns ├── examples/ # Examples of good output └── anti-patterns/ # Examples of bad output Examples are especially powerful. Showing an actual JSON or markdown example of \u0026ldquo;produce output like this\u0026rdquo; guides the agent far more precisely than a written description.\nPrinciples for Designing the Validation Loop The most instructive aspect of the Excalidraw skill is its self-validation loop. This pattern can be applied to any skill:\nRun or render the output externally — parse JSON, execute code, render images Have the agent inspect the result directly — read error messages or analyze screenshots Fix the existing output when problems are found — do not start over from scratch Cap the number of iterations — prevent infinite loops. 2-4 iterations is appropriate Practical Skill Ideas Skill name Purpose Validation method Code Review Structurally analyze a PR diff Verify evidence for each checklist item Documentation Generate API docs from code Execute generated example code Test Generator Generate tests from function signatures Run generated tests Commit Message Generate meaningful commit messages from a diff Validate against conventional commits spec Architecture Audit Analyze codebase dependencies Run circular dependency detection script Takeaways The essence of the Skills system is \u0026ldquo;compensating for agent weaknesses with structure.\u0026rdquo; LLMs have high general capability but cannot deliver consistent quality in specific domains without systematic guidance. Skills package that structure for reuse.\nThe most noteworthy aspect of the Excalidraw skill is its validation loop. Instead of \u0026ldquo;make it and ship it,\u0026rdquo; the flow is \u0026ldquo;make it → check it → fix it\u0026rdquo; — all automated. This pattern is not limited to diagrams. It applies to code generation, documentation, data analysis, and almost any agent task. Designing an external feedback loop where the agent can validate its own work is the core of skill creation.\nThe concept of \u0026ldquo;visual argumentation\u0026rdquo; extends beyond diagrams as well. The principle that structure itself must carry the message applies equally to code architecture, document structure, and API design. Just as the directory structure alone should reveal a project\u0026rsquo;s separation of concerns, the diagram layout alone should reveal the core flow of a system.\nFinally, the act of building skills is itself a process of codifying your own expertise. Converting the tacit knowledge of \u0026ldquo;this is how I create diagrams\u0026rdquo; into an explicit workflow makes that knowledge scalable through agents. This is the core value of agentic engineering — not automating an expert\u0026rsquo;s judgment, but automating an expert\u0026rsquo;s process.\nSource: Build BEAUTIFUL Diagrams with Claude Code (Full Workflow) — Cole Medin\n","date":"2026-04-01T00:00:00+09:00","image":"/images/posts/2026-04-01-excalidraw-skill/cover-en.jpg","permalink":"/posts/2026-04-01-excalidraw-skill/","title":"The Excalidraw Diagram Skill — Teaching Coding Agents Visual Argumentation"},{"content":"Overview I analyzed three YouTube videos on Claude Code\u0026rsquo;s automation capabilities. The skill system (creation to deployment), scheduling as an alternative to n8n, and remote control via Dispatch — these three pillars are what transform Claude Code from a coding tool into a workflow automation platform. Related posts: Claude Computer Use, HarnessKit Dev Log\ngraph TD A[\"Claude Code Automation\"] --\u003e B[\"Skill System\"] A --\u003e C[\"Scheduling\"] A --\u003e D[\"Dispatch\"] B --\u003e E[\"Create\u0026lt;br/\u0026gt;Define slash command\"] B --\u003e F[\"Deploy\u0026lt;br/\u0026gt;Marketplace\"] C --\u003e G[\"Cron-based\u0026lt;br/\u0026gt;Recurring execution\"] C --\u003e H[\"Event trigger\u0026lt;br/\u0026gt;Hooks\"] C --\u003e I[\"Remote agent\u0026lt;br/\u0026gt;Remote triggers\"] D --\u003e J[\"Mobile → PC\u0026lt;br/\u0026gt;Remote session control\"] Skill System — Encapsulating Repetition The video Automating with Claude Skills — From Creation to Deployment covers the full lifecycle of a skill.\nWhat Skills Are A skill encapsulates a repeating workflow into a markdown file. Invoking it with a slash command (/skill-name) tells Claude to carry out the defined procedure. If CLAUDE.md is \u0026ldquo;always-on rules,\u0026rdquo; a skill is \u0026ldquo;a specialist you call in when needed.\u0026rdquo;\nCreation A skill file is structured as frontmatter + prompt:\n--- name: email-reply description: Draft a reply to an incoming email --- 1. Analyze the email content 2. Reference the tone in reference/tone.md 3. Structure a response for each key point 4. Write in a polite but clear voice Once created, a skill can be reused indefinitely — and the hundredth run can be better than the first by continuously refining it. Compare this to re-explaining context from scratch in a new chat every time, and the efficiency gain is massive.\nMarketplace Deployment Skills can go beyond personal use and be published to the marketplace. HarnessKit and log-blog are already listed there via this route. Package them as plugins and other users can install and use them immediately.\nScheduling — Why n8n Is Becoming Less Necessary The video Fewer Reasons to Use n8n Every Day introduces three scheduling approaches in Claude Code and compares them to automation tools like n8n.\nMethod 1: Cron-Based Recurring Execution Use the /schedule or /loop command to set up cron-expression-based recurring tasks. For example, register \u0026ldquo;check server logs every 30 minutes and classify errors\u0026rdquo; as a cron job, and Claude handles it on schedule.\nMethod 2: Event Triggers (Hooks) Automatically run a skill or task when a specific event occurs. File changes, git commits, and tool calls can all serve as triggers. Define hooks in settings.json.\nMethod 3: Remote Agents (Remote Triggers) Remotely trigger a Claude Code session running on a server. API calls or webhooks can kick off tasks, enabling integration with CI/CD pipelines or external services.\nn8n Comparison n8n Claude Code Scheduling Setup GUI node editor Natural language + cron Logic Node-to-node connections AI judgment Flexibility Predefined nodes Free-form Error handling Conditional branching AI self-assessment Cost Self-host free API costs This isn\u0026rsquo;t a complete replacement — there\u0026rsquo;s significant overlap in developer workflow automation. n8n excels at structured, predefined integrations; Claude Code excels at automation that requires unstructured judgment.\nDispatch — Remote Control from Your Phone The video Claude\u0026rsquo;s Biggest New Feature — Control Your PC From Your Phone introduces Claude Dispatch.\nDispatch lets you remotely trigger a Claude Code session on your PC from a mobile device and check the results. During your commute or while you\u0026rsquo;re out, you can instruct agents in your development environment and monitor their progress.\nCombined with Claude Computer Use, which was covered previously, this enables full automation where Claude controls the mouse and keyboard on a PC you\u0026rsquo;re not physically sitting at.\nThe Synergy of All Three Skills (what) + Schedule (when) + Dispatch (from anywhere) = Fully automated workflow A real-world example:\nSkill: Define \u0026ldquo;analyze server logs and generate error report\u0026rdquo; Schedule: Cron runs it every hour Dispatch: Mobile notification when an error is found, with option to send further instructions I\u0026rsquo;m already using this pattern in the trading-agent project — ScheduleManager handles cron editing, and MCP delegates analysis tasks to the agent.\nInsight The keyword threading through all three videos is \u0026ldquo;decentralized automation.\u0026rdquo; Centralized platforms like n8n and Zapier provide structured trigger-action pipelines. Claude Code\u0026rsquo;s automation supports unstructured, judgment-driven automation where AI makes the calls. Skills define the work, scheduling manages the timing, and Dispatch removes location constraints. Put those three together and you\u0026rsquo;re a step closer to a development environment that runs without a human present.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-claude-code-automation/cover-en.jpg","permalink":"/posts/2026-03-30-claude-code-automation/","title":"Claude Code Automation Triple Play — Skills, Scheduling, and Dispatch"},{"content":"Overview I analyzed the YouTube video 27 Claude Code Tips That Make You 10x Faster. The 27 tips are drawn from 500+ hours of hands-on Claude Code experience. I\u0026rsquo;ve re-categorized them into beginner / intermediate / advanced and analyzed them from a practical standpoint. This continues the previous series:\nClaude Code Practical Guide 1 — Context Management to Workflows Claude Code Practical Guide 2 — New Features from the Last 2 Months graph TD A[\"Claude Code 3-Layer Architecture\"] --\u003e B[\"CLAUDE.md\u0026lt;br/\u0026gt;Behavior rules\u0026lt;br/\u0026gt;Auto-loaded every message\"] A --\u003e C[\"Skills/Workflows\u0026lt;br/\u0026gt;Repeating task automation\u0026lt;br/\u0026gt;On-demand invocation\"] A --\u003e D[\"Reference Files\u0026lt;br/\u0026gt;Reusable templates\u0026lt;br/\u0026gt;Referenced by all skills\"] B --\u003e E[\"Direction challenge\"] B --\u003e F[\"Quality gate\"] B --\u003e G[\"Test first\"] B --\u003e H[\"Context conservation\"] C --\u003e I[\"Email replies\"] C --\u003e J[\"LinkedIn posts\"] C --\u003e K[\"Proposals\"] I --\u003e D J --\u003e D K --\u003e D Beginner: Getting Started Environment Setup Integrate with VS Code or Antigravity — Rather than running Claude Code standalone, integrating it into your IDE puts your code editor and AI conversation on the same screen, eliminating context-switching overhead. One install from the plugin marketplace is all it takes.\nEnable auto-save — This one matters a lot. If VS Code\u0026rsquo;s autosave is off, files Claude edits won\u0026rsquo;t be saved and you\u0026rsquo;ll waste time wondering why changes aren\u0026rsquo;t appearing. Search autosave in settings and check the box.\nUse dictation — On Mac, press Fn twice to enable voice input. You can get prompts in faster than typing.\nSetting Direction The hardest part of starting with Claude Code is \u0026ldquo;not knowing what to ask.\u0026rdquo; The video suggests this opening:\n\u0026ldquo;I\u0026rsquo;m building a website from scratch. What questions should I be asking you?\u0026rdquo;\nClaude will then ask back: \u0026ldquo;What\u0026rsquo;s the purpose of the site?\u0026rdquo;, \u0026ldquo;What does success look like?\u0026rdquo;, \u0026ldquo;Who are the target users?\u0026rdquo; — following that chain naturally produces a solid requirements doc.\nIntermediate: Maximizing Productivity Multi-Tab Parallel Work The creator describes this as \u0026ldquo;embarrassingly late to discover.\u0026rdquo; You can open multiple tabs and run different tasks simultaneously. Split-screen two projects side by side for parallel work. You can also split horizontally to monitor multiple conversations at once.\nOne caveat: to prevent coming back 20 minutes later to find everything frozen waiting for permission approval, enable bypass permissions mode. Search bypass permissions in settings and toggle it on.\nCLAUDE.md — Two Essential Files Every project should have these two files:\nCLAUDE.md — How Claude should behave. Think of it as \u0026ldquo;hiring and onboarding an employee.\u0026rdquo; project_specs — What you\u0026rsquo;re building. Think of it as \u0026ldquo;explaining what the company does to a new hire.\u0026rdquo; Both should be living documents that evolve alongside the project.\n5 Rules to Put in CLAUDE.md Rule Purpose Challenge my direction Prevents yes-man behavior, drives better outcomes Quality gate Honest quality scores (3/10 → here\u0026rsquo;s how to reach 9/10) Test before delivery Stops broken deliverables from reaching you to debug Context awareness Saves context window, avoids wasteful token use Upgrade suggestion Improvement suggestions each response, catches blind spots Structuring Responses On complex projects, unstructured Claude responses are overwhelming. The video suggests a 5-part response format:\nWhat was done — Summary of the work What I need from you — Actions required from you Why it matters — Explained as if to a 15-year-old Next steps — Where things go from here Errors and context — Any issues that came up, plus background needed to understand them Message Queuing You don\u0026rsquo;t have to wait for a message to finish before sending the next. Send multiple messages in a row and they queue up for sequential processing.\nAdvanced: System Design The 3-Layer Architecture The video proposes structuring Claude Code projects in three layers:\nCLAUDE.md — Behavior rules. Auto-read on every message. Skills/Workflows — Repeating task automation. Called on-demand with /skill-name. Reference Files — Reusable templates. Referenced by all skills. Example: if three skills (email replies, LinkedIn posts, proposals) all reference one \u0026ldquo;my tone\u0026rdquo; file, updating your tone once propagates to all three skills. Set it up once, reuse forever, and keep improving.\nUsing Sub-Agents Building a 5-page website sequentially is slow. Sub-agents let you generate each page in parallel:\nHomepage → Sub-agent 1 About page → Sub-agent 2 Contact page → Sub-agent 3 Each agent specializes in one thing with an isolated context — the results are faster and better.\nDesign Tips Dribbble cloning — Get inspiration from Dribbble, paste a screenshot into Claude Code, and it can reproduce it pixel-for-pixel. Attach a URL and it analyzes and replicates the site.\nSpline 3D — Add free 3D graphics to your website. Interactive elements like cubes and balls that follow the cursor make a site look like it cost $10,000.\nOther Advanced Tips Escape + Rewind — When a task goes in the wrong direction, press Escape to stop it and use the Rewind button to restore a previous state Compacting — When context usage hits 83%+, compact manually and add a reminder with key information you can\u0026rsquo;t afford to lose Memory — A persistent secret memory file across projects. Managed with /memory. Stores your name, preferences, etc. Insights — Type insights to see a full usage stats and feedback report Plugins — Use /plugin to download pre-built solutions (e.g. frontend-design) Quick Links Free CLAUDE.md template — linked in the video description Dribbble — Design inspiration Spline — Free 3D graphics Insight The thread running through all 27 tips is that \u0026ldquo;Claude Code is a system, not just a tool.\u0026rdquo; Defining behavior in CLAUDE.md, automating workflows with Skills, maintaining consistency through Reference Files — the 3-layer architecture is a design pattern, not a list of tricks. Combined with Practical Guide 1 and 2, the series flows naturally: context management (#1) → new features (#2) → system design (#3). The sub-agent and 3-layer patterns in particular are already being applied in the HarnessKit and log-blog projects.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-claude-code-27-tips/cover-en.jpg","permalink":"/posts/2026-03-30-claude-code-27-tips/","title":"Claude Code Practical Guide 3 — 27 Tips from 500 Hours of Use"},{"content":"Overview Previous: #5 — Inpaint UX Improvements, Dev Server Deployment, Stability Work\nIn #6, three core tasks were completed across 31 commits. First, image storage was fully migrated from the local EC2 filesystem to AWS S3. Second, \u0026ldquo;Diffs Image Agent\u0026rdquo; branding was applied and the favicon replaced. Third, a number of UI stability and usability fixes were shipped.\ngraph LR A[\"Before: Local Filesystem\u0026lt;br/\u0026gt;Direct EC2 disk storage\"] --\u003e B[\"After: AWS S3\u0026lt;br/\u0026gt;diffs-studio-hybrid-search-images\"] B --\u003e C[\"Uploaded images\u0026lt;br/\u0026gt;uploads/\"] B --\u003e D[\"Generated images\u0026lt;br/\u0026gt;generated/\"] B --\u003e E[\"Thumbnails\u0026lt;br/\u0026gt;thumbnails/\"] F[\"Terraform\u0026lt;br/\u0026gt;IaC\"] --\u003e G[\"S3 bucket\"] F --\u003e H[\"IAM Role\u0026lt;br/\u0026gt;Instance Profile\"] F --\u003e I[\"Bucket policy\u0026lt;br/\u0026gt;CIDR notation\"] S3 Image Storage Migration Background Images were previously stored directly on the EC2 instance\u0026rsquo;s local disk. This created problems: storage capacity limits, risk of data loss when the instance is replaced, and environment mismatch between local development and the server. Migrating to S3 eliminates storage concerns and gives both local and production environments the same storage layer.\nImplementation The migration was approached systematically, starting with design documentation. The S3 migration design spec and implementation plan were documented first, then work proceeded layer by layer from infrastructure to application.\nInfrastructure layer (Terraform):\nCreated the diffs-studio-hybrid-search-images S3 bucket Configured IAM Role and Instance Profile for EC2 bucket access Applied EIP CIDR notation to the bucket policy Backend layer:\nAdded S3 config and boto3 dependency Implemented S3 storage wrapper module — initialized in app lifespan Replaced all local file URL/path helpers with S3-based versions Redirected /images/ path to S3; uploads now go to S3 Generated images and thumbnails written directly to S3 Reference images, inpaint/edit source images all loaded from S3 Rewrote thumbnail backfill script to use S3 Presigned URL handling:\nAdded automatic presigned URL refresh on tab visibility change — a clean solution to S3 presigned URL expiration Problem Solved There was an EIP CIDR notation issue in the Terraform bucket policy. A single IP needs to be specified as /32, but the suffix was missing, causing the policy to fail silently. Caught during code review and fixed together with CIDR notation, ref key cache, ContentType, and Gemini API issues.\nDiffs Branding Login Page and Header The Diffs logo was applied to the login page and header, giving the previously bare-default UI a brand identity.\nBrowser Tab Title and Favicon The browser tab title was changed to \u0026ldquo;Diffs Image Agent\u0026rdquo; and the generic favicon was replaced with a \u0026ldquo;D.\u0026rdquo; icon. The favicon was converted from PNG to ICO using favicon.io.\nUI Stability and Usability Improvements A variety of UI issues were fixed across multiple sessions:\nCard action buttons: Buttons not visible over bright images — darkened button backgrounds Infinite scroll: Infinite scroll triggering page bounce on empty state — fixed Reference image ordering: User-uploaded references now appear before system-injected images Uploaded image display: Uploaded images shown as cards in search popup and vertical browse grid Type label: Renamed to \u0026ldquo;Base Regeneration\u0026rdquo; to prevent confusion with the button Base image indicator: Color changed from purple to neutral gray IMAGE_SAFETY error: Specific reason now shown on frontend instead of a generic 500 error Card/detail UI: Unified to a neutral, minimal style DB Migration and User Data Alembic migration sync work was done on the EC2 server. Migration versions were verified before server pooling and synchronized between local and server environments. Images that had been generated without a user_id were also reassigned to a specific user.\nGemini Labeling Pipeline Labeling work was done on image references. The status of the Gemini API-based labeling pipeline was checked and progress monitored at 30-minute intervals. New image labels were also added.\nCommit Log Message Scope fix: terraform bucket policy CIDR notation for EIPs infra add new image label data chore: add APP_ENVIRONMENT to ecosystem config and .env config fix: address code review issues — CIDR, ref key cache, ContentType, Gemini multi feat: refresh presigned image URLs on tab visibility change frontend feat: rewrite thumbnail backfill script to use S3 backend feat: add S3 image source support to labeling pipeline backend feat: load source images from S3 for inpaint/edit backend feat: load reference images from S3 for generation backend feat: write generated images and thumbnails to S3 backend feat: redirect /images/ to S3, upload to S3 backend feat: replace local file URL/path helpers with S3-based versions backend feat: initialize S3 storage in app lifespan, remove local dir constants backend feat: add S3 storage wrapper module backend feat: add S3 config and boto3 dependency backend infra: add S3 bucket, IAM role, and instance profile for image storage infra docs: add S3 image migration implementation plan docs docs: add S3 image migration design spec docs feat: replace generic favicon with branded Diffs \u0026ldquo;D.\u0026rdquo; icon frontend feat: update browser tab title to \u0026ldquo;Diffs Image Agent\u0026rdquo; frontend fix: darken card action button backgrounds for visibility frontend fix: prevent infinite scroll loading on empty state frontend refactor: reorder reference images so user refs come before system-injected backend feat: rebrand login page and header with Diffs logo frontend fix: hide info button and scroll arrows on uploaded image cards frontend feat: show uploaded images as cards in search popup + vertical browse grid frontend fix: rename type label to \u0026lsquo;Base Regeneration\u0026rsquo; frontend refactor: neutralize base image indicator colors from purple to gray frontend fix: surface IMAGE_SAFETY reason to frontend instead of generic 500 full-stack refactor: unify card and detail UI to neutral, minimal style frontend Insight The centerpiece of this cycle was the S3 migration. The systematic layer-by-layer transition — design doc → Terraform infra → backend wrapper → API endpoints → frontend URL refresh — went smoothly. Solving the presigned URL expiration issue via the tab visibility event was a clean UX-first approach. Branding work may look simple, but swapping out a favicon and tab title has a surprisingly large impact on how finished the app feels. The fact that more than half of the 31 commits were S3-related is a reminder of just how many touchpoints a storage layer replacement actually involves.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-hybrid-search-dev6/cover-en.jpg","permalink":"/posts/2026-03-30-hybrid-search-dev6/","title":"Hybrid Image Search Dev Log #6 — S3 Image Storage Migration and Branding"},{"content":"Overview I analyzed the YouTube video LiteParse - The Local Document Parser. LiteParse itself is an interesting tool, but the more significant story is what it represents: the team that pioneered RAG frameworks publicly declaring \u0026ldquo;the framework era is over\u0026rdquo; and pivoting to a single focused tool. Related post: Context7 deep dive\ngraph TD A[\"LlamaIndex Evolution\"] --\u003e B[\"2022.11\u0026lt;br/\u0026gt;RAG framework born\u0026lt;br/\u0026gt;Jerry Liu\"] B --\u003e C[\"2023-2024\u0026lt;br/\u0026gt;Framework heyday\u0026lt;br/\u0026gt;Indexing+search+generation unified\"] C --\u003e D[\"Problems emerge\u0026lt;br/\u0026gt;Over-abstraction\u0026lt;br/\u0026gt;Docs always stale\u0026lt;br/\u0026gt;Undebuggable\"] D --\u003e E[\"2025\u0026lt;br/\u0026gt;Strategic pivot announced\u0026lt;br/\u0026gt;Framework → Tool\"] E --\u003e F[\"LiteParse\u0026lt;br/\u0026gt;One job: document parsing\u0026lt;br/\u0026gt;Runs locally\"] LlamaIndex: History and Pivot Pioneering the RAG Framework Jerry Liu\u0026rsquo;s November 2022 LlamaIndex was the first serious RAG framework. It abstracted document indexing, vector search, and answer generation into a unified system, making RAG pipelines fast to build. As RAG emerged as the dominant pattern for LLM applications, LlamaIndex became the category\u0026rsquo;s defining framework.\nThe Fundamental Problems of the Framework Era The video names the core problems:\nAbstraction layers can\u0026rsquo;t keep pace — AI models and techniques change monthly. Framework abstractions lag behind. Documentation is always behind, and a six-month-old tutorial often simply doesn\u0026rsquo;t work anymore.\nDebugging is nearly impossible — Complexity hidden inside the framework makes it hard to trace root causes when something goes wrong. In an \u0026ldquo;indexing → retrieval → generation\u0026rdquo; pipeline, the problem can be anywhere behind the abstraction layer.\nAbstraction becomes a constraint — As AI models themselves improve rapidly, the framework\u0026rsquo;s prescribed approach is increasingly not the best approach. When you\u0026rsquo;re writing more code to work around the framework than with it, the framework has lost its reason to exist.\nThe critical point: LlamaIndex\u0026rsquo;s own team admitted this and changed direction. The pioneers of the framework era declared its limits themselves.\nLiteParse — One Problem, Done Well The Problem It Solves Coding agents can write thousands of lines of Python without breaking a sweat, but hand them a PDF and useful context vanishes:\nTables get flattened — Row/column structure is lost, distorting the meaning of the data Charts disappear — Visual data is completely ignored Numbers hallucinate — OCR errors pass wrong figures to the model PyPDF workarounds are janky — You get basic text extraction and nothing more Fixing this previously required bolting on a separate OCR model or wiring together multiple libraries into a fragile pipeline.\nLiteParse\u0026rsquo;s Approach LiteParse is a locally executed document parser that does exactly one thing: extract tables, charts, and code blocks from PDFs and DOCX files accurately.\nCore characteristics:\nLocal execution — No external API dependency, privacy preserved Structure preservation — Table rows/columns and chart data points are maintained Single purpose — A standalone tool, not part of a RAG pipeline Pipeline-agnostic — Connect it to any workflow; no LlamaIndex dependency Framework vs. Tool — A Paradigm Comparison Framework (LlamaIndex RAG) Tool (LiteParse) Scope Full RAG pipeline Document parsing only Abstraction High (index, retrieval, generation) Low (input → parsed output) Flexibility Locked to the framework\u0026rsquo;s approach Connects to any pipeline Debugging Hidden behind abstractions Clear inputs and outputs Maintenance Frequent breaking changes Stable interface Learning curve Must understand the whole framework Understand just the one feature The Structural Shift in AI Developer Tooling LlamaIndex\u0026rsquo;s pivot is not an isolated event. The same pattern is repeating across the AI developer ecosystem:\nContext7 — Succeeded as an MCP tool specialized in \u0026ldquo;one thing\u0026rdquo;: injecting up-to-date documentation into LLM context (Context7 deep dive) MCP (Model Context Protocol) — A standardized protocol between tools, not a framework Claude Code Marketplace — An ecosystem of plugins each specialized for a specific function (Marketplace comparison) If 2022–2024 was the age of \u0026ldquo;frameworks that wrap everything,\u0026rdquo; 2025 onwards is the age of \u0026ldquo;tools that do one thing well.\u0026rdquo; HarnessKit and log-blog were deliberately designed in this spirit — not frameworks, but plugins that solve a specific problem cleanly.\nKey Takeaways LlamaIndex\u0026rsquo;s pivot is significant precisely because the framework\u0026rsquo;s limitations were declared by the framework\u0026rsquo;s own creators — not by outside critics. That\u0026rsquo;s a strong signal about the direction of AI developer tooling. In the agentic era, agents handle orchestration themselves. What developers need isn\u0026rsquo;t \u0026ldquo;a framework that ties everything together\u0026rdquo; but \u0026ldquo;good tools the agent can call.\u0026rdquo; Just as LiteParse owns document parsing, Context7 owns documentation injection, and MCP owns the tool protocol — the combination of well-built, focused tools is replacing the framework.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-ai-dev-tools/cover-en.jpg","permalink":"/posts/2026-03-30-ai-dev-tools/","title":"LiteParse and the End of the Framework Era — LlamaIndex's Strategic Pivot"},{"content":"Overview I analyzed two YouTube videos on AI agent architecture and quality management. The first covers Anthropic\u0026rsquo;s long-running agent blueprint — a design guide for agents that autonomously execute complex tasks spanning hours or even days. The second covers harness engineering — a methodology for systematically managing agent quality. Related posts: The Rise of Sub-Agents, HarnessKit Dev Log #3\ngraph TD A[\"Long-Running Agent\"] --\u003e B[\"Task Decomposition\"] B --\u003e C[\"Subtask 1\"] B --\u003e D[\"Subtask 2\"] B --\u003e E[\"Subtask N\"] C --\u003e F{\"Checkpoint\"} D --\u003e F E --\u003e F F --\u003e|\"Success\"| G[\"Next stage\"] F --\u003e|\"Failure\"| H[\"Recovery strategy\"] H --\u003e I[\"Retry\"] H --\u003e J[\"Alternative path\"] H --\u003e K[\"Human escalation\"] L[\"Harness Engineering\"] --\u003e M[\"Guardrails\"] L --\u003e N[\"Monitoring\"] L --\u003e O[\"Feedback loop\"] Anthropic\u0026rsquo;s Long-Running Agent Blueprint The video Anthropic Just Dropped the New Blueprint for Long-Running AI Agents takes a deep look at the long-running agent design guide Anthropic published.\nOne-Shot vs. Long-Running Most AI agents today are one-shot — receive a question, give an answer, done. But real-world work looks like \u0026ldquo;refactor this entire codebase\u0026rdquo; or \u0026ldquo;build this data pipeline\u0026rdquo; — multi-hour or multi-day compound tasks.\nLong-running agents must handle these autonomously and be able to recover when they fail or lose direction mid-task. Anthropic\u0026rsquo;s blueprint provides the design principles to make this happen.\nCore Design Principles 1. Task Decomposition\nBreak complex tasks into independent subtasks. Each subtask should:\nHave clearly defined inputs and outputs Be independently executable and verifiable Fail without cascading to other subtasks 2. Checkpoints and State Management\nIn long-running execution, losing intermediate results is the biggest risk. Saving a checkpoint on each subtask completion enables:\nResuming from the last checkpoint on failure Preserving critical state when compressing the context window Providing human review points 3. Failure Recovery Strategy\nThree-level recovery:\nRetry — Automatic retry for transient errors (API timeouts, etc.) Alternative path — Achieve the same goal via a different method (similar to Deterministic Fallback) Human escalation — Defer to a human when the agent can\u0026rsquo;t resolve the issue itself 4. Progress Reporting and Transparency\nDuring long-running tasks, users need to know \u0026ldquo;what\u0026rsquo;s happening right now.\u0026rdquo; Provide periodic progress updates, current stage indication, and estimated completion time.\nReal-World Application Claude Code itself is an implementation of this blueprint. During large-scale refactoring or feature work:\nTasks decompose into subtasks (Plan mode) Each file modification is a checkpoint (git commit) Failures are recoverable via rewind Progress is reported to the user Harness Engineering — Quality Management for Agents The video Harness Engineering in Practice explains the harness engineering methodology for systematically managing AI agent quality from a practitioner\u0026rsquo;s perspective.\nWhat Is a Harness? A harness originally refers to the gear used to control and direct a horse\u0026rsquo;s strength. By analogy, a harness for AI agents is a system that controls agent output and guarantees quality. The stronger the agent, the more robust the harness needs to be.\nThe 3 Components of a Harness 1. Guardrails\nDefine what the agent must not do:\nProtected directories — no deletions allowed Conditions for automatic commits External API call limits Cost caps 2. Monitoring\nTrack agent behavior in real time:\nTool call patterns Error rates Token usage Task completion rates 3. Feedback Loop\nEvaluate agent output and improve it:\nCollect automated test results Incorporate user feedback Learn from failure patterns Auto-adjust settings The Management Perspective The video addresses more than technical implementation — it covers the management angle too. Managing a team of agents has parallels with managing a human team:\nClear role and responsibility definitions Regular performance reviews (evals) Escalation paths when problems occur Continuous training (prompt refinement) Where the Two Approaches Intersect The long-running agent blueprint and harness engineering look at the same problem from different angles:\nPerspective Long-Running Agent Harness Engineering Focus Internal agent design External agent control Goal Autonomous task completion Quality assurance Failure response Self-recovery strategy Guardrails + escalation Improvement method Checkpoint-based Feedback loop-based Combine them and you get: the agent internally equipped with checkpoints and recovery strategies, while the harness externally enforces quality through guardrails and monitoring — a two-layer safety structure.\nThe HarnessKit project sits precisely at this intersection — it implements an external harness for Claude Code agents as a plugin, automating guardrails and monitoring.\nInsight As AI agents evolve from one-shot to long-running, \u0026ldquo;trustworthy agents\u0026rdquo; are becoming more important than \u0026ldquo;smart agents.\u0026rdquo; Anthropic\u0026rsquo;s blueprint builds that trustworthiness from the inside through internal design; harness engineering builds it from the outside through external control. The two-layer safety structure combining both approaches looks set to become the standard for production agents. This perspective also connects to the AI App Production Design Patterns post — Deterministic Fallback, HITL — it all comes back to the same core idea: design for failure from the start.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-long-running-agents/cover-en.jpg","permalink":"/posts/2026-03-30-long-running-agents/","title":"Long-Running AI Agents and Harness Engineering in Practice"},{"content":"Overview I analyzed the YouTube video AI Plugin with 110k Stars — One Line of Code Does It All. The plugin in question is Superpowers — the Claude Code plugin I covered in depth in The Complete Superpowers Guide. It had 69k stars when I first wrote about it; five months later it crossed 110k. This post focuses on the practical critique and key insights from a Korean developer\u0026rsquo;s perspective. Related posts: The Complete Superpowers Guide, HarnessKit Dev Log #3\ngraph TD A[\"Superpowers 4-Stage Process\"] --\u003e B[\"1. Brainstorm\u0026lt;br/\u0026gt;Talk before code\"] B --\u003e C[\"2. Plan\u0026lt;br/\u0026gt;Blueprint first\"] C --\u003e D[\"3. TDD\u0026lt;br/\u0026gt;Criteria first, code second\"] D --\u003e E[\"4. Parallel Sub-agents\u0026lt;br/\u0026gt;Split into teams\"] F[\"Team Lead AI\u0026lt;br/\u0026gt;Opus model\"] --\u003e G[\"Team Member AI 1\u0026lt;br/\u0026gt;Haiku model\"] F --\u003e H[\"Team Member AI 2\u0026lt;br/\u0026gt;Haiku model\"] F --\u003e I[\"Team Member AI 3\u0026lt;br/\u0026gt;Haiku model\"] G --\u003e J[\"Isolated Workspace\u0026lt;br/\u0026gt;Git Worktree\"] H --\u003e J I --\u003e J What Is Superpowers? When you tell an AI coding tool (Claude Code, Cursor, Codex, etc.) \u0026ldquo;build me an app,\u0026rdquo; it dives straight into writing code. The video\u0026rsquo;s analogy is spot-on:\nIt\u0026rsquo;s like telling a contractor \u0026ldquo;make it feel like a café\u0026rdquo; — and instead of asking how many seats you need or what your budget is, they immediately start knocking down walls.\nSuperpowers solves this problem. It injects a manual (skill files) that enforces a working order on the AI — conversation → planning → testing → implementation, in that sequence. It hit 110k GitHub stars in five months. Creator Jesse Vincent is a seasoned open-source developer with a long track record.\nInstallation is simple:\n# For Claude Code users /plugin install superpowers # For Cursor users /plugin superpowers Core 1: Brainstorming — Talk Before You Code With a typical AI coding tool, saying \u0026ldquo;add a login feature\u0026rdquo; triggers immediate code generation. With Superpowers installed, the AI asks questions first:\nWhat login method do you want? Email? Social login? Do you need password recovery? How should sessions be managed? It suggests two or three approaches, explains the tradeoffs, and only starts building after you say \u0026ldquo;let\u0026rsquo;s go with this.\u0026rdquo; The contractor who used to knock down walls first now shows you the blueprints.\nThe skill file explicitly forbids the AI from skipping this phase:\n\u0026ldquo;This is not optional. You must follow this.\u0026rdquo;\nIt even includes a counter-script for when the AI tries to wriggle out by saying \u0026ldquo;this is too simple to bother with.\u0026rdquo;\nCore 2: TDD — Define Success Before Writing Code The video\u0026rsquo;s analogy: when making kimchi jjigae, you normally follow the recipe and taste at the end. TDD means deciding what it should taste like before you start. \u0026ldquo;This level of saltiness, this much chili\u0026rdquo; — you define the standard first, then cook to meet it.\nIn Superpowers this is written as a hard rule:\n\u0026ldquo;No building without criteria. If you started without criteria, delete it and start over.\u0026rdquo;\nYou write the test (the criterion for how a feature should behave) first, then write code that passes it. No more \u0026ldquo;why isn\u0026rsquo;t this working?\u0026rdquo; debugging sessions after the fact.\nCore 3: Sub-Agent Teams — AI Works in Parallel This is the most impressive design choice. Instead of one AI doing everything, the work is split across a team.\nModel Separation Strategy Role Model Why Team Lead (planning) Opus (advanced) Whole-system design requires deep thinking Team Members (coding) Haiku (lightweight) Once the plan is done, execute fast It\u0026rsquo;s like architecture — the 30-year veteran designs the building, but the bricklaying is done by skilled tradespeople.\nContext Isolation Each team member AI focuses only on its own task. The AI that reads code, the AI that writes code, and the AI that reviews code are all separate. Just as a person\u0026rsquo;s brain melts when multitasking three meetings, an AI that\u0026rsquo;s given too many things at once starts making mistakes.\nIsolated Workspaces (Git Worktree) When multiple AIs touch the same project at once, conflicts arise. Superpowers gives each AI its own copy of the project using Git Worktrees. AI 1 builds the login feature, AI 2 builds the payment feature — each in their own workspace — and the results are merged at the end.\nWhere Superpowers Falls Short The video is honest about the limitations:\nNo formal benchmarks — there\u0026rsquo;s not enough comparative data to prove effectiveness with hard numbers Shallow brainstorming questions — the design of which questions to ask still needs more work Weak QA phase — real validation requires E2E (end-to-end) testing, and the current QA step doesn\u0026rsquo;t go that far Superpowers vs. HarnessKit Superpowers and HarnessKit solve the same problem from different angles:\nSuperpowers HarnessKit Approach Enforce workflow (skills) Guardrails + monitoring (harness) Focus AI\u0026rsquo;s task order AI\u0026rsquo;s output quality Method Process injection Environment control Install plugin install one line Marketplace install They\u0026rsquo;re not competing — they\u0026rsquo;re complementary. Use Superpowers to enforce the right order, use HarnessKit to manage quality, and you have a two-layer safety structure.\nInsight Superpowers didn\u0026rsquo;t hit 110k stars in five months because of some technical breakthrough. It implemented a simple principle as a system: give AI a process to follow and the results change. The video\u0026rsquo;s core message is accurate — how you use AI matters more than how smart the AI is. The same Claude Code produces completely different outcomes depending on whether Superpowers is installed. This principle applies to all AI usage, not just coding. HarnessKit — the project we\u0026rsquo;re building now — is a product of the same philosophy: designing the AI\u0026rsquo;s working environment, not just the AI\u0026rsquo;s capabilities.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-ai-plugin-ecosystem/cover-en.jpg","permalink":"/posts/2026-03-30-ai-plugin-ecosystem/","title":"Superpowers Follow-up — From 69k to 110k Stars, and What's Still Missing"},{"content":"Overview Previous: #6\nIn #7, the trading agent\u0026rsquo;s analytical capabilities were significantly expanded across 34 commits. Work included: DCF valuation with sensitivity heatmap, portfolio risk analysis (VaR, beta, sector concentration), adding the 6th expert (news/macro analyst) to the signal pipeline, DART disclosure integration, and investment memo export.\ngraph TD A[\"Signal Pipeline\"] --\u003e B[\"Technical Analysis\"] A --\u003e C[\"Fundamental Analysis\"] A --\u003e D[\"Sentiment Analysis\"] A --\u003e E[\"Flow Analysis\"] A --\u003e F[\"Valuation\"] A --\u003e G[\"News/Macro Analysis\u0026lt;br/\u0026gt;6th Expert NEW\"] G --\u003e H[\"Google News RSS\"] G --\u003e I[\"DART Disclosures\"] G --\u003e J[\"Insider Trading\"] F --\u003e K[\"DCF Sensitivity\u0026lt;br/\u0026gt;Heatmap\"] F --\u003e L[\"Peer Comparison\u0026lt;br/\u0026gt;DART-based\"] A --\u003e M[\"Portfolio Risk\"] M --\u003e N[\"VaR\"] M --\u003e O[\"Beta\"] M --\u003e P[\"Correlation Matrix\"] DCF Valuation and Sensitivity Analysis Background The existing signal pipeline had no quantitative valuation model. DCF (Discounted Cash Flow) is essential for estimating fair value, and showing sensitivity across scenarios — rather than a single point estimate — is what makes it useful for investment decisions.\nImplementation A DCF valuation service was implemented and a sensitivity heatmap table showing outcomes across WACC and growth rate combinations was added to ValuationView. The frontend visualizes this as a heatmap, making it immediately clear under which assumptions the current price looks undervalued or overvalued.\nPeer comparison was also added — pulling valuation data for same-sector companies from DART so relative positioning can be assessed.\nUnit tests were added to verify the core logic of the DCF valuation and portfolio risk services.\nPortfolio Risk Analysis VaR, Beta, and Sector Concentration Portfolio-level risk analysis was implemented:\nVaR (Value at Risk): Maximum expected loss at a given confidence interval Beta: Calculated from actual portfolio data Sector concentration: Detects overexposure to any one sector Correlation matrix heatmap: Visualizes pairwise correlation across holdings KOSPI200 constituent sector data was collected from NAVER Finance for sector classification.\nSignal Pipeline Expansion The 6th Expert: News/Macro Analyst A news/macro analyst was added alongside the existing five experts (technical, fundamental, sentiment, flow, and valuation). This expert analyzes macroeconomic news and stock-specific events and incorporates them into the signal.\nGoogle News RSS fallback — To improve news collection reliability, Google News RSS was added as a fallback. When the primary news source is unstable, it switches automatically.\nDART Integration Catalyst Calendar: DART disclosure schedule displayed as a timeline UI for a quick view of upcoming material events Insider Trading: DART insider trading data integrated into the signal pipeline Foreign/institutional investor flows: Foreign and institutional buying/selling data added to the flow analysis expert DB Schema Expansion 8 new tables, the ANALYST role, and metadata initialization were added — all data models for the new features defined in a single pass.\nSignal History and Comparison Signal history snapshots and timeline comparison were added. Past signals can now be compared against the current signal to track how views have changed over time — useful for post-hoc evaluation of signal consistency and predictive power.\nFrontend UI Improvements SignalCard expansion: Expert opinion expansion display, risk_notes rendering, compact/expanded view toggle SignalDetailModal: Drilldown into related order history ReportViewer: Trade PnL column and rr_score color coding added ScheduleManager: Cron editing and run-now button; agent names and friendly task labels displayed DashboardView: report.generated event handling, performance endpoint with period selector Investment Memo Export A feature was added to export investment memos in HTML and DOCX formats based on signal data. Word documents are generated using python-docx.\nServer Stability MCP (Model Context Protocol) connection stability was improved:\nFixed missing await on async MCP context methods Added auto-reconnect logic on connection failure Server logs were monitored periodically. websockets library deprecated API warnings and other runtime errors were classified, and only codebase-level issues were selected for fixing.\nConfiguration Expansion initial_capital and min_rr_score added to settings and the risk-config API New components aligned with the existing design system Vite ESM resolution error fixed (using import type) Lint errors (unused vars) cleaned up Commit Log Message Scope feat: show agent name and friendly task labels in ScheduleManager frontend style: align new components with existing design system frontend fix: use import type for ScheduledTask to fix Vite ESM resolution frontend feat: add Google News RSS fallback for news collection stability backend feat: add compact/expanded view toggle to SignalCard frontend feat: add DOCX investment memo export with python-docx backend feat: add real portfolio beta calculation and correlation matrix heatmap backend + frontend feat: add DCF sensitivity heatmap table to ValuationView frontend test: add unit tests for DCF valuation and portfolio risk services test feat: populate kospi200_components sector data from NAVER Finance backend fix: await async MCP context methods and add auto-reconnect on failure backend fix: replace explicit any types with proper interfaces in SignalCard frontend feat: add investment memo HTML export from signal data backend feat: add VaR, beta, sector concentration risk analysis backend feat: add DCF valuation with sensitivity table backend feat: add signal history snapshots and timeline comparison full-stack feat: add peer comparison with sector-based DART valuation backend feat: add news/macro analyst as 6th expert in signal pipeline backend feat: add catalyst calendar with DART disclosures and timeline UI full-stack feat: add DART insider trading data to signal pipeline backend feat: add foreign/institutional investor trend to signal pipeline backend feat: add 8 new DB tables, ANALYST role, and metadata init backend fix: resolve lint errors (unused vars) in DashboardView and SignalCard frontend feat: add report.generated event handling in DashboardView frontend feat: add initial_capital and min_rr_score to settings and risk-config API full-stack feat: add ScheduleManager with cron editing and run-now button frontend feat: add trade PnL column and rr_score color coding to ReportViewer frontend feat: add SignalDetailModal with related orders drilldown frontend feat: add expert opinion expansion and risk_notes display to SignalCard frontend feat: use correct performance endpoint with period selector and metrics frontend Insight 34 commits is the highest count in the series so far. The trading agent is evolving from a simple signal generator into a comprehensive analysis platform covering portfolio risk management, valuation analysis, and disclosure monitoring. The addition of the 6th expert (news/macro) and the DART integration are particularly significant — they actively leverage Korea-specific data sources. The DCF sensitivity heatmap and portfolio correlation matrix are good examples of conveying complex data intuitively through visualization. On the stability front, the MCP auto-reconnect and periodic log monitoring pattern is now established — a meaningful step toward production-grade reliability.\n","date":"2026-03-30T00:00:00+09:00","image":"/images/posts/2026-03-30-trading-agent-dev7/cover-en.jpg","permalink":"/posts/2026-03-30-trading-agent-dev7/","title":"Trading Agent Dev Log #7 — DCF Valuation, Portfolio Risk Analysis, and the 6th Expert"},{"content":"Overview Anthropic has officially launched the ability for Claude to directly control a computer\u0026rsquo;s mouse, keyboard, and screen. Integrated with Claude Code Desktop and Cowork, Claude can now operate real GUIs — and combined with Dispatch, it can perform work remotely even when you step away. macOS launched first; Windows support is coming within weeks.\nWhat Is Computer Use Classic Claude Code operated by running CLI commands inside a terminal. Computer Use extends that scope to the entire GUI. Claude can perceive the screen through screenshots and execute actions like mouse clicks, keyboard input, and drag operations.\ngraph LR A[\"Claude AI\"] --\u003e B[\"Screen Capture \u0026lt;br/\u0026gt; Perceive the screen\"] B --\u003e C[\"Action Planning \u0026lt;br/\u0026gt; Plan the next action\"] C --\u003e D[\"Mouse / Keyboard \u0026lt;br/\u0026gt; Execute input\"] D --\u003e E[\"Result Capture \u0026lt;br/\u0026gt; Verify outcome\"] E --\u003e BKey constraint: Computer Use is still early-stage. Claude operates much more slowly and deliberately than a human. This is intentional — safety is prioritized.\nClaude Code Desktop \u0026amp; Cowork Integration Enabling Computer Use in Claude Code Desktop lets Claude directly manipulate IDEs or browsers during coding work. For example:\nLegacy app automation: automate repetitive tasks in GUI-only apps with no API Native app debugging: run builds and tests directly in Xcode, Android Studio, etc. Browser testing: test UI interactions in a real browser In Cowork mode, Claude works on the same screen alongside the user simultaneously, letting you observe and intervene in Claude\u0026rsquo;s actions in real time.\nDispatch — Remote Asynchronous Work Computer Use\u0026rsquo;s true potential surfaces when combined with Dispatch.\ngraph TD A[\"User\"] --\u003e|\"Task instructions\"| B[\"Dispatch\"] B --\u003e|\"Task queuing\"| C[\"Claude Agent\"] C --\u003e|\"Computer Use\"| D[\"macOS Desktop\"] D --\u003e|\"Report results\"| B B --\u003e|\"Notification\"| AYou can instruct Claude to operate the computer even when you\u0026rsquo;re not there. Complex multi-app tasks — like \u0026ldquo;clean up the data in this spreadsheet and send it as an email\u0026rdquo; — are handled asynchronously.\nRelationship with Claude Code Remote Control Claude Code already had a Remote Control feature. Here\u0026rsquo;s how it differs from Computer Use:\nFeature Remote Control Computer Use Scope Terminal CLI commands Entire GUI (mouse/keyboard) Target File system, shell Any desktop app Speed Immediate execution Slow and deliberate Safety Within sandbox Full screen access Use case Coding, builds, testing Legacy automation, GUI testing The two features are complementary. Remote Control is more efficient for work that can be handled via CLI; Computer Use is recommended only when a GUI is truly necessary.\nReal-World Use Cases Legacy App Automation Automate repetitive tasks in enterprise software (ERP, CRM, etc.) that has no API. Delegate daily GUI work — data entry, report generation, approval processes — to Claude.\nCross-App Workflows Execute multi-app workflows with a single command. For example, automate the full sequence: capture a design in Figma → modify code in VS Code → verify results in the browser.\nQA Testing Test user experience in actual UI. Unlike automation tools like Playwright or Selenium, Computer Use visually perceives the screen — making tests resilient to CSS selector changes.\nCurrent Limitations Speed: much slower than a human — each step requires analyzing a screenshot and planning, so expect wait time Accuracy: risk of clicking the wrong element in complex UIs Platform: macOS first, Windows not yet supported Security: full screen access requires care when sensitive information is visible on screen Insights Claude Computer Use is a significant turning point in AI agents evolving from \u0026ldquo;code generators\u0026rdquo; to \u0026ldquo;digital workers.\u0026rdquo; Moving AI out of the terminal sandbox into the full GUI dramatically expands the range of automatable work. Still early-stage with speed and accuracy limitations, but the combination with Dispatch — enabling asynchronous remote work — can bring real changes to developer workflows. Combining Remote Control and Computer Use in particular, for legacy system automation and cross-app workflows, we\u0026rsquo;re approaching an era where nearly any computer task can be delegated to AI.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-claude-computer-use/cover-en.jpg","permalink":"/posts/2026-03-25-claude-computer-use/","title":"Claude Computer Use — An AI That Controls Your Mouse and Keyboard"},{"content":"Overview One of the biggest challenges in vibe coding is design consistency. AI-generated UI works functionally, but colors, spacing, and typography tend to drift from screen to screen. This post analyzes the Claude Code \u0026amp; Figma for Consistent Design Figma community file and Figmapedia resources introduced in Pitube\u0026rsquo;s weekly live stream, and outlines a practical workflow.\nThe Problem: Design Fragmentation in Vibe Coding When generating UI with Claude Code, each prompt independently determines its own styles. Component A uses #3B82F6 blue; Component B uses #2563EB blue — subtly different colors accumulate and the result feels unpolished overall.\ngraph TD A[\"Prompt 1 \u0026lt;br/\u0026gt; Button component\"] --\u003e B[\"Color: #3B82F6 \u0026lt;br/\u0026gt; Padding: 8px 16px\"] C[\"Prompt 2 \u0026lt;br/\u0026gt; Card component\"] --\u003e D[\"Color: #2563EB \u0026lt;br/\u0026gt; Padding: 12px 24px\"] E[\"Prompt 3 \u0026lt;br/\u0026gt; Navigation\"] --\u003e F[\"Color: #1D4ED8 \u0026lt;br/\u0026gt; Padding: 10px 20px\"] B --\u003e G[\"Design Fragmentation\"] D --\u003e G F --\u003e G The Solution: Figma Design Tokens → Claude Code Context Step 1: Define Your Design System in Figma The approach proposed in the Figma community file is to systematically define design tokens:\nColor Tokens: Primary, Secondary, Neutral, Semantic (Success/Warning/Error) Spacing Scale: 4px units (4, 8, 12, 16, 24, 32, 48, 64) Typography Scale: Heading 1–6, Body, Caption, Label Border Radius: 4px, 8px, 12px, 16px, Full Shadow Scale: sm, md, lg, xl Step 2: Declare Design Rules in CLAUDE.md # Design System ## Colors - Primary: #3B82F6 (Blue 500) - Primary Hover: #2563EB (Blue 600) - Background: #FFFFFF - Surface: #F8FAFC (Slate 50) - Text Primary: #0F172A (Slate 900) ## Spacing - Base unit: 4px - Component padding: 8px 16px (sm), 12px 24px (md), 16px 32px (lg) ## Typography - Font: Inter - Heading: 600 weight, 1.25 line-height - Body: 400 weight, 1.5 line-height With these rules in CLAUDE.md, Claude Code references the same design tokens for every UI it generates.\nStep 3: Component-Level Prompting graph LR A[\"Figma \u0026lt;br/\u0026gt; Design Tokens\"] --\u003e B[\"CLAUDE.md \u0026lt;br/\u0026gt; Design Rules\"] B --\u003e C[\"Claude Code \u0026lt;br/\u0026gt; UI Generation\"] C --\u003e D[\"Consistent \u0026lt;br/\u0026gt; Components\"] D --\u003e E[\"Figma \u0026lt;br/\u0026gt; Verification\"] E --\u003e|\"Discrepancy found\"| B Figmapedia — Getting Design Terminology Right Figmapedia is a design terminology dictionary and resource platform. It organizes practical design information that \u0026ldquo;doesn\u0026rsquo;t surface well even in AI searches\u0026rdquo; — and it helps when writing design-related prompts for Claude Code, ensuring you\u0026rsquo;re using precise terminology.\nKey categories:\nFigma Terms \u0026amp; Info: explanations of Figma-specific features and terminology Prompt-pedia: a collection of design prompts useful for AI coding Button inner/outer spacing: padding vs. margin rules that get confused often in practice When prompting Claude Code to \u0026ldquo;reduce the button\u0026rsquo;s inner spacing,\u0026rdquo; you need a clear understanding of the difference between padding and margin to get the result you want. Figmapedia bridges that gap.\nPractical Tips: Claude Code + Figma Workflow Screenshot-Based Prompting Once you finish a design in Figma, pass a screenshot to Claude Code for visually-grounded code generation:\nImplement this Figma design as a React component. Follow the Design System section in CLAUDE.md for design tokens. Tailwind CSS Token Mapping Converting Figma design tokens into tailwind.config.js means Claude Code\u0026rsquo;s generated code automatically applies consistent styles.\nValidation Loop Generate component with Claude Code Check rendering in the browser Visual comparison against the Figma original If there\u0026rsquo;s a discrepancy, provide feedback → regenerate Insights The \u0026ldquo;design quality problem\u0026rdquo; in vibe coding is not a technical limitation — it\u0026rsquo;s a context deficit. Give Claude Code clear design tokens and rules, and it will produce consistent UI. Build the pipeline of Figma design system → CLAUDE.md rules → Claude Code generation, and you can maintain production-level UI consistency without a dedicated designer. Resources like Figmapedia help developers acquire precise design vocabulary, which directly translates into giving AI more accurate instructions.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-claude-code-figma/cover-en.jpg","permalink":"/posts/2026-03-25-claude-code-figma/","title":"Consistent UI with Claude Code and Figma — Analyzing Figma Community Resources"},{"content":"Overview I analyzed TILNOTE\u0026rsquo;s article \u0026ldquo;What Really Matters in AI Apps.\u0026rdquo; The core message is clear: the real problem isn\u0026rsquo;t when the model gets it right — it\u0026rsquo;s how the system behaves when the model is subtly wrong. This post covers three patterns — Deterministic Fallback, HITL, and Evaluation Stack — from a production design perspective. Related post: Vibe Coding Security Checklist\ngraph TD A[\"User Input\"] --\u003e B{\"Model Response + Validation\"} B --\u003e|\"Pass\"| C[\"Normal path\u0026lt;br/\u0026gt;Return model answer\"] B --\u003e|\"Low confidence\"| D[\"Restricted path\u0026lt;br/\u0026gt;Answer only within confirmed scope\"] B --\u003e|\"Failure\"| E[\"Fallback path\u0026lt;br/\u0026gt;Return search results or template\"] B --\u003e|\"Dangerous\"| F[\"Stop path\u0026lt;br/\u0026gt;Route to human review\"] G[\"HITL Control\"] --\u003e B G --\u003e D G --\u003e F H[\"Evaluation Stack\"] --\u003e I[\"Offline eval\"] H --\u003e J[\"Pre-production backtest\"] H --\u003e K[\"Online eval\"] H --\u003e L[\"Human review\"] Why These Three Patterns The article opens with a concrete scenario. A customer support AI is explaining a refund policy:\nUser: \u0026ldquo;Can I get a refund on last month\u0026rsquo;s charge? Please process it as a card cancellation.\u0026rdquo; Model: \u0026ldquo;Yes, charges within the last 30 days are eligible for automatic refunds. I\u0026rsquo;ll process it now.\u0026rdquo;\nThe problem: the actual policy has a clause that \u0026ldquo;digital products with usage history are not eligible for refunds,\u0026rdquo; and automatic refunds require agent approval. The real failure isn\u0026rsquo;t \u0026ldquo;the model gave a wrong answer\u0026rdquo; — it\u0026rsquo;s that \u0026ldquo;the system wasn\u0026rsquo;t designed to stop when it was wrong.\u0026rdquo;\nNIST AI 600-1 notes that generative AI requires separate risk management, measurement, and operational controls. Both Anthropic and OpenAI advise defining success criteria and designing evaluation first.\n1. Deterministic Fallback — When in Doubt, Take the Safe Path Many developers expect lowering temperature and refining prompts to produce stable outputs. That\u0026rsquo;s partially true, but it reduces output variance — it doesn\u0026rsquo;t make the system deterministic.\nWhat you actually need in production is a structure that degrades to a predefined path when the model fails:\nStage Path Behavior 1 Normal Model answer + validation passed 2 Restricted Answer only within confirmed-evidence scope 3 Fallback Return only search results, policy documents, or templates 4 Stop Route to human review The key is replacing \u0026ldquo;leaving failure to the model\u0026rsquo;s judgment\u0026rdquo; with state transitions defined in code.\nA safe flow for a customer support bot:\nSearch FAQ/policy documents first Only answer when there\u0026rsquo;s sufficient supporting evidence Route to a human agent when evidence is weak Never auto-execute actions like refunds The same applies to code generation tools. The unsafe structure is \u0026ldquo;apply code directly\u0026rdquo;; the realistic structure is \u0026ldquo;propose patch → test → review → human merges.\u0026rdquo; Anthropic\u0026rsquo;s Tool Use documentation explains this well — the model doesn\u0026rsquo;t execute tools directly; it proposes calls, and the app is responsible for execution.\n2. HITL — Humans as Control Mechanisms, Not Approval Buttons Understanding HITL (Human-in-the-Loop) as \u0026ldquo;a human takes one last look at the end\u0026rdquo; is incomplete. The important HITL in practice is one where humans can stop the system flow, make corrections, and resume — a control mechanism rather than a checkpoint.\nThe distinction the article emphasizes:\nPassive HITL Active HITL Only handles final approval Intervenes mid-flow Confirms results Corrects causes Batch review Real-time control Active HITL is especially critical in agentic workflows. When an agent is executing a 10-step task and takes a wrong turn at step 3, the right design doesn\u0026rsquo;t wait until step 10 for approval — it stops at step 3 and corrects direction.\n3. Evaluation Stack — Evaluation as Regression Prevention OpenAI\u0026rsquo;s eval guide explains: \u0026ldquo;Generative AI has inherent variability, so traditional software testing alone isn\u0026rsquo;t sufficient.\u0026rdquo;\nA four-stage evaluation framework:\nOffline eval: measure model performance on a fixed dataset. Fastest and cheapest. Pre-production backtest: simulate a new version against real traffic logs Online eval: A/B testing, canary deployments — gradual exposure to real users Human review: humans inspect outputs directly. Most expensive but most trustworthy. The critical framing: evaluation is a regression prevention mechanism, not a leaderboard (benchmark competition). The goal is to confirm that new prompts or model changes don\u0026rsquo;t break things that were working before.\nA Practical Adoption Order The article\u0026rsquo;s recommended sequence:\nStructure outputs — structured formats like JSON rather than free text Demote dangerous actions — direct execution → proposal Define fallback conditions in code — confidence-based branching Collect failure cases into an eval set — start small Preserve human review logs — as future eval data Common Mistakes \u0026ldquo;Just write better prompts\u0026rdquo; → prompts reduce output variance; they\u0026rsquo;re separate from system safety \u0026ldquo;Just add guardrails\u0026rdquo; → input filtering is only part of it; output path design is the core \u0026ldquo;A human can check at the end\u0026rdquo; → passive HITL breaks at scale \u0026ldquo;Good benchmarks mean good production\u0026rdquo; → eval prevents regressions; it doesn\u0026rsquo;t guarantee performance Insights What makes this article valuable is its focus not on \u0026ldquo;making the model smarter\u0026rdquo; but on \u0026ldquo;designing the product so it doesn\u0026rsquo;t shake when the model does.\u0026rdquo; It draws on official guidance from NIST, Anthropic, and OpenAI while laying out a concrete, practical adoption order. For the trading-agent and hybrid-search projects I\u0026rsquo;m currently working on — especially for \u0026ldquo;hard-to-reverse actions\u0026rdquo; like automatic trading or image generation — the Deterministic Fallback pattern applies directly.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-ai-app-production-patterns/cover-en.jpg","permalink":"/posts/2026-03-25-ai-app-production-patterns/","title":"Designing AI Apps for Production — Deterministic Fallback, HITL, and Evaluation Stack"},{"content":"Overview A project that scored 434 points and 108 comments on Hacker News caught my attention. Gemini Embedding 2 can now embed video directly into 768-dimensional vectors, making the old transcription → text embedding pipeline obsolete. This post covers both the technical architecture of the resulting sub-second video search CLI and the heated panopticon debate that erupted in the HN comments. It continues the embedding series from The CLIP Ecosystem.\ngraph LR subgraph Old[\"Traditional Pipeline\"] A[\"Video\"] --\u003e B[\"Frame extraction\"] B --\u003e C[\"Captioning / Transcription\"] C --\u003e D[\"Text embedding\"] end subgraph New[\"Gemini Embedding 2\"] E[\"Video\"] --\u003e F[\"Direct 768-dim vector\"] end F --\u003e G[\"ChromaDB\"] G --\u003e H[\"Natural language query\u0026lt;br/\u0026gt;sub-second search\"] What Direct Video Embedding Actually Means The bottleneck in traditional video search was clear: to extract meaning from video, you had to caption frames or transcribe audio, then embed the resulting text. This pipeline loses visual context, adds complexity, and cannot answer visually grounded queries like \u0026ldquo;green car cutting me off\u0026rdquo; from a transcription-only approach.\nGemini Embedding 2 eliminates the intermediate step entirely. A 30-second video clip gets converted into a 768-dimensional vector that can be directly compared against text queries. No transcription, no frame captioning, no intermediate text. Video and text are natively projected into the same vector space.\nImplementation: The CLI Video Search Tool The architecture of the CLI tool built by sohamrj:\nIndexing: Split long footage into chunks → embed each chunk with Gemini Embedding 2 → store in ChromaDB Search: Embed a natural language query with the same model → vector similarity search in ChromaDB Output: Return automatically trimmed clips matching the query Cost: roughly $2.50 per hour of footage. Still-frame detection skips idle segments, so security camera footage or Tesla Sentry Mode recordings cost significantly less.\nThis is essentially what CLIP-based image embedding did for static images, now applied to dynamic video by Gemini. It\u0026rsquo;s a natural extension of the image-text embedding concepts covered in the CLIP ecosystem post.\nHN Community Discussion: The Surveillance Debate Of the 108 comments, the social implications drew more heat than the technical implementation.\nThe Core Concern: Panopticon The top comment from macNchz cut to the heart of it:\n\u0026ldquo;We live in a world full of cameras, but we retain a degree of semi-anonymity because no one can actually watch all the footage. This technology changes that premise.\u0026rdquo;\nThe concern: once camera owners, manufacturers, and governments can set up natural-language alerts for specific people or activities — starting with plausible use cases like crime detection or reporting pet waste violations — it becomes a path to an unregulated panopticon.\nAlready Live: The Fusus Platform citruscomputing shared a real-world example from a city council meeting about ALPR (Automatic License Plate Recognition) camera contracts. The camera vendor\u0026rsquo;s Fusus platform:\nA dashboard that aggregates feeds from heterogeneous camera systems Natural language querying across live video feeds Plans to integrate privately deployed cameras The city budget covered only 50 ALPR units, but the implication is clear: a future where a neighbor\u0026rsquo;s camera feeds directly into a police AI system is not far off.\nTechnical Discussion On the technical side:\nCost efficiency: $2.50/hr is still expensive at mass surveillance scale, but the price trajectory makes it a matter of time Accuracy: The key value is improved accuracy for visual queries over text-based search ChromaDB vs. alternatives: Active debate on vector database choices Embedding Technology Comparison CLIP (Images) Gemini Embedding 2 (Video) Input Static images Dynamic video (30s chunks) Dimensions 512–1024 (model-dependent) 768 Intermediate steps None (direct embedding) None (direct embedding) Cost Free (local execution) ~$2.50/hr (API) Open source OpenCLIP and others Proprietary (API only) Key Takeaways Direct video embedding is a technically clean advance — it removes the text intermediary and brings video into the same semantic space as text queries. But as the HN discussion shows, the social implications reach far beyond the elegance of the engineering. A world where all footage can be indexed and searched by natural language is no longer a question of \u0026ldquo;can we?\u0026rdquo; but \u0026ldquo;should we?\u0026rdquo; The fact that platforms like Fusus are already deployed to law enforcement signals that the regulatory conversation is lagging badly behind technical capability. For the hybrid-image-search project, any future expansion into video search should be accompanied by explicit consideration of these ethical dimensions.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-gemini-video-embedding/cover-en.jpg","permalink":"/posts/2026-03-25-gemini-video-embedding/","title":"Gemini Video Embedding — A New Paradigm for Multimodal Search"},{"content":"Overview Get Shit Done (GSD) is a meta-prompting system that works across Claude Code, Gemini CLI, OpenCode, Codex, Copilot, and Antigravity. With 40,799 GitHub stars, it directly addresses \u0026ldquo;context rot\u0026rdquo; — the quality degradation that happens as a context window fills up. Engineers at Amazon, Google, Shopify, and Webflow reportedly use it in production.\nWhat Is Context Rot? The longer you work with an AI coding agent in a single session, the worse the output gets. As the context window fills with prior conversation, the AI that was sharp at the start begins repeating mistakes and generating inconsistent code.\ngraph TD A[\"Session start \u0026lt;br/\u0026gt; High quality\"] --\u003e B[\"Context accumulates\"] B --\u003e C[\"Context Rot \u0026lt;br/\u0026gt; Quality degrades\"] C --\u003e D[\"GSD intervenes \u0026lt;br/\u0026gt; Spec-based reset\"] D --\u003e E[\"Quality restored\"] E --\u003e|\"Next phase\"| BGSD\u0026rsquo;s core claim: this is not a fundamental LLM limitation — it\u0026rsquo;s a context engineering problem.\nHow GSD Works Spec-Driven Development GSD\u0026rsquo;s workflow has three stages:\n/gsd:new-project — Initialize the project\nAsks about goals, constraints, and technology preferences Once it has enough information, generates an \u0026ldquo;Ultra Spec\u0026rdquo; document Auto-generates a phase-by-phase implementation plan /gsd:begin — Start implementation\nExecutes work step by step, based on the spec Creates checkpoints after each phase completes When context rot hits, recovers context from the spec /gsd:continue — Resume after interruption\nReads previous state from the spec to restore context Maintains consistency across new sessions Existing Codebase Support The /gsd:map-codebase command analyzes an existing project using parallel agents to understand the stack, architecture, conventions, and concerns — and feeds that analysis into the subsequent /gsd:new-project flow.\ngraph LR A[\"Existing codebase\"] --\u003e|\"/gsd:map-codebase\"| B[\"Parallel analysis agents\"] B --\u003e C[\"Stack analysis\"] B --\u003e D[\"Architecture analysis\"] B --\u003e E[\"Convention analysis\"] C --\u003e F[\"Ultra Spec\"] D --\u003e F E --\u003e F F --\u003e|\"/gsd:begin\"| G[\"Consistent implementation\"] Multi-Runtime Support GSD\u0026rsquo;s distinguishing characteristic is that it\u0026rsquo;s not locked to any single AI coding tool:\nRuntime Install Location Command Format Claude Code ~/.claude/ /gsd:help Gemini CLI ~/.gemini/ /gsd:help OpenCode ~/.config/opencode/ /gsd-help Codex ~/.codex/ (skills) $gsd-help Copilot ~/.github/ /gsd:help Antigravity ~/.gemini/antigravity/ /gsd:help Installation is a single command: npx get-shit-done-cc@latest, followed by an interactive prompt to select runtime and install location.\nComparison with Similar Tools GSD\u0026rsquo;s creator TÂCHES says he built it after trying BMAD, Speckit, and Taskmaster firsthand and finding them frustrating.\nGSD BMAD / Speckit Philosophy Minimal workflow Enterprise process Complexity 3 core commands Sprints, story points, retrospectives Target audience Solo devs / small teams Teams / organizations Context rot Auto-recovery via spec No explicit solution The slogan \u0026ldquo;The complexity is in the system, not in your workflow\u0026rdquo; summarizes the difference. The user-facing workflow is minimal, but underneath it there\u0026rsquo;s XML prompt formatting, sub-agent orchestration, and state management running the show.\nKey Takeaways GSD is a practical answer to vibe coding\u0026rsquo;s fundamental problem — context rot. Rather than advising \u0026ldquo;write better prompts,\u0026rdquo; it takes a structural approach: maintain a permanent spec document and periodically reset the context from it. The 40K star count signals how universal this frustration is. Claude Code\u0026rsquo;s CLAUDE.md, Plan mode, and Memory system address the same problem, but GSD packages it into a unified workflow. The runtime-agnostic design is a real-world benefit: you can use the same spec whether you switch from Claude Code to Gemini CLI mid-project.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-get-shit-done/cover-en.jpg","permalink":"/posts/2026-03-25-get-shit-done/","title":"Get Shit Done — A Meta-Prompting System That Solves Context Rot"},{"content":"Overview Google AI Studio has dramatically upgraded its \u0026ldquo;vibe coding\u0026rdquo; experience — building production-ready full-stack apps from nothing but a prompt. The centerpiece is the Antigravity coding agent and deep Firebase integration covering real-time multiplayer, databases, authentication, external API connections, secrets management, and session storage, all in one flow. Based on this TILNOTE summary, here\u0026rsquo;s a breakdown of the key features and how to use them effectively.\nThe Antigravity Agent: Longer Memory, Bigger Edits Antigravity is a coding agent built directly into Google AI Studio. Unlike standard AI Studio code generation, it maintains a deeper understanding of project structure and conversation context across a session.\ngraph TD A[\"Prompt\"] --\u003e B[\"Antigravity Agent\"] B --\u003e C[\"Project structure understanding\"] B --\u003e D[\"Conversation context retention\"] C --\u003e E[\"Multi-file edits\"] D --\u003e E E --\u003e F[\"Build / Preview\"] F --\u003e|\"Follow-up instruction\"| BA short instruction like \u0026ldquo;add this feature\u0026rdquo; triggers accurate, multi-file edits and chained changes. The agent works as an editor who understands the whole app — not someone patching code piecemeal — so the iteration speed is meaningfully faster.\nFirebase Built In The agent detects the moment an app needs to persist data or support user accounts. With user approval, it connects Firebase and configures the backend automatically.\nAvailable Services Service Purpose Cloud Firestore Data storage (NoSQL) Firebase Authentication Login (Google OAuth, etc.) Realtime Database Live synchronization The key point: the agent handles the steps a developer normally does manually — creating a Firebase project in the console, wiring in the SDK — automatically.\nReal-Time Multiplayer and Collaboration The headline capability of this update is making apps that require concurrent users and real-time sync straightforward to build.\ngraph LR A[\"User A\"] --\u003e B[\"Firebase \u0026lt;br/\u0026gt; Realtime DB\"] C[\"User B\"] --\u003e B D[\"User C\"] --\u003e B B --\u003e E[\"Real-time sync\"] E --\u003e A E --\u003e C E --\u003e DOfficial example apps:\nReal-time multiplayer laser tag 3D particle-based collaborative workspace Physics-based 3D game (claw machine) Google Maps-integrated utility app Recipe creation and family/friends collaboration app The common thread: these aren\u0026rsquo;t just plausible-looking UIs. Each involves at least one of synchronization, data persistence, external integration, or authentication — actual apps, not demos.\nExternal Service Integration and Secrets Manager Connecting to maps, payments, or external databases requires API keys. Antigravity detects when a key is needed and guides you to store it securely in the Secrets Manager in the Settings tab.\nThis structurally prevents the common mistake of hardcoding API keys in source code, and keeps the integration closer to how you\u0026rsquo;d handle credentials in a real production environment.\nFramework Support React and Angular are joined by Next.js as a first-class option, selectable from the Settings panel. This makes it natural to build apps that take advantage of routing, server rendering, and full-stack patterns.\nFramework selection guide:\nReact: Fast UI experiments, client-heavy apps Angular: Large-scale enterprise apps, structured projects Next.js: Apps where SEO, server capabilities, or full-stack patterns matter Comparison with Claude Code Google AI Studio + Antigravity Claude Code Environment Web browser Terminal CLI Backend integration Firebase auto-configured Manual setup Deployment Firebase Hosting one-click Manual or scripted Multiplayer Realtime DB built in Implement yourself Code access Web editor Full filesystem Flexibility Limited to supported frameworks Any stack Depth Prototype-level Production-level Usage Strategy To get the most out of this update:\nInclude production conditions in the prompt: \u0026ldquo;Multiple users will use this simultaneously, data saves after login, it connects to an external service\u0026rdquo; Approve Firebase integration early: Locking in the structure upfront reduces backtracking Use Secrets Manager by default: Prevents API key hardcoding from the start Choose the right framework: SEO or server features → Next.js; fast experimentation → React Key Takeaways This update moves Google AI Studio further along the \u0026ldquo;prompt to production\u0026rdquo; axis. Firebase integration removes the friction of backend setup, and Antigravity\u0026rsquo;s longer context retention speeds up iterative refinement. If Claude Code is a tool for professional developers, AI Studio is positioning itself for the \u0026ldquo;I have an app idea but infrastructure setup is the barrier\u0026rdquo; user. The two tools complement each other well: prototype quickly in AI Studio, then refine to production quality in Claude Code.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-google-ai-studio-antigravity/cover-en.jpg","permalink":"/posts/2026-03-25-google-ai-studio-antigravity/","title":"Google AI Studio Full-Stack Vibe Coding — Antigravity Agent and Firebase Integration"},{"content":"Overview Previous: #2 — Marketplace-First Pivot and v2a/v2b Design and Implementation\nThis sprint (#3) covered two major tracks across 12 commits. First, a full audit of plugin triggering correctness, resulting in 5 fixes. Second, a redesign of the marketplace plugin recommendation system from live search to a validated, pre-curated list, plus an upgrade of the tool sequence to a 3-step sliding window.\nFull Plugin Trigger Audit Diagnosing the Problems A comprehensive review of plugin.json skill definitions, hooks execution paths, and internal skill logic uncovered 5 triggering issues.\ngraph TD A[\"Trigger audit\"] --\u003e B[\"CRITICAL \u0026lt;br/\u0026gt; Path reference errors\"] A --\u003e C[\"MAJOR \u0026lt;br/\u0026gt; Environment variable mismatch\"] A --\u003e D[\"MINOR \u0026lt;br/\u0026gt; Missing functionality\"] B --\u003e E[\"Unify to \u0026lt;br/\u0026gt; CLAUDE_PLUGIN_ROOT\"] C --\u003e F[\"Add preset \u0026lt;br/\u0026gt; existence check\"] D --\u003e G[\"Add install \u0026lt;br/\u0026gt; verification to status\"]Fix 1: Unify to CLAUDE_PLUGIN_ROOT Skills and hooks were referencing the plugin directory in inconsistent ways: claude plugin path calls, hardcoded paths, and relative paths all coexisted. Everything is now unified to a CLAUDE_PLUGIN_ROOT environment variable with a dirname-based fallback.\n# Before: mixed reference approaches PLUGIN_DIR=\u0026#34;$(claude plugin path harnesskit)\u0026#34; PLUGIN_DIR=\u0026#34;/Users/lsr/.claude/plugins/cache/harnesskit/...\u0026#34; # After: unified reference PLUGIN_DIR=\u0026#34;${CLAUDE_PLUGIN_ROOT:-$(cd \u0026#34;$(dirname \u0026#34;$0\u0026#34;)/..\u0026#34; \u0026amp;\u0026amp; pwd)}\u0026#34; Fix 2: Add Preset Check to post-edit Hooks post-edit-lint.sh and post-edit-typecheck.sh were executing before a preset was configured, causing errors. Added a preset existence check; they now skip gracefully if no preset is set.\nMarketplace Validated Recommendation System The Original Problem /harnesskit:init was relying on live search to recommend marketplace plugins, which was unstable and produced inconsistent results.\nThe Fix: Pre-Validated Recommendation List Switched to maintaining a marketplace-recommendations.json file populated by an update-recommendations.sh script that periodically crawls and updates the list.\ngraph LR A[\"update-recommendations.sh\"] --\u003e|\"crawl\"| B[\"Marketplace\"] B --\u003e C[\"Validate / Filter\"] C --\u003e D[\"marketplace-recommendations.json\"] D --\u003e|\"/harnesskit:init\"| E[\"Recommendations to user\"]/harnesskit:insights also now references recommendations.json when suggesting improvements, so it only recommends validated plugins.\n3-Step Sliding Window Tool Sequence Upgraded the tool usage pattern analysis from a simple count approach to a 3-step sliding window for better precision. Tool usage is now recorded in tool:summary format, with pattern detection triggering improvement suggestions.\nPlugin Installation Verification Added installation state verification to the /harnesskit:status skill. It now reports skill file presence, hooks execution permissions, and configuration file integrity in a single view.\nCommit Log Message Area feat: add plugin installation verification to status skills feat: upgrade tool sequence to 3-step sliding window skills feat: add recommendations.json reference to insights skills feat: rewrite init marketplace discovery with verified recs skills feat: add update-recommendations.sh for marketplace crawling scripts feat: add verified marketplace-recommendations.json templates refactor: migrate skills from \u0026lsquo;claude plugin path\u0026rsquo; to CLAUDE_PLUGIN_ROOT skills refactor: unify PLUGIN_DIR to CLAUDE_PLUGIN_ROOT with fallback hooks fix: add preset check to post-edit hooks + CLAUDE_PLUGIN_ROOT fallback hooks docs: add implementation plan for plugin trigger fixes docs docs: address spec review — fix CRITICAL and MAJOR issues docs docs: add spec for plugin trigger review — 5 fixes docs Key Takeaways In plugin development, \u0026ldquo;it works\u0026rdquo; and \u0026ldquo;it triggers correctly\u0026rdquo; are different problems. In a local development environment, paths are fixed and everything looks fine. In another user\u0026rsquo;s environment, the plugin cache path, environment variables, and preset state are all different. Unifying everything to CLAUDE_PLUGIN_ROOT is a small change that fundamentally improves portability. Switching marketplace recommendations from live search to a pre-validated list is driven by the same instinct — reduce uncertainty and guarantee a consistent experience.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-harnesskit-dev3/cover-en.jpg","permalink":"/posts/2026-03-25-harnesskit-dev3/","title":"HarnessKit Dev Log #3 — Plugin Trigger Fixes and Marketplace Recommendation System"},{"content":"Overview Previous: #4 — Router Separation, Terraform Dev Server, Inpaint Editor\nThis sprint (#5) covered three work streams across 13 commits. First, UX improvements to the Inpaint editor with direct access from the main page. Second, Google OAuth integration on the EC2 dev server and fixing image loading. Third, hardening overall stability: duplicate generation prevention, modal behavior, and aspect ratio handling.\nInpaint Editor UX Inpaint Access from Card Hover Previously, the Inpaint editor was only reachable from the image detail page. Now the main page cards show an \u0026ldquo;Edit\u0026rdquo; button on hover, and clicking it opens the Inpaint editor directly.\ngraph LR A[\"Main page \u0026lt;br/\u0026gt; card hover\"] --\u003e|\"Edit button\"| B[\"Inpaint Editor\"] C[\"Detail page\"] --\u003e|\"Edit button\"| B B --\u003e D[\"Skeleton loading\"] D --\u003e E[\"Generated image\"]Skeleton Loading Cards During Inpaint generation, skeleton loading cards appear on the main page so the user gets visual feedback on progress.\nUndo History Fix The saveHistory() call in the Inpaint editor was moved to run before a stroke begins rather than after it completes. The previous behavior caused undo to restore the current state instead of the previous one.\nEC2 Dev Server Google OAuth Integration Login was failing on first access to the dev server. Two root causes:\nVITE_GOOGLE_CLIENT_ID not set — Vite environment variables are inlined at build time, so they must be present in .env during the EC2 build EC2 URL not in GCP authorized origins — The EC2 URL (http://ec2-xxx.compute.amazonaws.com:5173) needed to be added to the authorized JavaScript origins in the GCP console Image Display Issues Search and reranking worked correctly on the server, but images returned 404. The cause: the README\u0026rsquo;s \u0026ldquo;data preparation\u0026rdquo; step had been skipped — image files are stored as split zips and needed to be extracted first.\nAdditionally: fix: add recursive image search for nested directory structures — images are now found even in nested directories.\nGeneration Stability Duplicate Generation Prevention Rapidly pressing Enter was triggering two image generations. Fixed by adding a guard that returns immediately from handleGenerate if generatingCount \u0026gt; 0. Later replaced the lock with a 500ms debounce for a more natural UX feel.\nESC Key to Close Modals Added ESC key event handling to all modal and popup components.\nTone/Angle Reference Prompt Hardening Fixed an issue where tone/angle metadata was unintentionally influencing generation results through the auto-injection system. The reference prompt is now strict about treating tone/angle as pure metadata.\naspect_ratio Validation Added validation of aspect_ratio values before passing them to the Gemini edit API, and ensured both aspect_ratio and resolution are preserved correctly across regeneration and inpaint flows.\nClipboard Fallback The prompt copy button on the detail page was failing in some environments. Added a execCommand('copy') fallback when navigator.clipboard.writeText() fails.\nML Model Background Loading The login page was unresponsive until ML model loading completed at server startup. Moved model loading to a background task so the login page is immediately available when the server starts.\nCommit Log Message Area fix: add clipboard fallback for prompt copy button FE fix: strengthen tone/angle reference prompt BE fix: validate aspect_ratio before Gemini edit API BE fix: preserve aspect_ratio and resolution across regen/inpaint BE+FE feat: show skeleton loading cards during inpaint generation FE feat: add inpaint edit button to card hover FE fix: replace generation lock with 500ms debounce FE fix: load ML models in background so login works during startup BE feat: add ESC key to close all modal/popup components FE fix: prevent duplicate image generation on rapid Enter FE fix: add recursive image search for nested directories BE add allowed host BE fix: save undo history before stroke begins in InpaintEditor FE Key Takeaways This sprint was fundamentally about making things that worked locally work correctly in a real deployment. Deploying the Inpaint editor to EC2 surfaced a string of unexpected issues: OAuth, image paths, environment variables. The Vite build-time environment variable gotcha (import.meta.env.* must be set before the build runs) is worth remembering for any server deployment. Replacing the duplicate generation lock with a debounce was the more UX-natural solution.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-hybrid-search-dev5/cover-en.jpg","permalink":"/posts/2026-03-25-hybrid-search-dev5/","title":"Hybrid Image Search Dev Log #5 — Inpaint UX, EC2 Dev Server, Stability"},{"content":"Overview Previous: #3 — From Skill to Plugin\nThis sprint (#4) focused on preparing the log-blog plugin for submission to the official Claude Code marketplace (anthropics/claude-plugins-official) across 2 commits. The work involved rewriting the README to match official plugin format, adding a LICENSE file, and removing personal information.\nOfficial Marketplace Requirements Submitting a plugin to the Claude Code official marketplace requires meeting a few criteria.\ngraph TD A[\"Plugin development complete\"] --\u003e B[\"Standardize README format\"] B --\u003e C[\"Add LICENSE file\"] C --\u003e D[\"Remove personal information\"] D --\u003e E[\"Write marketplace.json\"] E --\u003e F[\"Submit PR\"]README Rewrite The previous README read more like a development journal than documentation. Rewrote it to match the official plugin README structure:\nOverview: One-sentence description of what the plugin does Skills: List of available skills with descriptions Installation: How to install CLI Usage: CLI command guide Requirements: Required dependencies Troubleshooting: Common issues and solutions LICENSE File Added MIT License. Marketplace submissions require an explicit license.\nPersonal Information Removal Cleaned up personal email and other identifying information from the author field in plugin.json and other configuration. No unnecessary personal details exposed in a public distribution.\nCommit Log Message Area docs: rewrite README to match official Claude Code plugin format docs fix: add LICENSE file, unify author, remove personal info for marketplace review config Key Takeaways The last mile of plugin development is packaging, not code. A perfectly functional plugin that has an unfriendly README or no license won\u0026rsquo;t pass marketplace review. This work is small in scope, but it marks the final step in the journey from a local skill in .claude/skills/ to a plugin eligible for the official marketplace.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-log-blog-dev4/cover-en.jpg","permalink":"/posts/2026-03-25-log-blog-dev4/","title":"Log-Blog Dev Log #4 — Preparing for the Official Marketplace"},{"content":"Overview I did a deep dive into the CLIP model ecosystem — the backbone of image-text embedding. This covers everything from OpenAI\u0026rsquo;s original CLIP to Meta\u0026rsquo;s MetaCLIP2 (NeurIPS 2025 Spotlight), Apple\u0026rsquo;s MobileCLIP2 (TMLR 2025 Featured), the community-driven OpenCLIP, and Google\u0026rsquo;s SigLIP. The goal: pick the right embedding model for my hybrid-image-search project. Related series: Hybrid Image Search Dev Log #5\ngraph TD A[\"OpenAI CLIP\u0026lt;br/\u0026gt;2021, 33k stars\u0026lt;br/\u0026gt;WIT 400M pairs\"] --\u003e B[\"OpenCLIP\u0026lt;br/\u0026gt;Community reimplementation\u0026lt;br/\u0026gt;13.6k stars\"] A --\u003e C[\"MetaCLIP\u0026lt;br/\u0026gt;Meta FAIR\u0026lt;br/\u0026gt;Open curation pipeline\"] A --\u003e D[\"SigLIP\u0026lt;br/\u0026gt;Google\u0026lt;br/\u0026gt;Sigmoid Loss\"] A --\u003e E[\"MobileCLIP\u0026lt;br/\u0026gt;Apple\u0026lt;br/\u0026gt;Multi-Modal Reinforced\"] C --\u003e F[\"MetaCLIP2\u0026lt;br/\u0026gt;NeurIPS 2025 Spotlight\u0026lt;br/\u0026gt;Worldwide multilingual\"] E --\u003e G[\"MobileCLIP2\u0026lt;br/\u0026gt;TMLR 2025 Featured\u0026lt;br/\u0026gt;iPhone-optimized\"] B --\u003e H[\"Unified Hub\u0026lt;br/\u0026gt;300+ pretrained models\"] D --\u003e H F --\u003e H G --\u003e H OpenAI CLIP — Where It All Started openai/CLIP (33k stars) introduced Contrastive Language-Image Pre-Training in 2021. It popularized the idea of mapping images and text into a shared embedding space, and every CLIP variant since has built on top of it.\nThe core idea is elegantly simple: train on 400 million (image, text) pairs with contrastive learning, and you get zero-shot image classification without needing ImageNet\u0026rsquo;s 1.28M labeled examples. The API is intuitive:\nimport clip model, preprocess = clip.load(\u0026#34;ViT-B/32\u0026#34;, device=device) image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).cpu().numpy() # prints: [[0.9927937 0.00421068 0.00299572]] You pull vectors with encode_image() and encode_text(), then compute cosine similarity. clip.available_models() lists available checkpoints; clip.load(name) loads the model and preprocessing function.\nLimitations: The training dataset WIT (WebImageText) is proprietary, and the largest model tops out at ViT-L/14. These two gaps drove most of the follow-on research.\nOpenCLIP — The De Facto CLIP Hub mlfoundations/open_clip (13.6k stars) is an open-source reimplementation of CLIP that has become the ecosystem\u0026rsquo;s central hub. It provides 300+ pretrained models trained on public large-scale datasets like LAION-2B and DataComp-1B.\nPerformance comparison:\nModel Training Data Resolution Samples Seen ImageNet Zero-Shot ViT-B-16 DataComp-1B 224px 13B 73.5% ViT-L-14 DataComp-1B 224px 13B 79.2% ViT-H-14 LAION-2B 224px 32B 78.0% ViT-bigG-14 LAION-2B 224px 34B 80.1% ViT-L-14 (OpenAI original) WIT 224px 13B 75.5% OpenCLIP\u0026rsquo;s ViT-L-14 beats the original OpenAI model with the same architecture by 3.7 percentage points. Same architecture, different data — that delta is a clear demonstration of how much data curation matters.\nMetaCLIP, SigLIP, and MobileCLIP variants are all loadable through OpenCLIP\u0026rsquo;s unified open_clip.create_model_and_transforms() interface, meaning you can swap models in benchmarking experiments without changing any code.\nMetaCLIP2 — Multilingual Scaling and NeurIPS 2025 Spotlight facebookresearch/metaclip (1.8k stars) is a Meta FAIR project whose primary contribution is making CLIP\u0026rsquo;s data curation pipeline reproducible. The latest MetaCLIP2 (\u0026ldquo;worldwide\u0026rdquo;) earned a NeurIPS 2025 Spotlight.\nMetaCLIP2\u0026rsquo;s most important finding: English and non-English data mutually reinforce each other. Previous multilingual CLIP models suffered from the \u0026ldquo;curse of multilinguality\u0026rdquo; — adding more languages degraded English performance. MetaCLIP2 sidesteps this by designing the curation pipeline to be multilingual from the ground up.\nAcademic recognition:\nICLR 2024 Spotlight (MetaCLIP 1.0) CVPR 2024, EMNLP 2024 (Altogether synthetic captions) NeurIPS 2025 Spotlight (MetaCLIP2 Worldwide) Distillation models, training code, and evaluation code are all publicly available. The model is directly usable via HuggingFace and OpenCLIP. For a Korean-language image search project, the finding that multilingual CLIP outperforms English-only models is directly actionable for model selection.\nMobileCLIP2 — State of the Art on Device apple/ml-mobileclip (1.5k stars) is Apple\u0026rsquo;s lightweight CLIP model built on Multi-Modal Reinforced Training. MobileCLIP2 earned TMLR 2025 Featured Certification.\nThe benchmarks are strong:\nMobileCLIP2-S4 matches SigLIP-SO400M/14 accuracy with 2x fewer parameters, and delivers 2.5x lower latency than DFN ViT-L/14 on iPhone 12 Pro Max.\nWhat sets it apart from other CLIP variants: it ships with an iOS app demo (ios_app/) that runs real-time zero-shot image classification in Swift directly on device. The training code is OpenCLIP-based, using DFNDR and DataCompDR datasets.\nThe core technique — Multi-Modal Reinforced Training — distills knowledge from a large teacher model into a lightweight student while applying reinforcement simultaneously on both image and text modalities. The large-scale data generation code lives in a separate repo (ml-mobileclip-dr).\nSigLIP and the HuggingFace Embedding Ecosystem Google\u0026rsquo;s SigLIP (Sigmoid Loss for Language-Image Pre-Training) replaces CLIP\u0026rsquo;s softmax contrastive loss with sigmoid loss. google/siglip-so400m-patch14-384 is the flagship model, available in a 10-model HuggingFace collection.\nSigLIP\u0026rsquo;s advantage: less sensitivity to batch size. The original CLIP benefits from very large batches because the softmax is computed across all pairs. Sigmoid loss treats each pair independently, reducing batch size dependence.\nNavigating the HuggingFace Model Hub Three hubs worth exploring:\nImage Feature Extraction Models — CLIP-family models dominate the trending list. Filter by pipeline_tag=image-feature-extraction to find actively maintained models Zero-Shot Image Classification Models — label-free image classifiers, predominantly CLIP-based MTEB Leaderboard — Massive Text Embedding Benchmark evaluating text embedding performance across 38 datasets. Not directly comparable to image embeddings, but useful for gauging the text-side performance of multimodal models Model Selection Criteria Putting the research together for the hybrid-search project:\nCriteria Best Model Reason Accuracy first OpenCLIP ViT-bigG-14 80.1% ImageNet Multilingual (Korean) MetaCLIP2 SoTA multilingual performance Mobile deployment MobileCLIP2-S4 SigLIP-equivalent, 2x lighter General-purpose + ecosystem OpenCLIP ViT-L-14 79.2%, broadest support Quick Links HuggingFace Image Feature Extraction Models HuggingFace Zero-Shot Classification Models MTEB Leaderboard Key Takeaways Four years after OpenAI CLIP, the ecosystem has matured remarkably. OpenCLIP serves as the unified hub, and research from Meta, Apple, and Google has converged onto a single interface. Model selection is no longer \u0026ldquo;which CLIP?\u0026rdquo; but \u0026ldquo;which axis are you optimizing?\u0026rdquo; — accuracy, multilingual coverage, mobile efficiency, and trainability each point to a different winner. MetaCLIP2\u0026rsquo;s mutual reinforcement finding between languages is directly applicable to Korean image search, and MobileCLIP2\u0026rsquo;s mobile optimization is worth revisiting when the project moves toward app deployment.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-clip-ecosystem/cover-en.jpg","permalink":"/posts/2026-03-25-clip-ecosystem/","title":"The CLIP Ecosystem — From OpenAI to MetaCLIP2 and MobileCLIP2"},{"content":"Overview LLM-based stock trading agents have exploded in 2025–2026. The field has moved well past simple sentiment analysis: multi-agent architectures now simulate entire trading firms, and hybrid LLM+RL systems handle real-time risk management. This post analyzes four major open-source frameworks and three academic papers, distilling the insights most relevant to building a practical trading agent.\nTradingAgents — Simulating a Trading Firm with LLMs TradingAgents is a multi-agent trading framework from UCLA and MIT researchers with 40,795 GitHub stars — the largest community in the LLM trading space.\nArchitecture: Replicating Trading Firm Org Structure The central idea is to implement the division of labor in a real trading firm using LLM agents.\ngraph TD A[\"Market Data\"] --\u003e B[\"Analyst Team\"] B --\u003e C[\"Fundamental Analyst\"] B --\u003e D[\"Sentiment Analyst\"] B --\u003e E[\"Technical Analyst\"] C --\u003e F[\"Researcher Team\"] D --\u003e F E --\u003e F F --\u003e G[\"Bull Researcher\"] F --\u003e H[\"Bear Researcher\"] G --\u003e I[\"Debate \u0026lt;br/\u0026gt; Protocol\"] H --\u003e I I --\u003e J[\"Trader Agents\"] J --\u003e K[\"Risk Management Team\"] K --\u003e L[\"Final Decision\"] Analyst Team: Dedicated agents for fundamental, sentiment, and technical analysis Researcher Team: Bull and Bear perspectives debate market conditions Trader Agents: Agents with varying risk appetites Risk Management Team: Monitors position exposure and ratifies final decisions Backtest Results Backtesting shows meaningful improvements over baselines across cumulative returns, Sharpe ratio, and maximum drawdown. The Bull/Bear debate protocol consistently produces more balanced judgments than single-opinion agents.\nTechnical Details 229K lines of Python, currently at v0.2.2. Recent commits include 5-tier rating system standardization, portfolio manager refactoring, and exchange formula ticker preservation.\nPrimoAgent — Multi-Agent Stock Analysis PrimoAgent applies the multi-agent architecture specifically to the analysis pipeline rather than execution. Whereas TradingAgents covers the full cycle through trade execution, PrimoAgent focuses on generating the research.\nEach agent handles a different analytical domain (financial statements, news sentiment, technical indicators) and the results are combined into a unified research report. This fits small teams or individual investors looking to automate institutional-grade research processes.\nAlpacaTradingAgent — LLM Financial Trading Agent AlpacaTradingAgent combines the Alpaca Markets API with LLM-driven decision making to execute actual trades — distinguishing it from academic frameworks that stay in backtesting. The Alpaca paper trading API lets you validate strategies against live market data without risk, with a clear path to live trading.\nstock-analysis-agent — Korean Market Research Automation stock-analysis-agent uses Claude Code to automate institutional-grade research for Korean and US stocks. Its key differentiator is native support for Korean market data sources (DART electronic disclosure, Naver Finance, etc.).\nAs covered in a previous analysis, this project addresses Korean stock market data accessibility through an LLM + MCP architecture.\nStockBench — Can LLM Agents Actually Make Money? Tsinghua University\u0026rsquo;s StockBench benchmark confronts the question directly: \u0026ldquo;Can LLM agents trade profitably in real markets?\u0026rdquo;\nBenchmark Design StockBench constructs a backtesting environment on real market data with a standardized agent workflow.\ngraph LR A[\"Market data collection\"] --\u003e B[\"LLM analysis\"] B --\u003e C[\"Trade decision\"] C --\u003e D[\"Order execution\"] D --\u003e E[\"Portfolio evaluation\"] E --\u003e F[\"Performance measurement\"] F --\u003e|\"Next trading day\"| AKey Findings Universe size matters: LLM agent performance tends to degrade as the number of stocks increases Workflow error analysis: Classification of error types in the decision process Data source contribution: Ablation study on which data sources have the largest impact on returns StockBench matters because it rigorously evaluates real-world applicability rather than treating \u0026ldquo;LLMs can trade profitably\u0026rdquo; as a given. It\u0026rsquo;s a scientific validation tool for the field\u0026rsquo;s claims.\nLLM + Reinforcement Learning: Three 2025 Papers From the AI for Life blog, three major 2025 LLM+RL trading papers:\n1. FinRL-DeepSeek: Risk-Aware RL with LLM Signals A hybrid trading agent combining deep RL with LLM news analysis signals. Extends CVaR-Proximal Policy Optimization (CPPO) by injecting daily LLM-generated investment recommendations and risk assessment scores into the RL agent.\nThe key: instead of simple sentiment, it prompts LLMs (DeepSeek V3, Qwen-2.5, Llama 3.3) to extract nuanced risk/reward insights from news. Backtesting on the Nasdaq-100 from 1999–2023 shows significantly improved risk management performance.\n2. FLAG-Trader: Gradient-Level LLM and RL Integration Integrates LLM language understanding and RL sequential decision-making at the gradient level. The LLM processes market text data; the RL agent learns to trade on top of those representations.\n3. Stock-Evol-Instruct: LLM-Guided RL Trading Guides RL agent training with evolutionary instructions generated by an LLM. Uses natural language feedback from the LLM to sidestep the reward design difficulties that plague traditional RL.\ngraph TD A[\"Three LLM+RL Approaches\"] --\u003e B[\"FinRL-DeepSeek\"] A --\u003e C[\"FLAG-Trader\"] A --\u003e D[\"Stock-Evol-Instruct\"] B --\u003e E[\"LLM signals → RL input \u0026lt;br/\u0026gt; Risk-aware CPPO\"] C --\u003e F[\"Gradient-level integration \u0026lt;br/\u0026gt; Language + decision fusion\"] D --\u003e G[\"LLM instructions → RL guide \u0026lt;br/\u0026gt; Evolutionary learning\"] Connecting to My Own Project Comparing these frameworks against the trading-agent project I\u0026rsquo;m building:\nTradingAgents My trading-agent Market US equities Korean equities (KIS API) Agents 10+ (analysis + trading + risk) 6 (including news/macro) Data sources Yahoo Finance, Reddit DART, Naver, KIS Execution Backtesting-focused Live trading supported (MCP) UI CLI React dashboard TradingAgents\u0026rsquo; Bull/Bear debate protocol and StockBench\u0026rsquo;s benchmarking methodology are both worth adopting. In particular, the risk management team agent pattern and the DCF/PER valuation comparison directly connect to features currently in development.\nKey Takeaways The LLM trading agent ecosystem has converged on a clear pattern: multi-agent = trading firm simulation. Single LLM making all decisions is passé; specialized agents debating and reaching consensus consistently outperform. On the research side, LLM+RL hybrid approaches are becoming mainstream — combining LLM text understanding with RL sequential decision-making produces better risk-adjusted returns than either alone.\nStockBench\u0026rsquo;s emergence signals the field is maturing from demo-level to scientifically verifiable. For my own trading agent, TradingAgents\u0026rsquo; organizational structure patterns, StockBench\u0026rsquo;s evaluation framework, and FinRL-DeepSeek\u0026rsquo;s risk management methodology are all directly transferable to the Korean market context.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-llm-trading-agents-ecosystem/cover-en.jpg","permalink":"/posts/2026-03-25-llm-trading-agents-ecosystem/","title":"The LLM Trading Agent Ecosystem — TradingAgents, StockBench, FinRL-DeepSeek"},{"content":"Overview Previous: #5 — Backend Stabilization and Data Pipeline Improvements\nThis sprint (#6) ran three major work streams across 35 commits. First, a sixth expert (news/macro analyst) was added to the signal pipeline and DART-based analysis was significantly expanded. Second, advanced analysis features — DCF valuation, portfolio risk, signal history — were implemented. Third, the frontend received a large-scale expansion: ScheduleManager, SignalDetailModal, and investment memo export.\nSignal Pipeline Expansion News/Macro Analyst — The Sixth Expert Added a news/macro analyst to the existing five-expert lineup (technical, fundamental, sentiment, flow, risk). Uses Google News RSS as a fallback to improve news collection reliability.\ngraph TD A[\"Signal Pipeline\"] --\u003e B[\"Technical Analyst\"] A --\u003e C[\"Fundamental Analyst\"] A --\u003e D[\"Sentiment Analyst\"] A --\u003e E[\"Flow Analyst\"] A --\u003e F[\"Risk Analyst\"] A --\u003e G[\"News/Macro Analyst\"] G --\u003e H[\"Google News RSS\"] G --\u003e I[\"DART Disclosures\"] B --\u003e J[\"Combined Signal\"] C --\u003e J D --\u003e J E --\u003e J F --\u003e J G --\u003e JMajor DART Data Expansion Significantly expanded the data pulled from the DART electronic disclosure system:\nInsider trading data (feat: add DART insider trading data) — executive buy/sell trends Foreign/institutional investor trends (feat: add foreign/institutional investor trend) — fund flow analysis Catalyst calendar (feat: add catalyst calendar with DART disclosures) — earnings announcements and disclosure schedules visualized as a timeline UI Peer comparison (feat: add peer comparison with sector-based DART valuation) — sector-level valuation benchmarking 8 New Database Tables Added 8 tables, an ANALYST role, and metadata initialization in a single migration to support the new analysis features.\nAdvanced Analysis Features DCF Valuation Implemented a Discounted Cash Flow valuation module with sensitivity tables and heatmaps that visualize the fair value range across WACC and growth rate combinations.\n# DCF sensitivity heatmap core logic for wacc in wacc_range: for growth in growth_range: intrinsic_value = calculate_dcf(fcf, wacc, growth, terminal_growth) heatmap[wacc][growth] = intrinsic_value Portfolio Risk Analysis Calculates VaR (Value at Risk), beta, and sector concentration from real portfolio data. Renders a correlation matrix heatmap for cross-position correlation analysis.\nVaR: Historical simulation approach, 95%/99% confidence intervals for maximum loss estimation Beta: Portfolio beta relative to KOSPI200 Sector concentration: Collects KOSPI200 sector data from Naver Finance for sector distribution analysis Signal History Snapshots Added point-in-time signal storage and a timeline feature for comparing against historical signals.\nFrontend Expansion ScheduleManager Implemented a schedule management component with cron editing and a run-now button. Includes agent name display, friendly task labels, and sorting by cron time (hour:minute).\nSignalDetailModal Added a detail modal that lets users drill down from a signal into associated order history. Includes expert opinion expansion, risk_notes display, and compact/expanded view toggle.\nInvestment Memo Export Added investment memo generation in HTML and DOCX formats based on signal data. Uses python-docx for Word document export.\nOther UI Improvements Component Change OrderHistory Show fill_price, order_type, signal link PositionsTable Add market_value column ReportViewer Trade PnL column, rr_score color coding DashboardView Handle report.generated event Settings initial_capital, min_rr_score configuration MCP Middleware Fix Discovered that ctx.set_state() and ctx.get_state() are async methods but were being called without await in Session 1, causing repeated \u0026ldquo;MCP tool call failed\u0026rdquo; errors in server logs.\n# Before ctx.set_state(factory.CONTEXT_STARTED_AT, started_dt.strftime(\u0026#34;%Y-%m-%d %H:%M:%S\u0026#34;)) # After await ctx.set_state(factory.CONTEXT_STARTED_AT, started_dt.strftime(\u0026#34;%Y-%m-%d %H:%M:%S\u0026#34;)) Also added auto-reconnect logic so MCP connection failures recover automatically.\nUnit Tests Added unit tests for the DCF valuation and portfolio risk services.\nCommit Log Message Area feat: sort schedule tasks by cron time ascending UI feat: show agent name and friendly task labels in ScheduleManager UI style: align new components with existing design system UI fix: use import type for ScheduledTask (Vite ESM) FE feat: add Google News RSS fallback for news stability BE feat: add compact/expanded view toggle to SignalCard UI feat: add DOCX investment memo export BE feat: add real portfolio beta and correlation heatmap BE feat: add DCF sensitivity heatmap table UI test: add unit tests for DCF and portfolio risk TEST feat: populate kospi200 sector data from NAVER BE fix: await async MCP context methods + auto-reconnect BE fix: replace explicit any types FE feat: add investment memo HTML export BE feat: add VaR, beta, sector concentration risk BE feat: add DCF valuation with sensitivity table BE feat: add signal history snapshots and timeline BE+FE feat: add peer comparison with DART valuation BE feat: add news/macro analyst as 6th expert BE feat: add catalyst calendar with DART disclosures BE+FE feat: add DART insider trading data BE feat: add foreign/institutional investor trend BE feat: add 8 new DB tables, ANALYST role, metadata init BE fix: resolve lint errors in DashboardView/SignalCard FE feat: add report.generated event handling FE feat: add initial_capital and min_rr_score to settings BE+FE feat: add ScheduleManager with cron editing FE feat: add trade PnL column and rr_score color coding FE feat: add SignalDetailModal with orders drilldown FE feat: add expert opinion expansion and risk_notes FE feat: use correct performance endpoint with selector FE feat: add market_value to PositionsTable FE feat: show fill_price, order_type, signal link FE feat: add missing type fields FE feat: add missing API service functions FE Key Takeaways This sprint was a large-scale expansion that simultaneously deepened the analysis layer and raised the frontend polish. Adding the sixth expert rounds out the signal pipeline for more balanced decision-making. DCF valuation, VaR, and beta give the system institutional-grade analytical tools — it\u0026rsquo;s evolving from a signal generator into a comprehensive investment analysis platform. Expanding DART data coverage to insider trading, fund flows, and disclosure calendars sharpens the differentiation of this agent for the Korean equity market.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-trading-agent-dev6/cover-en.jpg","permalink":"/posts/2026-03-25-trading-agent-dev6/","title":"Trading Agent Dev Log #6 — Deeper Analysis and Major Frontend Expansion"},{"content":"Overview Building websites with vibe coding has never been easier — but security is still on you. Based on Techroute Alex\u0026rsquo;s video AI-Generated Code Shipped Directly Will Get You Hacked, this post covers a systematic approach to checking AI-generated code for security vulnerabilities and the automated scanning tools that help.\nThe 4 Layers of Web Security Web application security broadly spans four zones:\ngraph TD A[\"Network Security\"] --\u003e B[\"Apply HTTPS \u0026lt;br/\u0026gt; Prevent data snooping\"] C[\"Server Security\"] --\u003e D[\"OS security patches \u0026lt;br/\u0026gt; Install security solutions\"] E[\"DB Security\"] --\u003e F[\"Password hashing \u0026lt;br/\u0026gt; Encrypt personal data\"] G[\"Application Security\"] --\u003e H[\"OWASP Top 10 \u0026lt;br/\u0026gt; Code-level vulnerabilities\"]If you\u0026rsquo;re just deploying a simple static page (HTML/CSS/JS), network/server/DB security is relatively straightforward. But application security — vulnerabilities hiding inside your code — requires deliberate attention regardless.\nOWASP Top 10 — The Web Security Threats You Must Know OWASP (Open Worldwide Application Security Project) publishes the major web application security threats annually.\n1. Broken Access Control Users without proper authorization can access other users\u0026rsquo; data or functionality. Occurs when authorization checks are missing from API calls.\n2. Cryptographic Failures Storing passwords in plain text, or using weak hashing algorithms.\n3. Injection Injecting malicious code into SQL queries, OS commands, or LDAP queries to force execution. One of the most frequently found vulnerabilities in AI-generated code.\n4. Insecure Design Focusing exclusively on feature implementation while ignoring security architecture.\n5. Security Misconfiguration Unchanged default passwords, unnecessary features left enabled, error messages leaking sensitive information.\n6. Vulnerable and Outdated Components Using libraries or packages with known vulnerabilities.\n7. Identification and Authentication Failures Poor session management, weak password policies, no protection against brute-force attacks.\n8. Software and Data Integrity Failures Not verifying the integrity of code or dependencies in CI/CD pipelines.\n9. Security Logging and Monitoring Failures Systems unable to detect attack attempts.\n10. Server-Side Request Forgery (SSRF) Manipulating a server into making requests to attacker-controlled URLs.\nSecurity Mistakes AI Commonly Makes Patterns to watch out for especially in vibe coding:\ngraph LR A[\"AI-generated code\"] --\u003e B[\"SQL Injection \u0026lt;br/\u0026gt; f-string queries\"] A --\u003e C[\"XSS \u0026lt;br/\u0026gt; innerHTML usage\"] A --\u003e D[\"Hardcoded \u0026lt;br/\u0026gt; API keys / passwords\"] A --\u003e E[\"CORS * \u0026lt;br/\u0026gt; Allow all origins\"] A --\u003e F[\"Plaintext \u0026lt;br/\u0026gt; password storage\"] SQL Injection: f\u0026quot;SELECT * FROM users WHERE id = {user_id}\u0026quot; — no parameter binding XSS: element.innerHTML = userInput — user input directly injected as HTML Hardcoded secrets: API_KEY = \u0026quot;sk-abc123...\u0026quot; — environment variables not used Wildcard CORS: Access-Control-Allow-Origin: * — all origins allowed Plaintext storage: passwords stored directly in the DB without hashing Automated Security Scanning Tools As shown in the video, tools that automatically scan for security issues given a URL are available and practical.\nStatic Analysis (SAST) Analyze code directly for vulnerabilities:\nSemgrep: pattern-matching based security scanner Bandit: Python-specific security analyzer ESLint Security Plugin: JavaScript security rules Dynamic Analysis (DAST) Scan a running application:\nOWASP ZAP: free web application security scanner Nikto: web server vulnerability scanner Dependency Vulnerability Scanning Check known vulnerabilities in libraries you\u0026rsquo;re using:\nnpm audit / pip audit / safety check Snyk: SCA (Software Composition Analysis) tool Integrating Security Checks into Claude Code Ways to strengthen security when writing code with Claude Code:\nSpecify security rules in CLAUDE.md: \u0026ldquo;Always use parameter binding for SQL queries,\u0026rdquo; \u0026ldquo;Always sanitize user input\u0026rdquo; Add security perspective to code reviews: request OWASP Top 10 analysis when running /review Automate pre-deploy scanning: integrate Semgrep or Bandit into the CI/CD pipeline Separate environment variables: add .env to .gitignore, access secrets only through environment variables Insights It\u0026rsquo;s easy to get swept up in vibe coding\u0026rsquo;s convenience and overlook security. AI-generated code can be functionally correct while still containing OWASP Top 10 vulnerabilities. SQL injection, XSS, and hardcoded secrets are the mistakes AI makes most often. Running a single automated scan with Semgrep or OWASP ZAP before deployment catches most basic vulnerabilities. Security isn\u0026rsquo;t a step you add after writing code — it\u0026rsquo;s a baseline concern that should be factored in from the moment you write the prompt.\n","date":"2026-03-25T00:00:00+09:00","image":"/images/posts/2026-03-25-ai-code-security/cover-en.jpg","permalink":"/posts/2026-03-25-ai-code-security/","title":"Vibe Coding Security Checklist — OWASP Top 10 and Automated Scanning"},{"content":"Overview Previous post: Claude Code Practical Guide — Context Management and Workflows covered the core strategies: CLAUDE.md, Lazy Loading, TDD workflows, and sub-agents. Just five days later, here\u0026rsquo;s a follow-up — because the volume of new features Claude Code has shipped in the past two months is overwhelming. Based on Cole Medin\u0026rsquo;s video You\u0026rsquo;re Hardly Using What Claude Code Has to Offer, it\u0026rsquo;s Insane, this post covers 9 key new features not addressed in the previous guide.\ngraph TD subgraph PREV[\"Previous Post\"] A[\"CLAUDE.md / MEMORY.md\"] B[\"Lazy Loading\"] C[\"Plan Mode\"] D[\"TDD Workflow\"] E[\"Sub-agents x16\"] F[\"Hooks\"] end subgraph NEW[\"This Post\"] G[\"1M Context Window\"] H[\"Native Git Worktrees\"] I[\"/simplify, /batch\"] J[\"Remote Control\"] K[\"Auto-memory\"] L[\"/btw, /loop, /voice\"] M[\"Effort Levels\"] N[\"Scheduled Tasks\"] end PREV -.-\u003e|\"built on top of\"| NEW style PREV fill:#e8eaf6 style NEW fill:#e8f5e91M Context Window — But 250K Is the Real Limit Both Sonnet and Opus now have GA (Generally Available) 1M token context windows — room for roughly 750,000 words in short-term memory. In theory, you could load an entire codebase at once.\nIn practice, the limit is much lower. From Cole Medin\u0026rsquo;s repeated testing: hallucinations increase sharply beyond 250K–300K tokens. Use /context regularly to check your token usage, and when approaching 250K, either compact memory or write a handoff prompt and start a fresh session.\nThe previous post said \u0026ldquo;Context is milk — it goes bad over time.\u0026rdquo; That principle holds even with a 1M window. A fresh 200K beats a bloated 500K.\nNative Git Worktree Support The previous post covered manually running git worktree commands. Now Claude Code manages worktrees natively.\n# Before: manually create worktree, then run claude in each one git worktree add ../project-feature-a feature-a cd ../project-feature-a \u0026amp;\u0026amp; claude # Now: create directly within Claude Code # Auto-managed under .claude/worktrees/ The key change is that worktrees are automatically managed under .claude/worktrees/. You can create and switch between worktrees without any separate git commands, and work independently in each. Since real development always involves juggling multiple feature branches and PRs simultaneously, this significantly lowers the barrier to parallel work.\n/simplify — Fighting Over-Engineering One of the most common problems with LLM-generated code is over-engineering — unnecessary abstractions, excessive error handling, pointless utility functions creeping in. /simplify is a built-in command Anthropic developed internally and recently made public.\nRun /simplify right after completing an implementation, and Claude will review the code and remove unnecessary complexity. It replaces manually typing \u0026ldquo;this is getting too complicated, can you simplify it?\u0026rdquo; every single time.\n/batch — Parallel Processing for Large Refactors /batch handles large-scale changes in parallel by splitting the work across multiple sub-agents internally.\n/batch replace all console.log calls with structured logger from utils/logger That single line gets Claude to:\nScan the codebase for all console.log calls Distribute the work across sub-agents Each agent runs transformations in parallel Aggregate results and create a PR This is ideal for large-scale migrations, linting rule changes, or API version upgrades — anything that\u0026rsquo;s \u0026ldquo;simple but affects many files.\u0026rdquo;\nRemote Control — Controlling Your Desktop from Your Phone One of the most impressive new features. Run /remote-control in a Claude Code session and a cloud session is created. You can then connect to that session from the Claude mobile app.\nsequenceDiagram participant Phone as Claude App (Phone) participant Cloud as Cloud Session participant Desktop as Claude Code (Desktop) Desktop-\u003e\u003eCloud: /remote-control started Cloud--\u003e\u003ePhone: Session synced Phone-\u003e\u003eCloud: Send message Cloud-\u003e\u003eDesktop: Reflected in real time Desktop--\u003e\u003eCloud: Execution result Cloud--\u003e\u003ePhone: Result displayedMessages sent from your phone are reflected in real time in the desktop Claude Code session. You can check build status on the go, or issue simple correction instructions — development doesn\u0026rsquo;t stop just because you\u0026rsquo;re away from your desk.\nAuto-memory — Claude\u0026rsquo;s Self-Accumulating Memory The previous post covered separating CLAUDE.md (shared team rules) from MEMORY.md (personal learnings). Auto-memory goes a step further — Claude accumulates knowledge across sessions on its own.\nCLAUDE.md Auto-memory Managed by User (manual) Claude (automatic) Storage location Project root ~/.claude/memory/ Content Team rules, conventions Error patterns, project insights Determinism High (we control it) Low (Claude decides) Disableable N/A Yes Enabled by default, can be turned off. Cole Medin\u0026rsquo;s advice:\nWant maximum control → use CLAUDE.md only Want to give Claude autonomy → use CLAUDE.md + Auto-memory together In practice, running both in parallel is recommended. Periodically review what Auto-memory has accumulated, and promote useful items to CLAUDE.md.\n/btw — Quick Questions Without Polluting Context When you\u0026rsquo;re mid-task and a quick question pops up — \u0026ldquo;what does this library function actually do?\u0026rdquo; — asking in the main session unnecessarily inflates the context. /btw opens a sidecar conversation so you can ask without touching the main context.\n/btw What does CRUD stand for? → (read the answer) → Press Escape to close → Main session remains unchanged Note: Claude cannot use tools in /btw mode. For questions that need codebase exploration, use a sub-agent. Reserve /btw for simple knowledge questions only.\n/loop — Scheduled Repeated Tasks Runs a specific prompt on a recurring interval.\n# Check deployment status every 5 minutes /loop 5m check if the deployment finished and give me a status update # Run tests every 30 minutes /loop 30m run all tests and alert if any are failing Useful for CI/CD pipeline monitoring, periodic test runs, and polling external sites. A particularly powerful pattern: working in another Claude Code instance while /loop runs quality gates in the background.\n/voice — Native Voice Input /voice activates voice input. It\u0026rsquo;s significantly faster than typing when doing a brain dump in Plan mode.\nCole Medin notes that external tools like Aqua Voice, WhisperFlow, and Whispering (open source) are still slightly more accurate than the native option — but the zero-friction, no-install experience makes /voice the easy default.\nEffort Levels — Controlling Token Usage You can tune the model\u0026rsquo;s reasoning depth. Adjust with /effort or the left/right arrow keys at session start.\nLevel Best for Token usage Low Simple fixes, formatting Minimal Medium (default) General coding, bug fixes Moderate High Complex problem solving High Max (Opus only) Extremely hard debugging Maximum To avoid hitting the 5-hour or weekly token limit, aggressively use Low for simple tasks and reserve High/Max for genuinely difficult problems.\nScheduled Tasks \u0026amp; Cron Jobs Where /loop repeats within a session, Scheduled Tasks operate outside of any session.\nOne-time reminders: \u0026ldquo;Remind me to push the release branch at 3pm\u0026rdquo; Cron Jobs: Schedule recurring tasks — generate a daily morning code quality report, check a specific API\u0026rsquo;s status every hour, and so on. Insights The previous post said \u0026ldquo;Claude Code isn\u0026rsquo;t a tool — it\u0026rsquo;s a system.\u0026rdquo; These new features expand that system across both time and space.\nSpatial expansion: Remote Control extends beyond the desktop, Git Worktrees beyond a single branch, /batch beyond a single file.\nTemporal expansion: Auto-memory enables learning across sessions; /loop and Scheduled Tasks keep work going even when you step away.\nReduced cognitive load: /simplify fights over-engineering, /btw prevents context pollution, Effort Levels reduce token waste.\nIf the features from the previous post — CLAUDE.md, Plan mode, TDD, sub-agents — are physical fitness, the features in this post are equipment upgrades. Great gear without the fundamentals won\u0026rsquo;t help, but with a solid foundation, these tools genuinely push productivity to the next level.\nSource: You\u0026rsquo;re Hardly Using What Claude Code Has to Offer, it\u0026rsquo;s Insane — Cole Medin\n","date":"2026-03-24T00:00:00+09:00","image":"/images/posts/2026-03-24-claude-code-new-features/cover-en.jpg","permalink":"/posts/2026-03-24-claude-code-new-features/","title":"Claude Code Practical Guide Part 2 — 9 New Features from the Last Two Months"},{"content":"Overview Previous post: #3 — Search Pipeline Improvements and Generated Image Comparison Mode\nThis #4 entry covers 23 commits across three major workstreams:\nmain.py router extraction — splitting a bloated single file into 5 route modules Terraform dev server — a cost-efficient dev environment on AWS EC2 + Lambda Scheduler Inpaint editor — from Figma design through Canvas-based mask editor, API, and DB migration main.py Router Extraction Why Extract During code review, it became clear that main.py was handling app initialization, global state, and route handlers all in one file. This caused frequent merge conflicts and made navigation painful. Generation, search, and image-related endpoints were all mixed together, so every new feature (like Inpaint) bloated the diff.\nHow It Was Done Using FastAPI\u0026rsquo;s APIRouter, I extracted the file sequentially into 5 modules:\nbackend/src/routes/ ├── meta.py # health, API info, frontend serving ├── images.py # GET /images, upload, selection logging ├── search.py # POST /search, hybrid/simple ├── history.py # GET /api/history/generations ├── generation.py # POST /api/generate-image ├── edit.py # POST /api/edit-image (added later) └── auth.py # Google OAuth (existing) Each route module accesses global state (images_data, hybrid_pipeline, etc.) by importing from backend.src import main as app_module inside function bodies. This avoids circular imports while keeping things lean without a separate DI container.\nAfter extraction, main.py shrunk to roughly 140 lines — just app creation, lifespan, CORS, and router registration.\nRefactoring Principles Extract one module at a time: meta → images → search → history → generation, with a working commit after each No URL path changes: prefix settings preserve the API contract Consistent import pattern: all route modules access global state the same way Terraform Dev Server Background Local development was slow for ML model loading and image processing, and there were environment differences across team members. The goal was to spin up a dev server on AWS that automatically shuts down during off-hours to cut costs.\nArchitecture Decision We debated creating a new VPC versus sharing the existing prod VPC. The decision was Option B: share the VPC, separate Security Groups. For a small project, managing two VPCs is overhead, and Security Group isolation is sufficient.\ngraph TB subgraph VPC[\"VPC 10.0.0.0/16\"] subgraph SG_PROD[\"Prod Security Group\"] EC2_PROD[\"EC2 Prod\u0026lt;br/\u0026gt;t3.medium\"] end subgraph SG_DEV[\"Dev Security Group\"] EC2_DEV[\"EC2 Dev\u0026lt;br/\u0026gt;t3.medium\"] end end EIP_PROD[\"Elastic IP Prod\"] --\u003e EC2_PROD EIP_DEV[\"Elastic IP Dev\"] --\u003e EC2_DEV subgraph SCHEDULER[\"Lambda Scheduler\"] EB_START[\"EventBridge\u0026lt;br/\u0026gt;cron 10:00 KST\"] --\u003e|start| LAMBDA[\"Lambda\u0026lt;br/\u0026gt;ec2_scheduler\"] EB_STOP[\"EventBridge\u0026lt;br/\u0026gt;cron 22:00 KST\"] --\u003e|stop| LAMBDA end LAMBDA --\u003e|\"StartInstances\u0026lt;br/\u0026gt;StopInstances\"| EC2_DEVKey Resources Resource Purpose aws_security_group.dev_sg SSH(22), HTTP(80), Backend(8000), Vite(5173) aws_instance.dev_server Ubuntu 24.04 LTS, t3.medium, gp3 40GB aws_eip.dev_eip Static IP (persists across restarts) aws_key_pair.dev_key Dev-only SSH key pair aws_lambda_function EC2 start/stop execution aws_scheduler_schedule (start) Daily start at 10:00 KST aws_scheduler_schedule (stop) Daily stop at 22:00 KST Lambda Scheduler Design EventBridge Scheduler passes action and instance_id as a JSON payload when invoking Lambda. The Lambda simply calls start_instances or stop_instances via boto3.\nIAM follows least-privilege: the Lambda Role allows only ec2:StartInstances, ec2:StopInstances, and ec2:DescribeInstances on that specific instance, and the EventBridge Scheduler Role can only invoke that specific Lambda function.\nRunning 12 hours a day (10:00–22:00) achieves roughly 50% cost savings compared to 24/7.\nInpaint Editor Implementation Design Process After reviewing the InpaintFullPage design in Figma, I wrote a design spec and implementation plan first. The overall flow:\nsequenceDiagram participant User participant Editor as InpaintEditor (React) participant API as FastAPI participant Gemini as Gemini Imagen 3 User-\u003e\u003eEditor: Click Edit on generated image Editor-\u003e\u003eEditor: Draw mask on Canvas User-\u003e\u003eEditor: Enter prompt + Generate Editor-\u003e\u003eAPI: mask_file + source_filename + prompt API-\u003e\u003eGemini: source image + mask + prompt Gemini--\u003e\u003eAPI: edited image API--\u003e\u003eEditor: EditImageResponse Editor--\u003e\u003eUser: Display resultBackend Changes DB Migration: Added an is_inpaint Boolean column to generation_logs to distinguish inpaint generations from regular ones.\nSchemas: Defined EditImageRequest and EditImageResponse. Request takes source_filename, prompt, swap_filename, parent_generation_id, and image_count — with mask sent as a separate multipart File.\nSince JSON data and a file need to be sent together, the JSON payload is received as a Form string field and parsed. This is a workaround for FastAPI\u0026rsquo;s limitation that File and Body (JSON) cannot be used simultaneously.\nService: Added a generate_edit_image helper that mirrors the structure of the existing generate_single_image but includes source image + mask image + (optional) swap image in the Gemini API contents.\nFrontend: InpaintEditor Component Implemented a Canvas-based mask editing component. Key features:\nDraw brush masks as a Canvas overlay on top of the source image Adjustable brush size Undo/Clear support Export mask area as white PNG and send to the API Added editingImage state to App.tsx, wired to display InpaintEditor when the Edit button is clicked on a generated image.\nCommit Log # Commit message Summary 1 refactor: extract routes/meta.py with APIRouter meta route extracted 2 refactor: extract routes/images.py with APIRouter images route extracted 3 refactor: extract routes/search.py with APIRouter search route extracted 4 refactor: extract routes/history.py with APIRouter history route extracted 5 refactor: extract routes/generation.py with APIRouter generation route extracted 6 docs: add dev server Terraform design spec Terraform design doc 7 docs: add dev server Terraform implementation plan Terraform implementation plan 8 chore: add SSH key pair public keys Public key files 9 feat: manage SSH key pairs via Terraform aws_key_pair SSH key pair in Terraform 10 feat: add dev security group Dev-only Security Group 11 feat: add dev EC2 instance and Elastic IP dev EC2 t3.medium + EIP 12 feat: add dev Lambda scheduler with IAM role and policy Lambda + IAM 13 feat: add dev EventBridge scheduler (10:00-22:00 KST) EventBridge cron 14 docs: add inpaint \u0026amp; swap feature design spec Feature design doc 15 docs: add inpaint \u0026amp; swap implementation plan Implementation plan 16 feat: add is_inpaint column to generation_logs Alembic migration 17 feat: add EditImageRequest/Response schemas Pydantic schemas 18 feat: add generate_edit_image service helper Gemini API helper 19 feat: add POST /api/edit-image endpoint edit.py route module 20 feat: add editImage API function and is_inpaint types frontend api.ts 21 feat: add InpaintEditor component Canvas mask editor 22 feat: integrate InpaintEditor with edit button and app state App.tsx integration 23 fix: correct image_count field name and add Form annotation Field name fix Insights Refactor before adding features. Extracting the main.py routers first meant that adding edit.py for the Inpaint editor required no changes to existing code — just registering a new module. Without the refactor, main.py would have grown by another 150 lines.\nTerraform can manage prod and dev in a single file. Without separate workspaces or directories, I declared both prod and dev resources in one main.tf. At this project scale, having everything in one file makes the whole infrastructure readable at a glance.\nFile + JSON in the same request is tricky in FastAPI. Since File and Body can\u0026rsquo;t be used simultaneously, the workaround is to receive the JSON payload as a Form string and parse it. This is a fundamental multipart form data constraint.\nLambda Scheduler is ideal for small dev servers. Rather than heavy solutions like AWS Instance Scheduler, a Lambda + EventBridge combo costs nearly nothing and is easy to manage with Terraform.\n","date":"2026-03-24T00:00:00+09:00","image":"/images/posts/2026-03-24-hybrid-search-dev4/cover-en.jpg","permalink":"/posts/2026-03-24-hybrid-search-dev4/","title":"Hybrid Image Search Dev Log #4 — Router Extraction, Terraform Dev Server, Inpaint Editor"},{"content":"Overview Previous post: #2 — Unified Skill Flow and \u0026ndash;since-last-run Tracking\nThis entry covers two major threads. First, feature improvements including YouTube oEmbed metadata and series continuity detection. Second, migrating the standalone skill living in .claude/skills/ to Claude Code\u0026rsquo;s plugin structure. This spans 9 commits across 7 sessions.\nYouTube oEmbed Metadata Improvements Previously, when a YouTube link was included in a blog post, only the title was fetched. Two things improved here.\noEmbed API integration. Calls to YouTube\u0026rsquo;s oEmbed endpoint now automatically collect metadata including thumbnail, author, and video title. This data is made available for use in Hugo shortcodes.\ntranscript-api v1.x migration. The youtube-transcript-api library shipped a major v1.x update with a breaking API change. Migrated from the old YouTubeTranscriptApi.get_transcript() call pattern to the new interface. This is a straightforward dependency update, but since transcript-based summarization is central to blog post generation, quick action was necessary.\nSeries Continuity Detection One of Log-Blog\u0026rsquo;s core features is managing series posts — when writing #1, #2, #3 about the same project, only commits since the previous post should be included.\nThe previous approach filtered commits by date. The problem: dates are imprecise. The publication date of a post can differ from the last working day, and timezone issues add further ambiguity.\nThe solution was simple: add a last_commit field to Hugo frontmatter, and have the sessions command read that SHA to collect only changes after that commit. The ambiguity of date parsing disappears, and each new post picks up exactly where the previous one left off.\nPlugin Migration The biggest undertaking in this development cycle — roughly 7 hours in the seventh session.\nWhy a Plugin Placing skill files directly in .claude/skills/ works, but has deployment and update limitations. Users have to manually copy files, and there\u0026rsquo;s no version management. Claude Code\u0026rsquo;s plugin system enables automated installation and updates.\nStructural Design graph TD A[\"Previous structure\u0026lt;br/\u0026gt;.claude/skills/log-blog.md\"] --\u003e B{\"Migration\"} B --\u003e C[\"plugin.json\u0026lt;br/\u0026gt;Plugin manifest\"] C --\u003e D[\"/logblog:post\u0026lt;br/\u0026gt;Post generation skill\"] C --\u003e E[\"/logblog:setup\u0026lt;br/\u0026gt;Initial setup skill\"] C --\u003e F[\"marketplace.json\u0026lt;br/\u0026gt;Distribution metadata\"] style A fill:#f9f,stroke:#333 style C fill:#bbf,stroke:#333 style D fill:#bfb,stroke:#333 style E fill:#bfb,stroke:#333 style F fill:#fbf,stroke:#333plugin.json Manifest This is the plugin entry point. The author field was initially a string, which failed schema validation — it needs to be an object ({ \u0026quot;name\u0026quot;: \u0026quot;...\u0026quot;, \u0026quot;url\u0026quot;: \u0026quot;...\u0026quot; }). A minor detail, but it\u0026rsquo;s the kind of thing that eats an entire commit.\nSkill Migration Renamed the existing /log-blog skill to /logblog:post. The colon (:) separator is Claude Code\u0026rsquo;s plugin namespace convention — the plugin name becomes the prefix, and the skill name follows the colon. The skill\u0026rsquo;s internal logic was preserved; only path references and invocation style were updated to match the plugin structure.\n/logblog:setup Skill A new addition. It automates end-to-end configuration for new users setting up a blog:\nVerify Hugo project structure Generate config files Create required directory structure Verify Git integration In the fifth session, calling /logblog:post failed because the plugin wasn\u0026rsquo;t installed yet — an expected outcome, but it confirmed the need for a setup skill.\nMarketplace Distribution marketplace.json is the metadata file for registering with the Claude Code plugin registry. It includes the plugin name, description, version, repository URL, and list of supported skills. Since the official marketplace isn\u0026rsquo;t active yet, direct installation via the GitHub repository URL is used for now. When the marketplace opens, this file is ready to go.\nCommit Log # Commit message Notes 1 feat: add YouTube oEmbed metadata and migrate to transcript-api v1.x Feature improvement 2 feat: detect series updates via last_commit SHA in sessions command Series continuity 3 docs: add logblog plugin design spec Design doc 4 docs: add logblog plugin implementation plan Implementation plan 5 feat: add logblog Claude Code plugin manifest plugin.json 6 feat: migrate /log-blog skill to /logblog:post in plugin structure Skill migration 7 feat: add /logblog:setup skill for end-to-end blog setup Setup skill 8 fix: plugin.json author field must be object, not string Schema fix 9 feat: add marketplace.json for plugin distribution Marketplace Insights Write docs first, code second. In the seventh session, design docs and an implementation plan were written before touching code. It was a long session at 412 minutes, but direction never wavered. This ordering is especially important when venturing into unfamiliar territory like plugin structure.\nValidate the schema, don\u0026rsquo;t guess. The wrong author field type in plugin.json is the textbook example. When working with new formats, check the examples or schema definition first.\nFailed calls create features. The failed invocation in session five became the motivation for building /logblog:setup. Experiencing firsthand what a first-time user would encounter is the most accurate form of requirements gathering.\nFollow ecosystem naming conventions. The change from /log-blog to /logblog:post isn\u0026rsquo;t just a rename. It\u0026rsquo;s adopting the namespace convention of the plugin ecosystem. Following community conventions over idiosyncratic naming pays off long-term.\n","date":"2026-03-24T00:00:00+09:00","image":"/images/posts/2026-03-24-log-blog-dev3/cover-en.jpg","permalink":"/posts/2026-03-24-log-blog-dev3/","title":"Log-Blog Dev Log #3 — From Skill to Plugin"},{"content":"Overview oh-my-openagent (formerly oh-my-opencode) is a model-agnostic agent orchestrator — not tied to any single LLM. With 42,810 GitHub stars, it has grown into a TypeScript-based project spanning 6M+ lines of code.\nIf oh-my-claudecode (OMC) — covered in a previous post — is a Claude Code-specific extension, oh-my-openagent takes a fundamentally different approach. The goal is to unify Claude, GPT, Kimi, GLM, Gemini, Minimax, and any other model behind a single interface.\nCore Philosophy — Rejecting Vendor Lock-in oh-my-openagent\u0026rsquo;s philosophy can be summed up in one line:\n\u0026ldquo;Anthropic wants you locked in. Claude Code\u0026rsquo;s a nice prison, but it\u0026rsquo;s still a prison.\u0026rdquo;\nClaude Code is a great tool. But it also traps users inside the Anthropic ecosystem. In fact, Anthropic has previously blocked API access for this project (then called OpenCode) — which paradoxically validated oh-my-openagent\u0026rsquo;s reason for existing. Depend on a single vendor, and the door can close at any time.\nThe project adopts the SUL-1.0 license, and Sisyphus Labs is building a commercial version.\nSubscription Cost Comparison The practical benefits of model-agnosticism show up in cost optimization:\nService Monthly cost Notes ChatGPT $20 GPT-4o based Kimi Code $0.99 Best value GLM $10 Mid-range Claude Pro $20 Includes Claude Code Being able to move between all of these models with a single tool is the point.\nArchitecture oh-my-openagent\u0026rsquo;s killer feature is the ultrawork command. A single command triggers the agent to automatically run code analysis, modification, testing, and linting across the full workflow.\ngraph TB USER[\"User\"] --\u003e|ultrawork command| ORCH[\"Agent Orchestrator\"] ORCH --\u003e ROUTER[\"Model Router\"] ROUTER --\u003e CLAUDE[\"Claude API\"] ROUTER --\u003e GPT[\"GPT API\"] ROUTER --\u003e KIMI[\"Kimi API\"] ROUTER --\u003e GLM[\"GLM API\"] ROUTER --\u003e GEMINI[\"Gemini API\"] ROUTER --\u003e MINIMAX[\"Minimax API\"] ORCH --\u003e TOOLS[\"Tool Layer\"] TOOLS --\u003e FS[\"File System\"] TOOLS --\u003e TERM[\"Terminal Execution\"] TOOLS --\u003e LINT[\"Linting / Testing\"] ORCH --\u003e COMPAT[\"Compatibility Layer\"] COMPAT --\u003e CC[\"Claude Code\"] COMPAT --\u003e AMP[\"AmpCode\"] COMPAT --\u003e CURSOR[\"Cursor\"] style ORCH fill:#e1f5fe style ROUTER fill:#fff3e0 style COMPAT fill:#f3e5f5Key Components Agent Orchestrator — analyzes the task and determines the best combination of model and tools Model Router — routes to Claude, GPT, Kimi, etc. based on the nature of the task Tool Layer — handles actual work: file system access, terminal execution, linting/testing Compatibility Layer — integrates with existing tools like Claude Code, AmpCode, and Cursor A recent commit improved stale timeout handling for background agents, increasing stability for long-running agent tasks.\nComparison with OMC oh-my-claudecode (OMC) and oh-my-openagent share a similar name but have entirely different philosophies and scope.\ngraph LR subgraph OMC[\"oh-my-claudecode (OMC)\"] direction TB OMC_STAR[\"GitHub Stars: 10,400\"] OMC_AUTHOR[\"by Yeachan-Heo\"] OMC_MODEL[\"Claude only\"] OMC_GOAL[\"Maximize Claude Code experience\"] end subgraph OOA[\"oh-my-openagent\"] direction TB OOA_STAR[\"GitHub Stars: 42,810\"] OOA_AUTHOR[\"by code-yeongyu\"] OOA_MODEL[\"6+ models supported\"] OOA_GOAL[\"Model-agnostic orchestration\"] end OMC -.-\u003e|\"Optimization within Claude ecosystem\"| CLAUDE_ONLY[\"Single model, deep\"] OOA -.-\u003e|\"Vendor independence strategy\"| MULTI_MODEL[\"Multi-model integration\"] style OMC fill:#e8eaf6 style OOA fill:#e8f5e9 style CLAUDE_ONLY fill:#c5cae9 style MULTI_MODEL fill:#c8e6c9 oh-my-claudecode (OMC) oh-my-openagent GitHub Stars 10,400 42,810 Models supported Claude only Claude, GPT, Kimi, GLM, Gemini, Minimax Philosophy Make Claude Code better Don\u0026rsquo;t be locked into any model Killer feature Claude-optimized prompts/workflows ultrawork unified command Language TypeScript TypeScript Approach Single model, depth Multi-model, breadth License MIT SUL-1.0 Commercialization Community-driven Sisyphus Labs in progress OMC assumes Claude is the best model and maximizes the Claude Code experience. oh-my-openagent assumes no single model is best for every task and returns model choice to the user. These aren\u0026rsquo;t competing projects — they\u0026rsquo;re answers to different questions.\nCommunity Response 42,810 stars speak for themselves. Some highlights from real user reviews:\n\u0026ldquo;Cancelled my Cursor subscription\u0026rdquo; — oh-my-openagent alone is enough, no separate IDE subscription needed \u0026ldquo;Cleared 8,000 ESLint warnings in a single day\u0026rdquo; — showcasing ultrawork\u0026rsquo;s automation capabilities \u0026ldquo;Converted a 45,000-line Tauri app to SaaS overnight\u0026rdquo; — productivity at scale with large refactors The common thread across these reviews is the breadth of automation. Not just code completion — performing project-wide work with a single command is what sets it apart from conventional tools.\nInsights — A Fork in the AI Coding Ecosystem oh-my-openagent\u0026rsquo;s rise is sending an important signal to the AI coding tool ecosystem.\n1. Fatigue with vendor lock-in Anthropic\u0026rsquo;s blocking of OpenCode sent a wake-up call to the developer community. No matter how good the tool, platform holders can cut off access with a word. oh-my-openagent\u0026rsquo;s 42K+ stars are the market\u0026rsquo;s answer to that anxiety.\n2. There is no \u0026ldquo;best model\u0026rdquo; GPT excels at certain tasks. Claude at others. Kimi is cost-effective for specific work. Model-agnosticism accepts this reality and gives users the ability to pick the right tool for each job.\n3. CLI agents are converging Claude Code, Cursor, AmpCode — diverse tools are converging on the same form: terminal-based agent + tool use. oh-my-openagent anticipated this convergence and built a meta-layer that unifies all of these tools behind a single interface.\n4. OMC and oh-my-openagent can coexist Single-model depth (OMC) and multi-model integration (oh-my-openagent) are not mutually exclusive. A developer who primarily uses Claude can optimize the Claude experience with OMC while using oh-my-openagent to leverage other models for supplementary tasks. As the ecosystem matures, this layered approach is likely to become the standard.\nThe competition among AI coding tools is shifting from \u0026ldquo;which model is best\u0026rdquo; to \u0026ldquo;how do you combine models effectively.\u0026rdquo; oh-my-openagent is standing at that inflection point.\n","date":"2026-03-24T00:00:00+09:00","image":"/images/posts/2026-03-24-oh-my-opencode/cover-en.jpg","permalink":"/posts/2026-03-24-oh-my-opencode/","title":"oh-my-opencode — A Model-Agnostic Agent Orchestrator"},{"content":"Overview In September 2024, a Spotify data engineer shared their career transition story and real-world experience at an Inflearn online meetup. They had spent 4.5 years doing Spring backend development at Naver before pivoting to Spotify as a data engineer on the strength of their Scala + Spark background. A year later, the Spotify engineering blog tells the story of a platform processing 1.4 trillion data points per day, AI model distillation pipelines, and multi-agent advertising systems — the scope of data engineering has expanded dramatically. Reading the practitioner\u0026rsquo;s on-the-ground voice from the meetup alongside the technical details in the official blog paints a sharper picture of where the data engineer role is heading.\nMeetup Highlights The Reality of Domain Switching The presenter was candid about how their mental model shifted when moving from backend to data engineering. Data engineering conjures images of Spark pipelines, but in practice a significant portion of the work centers on SQL-based product development, data modeling, and dashboard design.\n\u0026ldquo;The connector between data producers and data consumers\u0026rdquo;\nThat was the presenter\u0026rsquo;s definition of what a data engineer fundamentally is.\nEngineering vs Science A key distinction from the meetup:\nData Engineering Data Science Core activity Automation, optimization Hypothesis validation Primary output Pipelines, data models Analysis reports, metric design Tools SQL, Scala, dbt Python, Jupyter, statistical models Org Structure Platform Org: backend infrastructure, large-scale ingestion, Schema Evolution, Data Warehouse Business Org: domain-specific data collection, data modeling, quality monitoring Data Scientists: analysis, metric design, dashboards What the Practitioner Emphasized SQL fluency is core — the ability to write precise SQL matters more in practice than mastery of complex frameworks Nitpicking matters — data quality comes from doggedly chasing down small inconsistencies Everyone accesses data — through BigQuery and Jupyter, non-engineers also explore data directly AI can\u0026rsquo;t replace human understanding and validation — no matter how much automation advances, the work of understanding and validating data remains with people Spotify\u0026rsquo;s Data Platform in 2026 The platform scale revealed in Spotify\u0026rsquo;s April 2024 engineering blog puts concrete numbers on what the meetup described.\nScale 1.4 trillion data points processed per day 1,800+ event types ingested 38,000+ active scheduled pipelines running 100+ engineers dedicated to the data platform The ~120 billion daily user interaction logs mentioned at the meetup are a subset of this 1.4 trillion Platform Architecture graph TB subgraph Collection[\"Data Collection\"] A[\"Client Events\u0026lt;br/\u0026gt;1,800+ types\"] --\u003e B[\"Pub/Sub\"] B --\u003e C[\"Dataflow\u0026lt;br/\u0026gt;Real-time ingestion\"] end subgraph Processing[\"Data Processing\"] C --\u003e D[\"Apache Beam\u0026lt;br/\u0026gt;Scio (Scala)\"] D --\u003e E[\"BigQuery\u0026lt;br/\u0026gt;Data Warehouse\"] D --\u003e F[\"GCS\u0026lt;br/\u0026gt;Object Storage\"] end subgraph Management[\"Data Management\"] E --\u003e G[\"dbt\u0026lt;br/\u0026gt;Data Modeling\"] G --\u003e H[\"Flyte / Styx\u0026lt;br/\u0026gt;Orchestration\"] H --\u003e I[\"38,000+\u0026lt;br/\u0026gt;Scheduled Pipelines\"] end subgraph Consumers[\"Data Consumers\"] E --\u003e J[\"Jupyter\u0026lt;br/\u0026gt;Exploratory Analysis\"] E --\u003e K[\"Dashboards\u0026lt;br/\u0026gt;Business Metrics\"] E --\u003e L[\"ML Pipelines\u0026lt;br/\u0026gt;Recommendations / Personalization\"] end style Collection fill:#1DB954,color:#fff style Processing fill:#191414,color:#fff style Management fill:#535353,color:#fff style Consumers fill:#1DB954,color:#fffThe Platform Org / Business Org split the presenter described maps onto three formal domains in the blog: Data Collection / Data Processing / Data Management.\nData Collection: client event ingestion, Schema Evolution, real-time streaming Data Processing: batch and streaming pipelines, large-scale transformation Data Management: metadata management, data catalog, quality monitoring Wrapped 2025\u0026rsquo;s Data Pipeline The Wrapped 2025 technical post published in March 2026 shows exactly where data engineering and AI intersect.\nScale and Constraints 1.4 billion personalized reports generated for 350 million users LLMs generate natural language summaries based on each user\u0026rsquo;s listening data AI Model Distillation Pipeline The Wrapped team took an interesting approach: Model Distillation — using a frontier model\u0026rsquo;s outputs as training data to fine-tune a smaller, faster model.\ngraph LR A[\"Frontier Model\u0026lt;br/\u0026gt;Generate high-quality outputs\"] --\u003e B[\"DPO Training Data\u0026lt;br/\u0026gt;Preference pairs\"] B --\u003e C[\"Fine-tuned Small Model\u0026lt;br/\u0026gt;Distillation complete\"] C --\u003e D[\"1.4B reports generated\u0026lt;br/\u0026gt;350M users\"] E[\"LLM-as-Judge\"] --\u003e |\"Accuracy, safety\u0026lt;br/\u0026gt;tone, formatting\"| C style A fill:#1DB954,color:#fff style C fill:#191414,color:#fff style D fill:#1DB954,color:#fff style E fill:#535353,color:#fffKey design decisions:\nDPO (Direct Preference Optimization): pairs good and bad frontier model outputs to train preference-based learning LLM-as-Judge evaluation: quality validated across four dimensions — accuracy, safety, tone, and formatting Column-oriented storage design: storage architecture to prevent race conditions under simultaneous access from 350 million users \u0026ldquo;At this scale, the LLM call is the easy part.\u0026rdquo;\nThat one sentence cuts to the heart of data engineering. Calling the LLM API is trivial. Building the pipeline to generate 1.4 billion outputs reliably, accurately, and safely — and deliver them — that\u0026rsquo;s the real engineering.\nMulti-Agent Advertising Architecture A multi-agent advertising system published in February 2026 shows the frontier of data engineering expanding into AI agent infrastructure.\nThe Problem Planning an ad campaign requires complex decisions: target audience selection, budget allocation, scheduling, media format choices. Previously this was 15–30 minutes of manual work.\nThe Solution: 6 Specialized Agents graph TB User[\"Advertiser Request\"] --\u003e Router[\"Router Agent\u0026lt;br/\u0026gt;Classify and route request\"] Router --\u003e Goal[\"GoalResolver\u0026lt;br/\u0026gt;Interpret campaign goals\"] Router --\u003e Audience[\"AudienceResolver\u0026lt;br/\u0026gt;Set target audience\"] Router --\u003e Budget[\"Budget Agent\u0026lt;br/\u0026gt;Optimize budget\"] Router --\u003e Schedule[\"Schedule Agent\u0026lt;br/\u0026gt;Plan timeline\"] Router --\u003e Media[\"MediaPlanner\u0026lt;br/\u0026gt;Select media formats\"] Goal --\u003e Result[\"Unified Campaign Plan\u0026lt;br/\u0026gt;Complete in 5-10 seconds\"] Audience --\u003e Result Budget --\u003e Result Schedule --\u003e Result Media --\u003e Result History[\"Thousands of\u0026lt;br/\u0026gt;historical campaigns\"] -.-\u003e Router style Router fill:#1DB954,color:#fff style Result fill:#191414,color:#fff style History fill:#535353,color:#fffTech Stack Component Technology Agent Framework Google ADK 0.2.0 LLM Vertex AI (Gemini 2.5 Pro) Communication gRPC Training data Thousands of historical campaigns 15–30 minutes → 5–10 seconds. From a data engineering perspective, the more interesting story isn\u0026rsquo;t the agents themselves but the data pipeline behind them: cleaning thousands of historical campaign records, structuring them in a form agents can reference, and serving them in real time — that\u0026rsquo;s the data engineer\u0026rsquo;s domain.\nData Engineer Skill Tree 2026 Cross-referencing the tech stack from the meetup with 2026 job postings gives a clear picture of what Spotify data engineers are expected to know today.\n2024 Meetup vs. 2026 Hiring Area 2024 Meetup 2026 Job Requirements Languages SQL, Scala, Python SQL, Python, Scala (note the order shift) Processing engines Spark Spark, Apache Beam, Scio, Flink Cloud GCP, BigQuery, GCS GCP, BigQuery, Dataflow, GCS Orchestration Not mentioned Flyte, Styx AI/ML Indirect mention LLM pipelines, Model Distillation Agents None Multi-Agent infrastructure In just one year, Apache Beam / Scio / Flink rose alongside Spark as requirements, and LLM pipelines and agent infrastructure entered the data engineer\u0026rsquo;s domain.\nInsights: A Year of Change The Meetup\u0026rsquo;s Prediction Held Up The presenter\u0026rsquo;s emphasis that \u0026ldquo;AI can\u0026rsquo;t replace human understanding and validation\u0026rdquo; was confirmed precisely by the Wrapped 2025 case. LLM-as-Judge was introduced, but designing the evaluation criteria (accuracy, safety, tone, formatting) and integrating it into the pipeline was ultimately the engineers\u0026rsquo; work.\nThe Expanding Scope of the Data Engineer At the 2024 meetup, the data engineer was \u0026ldquo;the connector between data producers and data consumers.\u0026rdquo; By 2026, AI agents have been added to the list of consumers. Serving data to agents, validating agent outputs, and building the data pipelines for agent systems — these have become new job responsibilities.\nWhat Hasn\u0026rsquo;t Changed Scale grew by 10x from 120 billion to 1.4 trillion, and AI agents and LLM pipelines appeared — but the three things the presenter emphasized remain as valid as ever:\nSQL fluency — BigQuery is still central, dbt is the standard for data modeling Nitpicking — not a single error can be tolerated across 1.4 billion Wrapped reports Identity as a connector — between producers and consumers, now extended to between producers and agents Overlaying the practitioner\u0026rsquo;s voice from the meetup with the official technical blog a year later, data engineering is clearly evolving from simply building pipelines to designing data infrastructure for the AI era. And at the center of that, still, is a person who understands data precisely and validates it relentlessly.\n","date":"2026-03-24T00:00:00+09:00","image":"/images/posts/2026-03-24-spotify-data-engineering/cover-en.jpg","permalink":"/posts/2026-03-24-spotify-data-engineering/","title":"Spotify Data Engineering — Practitioner Meetup Recap and Platform Evolution in 2026"},{"content":"Overview AI has become a core tool in creative content production. Two cases sharply illustrate both sides of this shift. The first is SPACE GREEN, a pilot video by Korean VFX company Giantstep that blends AI with traditional VFX. The second is As Deep as the Grave, a film that resurrects actor Val Kilmer — who passed away in April 2025 — using generative AI. The first raises a business question: \u0026ldquo;How do you sell AI content?\u0026rdquo; The second raises an ethical one: \u0026ldquo;Is a posthumous AI performance a tribute or exploitation?\u0026rdquo;\nGiantstep\u0026rsquo;s SPACE GREEN — The Hybrid Approach Who Is Giantstep Giantstep is a Korean VFX company founded in 2008. They\u0026rsquo;ve collaborated with SM Entertainment\u0026rsquo;s virtual artist NAEVIS, Samsung, Netflix, Disney, and others. The key distinction: this isn\u0026rsquo;t a pure AI startup. Giantstep is a company that layered AI on top of years of accumulated VFX expertise.\nThe SPACE GREEN Project SPACE GREEN is an R\u0026amp;D pilot video. Its defining feature is a hybrid approach — not AI alone.\nTeam: 4 junior artists (1–3 years experience) + 1 director Timeline: Just 10 days Method: AI generates rough drafts → VFX team refines details → final DI (Digital Intermediate) polish flowchart LR A[\"AI Generation\u0026lt;br/\u0026gt;Rough Draft\"] --\u003e B[\"VFX Refinement\u0026lt;br/\u0026gt;Detail Work\"] B --\u003e C[\"DI Finish\u0026lt;br/\u0026gt;Color + Polish\"] C --\u003e D[\"Final Content\u0026lt;br/\u0026gt;SPACE GREEN\"] style A fill:#4a9eff,color:#fff style B fill:#ff6b6b,color:#fff style C fill:#ffa94d,color:#fff style D fill:#51cf66,color:#fffIn one sentence, this pipeline works like this: AI takes it from 1 to 9, and the artists cover the last mile.\nThe Detail Valley AI-generated footage looks convincing at first glance but falls apart under scrutiny — fine textures dissolve, motion feels uncanny, edges lose coherence. This zone is called the Detail Valley. What Giantstep is actually selling isn\u0026rsquo;t AI footage itself; it\u0026rsquo;s the ability to bridge that quality gap by combining AI with VFX expertise.\nFor context: One More Pumpkin, which won the Dubai International AI Film Festival grand prize, had a $0 budget and a 5-day production window. AI alone can win awards. That reframes the question:\nIs AI content\u0026rsquo;s competitive edge in making it better — or in selling it better?\nGiantstep\u0026rsquo;s answer is clear: both matter, but market differentiation comes from quality. Anyone can make a $0 AI video. Clients pay for what lies beyond that.\nVal Kilmer\u0026rsquo;s Final Film — When Technology Becomes Art Background Val Kilmer was best known as Iceman in Top Gun. He lost his voice to throat cancer in 2015, and passed away in April 2025 at age 65.\nDirector Koerte Bruyns cast Kilmer for As Deep as the Grave in 2020, but Kilmer\u0026rsquo;s deteriorating health made filming impossible. Bruyns used generative AI trained on photos and footage spanning Kilmer\u0026rsquo;s career — from his early years to his final days — to bring him back to the screen.\nThe Key Decision: Keeping the Damaged Voice In Top Gun: Maverick (2022), an AI-restored voice was used. This film made the opposite choice — Kilmer\u0026rsquo;s real, damaged voice was kept as-is.\nThe character in the film is also ill. The character\u0026rsquo;s suffering and the actor\u0026rsquo;s real suffering overlap, and what was a technical limitation became narrative authenticity.\nThis is the moment technology becomes art.\nAn Ethical Framework Posthumous AI performances are sensitive territory. This project met four key criteria:\nConsent: Kilmer himself expressed his willingness to appear while still alive Family support: His children endorsed the project Industry compliance: SAG-AFTRA guidelines were followed Fair compensation: Kilmer\u0026rsquo;s estate received appropriate payment The director summarized his philosophy in a single phrase:\n\u0026ldquo;Together, not instead of.\u0026rdquo;\nThe Value Debate Around AI Creative Placing these two cases side by side reveals the core tensions in AI creative content:\nDimension SPACE GREEN As Deep as the Grave AI\u0026rsquo;s role Draft generation (1→9) Actor replication (face + body) Human\u0026rsquo;s role Detail refinement + DI Directorial judgment + voice selection Core value Quality gap = commercial differentiation Narrative authenticity = artistic value Debate Making it better vs. selling it better Tribute vs. exploitation Ethics Relatively low Posthumous likeness rights, consent, compensation Hollywood continues to debate posthumous AI performances. \u0026ldquo;Posthumous AI: tribute or exploitation?\u0026rdquo; There\u0026rsquo;s no consensus yet, but As Deep as the Grave offers a practical framework — four conditions: the subject\u0026rsquo;s prior consent, family support, industry standards compliance, and fair compensation.\nInsights 1. AI is a tool, not the product. As the Giantstep case demonstrates, AI-generated content itself is becoming a commodity. The competitive advantage lies in what you build on top of AI.\n2. Hybrid pipelines are the realistic answer. Pure AI footage falls into the Detail Valley. SPACE GREEN — completed in 10 days by a team of 4 juniors and 1 director — proves that a small team can leverage AI to produce results that rival large studio productions.\n3. The ethical framework must come before the technology. Val Kilmer\u0026rsquo;s project became moving rather than controversial because it satisfied four ethical criteria — not because the technology was impressive. As AI expands into representing deceased individuals, guidelines like SAG-AFTRA become more critical, not less.\n4. \u0026ldquo;Together, not instead of\u0026rdquo; should be the guiding principle for all AI creative work. Both Giantstep and director Bruyns positioned AI as a collaborative tool rather than a replacement for human creativity. This perspective determines the long-term sustainability of AI in creative fields.\nSource: From AI Cinema Briefing (YouTube)\n","date":"2026-03-24T00:00:00+09:00","image":"/images/posts/2026-03-24-ai-creative-content/cover-en.jpg","permalink":"/posts/2026-03-24-ai-creative-content/","title":"Two Faces of AI Creative — Giantstep's Hybrid VFX Pipeline and Val Kilmer's Final Film"},{"content":"Overview tmux, created by Nicolas Marriott in 2007, remains core terminal infrastructure 19 years later. Claude Code\u0026rsquo;s Agent Team feature recently put it back in the spotlight by spawning parallel agents on top of tmux sessions. Codex, Gemini CLI, OpenCode, and other terminal-based coding agents all make heavy use of tmux\u0026rsquo;s programmable API.\nThis post covers everything in one place: tmux\u0026rsquo;s architecture, session/window/pane management, customization, the plugin ecosystem, and integration with AI agents. For the tmux vs cmux comparison, see the separate post — this one focuses on a deep dive into tmux itself.\nTerminal Emulator vs Terminal Multiplexer Understanding tmux requires first grasping the fundamental difference between a terminal emulator and a terminal multiplexer.\ngraph TB subgraph emulator[\"Terminal Emulator\"] direction TB E[\"Terminal App \u0026lt;br/\u0026gt; iTerm2, Ghostty, \u0026lt;br/\u0026gt; Warp, Kitty, Alacritty\"] E --\u003e|\"direct connection\"| SH1[\"Shell 1\"] E --\u003e|\"direct connection\"| SH2[\"Shell 2\"] E --\u003e|\"when app closes\"| X[\"Shells close too\"] end subgraph multiplexer[\"Terminal Multiplexer\"] direction TB T[\"Terminal App\"] --\u003e|\"connects\"| TC[\"tmux Client\"] TC --\u003e|\"connected to\"| TS[\"tmux Server\"] TS --\u003e|\"manages\"| S1[\"Shell 1\"] TS --\u003e|\"manages\"| S2[\"Shell 2\"] T2[\"App closes\"] -.-\u003e|\"server stays alive\"| TS endA terminal emulator is an app that draws the screen. iTerm2, Ghostty, Warp, Kitty, and Alacritty all fall here. They connect directly to a shell, so closing the app terminates any running processes and sessions.\nA terminal multiplexer is a server that manages sessions. tmux and screen are the main examples. Running on top of terminal emulators, their server-client structure means sessions persist even when you close the terminal app.\nA terminal emulator \u0026ldquo;draws the screen\u0026rdquo;; a terminal multiplexer \u0026ldquo;manages sessions.\u0026rdquo; With a multiplexer, tab management, screen splitting, and session management all become the multiplexer\u0026rsquo;s responsibility rather than the terminal app\u0026rsquo;s.\nThis structural difference means that when using tmux, the most important criterion for a terminal emulator is how lightweight and fast it is. Since tmux handles tabs and splits, the terminal app itself can focus purely on fast rendering.\ntmux Architecture tmux operates on a server-client model. This structure is the foundation of all tmux\u0026rsquo;s strengths — session persistence, multiple client connections, and programmable control.\nServer-Client Model ┌─────────────────────────────────────────────────┐ │ tmux server │ │ (background process, manages all sessions) │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Session 0│ │ Session 1│ │ Session 2│ │ │ │ frontend │ │ backend │ │ devops │ │ │ └──────────┘ └──────────┘ └──────────┘ │ └─────────────┬──────────┬──────────┬─────────────┘ │ │ │ ┌──────┘ ┌─────┘ ┌────┘ ▼ ▼ ▼ Client A Client B Client C (iTerm2) (Ghostty) (SSH) tmux server: When you first run tmux, a server process starts in the background. This server manages all sessions, windows, and panes. tmux client: What the user sees. Connects to the server and displays a specific session\u0026rsquo;s content. Socket communication: Client and server communicate via Unix socket (/tmp/tmux-{uid}/default). Session Persistence The key advantage of this structure is session persistence.\nOpen tmux in Ghostty and launch Claude Code and a dev server Completely close Ghostty Reopen Ghostty and type tmux attach Claude Code and the dev server are still alive The terminal emulator disappeared, but the tmux server was keeping all processes running in the background. Whether your SSH connection drops, you close and reopen a laptop lid — tmux sessions persist.\nInstallation and Initial Setup Installation # macOS brew install tmux # Ubuntu/Debian sudo apt install tmux # Fedora sudo dnf install tmux # Check version tmux -V First Run # Start a new session (auto-named: 0, 1, 2...) tmux # Start a session with a name tmux new-session -s work # Shorthand tmux new -s work Config File Basics tmux configuration lives in ~/.tmux.conf. Start with just the essentials.\n# ~/.tmux.conf — minimal required settings # Expand scrollback history (default 2,000 → 50,000 lines) set -g history-limit 50000 # Enable mouse support set -g mouse on # Start window/pane indices at 1 (0 is awkward at the far left of the keyboard) set -g base-index 1 setw -g pane-base-index 1 To reload the config after editing:\n# From inside tmux tmux source-file ~/.tmux.conf # Or enter command mode with prefix + : and type: source-file ~/.tmux.conf Core Concepts: Session, Window, Pane tmux has a 3-tier structure.\ngraph TD SERVER[\"tmux server\"] --\u003e S1[\"Session: frontend\"] SERVER --\u003e S2[\"Session: backend\"] S1 --\u003e W1[\"Window 0: editor\"] S1 --\u003e W2[\"Window 1: terminal\"] S1 --\u003e W3[\"Window 2: logs\"] S2 --\u003e W4[\"Window 0: api\"] S2 --\u003e W5[\"Window 1: db\"] W1 --\u003e P1[\"Pane 0 \u0026lt;br/\u0026gt; vim\"] W1 --\u003e P2[\"Pane 1 \u0026lt;br/\u0026gt; file tree\"] W2 --\u003e P3[\"Pane 0 \u0026lt;br/\u0026gt; zsh\"] W3 --\u003e P4[\"Pane 0 \u0026lt;br/\u0026gt; tail -f app.log\"] W3 --\u003e P5[\"Pane 1 \u0026lt;br/\u0026gt; tail -f error.log\"] W4 --\u003e P6[\"Pane 0 \u0026lt;br/\u0026gt; npm run dev\"] W4 --\u003e P7[\"Pane 1 \u0026lt;br/\u0026gt; claude\"] W5 --\u003e P8[\"Pane 0 \u0026lt;br/\u0026gt; psql\"] Tier Description Analogy Session Top-level work unit. An independent project or work context Virtual desktop Window Tab within a session. A full screen Browser tab Pane Split area within a window. Each runs an independent shell IDE split panel You can have multiple windows in one session, and multiple panes within one window. The current session name and window list are shown in tmux\u0026rsquo;s bottom status bar.\nThe Prefix Key System All tmux shortcuts work by pressing the prefix key first, then a command key. The default prefix is Ctrl+b.\nThe Ctrl+B combo is a bit awkward. Since pressing Ctrl+B is uncomfortable, many developers remap it to Ctrl+Space and use that instead.\nPress Ctrl+b, release, then press the command key. They\u0026rsquo;re not pressed simultaneously.\nComplete Shortcut Reference Session Commands Shortcut Action Prefix + d Detach from current session Prefix + s Show session list Prefix + $ Rename current session Prefix + ( Switch to previous session Prefix + ) Switch to next session Prefix + : new Create new session (from inside tmux) Window Commands Shortcut Action Prefix + c Create new window Prefix + w Show window list (includes sessions, tree view) Prefix + , Rename current window Prefix + n Move to next window Prefix + p Move to previous window Prefix + 0~9 Jump directly to that numbered window Prefix + \u0026amp; Close current window (with confirmation) Prefix + l Toggle to last used window Pane Commands Shortcut Action Prefix + % Horizontal split (left/right) Prefix + \u0026quot; Vertical split (top/bottom) Prefix + arrow Move to pane in that direction Prefix + o Cycle through panes Prefix + z Toggle current pane zoom (fullscreen ↔ normal) Prefix + x Close current pane (with confirmation) Prefix + q Show pane numbers (press number to jump) Prefix + { Swap current pane with previous Prefix + } Swap current pane with next Prefix + Space Cycle through pane layouts Prefix + ! Break current pane into new window Other Shortcut Action Prefix + : Enter command mode Prefix + ? Show all key bindings Prefix + t Show clock Prefix + [ Enter copy mode (enables scrolling) Session Management Creating Sessions # Create new sessions from terminal tmux new -s frontend tmux new -s backend tmux new -s devops # Create session + name the first window tmux new -s work -n editor # Create session without attaching (background) tmux new -d -s background-job To create a new session from inside tmux:\nPrefix + : → new -s session-name Listing Sessions # From terminal tmux ls tmux list-sessions # From inside tmux Prefix + s # Session list (navigate with arrow keys) Prefix + w # Full list including windows in tree form Prefix + w is more practical than Prefix + s — it shows not just sessions but the windows within them in tree form. Typing a number from the list jumps there immediately.\nSwitching Sessions (Attach/Detach) # Exit session (session stays alive) Prefix + d # Reconnect to a specific session tmux attach -t frontend tmux a -t frontend # shorthand tmux a # attach to last session # When there\u0026#39;s only one session tmux a Renaming and Killing Sessions # Rename current session from inside tmux Prefix + $ # Kill session from terminal tmux kill-session -t old-session # Kill all sessions tmux kill-server Window Management Windows are the \u0026ldquo;tabs\u0026rdquo; within a session. The window list is shown in the bottom status bar.\nCreating and Switching Windows # Create new window Prefix + c # Switch between windows Prefix + n # next window Prefix + p # previous window Prefix + 0 # jump directly to window 0 Prefix + 1 # jump directly to window 1 Prefix + l # toggle to last used window # Rename window Prefix + , Searching and Moving Windows # Find window (search by name) Prefix + f # Move window to another session Prefix + : → move-window -t target-session # Reorder windows Prefix + : → swap-window -t 0 Pane Management Panes are the split areas within a window. Each pane runs an independent shell.\nSplitting Panes # Horizontal split (left/right) Prefix + % # Vertical split (top/bottom) Prefix + \u0026#34; Moving Between Panes # Navigate with arrow keys Prefix + ↑↓←→ # Cycle through panes Prefix + o # Jump to pane by number Prefix + q → press number Resizing Panes # Fine adjustment with arrow keys Prefix + Ctrl+↑ # expand up 1 unit Prefix + Ctrl+↓ # expand down 1 unit Prefix + Ctrl+← # expand left 1 unit Prefix + Ctrl+→ # expand right 1 unit # Drag with mouse (requires mouse on setting) # Drag pane borders # Cycle through preset layouts Prefix + Space Pane Zoom (Fullscreen Toggle) # Expand/collapse current pane to full window size Prefix + z Useful when you need to closely read a pane\u0026rsquo;s output. Press Prefix + z again to return to the split layout.\nSwapping Panes and Layouts # Swap pane positions Prefix + { # swap with previous pane Prefix + } # swap with next pane # Break current pane into new window Prefix + ! # Cycle through preset layouts (even-horizontal, even-vertical, main-horizontal, main-vertical, tiled) Prefix + Space Customization (.tmux.conf) Recommended Config # ~/.tmux.conf — production settings # ────────────────────────────────────── # Base Settings # ────────────────────────────────────── # Change prefix: Ctrl+b → Ctrl+Space set -g prefix C-Space unbind C-b bind C-Space send-prefix # Scrollback history (default 2,000 → 50,000) set -g history-limit 50000 # Mouse support set -g mouse on # Start window/pane indices at 1 set -g base-index 1 setw -g pane-base-index 1 # Renumber windows on close set -g renumber-windows on # Remove ESC delay (essential for Vim/Neovim) set -sg escape-time 0 # 256 color support set -g default-terminal \u0026#34;tmux-256color\u0026#34; set -ga terminal-overrides \u0026#34;,xterm-256color:Tc\u0026#34; # ────────────────────────────────────── # More Intuitive Pane Split Keys # ────────────────────────────────────── # | for horizontal split, - for vertical split bind | split-window -h -c \u0026#34;#{pane_current_path}\u0026#34; bind - split-window -v -c \u0026#34;#{pane_current_path}\u0026#34; # New windows also open in current path bind c new-window -c \u0026#34;#{pane_current_path}\u0026#34; # ────────────────────────────────────── # Vi-style Pane Navigation # ────────────────────────────────────── bind h select-pane -L bind j select-pane -D bind k select-pane -U bind l select-pane -R # Alt + hjkl for pane navigation (no prefix needed) bind -n M-h select-pane -L bind -n M-j select-pane -D bind -n M-k select-pane -U bind -n M-l select-pane -R # ────────────────────────────────────── # Pane Resize # ────────────────────────────────────── bind -r H resize-pane -L 5 bind -r J resize-pane -D 5 bind -r K resize-pane -U 5 bind -r L resize-pane -R 5 # ────────────────────────────────────── # Copy Mode (Vi style) # ────────────────────────────────────── setw -g mode-keys vi bind -T copy-mode-vi v send-keys -X begin-selection bind -T copy-mode-vi y send-keys -X copy-pipe-and-cancel \u0026#34;pbcopy\u0026#34; # ────────────────────────────────────── # Status Bar Customization # ────────────────────────────────────── set -g status-style \u0026#34;bg=#1e1e2e,fg=#cdd6f4\u0026#34; set -g status-left \u0026#34;#[fg=#89b4fa,bold] #S \u0026#34; set -g status-right \u0026#34;#[fg=#a6adc8] %Y-%m-%d %H:%M \u0026#34; set -g status-left-length 30 # Highlight active window setw -g window-status-current-style \u0026#34;fg=#89b4fa,bold\u0026#34; # ────────────────────────────────────── # Config Reload Shortcut # ────────────────────────────────────── bind r source-file ~/.tmux.conf \\; display-message \u0026#34;Config reloaded!\u0026#34; Key Customization Points Prefix key change: The default Ctrl+b conflicts with Vim\u0026rsquo;s Page Up and is ergonomically awkward. Many developers switch to Ctrl+Space or Ctrl+a (screen-compatible).\nhistory-limit: The default 2,000 lines is nowhere near enough for watching dev server logs. Setting 50,000+ is recommended.\nmouse on: Enables pane switching by clicking, resize by dragging borders, and scrolling. Essential for tmux beginners.\npane_current_path: Maintains the current working directory when splitting or opening new windows. Without this, every new split starts in the home directory, requiring repeated cd commands.\nPlugin Ecosystem (TPM) Installing TPM (Tmux Plugin Manager) git clone https://github.com/tmux-plugins/tpm ~/.tmux/plugins/tpm Add to ~/.tmux.conf:\n# Plugin list set -g @plugin \u0026#39;tmux-plugins/tpm\u0026#39; set -g @plugin \u0026#39;tmux-plugins/tmux-sensible\u0026#39; set -g @plugin \u0026#39;tmux-plugins/tmux-resurrect\u0026#39; set -g @plugin \u0026#39;tmux-plugins/tmux-continuum\u0026#39; set -g @plugin \u0026#39;tmux-plugins/tmux-yank\u0026#39; set -g @plugin \u0026#39;catppuccin/tmux\u0026#39; # TPM initialization (must be at the bottom of the config file) run \u0026#39;~/.tmux/plugins/tpm/tpm\u0026#39; Install plugins: Prefix + I (capital I)\nRecommended Plugins Plugin Description tmux-sensible Collection of sensible default settings (history-limit, escape-time, etc.) tmux-resurrect Save/restore session state. Recover sessions after reboot tmux-continuum Automatically saves resurrect state periodically. Auto-restores on tmux start tmux-yank System clipboard integration for copying tmux-open Open URLs from copy mode in browser catppuccin/tmux Catppuccin theme (status bar beautification) tmux-fzf fzf-powered session/window/pane search tmux-resurrect + tmux-continuum This combination takes tmux\u0026rsquo;s session persistence a step further. Session structure can be restored even if the tmux server itself is stopped or the system reboots.\n# tmux-resurrect settings set -g @resurrect-capture-pane-contents \u0026#39;on\u0026#39; set -g @resurrect-strategy-nvim \u0026#39;session\u0026#39; # tmux-continuum settings set -g @continuum-restore \u0026#39;on\u0026#39; # auto-restore on tmux start set -g @continuum-save-interval \u0026#39;15\u0026#39; # auto-save every 15 minutes Recommended Terminal Emulators tmux works with any terminal emulator, but when using tmux, the terminal app\u0026rsquo;s raw performance matters. Since tmux handles tabs and splits, the terminal app can focus entirely on fast rendering.\nGhostty The top recommendation for pairing with tmux right now is Ghostty.\nGPU-accelerated rendering: Handles heavy output quickly Low resource usage: Very low CPU and memory footprint Native UI: Works like a native app on macOS Proven rendering engine: cmux (Manaflow AI) is also based on Ghostty\u0026rsquo;s libghostty Install Ghostty:\nbrew install --cask ghostty Other Terminal Emulator Compatibility Terminal tmux Compatibility Notes Ghostty Excellent GPU accelerated, lightweight iTerm2 Excellent Native tmux integration mode Alacritty Excellent GPU accelerated, config file-based Kitty Excellent GPU accelerated, built-in splits WezTerm Excellent Lua scripting Warp Decent Built-in AI features, prefers native splits AI Coding Agents and tmux Why AI Agents Use tmux The core reason tmux is in the spotlight again in the AI agent era is programmable terminal control. tmux CLI commands automate session creation, command sending, and output collection.\nClaude Code\u0026rsquo;s Agent Team uses tmux when spawning parallel agents. Each agent runs in a separate pane, commands are sent via send-keys, and results are collected via capture-pane.\nThe Core API: send-keys and capture-pane # 1. Create a background session tmux new-session -d -s agents # 2. Split into multiple panes tmux split-window -h -t agents tmux split-window -v -t agents:0.1 # 3. Send commands to each pane tmux send-keys -t agents:0.0 \u0026#34;cd ~/project \u0026amp;\u0026amp; claude \u0026#39;Fix the login bug\u0026#39;\u0026#34; Enter tmux send-keys -t agents:0.1 \u0026#34;cd ~/project \u0026amp;\u0026amp; claude \u0026#39;Write unit tests\u0026#39;\u0026#34; Enter tmux send-keys -t agents:0.2 \u0026#34;cd ~/project \u0026amp;\u0026amp; npm run dev\u0026#34; Enter # 4. Collect output from a specific pane tmux capture-pane -t agents:0.0 -p # print to stdout tmux capture-pane -t agents:0.0 -p -S -100 # last 100 lines tmux capture-pane -t agents:0.0 -b temp # save to buffer Target Specification Syntax tmux\u0026rsquo;s target specification syntax is session:window.pane:\nagents:0.0 → pane 0 of window 0 in \u0026#34;agents\u0026#34; session agents:0.1 → pane 1 of window 0 in \u0026#34;agents\u0026#34; session work:editor.0 → pane 0 of \u0026#34;editor\u0026#34; window in \u0026#34;work\u0026#34; session Practical AI Agent Workspace Script #!/bin/bash # ai-workspace.sh — Configure parallel AI agent work environment PROJECT_DIR=\u0026#34;$1\u0026#34; SESSION=\u0026#34;ai-work\u0026#34; # Kill existing session if present tmux kill-session -t \u0026#34;$SESSION\u0026#34; 2\u0026gt;/dev/null # Create main session tmux new-session -d -s \u0026#34;$SESSION\u0026#34; -c \u0026#34;$PROJECT_DIR\u0026#34; -n \u0026#34;agents\u0026#34; # Split panes: 3 agent areas tmux split-window -h -t \u0026#34;$SESSION:agents\u0026#34; -c \u0026#34;$PROJECT_DIR\u0026#34; tmux split-window -v -t \u0026#34;$SESSION:agents.1\u0026#34; -c \u0026#34;$PROJECT_DIR\u0026#34; # Create monitoring window tmux new-window -t \u0026#34;$SESSION\u0026#34; -n \u0026#34;monitor\u0026#34; -c \u0026#34;$PROJECT_DIR\u0026#34; tmux split-window -v -t \u0026#34;$SESSION:monitor\u0026#34; -c \u0026#34;$PROJECT_DIR\u0026#34; # Dev server + logs in monitoring window tmux send-keys -t \u0026#34;$SESSION:monitor.0\u0026#34; \u0026#34;npm run dev\u0026#34; Enter tmux send-keys -t \u0026#34;$SESSION:monitor.1\u0026#34; \u0026#34;tail -f logs/app.log\u0026#34; Enter # Return to agents window tmux select-window -t \u0026#34;$SESSION:agents\u0026#34; # Attach tmux attach -t \u0026#34;$SESSION\u0026#34; Agent Output Monitoring Script #!/bin/bash # monitor-agents.sh — Periodically collect output from all panes SESSION=\u0026#34;ai-work\u0026#34; OUTPUT_DIR=\u0026#34;/tmp/agent-outputs\u0026#34; mkdir -p \u0026#34;$OUTPUT_DIR\u0026#34; while true; do # Collect recent output from all panes for pane in $(tmux list-panes -t \u0026#34;$SESSION\u0026#34; -F \u0026#39;#{pane_id}\u0026#39;); do tmux capture-pane -t \u0026#34;$pane\u0026#34; -p -S -50 \u0026gt; \u0026#34;$OUTPUT_DIR/${pane}.txt\u0026#34; done # Detect specific keywords (errors, completion, etc.) if grep -q \u0026#34;Error\\|FAIL\\|Complete\\|Done\u0026#34; \u0026#34;$OUTPUT_DIR\u0026#34;/*.txt 2\u0026gt;/dev/null; then echo \u0026#34;[$(date)] Agent activity detected\u0026#34; fi sleep 10 done Claude Code Agent Team and tmux Claude Code\u0026rsquo;s Agent Team uses tmux internally in this flow:\ntmux new-session -d creates a background session tmux split-window creates panes for each agent tmux send-keys sends tasks to each agent tmux capture-pane collects each agent\u0026rsquo;s output Results are synthesized to produce the final response All of this is possible thanks to tmux\u0026rsquo;s programmable API. Without tmux, it would be much harder for AI agents to programmatically control multiple terminal sessions.\nPractical Tips Copy Mode (Scrolling and Copying) To scroll or copy text in tmux, you need to enter Copy Mode.\n# Enter Copy Mode Prefix + [ # Movement in Copy Mode (with vi mode settings) h/j/k/l # directional movement Ctrl+u/d # page up/down g/G # beginning/end /search-term # text search n/N # next/previous search result # Select and copy text (vi mode) Space # start selection Enter # copy selected text + exit Copy Mode q # exit Copy Mode (without copying) # Paste copied text Prefix + ] With mouse mode (set -g mouse on) enabled, mouse scrolling also auto-enters Copy Mode.\nPane Synchronization Useful when you need to send the same command to multiple servers simultaneously.\n# Enable sync: send identical input to all panes in current window Prefix + : → setw synchronize-panes on # Disable sync Prefix + : → setw synchronize-panes off # Add toggle shortcut to .tmux.conf bind S setw synchronize-panes \\; display-message \u0026#34;Sync #{?synchronize-panes,ON,OFF}\u0026#34; Preset Layouts # Cycle through layouts Prefix + Space # Set a specific layout directly Prefix + : → select-layout even-horizontal # equal horizontal split Prefix + : → select-layout even-vertical # equal vertical split Prefix + : → select-layout main-horizontal # main top + bottom split Prefix + : → select-layout main-vertical # main left + right split Prefix + : → select-layout tiled # tiled layout Command Mode Command mode (entered with Prefix + :) lets you type any tmux command directly.\n# Common command mode commands new -s session-name # new session move-window -t other-session # move window to another session swap-pane -U # move pane position up swap-pane -D # move pane position down resize-pane -D 10 # expand down 10 units resize-pane -R 20 # expand right 20 units Building the Vi Navigation Habit To keep hands as still as possible — not moving them down to arrow keys — it\u0026rsquo;s worth learning Vi-style navigation with HJKL. You\u0026rsquo;ll use these constantly not just locally, but also when working on remote servers via SSH.\nH = ← (left) J = ↓ (down) K = ↑ (up) L = → (right) Quick Links tmux GitHub — C-based open source, ISC license tmux Wiki — official documentation TPM (Tmux Plugin Manager) — plugin manager tmux-resurrect — session save/restore Ghostty — GPU-accelerated terminal emulator TMUX Masterclass — YouTube — primary reference for this post tmux basic usage — hyde1004 — Korean tmux guide Takeaways tmux is terminal infrastructure with the overwhelming strengths of 19 years of proven stability and cross-platform support. The server-client architecture guarantees session persistence, and the programmable CLI API has made it a core tool again in the AI coding agent era.\nThere\u0026rsquo;s a perception that the learning curve is steep, but in practice the essential shortcuts number about ten. Prefix + c (new window), Prefix + %/\u0026quot; (split), Prefix + arrow (navigation), Prefix + d (detach), Prefix + w (window list) — that\u0026rsquo;s enough for everyday work. Add Vi navigation and intuitive split keys via .tmux.conf customization and productivity goes up another level.\ntmux\u0026rsquo;s real value shows in combination with AI agents. Just two commands — send-keys and capture-pane — complete the \u0026ldquo;send command → collect output\u0026rdquo; cycle, and this is the foundation of Claude Code Agent Team\u0026rsquo;s parallel agent architecture. If tmux is \u0026ldquo;infrastructure where sessions never die,\u0026rdquo; AI agents are workers that operate autonomously on top of that infrastructure. In 2026, not knowing tmux while trying to use terminal-based AI coding tools is like entering a marathon without basic fitness.\n","date":"2026-03-23T00:00:00+09:00","image":"/images/posts/2026-03-23-tmux-masterclass/cover-en.jpg","permalink":"/posts/2026-03-23-tmux-masterclass/","title":"tmux Masterclass — Everything You Need to Know, Including AI Agent Integration"},{"content":"Overview tmux, born in 2007, has been a cornerstone of server management and development environments for 19 years. Claude Code\u0026rsquo;s Agent Team feature recently put it back in the spotlight by spawning parallel agents on top of tmux sessions. Meanwhile, cmux — built by Manaflow AI — arrived with the concept of \u0026ldquo;a terminal built for AI agents.\u0026rdquo; It\u0026rsquo;s a native macOS app based on Ghostty\u0026rsquo;s rendering engine (libghostty).\nThis post compares the two tools\u0026rsquo; architectures, core concepts, and AI agent support models, and suggests how to combine them effectively in practice.\nArchitecture Comparison The two tools have fundamentally different design philosophies.\ngraph LR subgraph tmux[\"tmux (Server-Client)\"] S[\"tmux server\"] --\u003e C1[\"Client 1\"] S --\u003e C2[\"Client 2\"] S --\u003e C3[\"Client 3\"] S --\u003e SE1[\"Session 1\"] S --\u003e SE2[\"Session 2\"] SE1 --\u003e W1[\"Window 1\"] SE1 --\u003e W2[\"Window 2\"] W1 --\u003e P1[\"Pane 1\"] W1 --\u003e P2[\"Pane 2\"] end subgraph cmux[\"cmux (Native macOS App)\"] APP[\"cmux.app \u0026lt;br/\u0026gt; Swift + AppKit\"] --\u003e WS1[\"Workspace 1 \u0026lt;br/\u0026gt; git branch, PR, ports\"] APP --\u003e WS2[\"Workspace 2\"] WS1 --\u003e SF1[\"Surface 1\"] WS1 --\u003e SF2[\"Surface 2\"] SF1 --\u003e PA1[\"Pane A\"] SF1 --\u003e PA2[\"Pane B\"] end Item tmux cmux Type Terminal multiplexer AI agent terminal Architecture Server-client Native macOS app OS support Cross-platform (Linux, macOS, BSD, Solaris) macOS 14.0+ only UI TUI (text-based) GUI (native AppKit) Rendering Custom TUI Ghostty engine (libghostty) License ISC AGPL tmux uses a server process that manages all sessions while clients connect to view them. Sessions persist even if the terminal is closed, as long as the server is alive. cmux is a native macOS app that displays workspace metadata — git branch, PR status, open ports, notifications — visually in a sidebar.\nCore Concept Mapping The two tools\u0026rsquo; hierarchies have a clear correspondence.\ntmux cmux Description Session Workspace Top-level work unit Window Surface Tab within a session/workspace Pane Pane Split screen area How Navigation Differs tmux uses a prefix key approach. Press Ctrl+b first, then enter a command key. The learning curve is steep, but everything can be controlled with just a keyboard.\ncmux uses native macOS shortcuts. No prefix required — actions fire immediately.\nAction tmux cmux New session/workspace tmux new -s name Cmd+N Horizontal split Ctrl+b % Cmd+D Vertical split Ctrl+b \u0026quot; Cmd+Shift+D New window/surface Ctrl+b c Cmd+T Session list Ctrl+b s Always visible in sidebar AI Agent Support This is where the two tools differ most significantly.\nflowchart TB subgraph tmux_flow[\"tmux + AI Agent\"] A1[\"Claude Code \u0026lt;br/\u0026gt; Agent Team\"] --\u003e|\"tmux new-session\"| A2[\"tmux session\"] A2 --\u003e|\"tmux send-keys\"| A3[\"Agent Pane 1\"] A2 --\u003e|\"tmux send-keys\"| A4[\"Agent Pane 2\"] A2 --\u003e|\"tmux send-keys\"| A5[\"Agent Pane 3\"] A3 --\u003e|\"tmux capture-pane\"| A6[\"Collect results\"] A4 --\u003e|\"tmux capture-pane\"| A6 A5 --\u003e|\"tmux capture-pane\"| A6 end subgraph cmux_flow[\"cmux + AI Agent\"] B1[\"AI Agent\"] --\u003e|\"cmux new-workspace\"| B2[\"Workspace\"] B2 --\u003e|\"cmux split\"| B3[\"Pane A\"] B2 --\u003e|\"cmux split\"| B4[\"Pane B\"] B3 --\u003e|\"cmux send\"| B5[\"Run command\"] B4 --\u003e|\"cmux read-screen\"| B6[\"Read another pane \u0026lt;br/\u0026gt; (inter-agent communication)\"] B2 --\u003e|\"notification system\"| B7[\"macOS notification \u0026lt;br/\u0026gt; + blue ring indicator \u0026lt;br/\u0026gt; + unread badge\"] endtmux\u0026rsquo;s AI Agent Usage tmux wasn\u0026rsquo;t originally designed for AI. But its programmable API lets AI tools leverage it.\nClaude Code: Creates tmux sessions to run parallel agents in Agent Team mode Codex, Gemini CLI: Use tmux in a similar way tmux send-keys sends commands; tmux capture-pane collects output cmux\u0026rsquo;s Native AI Support cmux was designed for AI agents from the ground up.\nNotification system: Blue ring on panes waiting for input, unread badges on workspace tabs, macOS desktop notifications. Cmd+Shift+U jumps to the most recent notification. read-screen: One pane can read another pane\u0026rsquo;s content. This is the core feature for inter-agent communication. send: Programmatically send commands to another pane. Environment variables: CMUX_WORKSPACE_ID, CMUX_SURFACE_ID, CMUX_SOCKET_PATH — agents automatically know their own context. Built-in browser: Open web pages inside the terminal. CLI Automation Comparison # tmux — programmatic control tmux new-session -d -s work tmux split-window -h tmux send-keys -t work:0.1 \u0026#34;npm run dev\u0026#34; Enter tmux capture-pane -t work:0.0 -p # cmux — AI agent-dedicated CLI cmux new-workspace cmux split --direction right cmux send --pane-id $CMUX_PANE_ID \u0026#34;npm run dev\u0026#34; cmux read-screen --pane-id $TARGET_PANE_ID \u0026ldquo;Primitive, Not Solution\u0026rdquo; Philosophy cmux\u0026rsquo;s core design philosophy is \u0026ldquo;Primitive, Not Solution.\u0026rdquo; Rather than providing a finished workflow, it offers low-level building blocks — read-screen, send, notifications. It leaves AI agents to combine these elements and compose their own workflows.\nThis approach increases compatibility with diverse AI tools and maximizes agent autonomy.\nThe Competitive Landscape The AI agent terminal space is growing quickly.\nTool Characteristics cmux Native macOS, Ghostty-based, read-screen Claude Squad GitHub-based agent orchestration Pane Terminal for AI agents Amux AI-centric multiplexer Calyx Emerging competitor Recommended Combination: tmux + cmux In conclusion, tmux and cmux are not substitutes — they\u0026rsquo;re complements.\ntmux: Session persistence (server-based), cross-platform support, remote server work cmux: GUI visualization, AI agent notifications, inter-agent communication (read-screen) For local macOS development with AI agents, cmux works well as the primary tool, with tmux alongside for remote server work or when session persistence is required. That combination is currently the most effective terminal setup.\nInstallation # tmux brew install tmux # cmux brew tap manaflow-ai/cmux \u0026amp;\u0026amp; brew install --cask cmux Quick Links tmux GitHub — 43,430 stars, C-based open source cmux official site — Manaflow AI cmux: Terminal for Coding Agents — Dale Seo — practical guide tmux vs cmux comparison — goddaehee — from installation to competitive tools TMUX Masterclass — YouTube Takeaways tmux\u0026rsquo;s strength is 19 years of proven stability and cross-platform support. It remains the top choice for every scenario involving remote servers, CI/CD, and session persistence. cmux is designed for the AI agent era, with its notification system and read-screen feature optimized for multi-agent workflows. The two are not substitutes — they\u0026rsquo;re complements. If tmux is \u0026ldquo;infrastructure where sessions never die,\u0026rdquo; cmux is \u0026ldquo;the interface where agents talk to each other.\u0026rdquo; AI coding tools spawning agents on top of tmux, with cmux visually managing those agents\u0026rsquo; state, is currently the most powerful terminal environment you can build.\n","date":"2026-03-23T00:00:00+09:00","image":"/images/posts/2026-03-23-tmux-cmux/cover-en.jpg","permalink":"/posts/2026-03-23-tmux-cmux/","title":"tmux vs cmux — Battle-Tested Terminal Multiplexer vs the AI Agent Terminal"},{"content":"Overview Harness series posts:\nHarness — Turning Claude Code from a Generic AI into a Dedicated Employee — concept and core structure Harness Engineering #2 — Building Real Harnesses with Antigravity — Google Antigravity in practice This post — community harness plugin comparison HarnessKit Dev Log #1 — Adaptive Harness Plugin for Zero-Based Vibe Coders — a plugin built directly from these findings This post was written as preliminary research before designing HarnessKit. The goal was to analyze the strengths and weaknesses of existing harness plugins and determine what to adopt and what to improve. I compared the two most active implementations on GitHub — Chachamaru127\u0026rsquo;s claude-code-harness (281★) and panayiotism\u0026rsquo;s claude-harness (73★). They solve the same problem through completely different approaches.\nDesign Philosophy Comparison flowchart LR subgraph CCH[\"claude-code-harness\"] A1[\"TypeScript Core\"] --\u003e B1[\"Guardrail Engine\"] B1 --\u003e C1[\"5-Verb Skills\"] C1 --\u003e D1[\"Multi-Platform\"] end subgraph CH[\"claude-harness\"] A2[\"Shell Scripts\"] --\u003e B2[\"Memory Architecture\"] B2 --\u003e C2[\"/flow single command\"] C2 --\u003e D2[\"GitHub MCP integration\"] end CCH -.-\u003e|\"same goal\"| CHBoth plugins start from Anthropic\u0026rsquo;s \u0026ldquo;Effective harnesses for long-running agents\u0026rdquo; article, but their implementation strategies diverge.\nclaude-code-harness focuses on runtime safety. A TypeScript guardrail engine monitors every tool call and blocks dangerous commands with deny/warn rules. The core value: \u0026ldquo;proceed without breaking down in the same ways repeatedly.\u0026rdquo;\nclaude-harness focuses on context continuity. A 5-layer memory architecture preserves context across sessions, and a single /flow command automates everything from planning to merge. The core value: \u0026ldquo;once started, it flows automatically to completion.\u0026rdquo;\nArchitecture Details claude-code-harness — TypeScript Guardrail Engine flowchart TD A[\"User command\"] --\u003e B[\"PreToolUse Hook\"] B --\u003e C{\"Guardrail Engine\u0026lt;br/\u0026gt;(TypeScript)\"} C --\u003e|DENY| D[\"Block + warning\"] C --\u003e|WARN| E[\"Warning + proceed\"] C --\u003e|PASS| F[\"Execute\"] F --\u003e G[\"PostToolUse Hook\"] G --\u003e H{\"Tampering detected?\"} H --\u003e|detected| I[\"Rollback\"] H --\u003e|clean| J[\"Continue\"] Component Description core/guardrails/ pre-tool, post-tool, permission, tampering detection core/engine/lifecycle.js Session lifecycle management core/state/ State schema, migrations, storage skills-v3/ 5-verb skills (plan, work, review, validate, release) agents-v3/ reviewer, scaffolder, worker, team-composition The 5-Verb System is this plugin\u0026rsquo;s backbone:\nPlan — structure requirements into Plans.md Work — implement (--parallel supported, Breezing mode) Review — code review process Validate — re-runnable validation (generates evidence pack) Release — merge + release One distinctive feature is multi-platform support. Configuration files for Cursor, Codex, and OpenCode are included alongside Claude Code — a statement of intent to avoid lock-in to any single AI coding tool.\nThe guardrail engine is a TypeScript-compiled binary that receives JSON via stdin on every tool call and performs pattern matching. Unlike Shell-based grep matching, it enables structured, AST-level inspection. tampering.js even detects attempts to bypass the guardrail configuration itself.\nclaude-harness — Shell-Based Memory Architecture flowchart TD A[\"/flow command\"] --\u003e B[\"Context compilation\"] B --\u003e C[\"4-Layer memory load\"] C --\u003e D[\"GitHub Issue creation\"] D --\u003e E[\"Branch creation\"] E --\u003e F[\"TDD implementation\"] F --\u003e G[\"Checkpoint\"] G --\u003e H[\"PR creation\"] H --\u003e I[\"Auto-Merge\"] subgraph Memory[\"Memory layers\"] M1[\"Layer 1: Project Rules\"] M2[\"Layer 2: Feature State\"] M3[\"Layer 3: Session Context\"] M4[\"Layer 4: Learned Patterns\"] end C --\u003e Memory Component Description hooks/ 8 hooks (session-start, pre-tool-use, stop, pre-compact, etc.) skills/ 6 skills (setup, start, flow, checkpoint, merge, prd-breakdown) schemas/ JSON Schema state validation (active-features, memory-entries, loop-state, etc.) setup.sh One-time initialization The /flow single command is this plugin\u0026rsquo;s centerpiece. One command handles context compilation → GitHub Issue → Branch → TDD implementation → Checkpoint → PR → Merge. Fine-grained control via options:\n/flow \u0026#34;Add dark mode\u0026#34; # full lifecycle /flow --no-merge \u0026#34;Add feature\u0026#34; # stop before merge /flow --autonomous # batch-process entire feature /flow --team # ATDD (Agent Teams) /flow --quick # skip planning (simple tasks) The memory architecture is the differentiator. Four layers (Project Rules → Feature State → Session Context → Learned Patterns) structure context, automatically compiled at session start. The pre-compact hook saves critical information before context compression, preventing context loss during long sessions.\nAll hooks are pure Shell scripts. They run on bash + jq alone, without Node.js or Python runtimes. Simple to install with no dependencies — but limited for complex pattern matching.\nComparison Table Criterion claude-code-harness claude-harness Language TypeScript (core) + Shell (hooks) + Markdown (skills) Shell + Markdown Stars 281 73 Version v3.10.6 v10.2.0 Core model 5-Verb (Plan→Work→Review→Validate→Release) /flow single command (end-to-end) Guardrails TypeScript engine (deny/warn/pass + tampering detection) Shell-based pre-tool-use hook Memory State schema + migrations 4-Layer architecture (Project→Feature→Session→Learned) GitHub integration Indirect (gh CLI) GitHub MCP integration TDD Recommended in skills Enforced in /flow (RED→GREEN→REFACTOR) Multi-platform Claude Code, Cursor, Codex, OpenCode Claude Code only Agents reviewer, scaffolder, worker, team-composition Agent Teams (ATDD mode) PRD support Plans.md-based /prd-breakdown → auto-create GitHub Issues Autonomous execution /harness-work all (batch) /flow --autonomous (feature loop) Runtime dependency Node.js (TypeScript core) None (bash + jq) Which Plugin Is Right for You? flowchart TD A[\"Choose a harness plugin\"] --\u003e B{\"Is runtime safety\u0026lt;br/\u0026gt;the top priority?\"} B --\u003e|\"Yes\"| C[\"claude-code-harness\"] B --\u003e|\"No\"| D{\"Is cross-session\u0026lt;br/\u0026gt;context continuity important?\"} D --\u003e|\"Yes\"| E[\"claude-harness\"] D --\u003e|\"No\"| F{\"Do you need\u0026lt;br/\u0026gt;multi-platform support?\"} F --\u003e|\"Yes\"| C F --\u003e|\"No\"| G{\"Do you prefer\u0026lt;br/\u0026gt;minimal dependencies?\"} G --\u003e|\"Yes\"| E G --\u003e|\"No\"| CChoose claude-code-harness when:\nDangerous command blocking matters in a team environment You use multiple AI coding tools like Cursor and Codex in parallel You need to preserve validation results as evidence You need granular step-by-step workflow control Choose claude-harness when:\nYou want full automation from a single command Context loss in long sessions is a real problem for you You want a lightweight start with no Node.js dependency You need tight GitHub Issues/PR integration Relationship with Anthropic Superpowers Both plugins are aware of obra/superpowers (71,993★). The benchmark document in claude-code-harness directly compares all three, summarizing each one\u0026rsquo;s strengths:\nIf you want to expand your workflow\u0026rsquo;s breadth: Superpowers. If you want to reinforce the discipline of requirements → design → tasks: cc-sdd. If you want to transform plan · build · review · validate into a reliable standard flow that doesn\u0026rsquo;t collapse: Claude Harness.\nIn practice, Superpowers is closer to a workflow framework than a harness. It provides the flow from brainstorming → writing-plans → executing-plans → code-review, but doesn\u0026rsquo;t foreground infrastructure-level features like runtime guardrails or memory architecture. The three plugins aren\u0026rsquo;t competing — they operate at different layers.\nInsights TypeScript vs. Shell — the tradeoff is clear. The TypeScript guardrail engine enables structured checks and tampering detection but requires Node.js. Shell hooks have zero dependencies but are limited in pattern matching precision. The project\u0026rsquo;s security requirements determine the right choice. \u0026ldquo;5-Verb\u0026rdquo; and \u0026ldquo;/flow\u0026rdquo; are different solutions to the same problem. Explicit stage separation gives independent control over each stage but creates friction. A unified single command reduces friction but makes granular intervention harder. The larger the team, the more the former applies; for solo developers, the latter tends to win. Memory layering is harness engineering\u0026rsquo;s next frontier. panayiotism\u0026rsquo;s 4-layer memory architecture directly addresses the fundamental problem of preserving context across sessions. Chachamaru127 also has state/migration modules, but the emphasis is on guardrails rather than memory. Long-term, memory architecture is likely to become the defining factor in harness quality. The harness ecosystem is differentiating. General workflow (Superpowers), runtime safety (claude-code-harness), context continuity (claude-harness), adaptive presets (HarnessKit) — each attacks a different axis. This signals that harnesses are evolving from a single solution into a tool chain. ","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-harness-plugins/cover-en.jpg","permalink":"/posts/2026-03-20-harness-plugins/","title":"Claude Code Harness Plugins Compared — claude-code-harness vs. claude-harness"},{"content":"Overview Since Claude Code introduced its plugin system, the ecosystem has been expanding rapidly. It started with one official marketplace, but there are now community directories and a dedicated Korean-language marketplace among the options. This post compares the four major marketplaces and lays out criteria for choosing based on your needs.\nEcosystem Structure The Claude Code plugin ecosystem divides into three broad layers.\ngraph TD A[\"Claude Code CLI\"] --\u003e B[\"Built-in Official Marketplace \u0026lt;br/\u0026gt; claude-plugins-official\"] A --\u003e C[\"Third-party Marketplaces \u0026lt;br/\u0026gt; GitHub repo-based\"] B --\u003e D[\"Code Intelligence \u0026lt;br/\u0026gt; LSP for 11 languages\"] B --\u003e E[\"External Integrations \u0026lt;br/\u0026gt; GitHub, Slack, Jira, etc.\"] B --\u003e F[\"Development Workflows \u0026lt;br/\u0026gt; commit, PR review\"] B --\u003e G[\"Output Styles\"] C --\u003e H[\"claudemarketplaces.com \u0026lt;br/\u0026gt; community directory (2,919)\"] C --\u003e I[\"modu-ai/cc-plugins \u0026lt;br/\u0026gt; Korean-language marketplace\"] C --\u003e J[\"skillsmp.com \u0026lt;br/\u0026gt; Agent Skills\"] H --\u003e K[\"GitHub Repos \u0026lt;br/\u0026gt; individual marketplaces\"] I --\u003e L[\"Security-focused plugins \u0026lt;br/\u0026gt; Auth0, MFA, Compliance\"] style A fill:#6366f1,color:#fff style B fill:#059669,color:#fff style C fill:#d97706,color:#fff1. Official Marketplace — claude-plugins-official The default marketplace operated directly by Anthropic. Available automatically with Claude Code installation.\nMain Categories Category Content Examples Code Intelligence LSP-based language support (11 languages) TypeScript, Python, Rust, Go, etc. External Integrations External service connections GitHub, GitLab, Jira, Slack, Figma Development Workflows Development process automation commit-commands, pr-review-toolkit Output Styles Output format customization — Installation # Install a plugin /plugin install plugin-name@claude-plugins-official # List installed plugins /plugin list Pros: Backed by Anthropic, so stability and compatibility are guaranteed. No marketplace registration required.\nCons: Limited plugin selection; difficult to reflect the full range of community needs.\n2. claudemarketplaces.com — Community Directory An independent project run by @mertduzgun with no official relationship with Anthropic. Currently indexes 2,919 marketplaces, making it the largest by scale.\nPopular Marketplaces (by Stars) Marketplace Stars Plugins Notes f/prompts.chat 144.8k — Prompt-centric anthropics/claude-code 65.1k 13 Official repo obra/superpowers 46.9k — Extended capabilities upstash/context7 45k — Context management affaan-m/everything-claude-code 41.3k — Comprehensive resource ComposioHQ/awesome-claude-skills 32k 107 Skills collection wshobson/agents 28k 73 Agent-focused eyaltoledano/claude-task-master 25.3k — Task management Category Breakdown Organized into granular categories including 3D-Development, Agents, Authentication, Automation, Backend, Claude, and Code-Quality. Sponsored listings (ideabrowser.com, supastarter, etc.) are included, so it\u0026rsquo;s worth developing the habit of checking whether a listing is sponsored.\nPros: Massive scale, category search, Stars-based popularity indicators.\nCons: No quality verification; security judgment is the user\u0026rsquo;s responsibility since this is unofficial.\n3. skillsmp.com (SkillsMP) A marketplace specializing in Agent Skills, with Korean UI support at skillsmp.com/ko. At the time of writing, HTTP 403 errors are occurring on access, so stability needs to be verified.\nPros: Korean UI, Agent Skills specialization.\nCons: Unstable access (403 errors), unable to verify content.\n4. modu-ai/cc-plugins — Korean Community A Korean-optimized marketplace positioned as the \u0026ldquo;ModuAI Official Claude Code Plugin Marketplace.\u0026rdquo;\nCharacteristics Stars: 56 (early stage) License: GPL-3.0 (Copyleft) Tech stack: MoAI-ADK (AI Development Kit), DDD methodology Focus areas: Auth0 security, MFA, token security, compliance Installation # Register marketplace /plugin marketplace add modu-ai/cc-plugins # Install after registration /plugin install plugin-name@modu-ai-cc-plugins Pros: Korean documentation, security-focused plugins, domestic community support.\nCons: Still early stage with limited plugins; understanding the GPL-3.0 license restrictions is required.\nMarketplace Comparison Summary Item Official (Anthropic) claudemarketplaces.com skillsmp.com modu-ai/cc-plugins Scale Small Large (2,919) Unknown Small Operator Anthropic Community (individual) Unknown Korean community Quality verification Yes No Unknown Partial Korean support No No Yes Yes Security trustworthiness High Low (manual verification needed) Unknown Medium Installation ease Built-in Separate registration — Separate registration Auto-update Yes (configurable) Varies by marketplace — — Focus General General Agent Skills Security Plugin System Architecture The marketplace system in Claude Code runs on GitHub repos as its foundation.\ngraph LR subgraph \"Marketplace structure\" A[\"marketplace.json\"] --\u003e B[\"sources \u0026lt;br/\u0026gt; plugin metadata\"] A --\u003e C[\"scopes \u0026lt;br/\u0026gt; permission scope definitions\"] B --\u003e D[\"GitHub Repos\"] B --\u003e E[\"Git URLs\"] B --\u003e F[\"npm Packages\"] B --\u003e G[\"Local Paths\"] end subgraph \"Installation flow\" H[\"User\"] --\u003e I[\"/plugin marketplace add\"] I --\u003e A H --\u003e J[\"/plugin install\"] J --\u003e K[\"Plugin download \u0026lt;br/\u0026gt; + activation\"] end style A fill:#6366f1,color:#fff style H fill:#059669,color:#fffSupported Plugin Source Types Source type Example Use case GitHub repo owner/repo Most common Git URL https://github.com/... Direct URL specification Local path local directory path Local development/testing npm package @scope/package Node.js ecosystem Team Marketplace Configuration For shared team marketplaces, configure in .claude/settings.json:\n{ \u0026#34;extraKnownMarketplaces\u0026#34;: [ \u0026#34;your-org/internal-plugins\u0026#34; ] } Selection Guide — Which Marketplace to Use? Recommendations by Situation \u0026ldquo;Starting from scratch\u0026rdquo; — begin with the official marketplace. No setup required and you can start by strengthening code intelligence with LSP plugins.\n\u0026ldquo;Need a wide variety of plugins\u0026rdquo; — search claudemarketplaces.com, check Stars and recent update dates, then register individual marketplaces. ComposioHQ/awesome-claude-skills (107 plugins) and wshobson/agents (73 plugins) are both practical options.\n\u0026ldquo;Working in a Korean-language environment\u0026rdquo; — register modu-ai/cc-plugins for Korean documentation and domestic community support.\n\u0026ldquo;Security is a priority\u0026rdquo; — use the official marketplace as your base and only install third-party plugins after personally reviewing the source code.\nSecurity Considerations Security is the most critical issue in the plugin ecosystem.\nAnthropic does not verify third-party plugins. The official documentation explicitly states \u0026ldquo;user must trust plugins.\u0026rdquo;\nBefore installing, check:\nGitHub repo Stars, Issues, and recent commit activity License (Copyleft licenses like GPL-3.0 can affect commercial projects) Permission (scopes) the plugin requests Whether the source code makes external API calls or transmits data Auto-update settings: Only enable auto-update for trusted marketplaces; manage others manually.\nSponsored listing caution: Sponsored listings on claudemarketplaces.com are not quality endorsements — they are advertisements.\nQuick Links Claude Code official plugin docs claudemarketplaces.com skillsmp.com modu-ai/cc-plugins Marketplace creation guide Insights The Claude Code plugin ecosystem is still in an early growth phase. The pace of growth — fast enough to index 2,919 marketplaces — is impressive, and both the official marketplace\u0026rsquo;s organized category structure and the Korean community\u0026rsquo;s self-run marketplace are positive signals. However, the lack of quality verification, the absence of a compatibility standard between plugins, and a trust model that relies solely on GitHub Stars all need improvement. The VS Code extension ecosystem took years to mature, and Claude Code will need time too. For now, a rational strategy is to center your usage on the official marketplace while selectively leveraging community marketplaces.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-claude-code-marketplaces/cover-en.jpg","permalink":"/posts/2026-03-20-claude-code-marketplaces/","title":"Claude Code Plugin Marketplaces Compared — Where to Find Them and How to Choose"},{"content":"Overview Anyone who has used an AI coding assistant has probably run into this: Cursor or Claude Code confidently writes code that calls an API that doesn\u0026rsquo;t exist, or uses a pattern that was deprecated two years ago. LLMs have a temporal cutoff in their training data, and libraries update constantly. Context7 is a platform built to bridge that gap — it injects the latest official documentation directly into LLM prompts. With approximately 49,800 GitHub stars and growing fast, here\u0026rsquo;s a thorough look at what it does and how.\nThe LLM Hallucination Problem: Why It Happens Hallucination when LLMs generate code is not just a \u0026ldquo;mistake\u0026rdquo; — it\u0026rsquo;s a structural problem.\nflowchart LR A[\"LLM training data\u0026lt;br/\u0026gt;(~early 2025 cutoff)\"] --\u003e B[\"Code generation request\u0026lt;br/\u0026gt;Next.js 15 middleware\"] B --\u003e C{\"Does that version\u0026lt;br/\u0026gt;exist in training data?\"} C -- \"Yes\" --\u003e D[\"Accurate code generated\"] C -- \"No\" --\u003e E[\"Confidently generates\u0026lt;br/\u0026gt;based on older patterns\"] E --\u003e F[\"Uses deprecated API\u0026lt;br/\u0026gt;Calls non-existent functions\u0026lt;br/\u0026gt;Passes wrong parameters\"]Common examples:\nSituation Symptom Next.js App Router code Mixes in Pages Router patterns Latest Supabase Auth API Calls supabase.auth.api (deprecated) Tailwind CSS v4 config Generates v3 config format Cloudflare Workers new API Combines non-existent methods The core problem: LLMs don\u0026rsquo;t say \u0026ldquo;I don\u0026rsquo;t know.\u0026rdquo; If there\u0026rsquo;s a similar pattern in training data, they generate plausible-looking code from it, and the developer doesn\u0026rsquo;t realize until a runtime error surfaces.\nHow Context7 Solves It Context7\u0026rsquo;s approach is simple but effective: before the LLM generates code, inject the latest official documentation for the relevant library into the prompt context.\nflowchart TB subgraph User[\"User environment\"] A[\"AI coding editor\u0026lt;br/\u0026gt;Cursor / Claude Code / OpenCode\"] end subgraph Context7[\"Context7 Platform\"] B[\"MCP Server\u0026lt;br/\u0026gt;(open source)\"] C[\"API Backend\u0026lt;br/\u0026gt;(Upstash proprietary)\"] D[\"Crawling Engine\u0026lt;br/\u0026gt;doc collection \u0026amp; parsing\"] E[\"Documentation DB\u0026lt;br/\u0026gt;version-indexed\"] end subgraph Sources[\"Documentation sources\"] F[\"Official doc sites\"] G[\"GitHub READMEs\"] H[\"API References\"] end A -- \"use context7\" --\u003e B B -- \"resolve-library-id\u0026lt;br/\u0026gt;query-docs\" --\u003e C C --\u003e E D --\u003e E Sources --\u003e D C -- \"latest doc snippets\" --\u003e B B -- \"context injection\" --\u003e AHow It Works User adds use context7 to the prompt Context7 MCP server identifies the library (resolve-library-id) Searches the library\u0026rsquo;s latest docs for relevant sections (query-docs) Injects retrieved doc snippets into the LLM context LLM generates code based on current documentation Why this step matters: it\u0026rsquo;s not just \u0026ldquo;read the latest docs\u0026rdquo; — it selectively extracts only the sections relevant to the query. Putting the entire documentation in context wastes tokens and can actually degrade performance.\nCLI vs. MCP: Two Usage Modes Context7 supports two modes.\n1. CLI + Skills Mode (No MCP Required) # Setup npx ctx7 setup # OAuth auth → API key creation → skill installation # Search for a library ctx7 library nextjs middleware # Fetch docs for a specific library ctx7 docs /vercel/next.js \u0026#34;middleware authentication JWT\u0026#34; CLI mode is useful in environments that don\u0026rsquo;t support MCP, or when you just need a quick terminal lookup.\n2. MCP Mode (Native Integration) In MCP-supporting clients, Context7 operates automatically.\nMCP Tools provided:\nTool Purpose Input Output resolve-library-id Convert library name to Context7 ID \u0026quot;nextjs\u0026quot; /vercel/next.js query-docs Search relevant docs by library ID library ID + query doc snippets Key advantage of MCP mode: the user only adds use context7 to the prompt, and the LLM automatically performs the tool calls.\nMode Comparison Criterion CLI mode MCP mode Setup complexity Low (one npx command) MCP server registration required Automation level Manual Fully automatic MCP support required No Yes Best for Quick doc lookups, non-MCP environments Everyday AI coding workflow Library ID System and Version Targeting Context7 Library IDs use GitHub-style paths:\n/supabase/supabase /vercel/next.js /mongodb/docs /langchain-ai/langchainjs This ID system is interesting because it explicitly identifies the source of documentation, not just a package name. Searching just react might yield multiple results, but /facebook/react points to exactly one source.\nVersion Targeting Specify a version in the prompt and Context7 automatically matches that version\u0026rsquo;s docs:\nCreate a Next.js 15 middleware that validates JWT. use context7 Context7 detects \u0026ldquo;Next.js 15\u0026rdquo; from this prompt and fetches middleware-related sections from the v15 documentation.\nPractical Usage in Claude Code Setup npx ctx7 setup Practical Prompt Patterns Basic usage:\nShow me how to use a service role key to bypass Row Level Security in a Supabase Edge Function. use context7 Version specification:\nHow do I set up a custom theme in Tailwind CSS v4. use context7 Without Context7 vs. With Context7 // ❌ Without Context7 — LLM may generate outdated patterns import { createMiddlewareClient } from \u0026#39;@supabase/auth-helpers-nextjs\u0026#39; // auth-helpers-nextjs is deprecated, replaced by @supabase/ssr // ✅ With Context7 — based on latest official docs import { createServerClient } from \u0026#39;@supabase/ssr\u0026#39; import { NextResponse, type NextRequest } from \u0026#39;next/server\u0026#39; export async function middleware(request: NextRequest) { let supabaseResponse = NextResponse.next({ request }) const supabase = createServerClient( process.env.NEXT_PUBLIC_SUPABASE_URL!, process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY!, { cookies: { getAll() { return request.cookies.getAll() }, setAll(cookiesToSet) { cookiesToSet.forEach(({ name, value }) =\u0026gt; request.cookies.set(name, value)) supabaseResponse = NextResponse.next({ request }) cookiesToSet.forEach(({ name, value, options }) =\u0026gt; supabaseResponse.cookies.set(name, value, options)) }, }, } ) await supabase.auth.getUser() return supabaseResponse } The Upstash Connection and Business Model Context7 is an Upstash project. Upstash is an infrastructure company offering serverless Redis, Kafka, and QStash that has been expanding into the AI/LLM tooling ecosystem.\nOpen Source Boundary Component Open? MCP Server source Open source (GitHub) CLI tool Open source API Backend Closed (Upstash proprietary) Crawling/Parsing Engine Closed Documentation DB Closed The MCP server and CLI are open source to build community trust and adoption. The core value — the document crawling, parsing, and indexing engine — is kept proprietary to form a business moat.\nRevenue model: Basic usage is free (rate limited); generating an API key at context7.com/dashboard unlocks higher rate limits.\nComparison with Alternatives Approach Accuracy Automation Build cost Dependency Manual doc copy-paste High None None None Self-hosted RAG High High Very high Own infra Context7 High High Near zero Upstash Web search integration Medium Medium Low Search API Context7\u0026rsquo;s biggest advantage is value relative to setup cost. One npx ctx7 setup command gives you access to current docs for dozens of libraries.\nCritical Analysis Strengths Extremely low barrier to entry: one npx ctx7 setup command and you\u0026rsquo;re done Version awareness: specify a version in the prompt and it auto-matches Wide client support: integrates with 30+ clients including Cursor, Claude Code, and OpenCode Community momentum: ~49,800 stars is the fuel to keep improving the doc DB\u0026rsquo;s quality and coverage Limitations and Risks Single point of failure: the backend API is entirely Upstash-dependent — no fallback if the service goes down Opaque coverage: it\u0026rsquo;s not transparently documented which libraries are in the DB or how current they are Prompt token consumption: doc snippets injected into context consume tokens \u0026ldquo;use context7\u0026rdquo; keyword dependency: requiring a keyword in the prompt means the user has to decide when to use it Vendor lock-in path: the classic freemium model — free use → hit rate limit → paid conversion Quick Links Context7 GitHub Context7 website API key generation Claude Code plugin marketplace Insights Context7 is not technically complex. The idea of \u0026ldquo;inject current docs into LLM context\u0026rdquo; is one anyone could conceive. But actually building the infrastructure that continuously crawls thousands of libraries, indexes them by version, and accurately extracts relevant sections — and offers this for free — is a completely different problem. Context7\u0026rsquo;s real value isn\u0026rsquo;t the code; it\u0026rsquo;s the data pipeline.\nFrom the perspective of the MCP ecosystem, Context7 is one of the most compelling demonstrations of why MCP is needed. That said, in the long run, this kind of functionality will likely get built into AI coding tools themselves. If Cursor or Claude Code start offering native documentation indexing, Context7\u0026rsquo;s standalone value proposition will diminish.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-context7/cover-en.jpg","permalink":"/posts/2026-03-20-context7/","title":"Context7 — A Deep Dive into the Platform That Injects Up-to-Date Docs into LLMs"},{"content":"Overview Y Combinator CEO Garry Tan has open-sourced his personal Claude Code development environment. gstack is a skill framework that transforms Claude Code into a virtual engineering team of 15 specialized skills and 6 power tools. It crossed 10,000 GitHub stars on its first day and currently sits above 27,000. Garry Tan\u0026rsquo;s claim of writing over 600,000 lines of production code in 60 days — and the assertion that you can generate 10,000–20,000 usable lines of code per day — drove explosive attention from the developer community.\nThe Sprint Architecture gstack\u0026rsquo;s core is structuring the full software development lifecycle into a 7-stage sprint process. Rather than just generating code, it replicates inside Claude Code the cycle that actual engineering teams follow: think → plan → build → review → test → ship → reflect.\nflowchart LR A[\"Think \u0026lt;br/\u0026gt; Problem analysis\"] --\u003e B[\"Plan \u0026lt;br/\u0026gt; Design \u0026amp; review\"] B --\u003e C[\"Build \u0026lt;br/\u0026gt; Implementation\"] C --\u003e D[\"Review \u0026lt;br/\u0026gt; Code review\"] D --\u003e E[\"Test \u0026lt;br/\u0026gt; QA validation\"] E --\u003e F[\"Ship \u0026lt;br/\u0026gt; Deploy\"] F --\u003e G[\"Reflect \u0026lt;br/\u0026gt; Retrospective\"] G --\u003e|\"next sprint\"| A style A fill:#e1f5fe style B fill:#f3e5f5 style C fill:#e8f5e9 style D fill:#fff3e0 style E fill:#fce4ec style F fill:#e0f2f1 style G fill:#f5f5f5The notable feature is that 10–15 sprints can run in parallel. Claude Code\u0026rsquo;s multi-task capability is used to develop multiple features simultaneously, with each sprint going through independent review and test stages.\nAnalyzing the 15 Skills gstack maps each engineering team role to an independent skill. Each skill is invoked with a / command and may auto-load based on context.\nCEO and Leadership Roles Skill Command Role CEO Review /plan-ceo-review Review plans from a business perspective, adjust priorities Design Review /plan-design-review UX/UI perspective design review Eng Review /plan-eng-review Technical feasibility, architecture review Office Hours /office-hours Open Q\u0026amp;A, direction discussions CEO Review follows what Garry Tan calls the \u0026ldquo;Boulder Ocean\u0026rdquo; philosophy. The principle is that the CEO doesn\u0026rsquo;t interfere with implementation details, but provides clear feedback on strategic direction and priorities. Most recommendations from this review are designed to be accepted by default, so Claude can proceed quickly with its own judgment.\nEngineering Roles Skill Command Role Code Review /review Perform PR-level code review QA /qa Automated testing and quality validation Ship /ship Manage deployment process Investigate /investigate Bug tracking, log analysis Careful /careful Switch to cautious mode, detect risky changes Operations and Documentation Roles Skill Command Role Document Release /document-release Auto-generate release notes Retro /retro Sprint retrospective, identify improvements Browse /browse Web search and reference collection Codex /codex Codebase knowledge management Power Tools (Safety Mechanisms) Tool Command Function Freeze /freeze Prevent changes to specific files/directories Guard /guard Watch for changes and warn Unfreeze /unfreeze Release freeze /freeze and /guard are especially important safety mechanisms. When running parallel sprints, they prevent conflicts from multiple Claude instances simultaneously modifying the same file. Freezing a core config file or database schema means that sprint won\u0026rsquo;t touch those files.\nInstallation and Usage Installation is straightforward — clone into Claude Code\u0026rsquo;s skills directory and run the setup script:\ngit clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack cd ~/.claude/skills/gstack ./setup Immediately usable from Claude Code:\n\u0026gt; /plan-ceo-review \u0026gt; I want to build a Pomodoro timer app. React + TypeScript. [CEO Review skill activated] - Goal clarity: ✅ - Market differentiation: recommend — define a differentiator vs. existing timers - Tech stack fit: ✅ React + TS is appropriate for this scale - MVP scope: recommend limiting to core timer + session history Accept? [Y/n] After CEO review, the flow naturally continues:\n\u0026gt; /plan-eng-review [Eng Review skill activated] - Component structure proposal (ASCII flowchart generated) - State management: useReducer recommended - Test strategy: Vitest + React Testing Library One distinctive feature of gstack is the automatic ASCII flowchart generation that visualizes architecture during the planning stage. ASCII art is used instead of Mermaid because it\u0026rsquo;s immediately readable in Claude Code\u0026rsquo;s terminal environment.\nComparison with Other Tools The Claude Code ecosystem has other extension tools beyond gstack. harness and oh-my-claudecode are the most notable.\nflowchart TD subgraph gstack G1[\"15 role-based skills\"] G2[\"Sprint process\"] G3[\"CEO/Design/Eng review\"] G4[\"Freeze/Guard safety\"] end subgraph harness[\"harness\"] H1[\"Workflow automation\"] H2[\"Custom pipelines\"] H3[\"Tool chaining\"] end subgraph omc[\"oh-my-claudecode\"] O1[\"Prompt templates\"] O2[\"Context management\"] O3[\"Configuration presets\"] end style gstack fill:#e8f5e9 style harness fill:#e1f5fe style omc fill:#fff3e0 Characteristic gstack harness oh-my-claudecode Core philosophy Virtual team simulation Workflow automation Prompt optimization Skill count 15 + 6 power tools Custom-defined Template-based Review process CEO/Design/Eng 3-level None None Parallel execution 10-15 sprints Pipeline-based Not supported Safety mechanisms freeze/guard/unfreeze None None Installation git clone + setup npm/pip dotfiles gstack\u0026rsquo;s most distinctive quality is that it is process-oriented. Where other tools focus on \u0026ldquo;how to use Claude Code better,\u0026rdquo; gstack tries to \u0026ldquo;transplant the way a software team actually works into Claude Code.\u0026rdquo; The CEO Review concept as a layer simply doesn\u0026rsquo;t exist in other tools.\nWhat Garry Tan\u0026rsquo;s Background Tells Us Garry Tan is not just a CEO. He was an early Palantir engineer who personally designed the Palantir logo, then became a YC partner before taking the CEO role. This background is directly reflected in gstack\u0026rsquo;s design:\nPalantir experience → data-driven decision making, structured review processes YC experience → fast MVP, sprint-based development, \u0026ldquo;Ship fast\u0026rdquo; culture Design sensibility → the existence of the Design Review skill; treating UX as equal to code review The 10,000–20,000 lines per day figure can sound like an exaggeration, but given parallel sprints and Claude Code\u0026rsquo;s code generation capabilities, it\u0026rsquo;s not physically impossible. What \u0026ldquo;usable\u0026rdquo; code means in this context, however, is worth debating.\nCritical Analysis Strengths Structured development process: enforcing plan → review → build stages rather than \u0026ldquo;just write code\u0026rdquo; improves quality Safety mechanisms: /freeze and /guard for preventing conflicts in parallel work are genuinely practical Low barrier to entry: one git clone installs it; MIT license makes it freely usable Context auto-loading: relevant skills activate automatically based on context, removing the need for manual invocation each time Weaknesses and Concerns Claude Code lock-in: only works with Anthropic\u0026rsquo;s Claude Code; unusable in Cursor, Windsurf, or other AI coding tools \u0026ldquo;Magic bullet\u0026rdquo; illusion: a significant portion of the 27,000 stars comes from Garry Tan\u0026rsquo;s name recognition. The same tool from an unknown developer would likely not have attracted this level of attention. LOC metric limitations: lines of code are a poor measure of productivity. Whether all 600,000 lines are meaningful code, or whether the figure includes boilerplate and generated scaffolding, is unclear. Limits of team simulation: whether CEO Review, Design Review, and similar skills can replace the depth of actual human reviewers needs verification. LLM review is closer to pattern matching; domain-specific business judgment is hard to replicate. TypeScript + Go Template mix: skill definitions are spread across multiple languages, creating a barrier for customization. Quick Links GitHub repository: garrytan/gstack Claude Code official docs Y Combinator YouTube: gstack — 10K GitHub stars in one day Insights The most interesting pattern gstack reveals is the direction AI coding tools are evolving. The early goal was \u0026ldquo;generate code faster.\u0026rdquo; gstack extends the goal to \u0026ldquo;encapsulate the entire software development process inside AI.\u0026rdquo; This represents a shift from a simple code generation tool to a development methodology framework.\nThe existence of safety mechanisms like /freeze and /guard in particular is evidence that parallel AI agent execution creates real problems in practice. Managing conflicts when multiple Claude instances modify the same codebase simultaneously is a challenge the entire AI coding tool ecosystem will need to solve.\nThat said, gstack\u0026rsquo;s popularity clearly owes more to the Garry Tan brand than to the tool\u0026rsquo;s inherent quality. What matters is whether this framework has been validated in actual production environments, and whether developers other than Garry Tan can experience the same productivity gains. 27,000 stars don\u0026rsquo;t equal 27,000 active users. In the age of vibe coding, tool selection should be deliberate — judged by whether something genuinely helps your workflow, not by star count.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-gstack/cover-en.jpg","permalink":"/posts/2026-03-20-gstack/","title":"gstack — YC CEO Garry Tan's Claude Code Virtual Engineering Team"},{"content":"Overview HarnessKit is a Claude Code Plugin for zero-based vibe coders. After analyzing Anthropic\u0026rsquo;s \u0026ldquo;Effective harnesses for long-running agents\u0026rdquo; article and existing harness engineering implementations (autonomous-coding, claude-harness, and others), I designed it to adopt their strengths and address their weaknesses. The core cycle is detect → configure → observe → improve, and the full journey from design spec to v0.1.0 implementation took 19 hours.\nDesign: What Is Harness Engineering? Background In vibe coding, an AI agent loses all context when a session ends. Building infrastructure outside the session — recording passes: false in feature_list.json, using progress files for handoffs, learning from failures.json — is what keeps the agent consistent. That\u0026rsquo;s the heart of harness engineering.\nThe problems with existing implementations were clear:\nRepo properties have to be figured out manually The same guardrails apply regardless of experience level No improvement loop after initial setup Design Principle: Marketplace First, Customize Later The initial design kept skill seed templates inside the plugin and generated them with /skill-builder. But through discussion, a \u0026ldquo;don\u0026rsquo;t reinvent the wheel\u0026rdquo; principle was established:\n\u0026ldquo;If there\u0026rsquo;s already a powerful plugin out there, just use it.\u0026rdquo;\nAs a result, all skills/agents templates were removed. The revised structure explores and installs validated marketplace plugins first, then analyzes usage patterns and uses /skill-builder to customize only the gaps.\nflowchart TD A[\"/harnesskit:setup\"] --\u003e B[\"Repo detection\u0026lt;br/\u0026gt;detect-repo.sh\"] B --\u003e C[\"Preset selection\u0026lt;br/\u0026gt;beginner / intermediate / advanced\"] C --\u003e D[\"Infrastructure generation\u0026lt;br/\u0026gt;CLAUDE.md, feature_list,\u0026lt;br/\u0026gt;progress, failures\"] C --\u003e E[\"Marketplace exploration\u0026lt;br/\u0026gt;skills, agents, review\"] E --\u003e F[\"Plugin install recommendations\"] D --\u003e G[\"Hook registration\u0026lt;br/\u0026gt;session-start, guardrails,\u0026lt;br/\u0026gt;session-end\"] G --\u003e H[\"Session start\"] H --\u003e I[\"Working\u0026lt;br/\u0026gt;guardrails + dev hooks\"] I --\u003e J[\"Session end\u0026lt;br/\u0026gt;log save + pattern detection\"] J --\u003e|\"recurring pattern detected\"| K[\"/harnesskit:insights\u0026lt;br/\u0026gt;deep analysis + diff proposals\"] K --\u003e L[\"/harnesskit:apply\u0026lt;br/\u0026gt;review and apply\"] L --\u003e HImplementation: 4 Plans, One Day Plan 1: Plugin Skeleton + Repo Detection The plugin manifest (plugin.json) and repo auto-detection script are the foundation. detect-repo.sh identifies language, framework, package manager, test framework, and linter purely from file existence patterns. Zero token consumption.\n# detect-repo.sh core logic (excerpt) TOOL=$(echo \u0026#34;$INPUT\u0026#34; | jq -r \u0026#39;.tool_name\u0026#39; 2\u0026gt;/dev/null || echo \u0026#34;\u0026#34;) [ \u0026#34;$TOOL\u0026#34; != \u0026#34;Bash\u0026#34; ] \u0026amp;\u0026amp; exit 0 CMD=$(echo \u0026#34;$INPUT\u0026#34; | jq -r \u0026#39;.tool_input.command // \u0026#34;\u0026#34;\u0026#39; 2\u0026gt;/dev/null || echo \u0026#34;\u0026#34;) Presets have three levels:\nPreset Guardrails Briefing Nudge threshold beginner Strong (mostly BLOCK) Detailed (full) 2 sessions intermediate Balanced (core BLOCK, some WARN) Summary (concise) 3 sessions advanced Minimal (mostly WARN/PASS) One-liner (minimal) 5 sessions Plan 2: File Generation + Toolkit The init.md skill generates all harness infrastructure files. CLAUDE.md is composed from base.md + framework templates + preset filters. .claudeignore applies exclusion patterns matched to the detected framework.\nDev hooks are also registered here:\npost-edit-lint.sh — PostToolUse: auto-lint after file edits post-edit-typecheck.sh — PostToolUse: run tsc after .ts/.tsx edits pre-commit-test.sh — PreToolUse: run tests before git commit (beginner only) Plan 3: Hooks System (TDD) The trickiest part. Three core hooks implemented with TDD.\nguardrails.sh (PreToolUse): receives JSON via stdin and performs pattern matching:\n{\u0026#34;tool_name\u0026#34;: \u0026#34;Bash\u0026#34;, \u0026#34;tool_input\u0026#34;: {\u0026#34;command\u0026#34;: \u0026#34;git push --force origin main\u0026#34;}} Per-preset rule matrix:\nPattern Beginner Intermediate Advanced sudo BLOCK BLOCK BLOCK rm -rf / BLOCK BLOCK BLOCK Write to .env BLOCK BLOCK WARN git push --force BLOCK BLOCK WARN git reset --hard BLOCK WARN PASS it.skip, test.skip WARN PASS PASS session-start.sh (SessionStart): reads progress, features, and failures, then outputs a briefing appropriate for the preset.\nsession-end.sh (Stop): reads the current-session.jsonl scratch file, generates a session log, and updates failures.json.\nPlan 4: Insights + Apply /harnesskit:insights analyzes accumulated session data across five dimensions:\nError patterns (recurring errors, root causes) Feature progress (completion rate, bottlenecks) Guardrail activity (BLOCK/WARN frequency) Toolkit usage (which plugins are being used) Preset fitness (conditions for upgrade/downgrade) Rejected proposals are recorded in insights-history.json and suppress the same category + target combination for 10 sessions.\nProblem Solving session-end.sh grep pipe failure grep returns exit code 1 when there are no matches. In a grep ... | jq -s ... pipe, this caused issues. The || echo \u0026quot;[]\u0026quot; fallback produced partial output ([]\\n[]) that broke jq --argjson.\nFix: store grep results in a variable first (|| true), then pipe to jq only when non-empty.\nFeedback that the project had no direct impact on user projects The initial spec only generated files inside .harnesskit/. From the user\u0026rsquo;s perspective: \u0026ldquo;I installed the harness but I can\u0026rsquo;t feel the difference while coding.\u0026rdquo;\n\u0026ldquo;I expected direct installation into the initial repo, but from what we\u0026rsquo;ve discussed, it doesn\u0026rsquo;t feel like that\u0026rsquo;s happening.\u0026rdquo;\nFix: Added an entire Section 9 defining Harness Toolkit Generation, including marketplace plugin exploration/installation, dev hook configuration, dev command registration, and agent recommendations. Then refactored once more to \u0026ldquo;Marketplace First, Customize Later.\u0026rdquo;\nInsights auto-execution vs. manual trigger The user initially expected hooks to automatically run /insights and make suggestions.\n\u0026ldquo;At first I assumed hooks would run /insights and make proposals — is it a different approach?\u0026rdquo;\nFix: Agreed on a hybrid approach. Shell hooks detect recurring patterns with zero token cost and output a nudge. The actual /insights is manually triggered by the user, at which point Claude performs deep analysis. Diagnosis (built-in insights) and prescription (HarnessKit insights) are separated.\nCommit Log Message Changes docs: add HarnessKit design spec and harness engineering research guide Initial design spec docs: resolve spec review issues (10/10 fixed, approved) 10 spec review items addressed docs: add Harness Toolkit generation, file impact matrix, v2 roadmap Section 9 + v2 roadmap docs: integrate /skill-builder for skill generation and improvement skill-builder integration docs: add \u0026lsquo;Curate Don\u0026rsquo;t Reinvent\u0026rsquo; principle across all toolkit areas External delegation principle docs: add 4 implementation plans for HarnessKit v0.1.0 Plans 1-4 written feat: initialize plugin skeleton with manifest and directory structure plugin.json + directory structure feat: add beginner/intermediate/advanced preset definitions 3-level preset JSON feat: add repo detection script with test suite detect-repo.sh + 11 tests feat: add /harnesskit:setup skill with detection, preset selection, and reset mode setup skill feat: add orchestrator agent for multi-step flow coordination orchestrator agent test: add integration test for setup flow components 19 setup flow tests feat: add CLAUDE.md templates (base + nextjs + fastapi + react-vite + django + generic) 6 CLAUDE.md templates feat: add .claudeignore and feature_list starter templates .claudeignore + starter.json feat: add skill seed templates for /skill-builder generation 8 seed templates feat: add agent templates (planner, reviewer, researcher, debugger) 4 agent templates feat: add dev hooks (auto-lint, auto-typecheck, pre-commit-test) PostToolUse/PreToolUse dev hooks feat: add dev command skills (test, lint, typecheck, dev) and update manifest 4 dev command skills feat: add init skill — orchestrates all harness + toolkit generation init.md test: add template validation tests for init 34 template validation tests test: add fixtures for hooks testing mock JSON/JSONL fixtures feat: add guardrails hook with preset-aware blocking rules guardrails.sh + 7 tests feat: add session-start hook with preset-aware briefing session-start.sh + 3 tests feat: add session-end hook with log saving, failure tracking, and nudge detection session-end.sh + 5 tests test: add hooks integration test — full session lifecycle 6 integration tests feat: add /harnesskit:status skill — quick dashboard status.md feat: add /harnesskit:insights skill — analysis, report, and proposal generation insights.md feat: add /harnesskit:apply skill — proposal review and application apply.md feat: register all skills in plugin manifest plugin.json final update Insights The power of Subagent-Driven Development: I delegated all four Plans to subagents as task units and ran two-stage validation — spec compliance review plus code quality review. The ability to ship a v0.1.0 with 85 passing tests in a single day was fundamentally made possible by this approach.\nEvolution of the \u0026ldquo;Marketplace First\u0026rdquo; principle: The initial design had me writing seed templates and customizing with /skill-builder. User feedback — \u0026ldquo;why build this when good plugins already exist?\u0026rdquo; — prompted a pivot. Removing all skill/agent templates in favor of marketplace-first exploration meant deleting a significant amount of code, but dramatically reduced ongoing maintenance burden.\nThe 0-token shell hook design: guardrails, session-start, and session-end all run on pure bash + jq — no Claude API calls. This minimizes per-session token consumption while automating dangerous action blocking, briefings, and log collection.\nA data pipeline for v2: The real value of v1 is less in the features themselves than in the data accumulation structure. As session-logs, failures.json, and insights-history.json build up, v2 can offer automatic agent generation, automatic skill generation, and automatic preset adjustment. v1 is the foundation for v2.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-harnesskit-dev1/cover-en.jpg","permalink":"/posts/2026-03-20-harnesskit-dev1/","title":"HarnessKit Dev Log #1 — Designing and Building an Adaptive Harness Plugin for Zero-Based Vibe Coders"},{"content":"Overview Previous post: #1 — Designing and Building an Adaptive Harness Plugin for Zero-Based Vibe Coders\nRight after v1 passed all 85 tests, a fundamental question came up — \u0026ldquo;Do we really need custom templates when the marketplace already has proven plugins?\u0026rdquo; That question set the direction for a 26-hour marathon session.\ngraph TD A[\"v1 Complete\"] --\u003e|\"Paradigm Shift\"| B[\"Marketplace-First\"] B --\u003e C[\"v2a Design\"] B --\u003e D[\"v2b Design\"] C --\u003e E[\"Intelligent Harness Evolution\"] D --\u003e F[\"Extended Features\"] E --\u003e G[\"Enhanced Session Data Collection\"] E --\u003e H[\"Deep Pattern Analysis\"] E --\u003e I[\"Auto-Generated Proposals\"] F --\u003e J[\"PRD to Issues\"] F --\u003e K[\"Worktree Isolation\"] F --\u003e L[\"Bible Template\"] G --\u003e M[\"v0.2.0 Release\"] H --\u003e M I --\u003e M J --\u003e M K --\u003e M L --\u003e M The Marketplace-First Pivot Background v1 had over 12 custom skill/agent templates — 8 skill templates (nextjs, python-fastapi, common, generic, etc.) and 4 agent templates (planner, reviewer, researcher, debugger). These carried significant maintenance overhead, and the Claude Code plugin marketplace already had proven alternatives.\nWhat Changed One core commit tells the whole story: delete all custom skill/agent templates. Instead, HarnessKit pivoted to curating marketplace plugins, and only generating custom skills via /skill-builder when insights data justifies it.\n\u0026ldquo;Curate, Don\u0026rsquo;t Reinvent\u0026rdquo; — stop reinventing the wheel; curate what\u0026rsquo;s already proven.\nThe init, apply, and insights skills were all rewritten around this principle.\nv2a: Intelligent Harness Evolution Background v1\u0026rsquo;s observation system was limited to basic session log collection. v2a aimed for an intelligent evolution system that analyzes collected data for patterns and automatically proposes improvements.\nThree key decisions emerged from brainstorming:\nIncremental complexity — look at insights data to judge when to evolve Diff-based proposals — surface changes as diffs; user approves before applying Minimal commands — \u0026ldquo;more commands don\u0026rsquo;t mean better usability\u0026rdquo; Implementation The v2a spec defined five core capabilities:\nEnhanced session data collection — tool call sequences, time distributions, plugin usage patterns Deep pattern analysis — time-sink detection, repeated behavior identification, coverage gap analysis Auto-generated proposals — suggest agent, skill, and hook creation based on usage patterns Review internalization pipeline — marketplace plugin → custom replacement when data justifies it A/B testing integration — skill quality comparison tied to /skill-builder # Example: session-end data extraction in v2a (base.md logging protocol) # Automatically records tool call sequences, time distributions, plugin usage Implementation was carried out via subagent-driven development, split into 7 tasks.\nv2b: Extended Harness Features PRD to GitHub Issues (/harnesskit:prd) This skill takes a PRD document, decomposes it into GitHub issues, and syncs them to feature_list.json. It helps vibe coders manage requirements systematically.\nWorktree Isolation (/harnesskit:worktree) A harness-aware git worktree management skill. It provides isolated environments for parallel development by leveraging Claude Code\u0026rsquo;s built-in worktree support rather than building from scratch — a direct extension of the marketplace-first principle.\nBible Template — An Interesting Design Evolution The Bible is a curated template encoding harness engineering principles. It was initially designed to let users freely extend it, but an important concern was raised during the session:\n\u0026ldquo;If users can add to it freely, won\u0026rsquo;t inconsistent guidelines degrade plugin quality?\u0026rdquo;\nThis feedback led to the Bible being redesigned as a constant, curator-only template — only plugin maintainers can update it. A deliberate constraint to prevent quality degradation.\nPlugin Format Restructuring The transition to the official Claude Code plugin format happened in two rounds:\nRound 1: harnesskit/ nested directory → skills/SKILL.md flat structure Round 2: skills/setup.md → skills/setup/SKILL.md directory-based structure (official convention) This was a large-scale refactoring that touched over 26 files.\nProductization The final step was turning HarnessKit into a shippable product:\nProduction-grade README and MIT license Privacy Policy: \u0026ldquo;No external data collection\u0026rdquo; — all data stored locally in .harnesskit/ Version bump to 0.2.0, all v2b skills registered Enhanced monorepo detection: detect-repo.sh now scans backend/frontend subdirectories Commit Log Message Change refactor: marketplace-first approach — remove skill/agent templates Mass deletion + rewrite docs: add HarnessKit v2a design spec v2a design doc docs: add v2a implementation plan Implementation plan feat(v2a): add tool usage and plugin logging protocol base.md logging test(v2a): add session data fixtures Test fixtures feat(v2a): add tool call sequence, time distribution extraction Data extraction feat(v2a): add v2a config schema initialization Config schema feat(v2a): add v1→v2a migration path Migration path feat(v2a): add review internalization, custom toolkit to status Status dashboard feat(v2a): add agent/hook/review proposals to apply Apply execution path feat(v2a): add time-sink, repeated actions, coverage gap analysis Deep analysis docs: add HarnessKit v2b design spec v2b design doc docs: redesign bible as constant curated template Bible redesign feat(v2b): add curated bible template Bible implementation feat(v2b): add /harnesskit:prd skill PRD skill feat(v2b): add /harnesskit:worktree skill Worktree skill feat(v2b): add A/B eval comparison to apply Skill comparison eval feat(v2b): register prd + worktree skills, bump to 0.2.0 Version bump docs: add production README, LICENSE, .gitignore Productization refactor: restructure to official Claude Code plugin format Round 1 restructure docs: add privacy policy Privacy policy refactor: restructure skills/agents to official plugin format Round 2 restructure feat: enhance detect-repo.sh for monorepos Monorepo detection Takeaways The most striking thing about this 26-hour session was adopting the \u0026ldquo;Curate, Don\u0026rsquo;t Reinvent\u0026rdquo; principle. Boldly deleting over 12 carefully crafted templates from v1 and pivoting to a marketplace-first approach was a significant shift — technically and philosophically. The Bible template\u0026rsquo;s redesign is another interesting case: moving from \u0026ldquo;give users freedom\u0026rdquo; to \u0026ldquo;deliberately constrain for quality\u0026rdquo; is an important lesson about plugin ecosystem maturity. The core of v2a/v2b comes down to data-driven judgment — create custom skills only when insights justify it, and use proven marketplace plugins until then.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-harnesskit-dev2/cover-en.jpg","permalink":"/posts/2026-03-20-harnesskit-dev2/","title":"HarnessKit Dev Log #2 — Going Marketplace-First and Building v2a/v2b"},{"content":"Overview Following the Google OAuth login wall implementation in the previous post, this session focused on three things: visual feedback for the auto-injection system (tone/angle), per-user data isolation, and image generation parallelization. The UI received major improvements so users can intuitively see the comparison generation results (tone+angle vs tone-only) defined in PRD v3.\nPrevious post: #1 — Hybrid Image Search Dev Log — Implementing Google OAuth Login Wall\nAuto-Injection Reference Visualization Background One of PRD v3\u0026rsquo;s core features is automatic tone/angle reference injection. But users had no way of seeing which images were actually being applied as tone or angle references. Looking at generated images and wondering \u0026ldquo;why did it come out with this color palette?\u0026rdquo; with no way to check was a real problem.\nImplementation We built an injection info pipeline from backend to frontend.\nflowchart LR A[\"GenerationLog DB\"] --\u003e|injected_tone_filename| B[\"service.py\"] B --\u003e|InjectionInfo| C[\"main.py API\"] C --\u003e|JSON response| D[\"api.ts\"] D --\u003e|injection_info| E[\"App.tsx\"] E --\u003e F[\"Badge display\"] E --\u003e G[\"Detail card\"]Backend — Added an InjectionInfo model to schemas.py and updated service.py to read injected_tone_filename and injected_angle_filename fields from the DB and convert them to a structured response:\ndef _build_injection_info_from_row(row: dict) -\u0026gt; InjectionInfo | None: tone_fn = row.get(\u0026#34;injected_tone_filename\u0026#34;) angle_fn = row.get(\u0026#34;injected_angle_filename\u0026#34;) reason = row.get(\u0026#34;injection_reason\u0026#34;) if not tone_fn and not angle_fn: return None return InjectionInfo( tone=InjectedReference(filename=tone_fn, score=0.0) if tone_fn else None, angle=InjectedReference(filename=angle_fn, score=0.0) if angle_fn else None, reason=reason, ) Frontend — Two visual elements were added:\nThumbnail badges: Tone and Angle tags displayed in the top-left of image cards in amber/blue colors Detail modal card: GeneratedImageDetail.tsx shows the actual injected reference images as thumbnails, with the injection reason as text Debugging — References Not Showing Up After the initial implementation, an actual generation run showed no tone/angle indicators at all. A screenshot confirmed injection_info was coming back as null. The cause was a field name mismatch between the DB column names and the actual row keys in _build_injection_info_from_row. Fixing the mapping resolved it.\nAdditionally, the reference image selection logic had a bug where the ImageCategories struct wasn\u0026rsquo;t loading properly. Fixed by parsing the categories field when loading images.json:\ncategories = ImageCategories(**img[\u0026#34;categories\u0026#34;]) if \u0026#34;categories\u0026#34; in img else ImageCategories() doc = ImageDocument( id=img[\u0026#34;id\u0026#34;], filename=img[\u0026#34;filename\u0026#34;], labels=labels, categories=categories, ) Comparison Image Hover Overlay To compare the tone+angle version against the tone-only version, we added a hover overlay that shows the comparison image on the same card. A side-by-side card display was considered, but hover switching on the same card was chosen for better usability.\nDuring implementation, the Tone badge was shifting position on hover. Fixed by using CSS position: absolute, and text size was increased for readability.\nSearch Results Horizontal Scroll Background The search results popup opened by the \u0026ldquo;Find References\u0026rdquo; button used a grid grid-cols-6 vertical grid layout. With many images, scrolling became long and comparison was difficult.\nImplementation All three grids in the popup (by component, combined results, view all) were replaced with a single horizontal row with left/right arrows.\nA reusable ScrollableRow component was created:\nconst ScrollableRow: React.FC\u0026lt;{ children: React.ReactNode }\u0026gt; = ({ children }) =\u0026gt; { const scrollRef = useRef\u0026lt;HTMLDivElement\u0026gt;(null); const [canScrollLeft, setCanScrollLeft] = useState(false); const [canScrollRight, setCanScrollRight] = useState(true); const scroll = (direction: \u0026#39;left\u0026#39; | \u0026#39;right\u0026#39;) =\u0026gt; { const el = scrollRef.current; if (!el) return; const scrollAmount = 540; // ~3 cards el.scrollBy({ left: direction === \u0026#39;left\u0026#39; ? -scrollAmount : scrollAmount, behavior: \u0026#39;smooth\u0026#39; }); }; return ( \u0026lt;div className=\u0026#34;relative group/scroll\u0026#34;\u0026gt; {canScrollLeft \u0026amp;\u0026amp; ( \u0026lt;button onClick={() =\u0026gt; scroll(\u0026#39;left\u0026#39;)} className=\u0026#34;absolute left-0 top-0 bottom-0 z-10 w-10 ...\u0026#34;\u0026gt; \u0026lt;ChevronLeft size={20} /\u0026gt; \u0026lt;/button\u0026gt; )} \u0026lt;div ref={scrollRef} onScroll={updateScrollState} className=\u0026#34;flex gap-2.5 overflow-x-auto custom-scrollbar-hidden\u0026#34;\u0026gt; {children} \u0026lt;/div\u0026gt; {canScrollRight \u0026amp;\u0026amp; ( \u0026lt;button onClick={() =\u0026gt; scroll(\u0026#39;right\u0026#39;)} className=\u0026#34;...\u0026#34;\u0026gt; \u0026lt;ChevronRight size={20} /\u0026gt; \u0026lt;/button\u0026gt; )} \u0026lt;/div\u0026gt; ); }; All existing grid grid-cols-6 gap-2.5 layouts were replaced with \u0026lt;ScrollableRow\u0026gt;, and each image card got flex-shrink-0 w-[200px] for a fixed width. Initially 160px, but 200px proved better for the horizontal layout.\nPer-User Data Isolation Background In a multi-user environment, generation history was being fetched without a user_id filter. This was a security issue where other users\u0026rsquo; generated images could appear in someone\u0026rsquo;s history.\nImplementation Rather than just limiting what\u0026rsquo;s displayed, we implemented true isolation at the backend level:\nflowchart TD A[\"GET /api/history/generations\"] --\u003e B{\"user_id filter\"} B --\u003e C[\"Return only own generation history\"] D[\"GET /images/filename\"] --\u003e E{\"check_file_ownership\"} E --\u003e|\"Own file\"| F[\"Return image\"] E --\u003e|\"Shared reference\"| F E --\u003e|\"Another user's file\"| G[\"403 Forbidden\"] get_generation_history(user_id=...) — Added user_id filter to query check_file_ownership(filename, user_id) — Verifies ownership of generated/uploaded files. Reference images (image_ref_* directories) are shared assets and are allowed; gen_*/upload_* files are owner-only /images/{filename} endpoint — Added auth dependency and ownership check async def check_file_ownership(filename: str, user_id: int) -\u0026gt; bool: \u0026#34;\u0026#34;\u0026#34;Check if a generated or uploaded file belongs to the given user. Returns True if the file is not found in any table (legacy/orphan data). \u0026#34;\u0026#34;\u0026#34; Async Parallel Generation Background PRD 2.4 specified running comparison generation (tone+angle vs tone-only) in parallel with Promise.all, but the backend was actually using sequential await calls. For 4-image generation, this doubled the wait time.\nImplementation Parallelized Gemini API calls using asyncio.gather and asyncio.Semaphore:\nimport asyncio # Limit concurrent Gemini API calls _gemini_semaphore = asyncio.Semaphore(4) Refactored the _generate_batch function that previously used sequential for-loops, so that in comparison mode both batches run concurrently via asyncio.gather. The Semaphore limits concurrent calls to prevent API rate limit issues.\nDB Management Convenience — make db-clean Frequently resetting data during development meant manually typing sqlite3 commands every time. Added a db-clean Makefile target:\ndb-clean: @sqlite3 data/logs.db \u0026#34;DELETE FROM search_logs; DELETE FROM image_selections; DELETE FROM generation_logs; DELETE FROM manual_uploads;\u0026#34; @echo \u0026#34;Cleared: search_logs, image_selections, generation_logs, manual_uploads\u0026#34; This preserves the schema and alembic_version, images, users tables while clearing log data.\nCommit Log Message Change perf: parallelize image generation with async Gemini API backend/src/main.py data: update images.json with refreshed labels and metadata data/images.json feat: comparison hover overlay, injection badges, and scrollable search results App.tsx, GeneratedImageDetail.tsx, SearchResultsPopup.tsx feat: add comparison images and injection info to generation history API schemas.py, api.ts chore: add db-clean Makefile target for clearing log tables Makefile chore: remove stale docs and skill file, update gitignore .gitignore + 4 files fix: isolate user data — filter history by user_id and enforce image ownership database/__init__.py, service.py, main.py docs: update README with auto-injection system README.md + 2 files Takeaways Visual feedback is also a debugging tool — while wiring up the tone/angle injection display, we discovered several bugs in the actual injection logic (categories not loading, field name mismatches). Making things visible makes bugs visible too. Data isolation from the start — adding multi-user support after the fact means hunting through every existing query. user_id filters belong in the table design phase. Semaphore for controlled parallelism — asyncio.gather alone can hit rate limits. Pair it with something like Semaphore(4) for stable behavior. Horizontal scroll UX — for image search results, horizontal scrolling is more intuitive than a vertical grid. Showing category results one row at a time makes comparison easier. Arrow buttons that appear on hover, combined with a hidden scrollbar, is a good pattern for maintaining usability. ","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-hybrid-search-dev2/cover-en.jpg","permalink":"/posts/2026-03-20-hybrid-search-dev2/","title":"Hybrid Image Search Dev Log #2 — Auto-Injection Visualization, User Isolation, and Async Parallel Generation"},{"content":"Overview Previous post: #2\nThis session was about moving from \u0026ldquo;make it work\u0026rdquo; to \u0026ldquo;make it right.\u0026rdquo; After converting the Gemini API to async and adding concurrent generation support, the structural debt exposed by the performance work made it clear: the 1,516-line main.py needed to be broken apart. We planned an 11-task, 4-phase decomposition and got started.\ngraph TD A[\"Performance Optimization\"] --\u003e B[\"Async Parallelization\"] A --\u003e C[\"Concurrent Generation\"] B --\u003e D[\"Structural Debt Exposed\"] C --\u003e D D --\u003e E[\"Code Review\"] E --\u003e F[\"11-task 4-phase Decomposition Plan\"] F --\u003e G[\"Phase 1: generation module\"] F --\u003e H[\"Phase 2: app_utils\"] F --\u003e I[\"Phase 3: routes split\"] F --\u003e J[\"Phase 4: main.py cleanup\"] G --\u003e K[\"main.py 1516 lines → ~900 lines\"] H --\u003e K I --\u003e K Async Parallelization Background The existing image generation pipeline was calling synchronous client.models.generate_content() inside async def — blocking the entire event loop. The google-genai SDK v1.62.0 already had an async API (client.aio.models.generate_content()), but it wasn\u0026rsquo;t being used.\nImplementation: Two-Level Parallelization # Level 1: Within-batch parallelization — generate individual images concurrently async def _generate_single_image(...): async with semaphore: # Semaphore(4) — respects API rate limits return await client.aio.models.generate_content(...) results = await asyncio.gather( *[_generate_single_image(item) for item in batch] ) # Level 2: Cross-batch parallelization — primary + comparison run concurrently primary, comparison = await asyncio.gather( generate_batch(primary_items), generate_batch(comparison_items) ) For 4-image generation with comparison mode, what used to require 8 sequential calls now runs in parallel within the Semaphore(4) limit, resulting in a significant perceived speedup.\nFrontend: Concurrent Generation Support Even after the async conversion, the UI was still locking the button during generation. Replaced generating: boolean with generatingCount: number to allow multiple generation requests to run simultaneously.\n// Before: boolean lock — only one at a time const [generating, setGenerating] = useState(false); // After: counter — allows concurrent generation const [generatingCount, setGeneratingCount] = useState(0); // Button disabled only when prompt is empty // Spinner: \u0026#34;Generating 2 images...\u0026#34; Generation Quality Improvements Structured Prompts Added structured section headers (### Core Generation Subject ###, dividers, etc.) to the prompts sent to Gemini for clearer instruction delivery. Added a full prompt preview to the detail view so users can see exactly what prompt was sent.\nReference Image Randomization Previously, tone/angle reference selection always picked the single highest-scoring image — a deterministic structure that produced identical results for the same query.\ngraph LR A[\"Reference image candidates\"] --\u003e B[\"Similarity score calculation\"] B --\u003e C[\"Filter top 20% (min 1 image)\"] C --\u003e D[\"random.choice\"] D --\u003e E[\"Include in prompt\"]Changed to random.choice from the top 20% pool. Applied to both search-based and fallback paths, for both tone and angle references. A small change with a significant impact on generation diversity.\nStructural Refactoring: Decomposing main.py Code Review Findings After requesting a code review post-async-addition, main.py\u0026rsquo;s problems became clear:\n1,516 lines with 7 responsibilities: app bootstrap, auth, image serving, search, generation injection, Gemini service, generation orchestration No APIRouter usage — all routes registered directly with @app.get/@app.post Global mutable state — images_data, hybrid_pipeline etc. as module-level variables 145-line _generate_single_image function Decomposition Plan An 11-task, 4-phase decomposition plan was established:\nPhase Target Result 1 generation/injection.py, prompt.py, service.py Core generation logic separated 2 app_utils.py Shared utilities 3 routes/auth.py, meta.py, images.py, search.py, history.py, generation.py APIRouter-based route separation 4 Final main.py cleanup ~100 lines target Execution and Technical Decisions Carried out via subagent-driven development — each task delegated to a separate subagent with a 2-stage review (spec compliance + code quality).\nKey decisions made during refactoring:\nGlobal variables → explicit parameters: Functions that read images_data, hybrid_pipeline, etc. now receive them as explicit parameters Circular import prevention: Route modules only access main.py globals inside function bodies (not at module scope) _gemini_semaphore: Moved to generation/service.py, removed from main.py Bug found: get_image_file_legacy missing auth dependency — logged but intentionally left for behavior-preserving refactor Results By session end: Phase 1 complete, Phase 2 complete, Phase 3 in progress (routes/auth.py, routes/meta.py extracted). main.py reduced from 1,516 lines to approximately 900, with remaining route extractions still pending.\nCommit Log Message Change feat: allow concurrent image generations by removing button lock boolean → counter, concurrent generation UI feat: add structured prompt headers and full prompt preview Prompt quality + debugging feat: randomize tone/angle ref selection from top 20% candidates Generation diversity refactor: extract generation/injection.py from main.py Phase 1 — injection separation refactor: extract generation/prompt.py from main.py Phase 1 — prompt separation refactor: extract generation/service.py from main.py Phase 1 — Gemini service separation refactor: extract app_utils.py with shared utilities Phase 2 — utilities separation refactor: extract routes/auth.py with APIRouter Phase 3 — auth route separation Takeaways This session illustrates a classic pattern: performance optimization triggering structural refactoring. Adding async parallelization pushed main.py\u0026rsquo;s complexity past a threshold, and the code review gave the systematic decomposition its opening. The most important principle throughout was behavior preservation — intentionally maintaining existing bugs while changing only the structure. The reference image randomization was nearly a one-liner change, but it demonstrates an important point for generative AI pipelines: \u0026ldquo;probabilistic diversity\u0026rdquo; contributes more to user experience than \u0026ldquo;deterministic optimal.\u0026rdquo;\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-hybrid-search-dev3/cover-en.jpg","permalink":"/posts/2026-03-20-hybrid-search-dev3/","title":"Hybrid Image Search Dev Log #3 — Async Parallelization, Prompt Quality, and Structural Refactoring"},{"content":"Overview Previous post: #1 — Sessions Command and Dev Log Automation\nWhile running the log-blog skill, a fundamental issue surfaced: only browser history was being extracted, and Claude Code sessions with commit-based dev logs weren\u0026rsquo;t included in the initial list. To fix this, we merged both flows and added a --since-last-run flag to automatically manage the time range.\ngraph TD A[\"Problem Found: Sessions/Commits Missing\"] --\u003e B[\"Write Design Spec\"] B --\u003e C[\"Incorporate Review Feedback\"] C --\u003e D[\"Create Implementation Plan\"] D --\u003e E[\"run_state module\"] D --\u003e F[\"extract --since-last-run\"] D --\u003e G[\"sessions --since-last-run\"] D --\u003e H[\"SKILL.md Integration\"] E --\u003e I[\"Integration Complete\"] F --\u003e I G --\u003e I H --\u003e I I --\u003e J[\"fix: save_last_run in JSON path\"] The Problem Running the log-blog skill had Step 1 running extract --json to pull browser history only. Claude Code sessions and git commit-based dev logs only got listed if explicitly requested.\nUser feedback was direct:\n\u0026ldquo;There\u0026rsquo;s no way I didn\u0026rsquo;t commit anything — is this a bug?\u0026rdquo; \u0026ldquo;Why isn\u0026rsquo;t it creating posts from sessions and commits?\u0026rdquo;\nSince both workflows (browser history + dev logs) would be run daily, an integrated flow that surfaces both from the start was clearly needed.\nDesign: Unified Skill Flow Core Decision After considering three approaches in brainstorming, we went with running both simultaneously every time, even if it takes longer:\nStep 1: Run extract and sessions --list concurrently Step 3: Present both browser-based items and dev log candidates together After user approval: browser items proceed via fetch, dev logs via sessions --project \u0026ndash;since-last-run Tracking The problem with the --hours 24 default: run it every other day and you miss a day; run it twice in a day and you get duplicates.\nSolution: automatic time range calculation based on last-run timestamps\ngraph LR A[\"Run starts\"] --\u003e B{\"last-run file exists?\"} B --\u003e|Yes| C[\"hours = (now - last_run).hours\"] B --\u003e|No| D[\"hours = 24 (default)\"] C --\u003e E[\"Run extract/sessions\"] D --\u003e E E --\u003e F[\"Save last-run timestamp\"] Implementation run_state Module Added run_state.py to manage last-run timestamps:\n# Load/save the last run time def load_last_run() -\u0026gt; Optional[datetime]: ... def save_last_run(timestamp: datetime) -\u0026gt; None: ... def hours_since_last_run() -\u0026gt; Optional[float]: ... Timestamps are stored in ISO 8601 format in a .log-blog-last-run file at the project root.\n\u0026ndash;since-last-run Flag for extract/sessions The --since-last-run flag was added to both extract and sessions commands. When set:\nCalculate elapsed time since last run via hours_since_last_run() Use that time as the --hours value Fall back to 24 hours if no last-run file exists Call save_last_run() after execution completes SKILL.md Integration Updated the skill document so Step 1 runs both commands simultaneously:\n# Step 1: Run concurrently uv run log-blog extract --json --since-last-run uv run log-blog sessions --list --since-last-run Also improved the Step 3 user review screen so dev log candidates are automatically included.\nBug Fix: save_last_run in JSON Output Path The final commit fixed a bug where save_last_run wasn\u0026rsquo;t being called when using the --json flag. The timestamp now gets saved after execution completes in the JSON output path as well.\nCommit Log Message Change docs: add unified skill flow and session data bug fix design spec Design spec docs: address spec review feedback Review feedback incorporated docs: add last-run tracking feature to unified skill flow spec last-run tracking spec docs: add implementation plan Implementation plan feat: add run_state module for last-run timestamp tracking run_state module feat: add \u0026ndash;since-last-run flag to extract command extract flag feat: add \u0026ndash;since-last-run flag to sessions command sessions flag feat: unify browser history and dev log flows in SKILL.md Skill integration fix: save_last_run in JSON output path of extract command JSON path bug fix Takeaways The trigger for this improvement was \u0026ldquo;frustration felt while actually using the tool.\u0026rdquo; Dogfooding — developers using their own tools — continues to prove its value. The --since-last-run flag is technically simple (store/load a timestamp), but its impact on user experience is significant: it completely eliminates the judgment call of \u0026ldquo;how many hours should I specify?\u0026rdquo; The structured design → review → implement workflow playing out systematically across 9 commits also shows how much the log-blog project itself has matured.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-log-blog-dev2/cover-en.jpg","permalink":"/posts/2026-03-20-log-blog-dev2/","title":"Log-Blog Dev Log #2 — Unified Skill Flow and --since-last-run Tracking"},{"content":"Overview The era of AI coding agents generating code has already arrived. But one fundamental question remains — how does the agent itself improve? Most AI coding tools today start from a blank slate every session. Whatever was learned in previous work doesn\u0026rsquo;t carry over.\nMEGA Code takes this problem head-on. It\u0026rsquo;s an ambitious project that automatically extracts Skills (reusable know-how) and Strategies (decision-making guides) from session logs, building infrastructure where AI coding agents accumulate experience and evolve on their own. According to their benchmarks, it reduces token usage to 1/5 while tripling structural quality.\nThis post digs into MEGA Code\u0026rsquo;s core concepts, 3-Layer architecture, analysis of their benchmark claims, and comparisons with other meta-learning approaches.\nCore Concepts: Skills vs Strategies MEGA Code\u0026rsquo;s self-evolution mechanism is built on two key concepts. They look similar at a glance, but their roles and extraction methods are fundamentally different.\nSkills — Reusable Know-How A Skill is concrete procedural knowledge for performing a specific task. It answers \u0026ldquo;How to do it.\u0026rdquo;\nExamples:\nWriting React component tests: The pattern sequence of mounting components with Jest + React Testing Library, simulating user events, and writing assertions Standardizing API error handling: try-catch block structure, branching by error type, message format to expose to users Generating DB migration scripts: The procedure for detecting schema changes and creating rollback-capable migration files Skills are extracted from diffs. When the agent\u0026rsquo;s code modification history (before → after) shows a pattern that can be applied repeatedly, it gets registered as a Skill.\nStrategies — Decision-Making Guides A Strategy is a set of criteria for making situation-dependent judgments. It answers \u0026ldquo;What to choose.\u0026rdquo;\nExamples:\nChoosing a state management tool: React Context for under 10 components, Zustand for complex global state, TanStack Query when server state is primary Deciding test strategy: Unit tests for utility functions, integration tests for API interactions, E2E tests for core user flows Prioritizing refactoring: Start with frequently-changed files, start with modules with fewer dependencies Strategies are extracted from repeated editing patterns. When the agent consistently makes the same choice in similar situations, those decision criteria get abstracted into a Strategy.\ngraph LR A[\"Session Logs\"] --\u003e B[\"Diff Analysis\"] A --\u003e C[\"Pattern Detection\"] B --\u003e D[\"Skills \u0026lt;br/\u0026gt; (Procedural know-how)\"] C --\u003e E[\"Strategies \u0026lt;br/\u0026gt; (Decision guides)\"] D --\u003e F[\"Skill Registry\"] E --\u003e G[\"Strategy Registry\"] F --\u003e H[\"Auto-applied \u0026lt;br/\u0026gt; in next session\"] G --\u003e HThe Diff-to-Skill Pipeline MEGA Code\u0026rsquo;s core engine is the pipeline that converts session log diffs into Skills. Rather than simply storing code change history, it\u0026rsquo;s a process of elevating them into abstracted, reusable knowledge.\nHow the Pipeline Works Diff collection: Every code modification by the agent records a before/after diff Pattern clustering: Similar diffs are grouped together. For example, if \u0026ldquo;adding error handling after an API call\u0026rdquo; appears 3+ times, it becomes one cluster Abstraction: Specific variable names and function names are removed, leaving only the essence of the pattern. fetchUser → fetchEntity, UserError → EntityError — generalizing like this Skill creation: The abstracted pattern is given a name, description, application conditions, and code template, then registered as a Skill Validation: A feedback loop validates whether the generated Skill is actually useful in new sessions An interesting aspect of this process is the existence of a quantitative threshold. Patterns that appear only once are ignored; only repeatedly occurring patterns get promoted to Skills. This reduces noise and ensures only genuinely reusable knowledge accumulates.\nStrategy Extraction Mechanism Strategy extraction operates at a higher level. Rather than analyzing diffs themselves, it analyzes the agent\u0026rsquo;s choice patterns.\nFor example, when the agent writes state management code:\nSession A: Small app → chose Context API Session B: Complex app → chose Zustand Session C: Server-state-heavy → chose TanStack Query As this choice history accumulates, a Strategy is auto-generated: \u0026ldquo;Choose state management tools differently based on app complexity and state characteristics.\u0026rdquo;\nThe 3-Layer Architecture MEGA Code proposes a 3-stage architecture that progressively increases complexity.\ngraph TB subgraph L1[\"Layer 1 — Current\"] A1[\"Auto Skills Generation\"] --\u003e A2[\"Auto Strategies Generation\"] A2 --\u003e A3[\"Eureka VS Code Extension\"] end subgraph L2[\"Layer 2 — Planned\"] B1[\"Wisdom Graph\"] --\u003e B2[\"Atomic-level \u0026lt;br/\u0026gt; Skill Decomposition\"] B2 --\u003e B3[\"Cross-project \u0026lt;br/\u0026gt; Knowledge Transfer\"] end subgraph L3[\"Layer 3 — Planned\"] C1[\"Offline Optimization\"] --\u003e C2[\"Compound Intelligence\"] C2 --\u003e C3[\"Multi-agent \u0026lt;br/\u0026gt; Collaboration\"] end L1 --\u003e L2 L2 --\u003e L3 style L1 fill:#2d5016,stroke:#4a8c28,color:#fff style L2 fill:#1a3a5c,stroke:#2980b9,color:#fff style L3 fill:#5c1a3a,stroke:#b92980,color:#fffLayer 1: Auto Skills \u0026amp; Strategies + Eureka (Current) The currently available stage. Skills and Strategies are automatically extracted from session logs and surfaced to developers through the VS Code extension Eureka.\nWhat Eureka does:\nBrowse extracted Skills/Strategies directly within VS Code Auto-recommend Skills matching the current work context Interface for manually editing Skills or registering new ones Separate Skills/Strategies management per project Eureka isn\u0026rsquo;t just a code snippet manager. Context-aware recommendations are the core. It analyzes the currently open file, cursor position, and recent edit history to proactively suggest relevant Skills.\nLayer 2: Wisdom Graph (Planned) The idea is to decompose Skills and Strategies down to atomic level. A composite Skill gets broken into smaller units, and the relationships between them are modeled as a graph.\nWhy atomic decomposition matters:\nLayer 1 Skills are relatively coarse-grained. \u0026ldquo;Writing React component tests\u0026rdquo; contains multiple substeps internally. The problem is that even when only a subset is needed, the entire Skill gets applied, consuming unnecessary tokens.\nThe Wisdom Graph solves this:\nMount component → Simulate events → Write assertions — each is an independent atomic Skill Selectively compose only what\u0026rsquo;s needed Cross-project knowledge transfer becomes possible This is similar to the Unix philosophy: \u0026ldquo;small programs that do one thing well, combined.\u0026rdquo;\nLayer 3: Offline Optimization + Compound Intelligence (Planned) The most ambitious stage. The agent optimizes existing Skills/Strategies in offline mode (outside of live sessions), and implements Compound Intelligence that integrates experience from multiple agents.\nWhen this stage is realized:\nKnow-how Agent A learned from frontend work gets applied to Agent B\u0026rsquo;s backend tasks Skills accumulated overnight are automatically organized, merged, and optimized Knowledge is shared in Multi-agent scenarios where multiple agents collaborate Benchmark Analysis The benchmark numbers MEGA Code published are impressive:\nMetric Baseline MEGA Code Improvement Token usage 897K 169K 81% reduction (approx. 1/5) Structural quality 1x 3x 3x improvement 81% Token Reduction What this number means:\nCost reduction: LLM API call costs drop to 1/5 Speed improvement: Fewer tokens to process means faster response times Context window efficiency: More of the limited context window allocated to genuinely useful information The mechanism for token reduction is clear. As Skills accumulate, the agent no longer needs to \u0026ldquo;think from scratch\u0026rdquo; each time — it applies proven patterns directly. Similar to few-shot prompting, but rather than reducing the prompt itself, it eliminates unnecessary exploration and trial-and-error.\n3x Structural Quality The fact that the exact measurement criteria for \u0026ldquo;structural quality\u0026rdquo; aren\u0026rsquo;t disclosed warrants caution. Possible measurement approaches include:\nCode structure consistency (naming conventions, file structure, etc.) Architecture pattern adherence Test coverage Code review pass rates More accurate evaluation will be possible when additional details about benchmark conditions (which projects, which tasks, comparison baseline models, etc.) are published.\nComparison with Other Meta-Learning Approaches MEGA Code isn\u0026rsquo;t the only project tackling \u0026ldquo;AI agent self-improvement.\u0026rdquo; Let\u0026rsquo;s compare with similar directions.\nHarnessKit\u0026rsquo;s Observe-Improve Loop HarnessKit builds a loop that observes agent behavior and improves the process based on results.\nIn common: Analyzes session history to improve agents Different: HarnessKit focuses on process-level improvement; MEGA Code focuses on knowledge (Skills/Strategies) level improvement. If HarnessKit optimizes \u0026ldquo;what order to work in for efficiency,\u0026rdquo; MEGA Code optimizes \u0026ldquo;what code patterns to apply.\u0026rdquo; Superpowers\u0026rsquo; Memory System Superpowers gives agents long-term memory.\nIn common: Knowledge persistence across sessions Different: Superpowers\u0026rsquo; memory is closer to relatively raw memory storage; MEGA Code\u0026rsquo;s Skills/Strategies are structured, abstracted knowledge. If memory is a \u0026ldquo;diary,\u0026rdquo; Skills are more like a \u0026ldquo;textbook.\u0026rdquo; Claude\u0026rsquo;s Memory/CLAUDE.md Anthropic\u0026rsquo;s Claude Code also maintains project context through CLAUDE.md and a memory system.\nIn common: Knowledge transfer across sessions Different: Claude\u0026rsquo;s memory is explicitly managed by the user and recorded in CLAUDE.md, while MEGA Code targets automatic extraction. MEGA Code is more ambitious in automation level, but extraction accuracy and noise management become the key challenge. Approach Knowledge Form Extraction Method Abstraction Level MEGA Code Skills + Strategies Automatic (diff analysis) High HarnessKit Process patterns Semi-automatic (observe loop) Medium Superpowers Raw memory Automatic (session recording) Low Claude Memory Structured notes Manual + semi-automatic Medium Critical Analysis Strengths Clear problem definition: Precisely identifies the problem — \u0026ldquo;agents don\u0026rsquo;t learn from experience\u0026rdquo; Skills/Strategies distinction: The framework cleanly separates procedural knowledge from decision-making knowledge Progressive architecture: The 3-Layer approach separates currently available value from future vision Impressive benchmarks: 1/5 token reduction translates directly to real cost savings Weaknesses and Open Questions Skill quality control: How do you verify that automatically extracted Skills are actually useful? If bad patterns get registered as Skills, code quality could actually decline Project dependency: Are Skills extracted from Project A valid in Project B? What are the limits of cross-project transfer in environments with different domain conventions? Skill conflicts: What happens when two Skills recommend conflicting patterns? Benchmark transparency: The measurement criteria and experimental conditions for the 3x structural quality improvement aren\u0026rsquo;t sufficiently disclosed Layer 2/3 feasibility: Wisdom Graph and Compound Intelligence are still conceptual. Layer 1\u0026rsquo;s success doesn\u0026rsquo;t guarantee success for Layer 2/3 Lock-in risk: If Skills/Strategies become tied to the MEGA Code platform, switching to other tools becomes difficult Hopes and Concerns The most exciting part is the Wisdom Graph. It has the potential to solve one of the biggest problems with current AI coding tools — \u0026ldquo;context-free code generation.\u0026rdquo; But whether atomic-level Skill decomposition is actually feasible, and whether those decomposed pieces can be meaningfully recombined, remains unproven.\nQuick Links MEGA Code official site — Product overview and access request Eureka VS Code Extension — Search in VS Code Marketplace MEGA Code Benchmark Report — Token reduction and quality improvement data Takeaways \u0026ldquo;Agents that learn from experience\u0026rdquo; is the next frontier of AI coding. Code generation capability is already becoming commoditized. Differentiation will come not from \u0026ldquo;generates better\u0026rdquo; but from \u0026ldquo;gets better with use.\u0026rdquo;\nThe Skills vs Strategies distinction reflects how human experts structure knowledge. Experienced developers accumulate \u0026ldquo;how to implement\u0026rdquo; (procedural knowledge) and \u0026ldquo;what to choose\u0026rdquo; (strategic judgment) separately. MEGA Code\u0026rsquo;s attempt to automate this structure is theoretically sound.\nToken efficiency is a quality issue beyond cost. When context windows are limited, reducing unnecessary tokens means allocating more space to genuinely important information. This isn\u0026rsquo;t just cost savings — it\u0026rsquo;s an improvement in the agent\u0026rsquo;s \u0026ldquo;attention.\u0026rdquo;\nAuto-extraction accuracy will be the key bottleneck. If wrong Skills get registered, the agent repeatedly applies wrong patterns. A meta-version of \u0026ldquo;garbage in, garbage out\u0026rdquo; can occur. The quality management mechanism for Skills will determine MEGA Code\u0026rsquo;s success or failure.\nCompetition is converging on \u0026ldquo;who completes the self-evolution loop first.\u0026rdquo; MEGA Code, HarnessKit, Superpowers — all pointing in the same direction. The ultimate winner will likely be not the fastest team, but the one that builds the most trustworthy self-evolution loop.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-mega-code/cover-en.jpg","permalink":"/posts/2026-03-20-mega-code/","title":"MEGA Code — AI Coding Infrastructure That Evolves from Session Logs"},{"content":"Overview oh-my-claudecode (OMC) is a Teams-first multi-agent orchestration framework that runs on top of Claude Code. With over 10,400 GitHub stars and rapid evolution to v4.9.0, it claims \u0026ldquo;Zero config, Zero learning curve.\u0026rdquo; The core idea is simple — rather than replacing Claude Code\u0026rsquo;s master agent, it layers 27 specialized agents and 28 skills via skill injection. This post digs into OMC\u0026rsquo;s architecture, Team Mode pipeline, orchestration mode comparisons, and when to actually use it.\nSkill Composition Architecture — The Layering Model What fundamentally differentiates OMC from other Claude Code extensions is layer composition rather than mode switching.\nThe traditional approach cuts context and switches modes — \u0026ldquo;switch to planning mode → switch to execution mode.\u0026rdquo; OMC uses Claude Code\u0026rsquo;s skill system to stack behaviors.\nThe skill composition formula:\n[Execution Skill] + [0-N Enhancement Skills] + [Optional Guarantee] Specifically:\nExecution Skill: The core skill that does actual work (e.g., team-exec, autopilot) Enhancement Skills: Skills that inject additional behavior (e.g., critic, researcher) Optional Guarantee: A quality assurance layer (e.g., team-verify, team-fix) The biggest advantage of this approach is that context is never severed. When transitioning from planning to execution, the context of previous conversation is fully preserved. Since skills inject behavior while Claude Code\u0026rsquo;s master agent remains active, there\u0026rsquo;s no context break.\ngraph TB Master[\"Claude Code\u0026lt;br/\u0026gt;Master Agent\"] subgraph Skills[\"Skill Layer Composition\"] direction TB Exec[\"Execution Skill\u0026lt;br/\u0026gt;team-exec / autopilot\"] Enhance[\"Enhancement Skills\u0026lt;br/\u0026gt;critic + researcher + ...\"] Guard[\"Guarantee Layer\u0026lt;br/\u0026gt;team-verify / team-fix\"] end subgraph Agents[\"27 Specialized Agents\"] direction LR A1[\"architect\"] A2[\"researcher\"] A3[\"designer\"] A4[\"writer\"] A5[\"critic\"] A6[\"planner\"] A7[\"qa-tester\"] A8[\"...\"] end Master --\u003e Skills Exec --\u003e Agents Enhance --\u003e Agents Guard --\u003e|\"Loop on failure\"| Exec style Master fill:#4A90D9,color:#fff style Skills fill:#f5f5f5 style Agents fill:#f0f8ffTeam Mode Pipeline Team Mode, which became canonical in v4.1.7, is OMC\u0026rsquo;s core orchestration mode. It\u0026rsquo;s a 5-stage pipeline.\ngraph LR Plan[\"team-plan\u0026lt;br/\u0026gt;Requirements analysis\"] PRD[\"team-prd\u0026lt;br/\u0026gt;Design document\"] Exec[\"team-exec\u0026lt;br/\u0026gt;Parallel implementation\"] Verify[\"team-verify\u0026lt;br/\u0026gt;Quality validation\"] Fix[\"team-fix\u0026lt;br/\u0026gt;Issue resolution\"] Plan --\u003e PRD --\u003e Exec --\u003e Verify Verify --\u003e|\"Pass\"| Done[\"Complete\"] Verify --\u003e|\"Fail\"| Fix Fix --\u003e|\"Loop\"| Verify style Plan fill:#E8F5E9 style PRD fill:#E3F2FD style Exec fill:#FFF3E0 style Verify fill:#F3E5F5 style Fix fill:#FFEBEE style Done fill:#C8E6C9Each stage in detail:\n1. team-plan — Requirements Analysis Receives the user\u0026rsquo;s request and has the architect and planner agents collaborate. Defines scope, identifies required files and modules, and builds a dependency graph.\n2. team-prd — Design Document Based on plan results, writer and designer agents generate a PRD (Product Requirements Document). This document is injected as context for subsequent stages.\n3. team-exec — Parallel Implementation Multiple agents implement in parallel according to the PRD. This is where tmux CLI workers can be utilized. Each worker runs as an independent Claude Code (or Codex, Gemini) process in a split pane.\n4. team-verify — Quality Validation qa-tester and critic agents validate the implementation. Runs tests, reviews code, and checks requirements fulfillment.\n5. team-fix — Fix Loop Addresses issues found during verification. After fixing, returns to team-verify — a loop structure. This loop is the core of OMC\u0026rsquo;s quality assurance mechanism.\nOrchestration Mode Comparison Beyond Team Mode, OMC offers several orchestration modes, each optimized for different situations.\nMode Magic Keyword Characteristics Best For Team Mode team 5-stage pipeline, parallel execution Large multi-file/multi-role tasks omc team (CLI) — Team Mode directly from CLI CI/CD integration, script automation ccg — Codex + Gemini + Claude triple model advisor Design decisions, architecture review Autopilot autopilot, ap Autonomous execution, minimal intervention Repetitive tasks, well-defined tasks Ultrawork ulw High-intensity focus mode Complex single-file refactoring Ralph ralph, ralplan Plan-centric, careful execution Planning phase, high-risk changes ccg — Tri-Model Advisor Particularly interesting is the /ccg skill. It gets Codex and Gemini perspectives inside Claude Code, with Claude synthesizing them. It leverages inter-model viewpoint differences for better decision-making.\ndeep-interview — Socratic Questions Before Coding Asks the user iterative questions before starting to code, clarifying requirements. Rather than \u0026ldquo;what will you build,\u0026rdquo; it first establishes \u0026ldquo;why are you building this\u0026rdquo; and \u0026ldquo;what constraints exist.\u0026rdquo;\ntmux CLI Workers and Multi-Model Support tmux CLI Workers, introduced in v4.4.0, dramatically extended OMC\u0026rsquo;s parallel execution capability.\ngraph TB OMC[\"OMC Orchestrator\"] subgraph tmux[\"tmux Session\"] direction LR P1[\"Pane 1\u0026lt;br/\u0026gt;Claude Code\"] P2[\"Pane 2\u0026lt;br/\u0026gt;Codex CLI\"] P3[\"Pane 3\u0026lt;br/\u0026gt;Gemini CLI\"] P4[\"Pane 4\u0026lt;br/\u0026gt;Claude Code\"] end OMC --\u003e|\"task dispatch\"| P1 OMC --\u003e|\"task dispatch\"| P2 OMC --\u003e|\"task dispatch\"| P3 OMC --\u003e|\"task dispatch\"| P4 P1 --\u003e|\"result\"| OMC P2 --\u003e|\"result\"| OMC P3 --\u003e|\"result\"| OMC P4 --\u003e|\"result\"| OMC style OMC fill:#4A90D9,color:#fff style tmux fill:#f5f5f5Key characteristics:\nReal process spawning: Independent claude, codex, gemini CLI processes run in each pane Multi-model routing: Routes to the appropriate model based on task characteristics. Claims 30-50% token cost savings through smart model routing Visual monitoring: tmux split-pane lets you see each worker\u0026rsquo;s progress in real time HUD Statusline: Shows currently active agents, progress stage, and token usage at a glance Magic Keyword System The most distinctive aspect of OMC\u0026rsquo;s user experience is the magic keyword system. Orchestration modes are activated with natural language keywords rather than complex commands.\nKeyword Action ralph Activate Ralph mode — plan-first approach ralplan Run only Ralph\u0026rsquo;s planning stage ulw Ultrawork mode — high-intensity focus plan Activate planning skill autopilot / ap Autopilot mode — autonomous execution These keywords can be used naturally in Claude Code prompts:\n\u0026#34;ralph, refactor the authentication system in this project\u0026#34; \u0026#34;ap add type annotations to all test files\u0026#34; 3-Tier Memory System Context loss during long sessions is a chronic problem with AI coding tools. OMC addresses this with a 3-tier memory system.\nTier Purpose Characteristics Priority Memory Top-priority context Always injected into prompts; project rules/constraints Working Memory Current task context Auto-updated during session; tracks progress state Manual Notes User-defined notes Manually managed; long-term persistence Priority Memory plays a similar role to CLAUDE.md, but is automatically managed by OMC and shared across agents. Working Memory auto-updates at each Team Mode stage so agents in later stages know the decisions made in earlier ones.\nInstallation and Quick Start Installation is done through Claude Code\u0026rsquo;s plugin system:\n# 1. Add plugin /plugin marketplace add https://github.com/Yeachan-Heo/oh-my-claudecode # 2. Install /plugin install oh-my-claudecode # 3. Initial setup /omc-setup Ready to use immediately after installation. True to the \u0026ldquo;Zero configuration\u0026rdquo; claim, you can start with magic keywords right after /omc-setup, no further configuration needed.\nThe npm package name is oh-my-claude-sisyphus, written in TypeScript (6.9M) and JavaScript (5.2M).\nTrade-offs — OMC vs Pure Claude Code OMC isn\u0026rsquo;t a silver bullet. More layers means more cost.\nWhen to Use OMC Multi-file/multi-role tasks: When you need to simultaneously change frontend + backend + tests Long sessions: Work exceeding 2 hours where context loss becomes an issue Planning-critical work: Architecture changes, large-scale refactoring — cases where you need to think first, then execute Simulating team workflows: Working solo but needing an architect → developer → reviewer flow When Pure Claude Code Is Better Simple tasks: Team Mode pipeline is overkill for modifying one function or fixing one bug Token cost sensitivity: The 5-stage pipeline consumes significantly more tokens than pure Claude Code Transparency matters: The more orchestration layers, the harder it becomes to trace \u0026ldquo;why was this decision made\u0026rdquo; Fast iteration needed: The plan → prd → exec → verify → fix loop takes time Core Trade-off Summary Item OMC Pure Claude Code Token cost High (multi-agent) Low Task time Longer but higher quality Faster but simpler Transparency Lower due to orchestration layers High Context retention Excellent with 3-Tier memory Basic level Multi-file work Powerful with parallel execution Sequential processing Guardrails Automatic verify/fix loop Manual review Critical Analysis OMC is an impressive project, but a few things deserve a sober look.\nQuestions about codebase size. TypeScript 6.9M + JavaScript 5.2M is large for a \u0026ldquo;Zero config\u0026rdquo; tool. Most of it is likely skill definitions and agent prompts, but a codebase of this scale carries significant maintenance overhead.\nThe reality of \u0026ldquo;27 agents.\u0026rdquo; How differentiated the 27 specialized agents actually behave depends on the quality of prompt engineering. Whether the boundary between architect and planner, or the difference between critic and qa-tester, is substantive requires verification.\nThe 30-50% savings claim for smart model routing. The benchmark conditions for this figure aren\u0026rsquo;t specified. Routing simple tasks to smaller models would save tokens, but it\u0026rsquo;s unclear whether retry costs for complex tasks are included.\nGuardrail sensitivity. If the team-verify → team-fix loop is overly sensitive, unnecessary fix cycles could repeat. This translates directly to token waste.\nThat said, the skill composition paradigm OMC proposes has genuine value. The approach of extending behavior through layering rather than mode switching — while preserving context — is compelling as the next step for AI coding tools. Team Mode\u0026rsquo;s plan → prd → exec → verify → fix pipeline in particular reflects the actual workflow of software engineering well.\nQuick Links oh-my-claudecode GitHub ROBOCO introduction post npm: oh-my-claude-sisyphus Takeaways The skill composition paradigm OMC proposes is genuinely valuable. Extending behavior through layering rather than mode switching — preserving context throughout — is a compelling direction for AI coding tools. Team Mode\u0026rsquo;s plan → prd → exec → verify → fix pipeline reflects real software engineering workflows well. That said, quantitative benchmarks showing exactly how much quality improvement this complex orchestration delivers over pure Claude Code are lacking. The time has come to prove it in numbers, not feeling.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-oh-my-claudecode/cover-en.jpg","permalink":"/posts/2026-03-20-oh-my-claudecode/","title":"oh-my-claudecode (OMC) — Teams-First Multi-Agent Orchestration for Claude Code"},{"content":"Overview The AI coding tool paradigm is shifting. We\u0026rsquo;re moving from a single large LLM handling everything, toward architectures where multiple lightweight subagents research in parallel and a main agent synthesizes the results. OpenAI announcing GPT 5.4 mini/nano as \u0026ldquo;explicitly designed for subagent use\u0026rdquo; signals that this pattern isn\u0026rsquo;t just a trend — it\u0026rsquo;s becoming the industry standard. Based on Cole Medin\u0026rsquo;s The Subagent Era Is Officially Here, this post digs into the core concepts and practical strategies of subagent architecture.\nWhy Subagents — The Context Rot Problem What Is Context Rot The more information you put in an LLM\u0026rsquo;s context window, the worse it performs. This is called context rot. Even a model with a 200K token context window will \u0026ldquo;forget\u0026rdquo; or misjudge the importance of early information when the window is actually filled to 200K.\nThis problem is particularly severe in AI coding tools:\nLarge codebase analysis: Loading dozens of files into context causes the model to miss content from the critical files Multi-step debugging: Simultaneously analyzing frontend code, backend code, and error logs causes information to blur together Web research + code modification: Processing search results and code together degrades quality on both How Subagents Solve It Subagent architecture solves this problem fundamentally. Each subagent has an independent context window, so it can focus solely on its assigned task. The main agent only receives summaries of each subagent\u0026rsquo;s results, keeping its own context clean.\ngraph TD User[\"User Request\"] --\u003e Main[\"Main Agent \u0026lt;br/\u0026gt; (Orchestrator)\"] Main --\u003e SA1[\"Subagent 1 \u0026lt;br/\u0026gt; Web Research\"] Main --\u003e SA2[\"Subagent 2 \u0026lt;br/\u0026gt; Frontend Analysis\"] Main --\u003e SA3[\"Subagent 3 \u0026lt;br/\u0026gt; Backend Analysis\"] SA1 --\u003e R1[\"Summarized research results\"] SA2 --\u003e R2[\"Summarized code analysis\"] SA3 --\u003e R3[\"Summarized API analysis\"] R1 --\u003e Main R2 --\u003e Main R3 --\u003e Main Main --\u003e Result[\"Integrated solution\"] style Main fill:#4A90D9,stroke:#333,color:#fff style SA1 fill:#7B68EE,stroke:#333,color:#fff style SA2 fill:#7B68EE,stroke:#333,color:#fff style SA3 fill:#7B68EE,stroke:#333,color:#fffThe key is context isolation. Even if each subagent uses 10K tokens, only about 1K tokens of summary gets passed back to the main agent. The main agent\u0026rsquo;s context only grows by 3K tokens total.\nComparing Subagent-Dedicated Models OpenAI explicitly labeling GPT 5.4 nano as \u0026ldquo;for subagents\u0026rdquo; is an industry first. Google is also moving in the same direction with Gemini 3.1 Flash Light under the \u0026ldquo;intelligence at scale\u0026rdquo; concept.\nKey Model Specs Model Processing Speed Input Cost (1M tokens) Output Cost (1M tokens) Primary Use Claude Haiku 4.5 53 tok/s $1.00 $5.00 General-purpose subagent GPT 5.4 nano 188 tok/s $0.20 $1.00 Dedicated subagent GPT 5.4 mini ~120 tok/s $0.40 $2.00 Medium-complexity tasks Gemini 3.1 Flash Light ~150 tok/s $0.15 $0.60 Large-scale parallel processing The GPT 5.4 nano numbers stand out:\nCost: 1/5 the cost of Claude Haiku 4.5 — you can run 5 subagents for the same price Throughput: 3.5x faster — dramatically reduces wait time for parallel subagents Design philosophy: \u0026ldquo;Smart enough, fast and cheap\u0026rdquo; — the right trade-off for subagent use Why Dedicated Models Are Needed Subagents have a different character than the main agent:\nMain agent: Complex reasoning, planning, code generation — accuracy is paramount Subagent: Information gathering, code reading, pattern searching — speed and cost are paramount Using large models like GPT-4o or Claude Sonnet as subagents causes costs to spike dramatically. 3 subagents called 5 times each means 15 LLM calls — unrealistic cost with large models. Nano-class models are what make subagent architecture economically viable.\nPractical Architecture — How Subagents Actually Work Claude Code\u0026rsquo;s Agent Tool Approach Claude Code is the first mover of subagent architecture. It creates subagents via the Agent Tool, with each subagent performing file reading, searching, and analysis tasks in independent context.\nsequenceDiagram participant U as User participant M as Main Agent participant T as tmux Session participant S1 as Subagent 1 participant S2 as Subagent 2 participant S3 as Subagent 3 U-\u003e\u003eM: Bug fix request M-\u003e\u003eM: Task decomposition M-\u003e\u003eT: Spawn subagents T-\u003e\u003eS1: Delegate web research T-\u003e\u003eS2: Frontend code analysis T-\u003e\u003eS3: Backend code analysis S1--\u003e\u003eM: Relevant doc summary S2--\u003e\u003eM: UI component analysis results S3--\u003e\u003eM: API endpoint analysis results M-\u003e\u003eM: Integrate results, create fix plan M-\u003e\u003eU: Code fix proposalNotably, Claude Code\u0026rsquo;s Agent Team feature spawns multiple subagents simultaneously as terminal sessions using tmux. This has even led to renewed developer interest in tmux.\nOpenAI Codex\u0026rsquo;s Approach OpenAI Codex takes a different approach. It runs agents in a sandbox environment, minimizing costs by using GPT 5.4 nano as subagents. While Claude Code is local terminal-based, Codex is cloud sandbox-based.\nThe core difference:\nCharacteristic Claude Code Agent Tool OpenAI Codex Execution environment Local terminal (tmux) Cloud sandbox Subagent model Claude Haiku 4.5 GPT 5.4 nano Parallelization method tmux session split Container-based File access Direct local filesystem Sandbox copy Cost structure API call cost only Compute + API cost AI Coding Tools Currently Supporting Subagents Subagents are no longer experimental. All major AI coding tools have adopted them:\nClaude Code — Agent Tool (first mover, most mature implementation) OpenAI Codex — GPT 5.4 nano-based subagents Gemini CLI — Experimental subagent support GitHub Copilot — Subtask splitting in agent mode Cursor — Parallel processing via Background Agent Open Code — Open source implementation Best Practices — Getting Subagents Right Cole Medin\u0026rsquo;s practical tips in the video are very specific.\nWhen to Use Subagents: Research The optimal use case for subagents is research:\nCode analysis: \u0026ldquo;Understand the dependency structure of this module\u0026rdquo; Web search: \u0026ldquo;Find a solution for this error message\u0026rdquo; Documentation exploration: \u0026ldquo;Summarize the migration guide for this library\u0026rdquo; Pattern search: \u0026ldquo;Find similar implementations in this project\u0026rdquo; Practical Example: 3 Parallel Research Subagents A real bug fix scenario Cole Medin shared:\n[Bug] Profile image not being saved on user profile update Main agent\u0026#39;s task decomposition: ├── Subagent 1: Web research │ → Search \u0026#34;multer file upload not saving express.js\u0026#34; │ → Collect solutions from Stack Overflow, GitHub Issues │ → Result: High probability of missing multer storage config │ ├── Subagent 2: Frontend analysis │ → Analyze form submission logic in ProfileEdit.tsx │ → Check FormData construction method │ → Result: Content-Type header not set to multipart │ └── Subagent 3: Backend analysis → Check multer middleware config in upload.route.ts → Verify file storage path and permissions → Result: Destination path is fine, middleware order issue found Main agent synthesis: → Fix frontend Content-Type + adjust backend middleware order Because the three subagents investigated their areas simultaneously, the time was reduced to 1/3 of sequential investigation. And because each subagent only loaded their area\u0026rsquo;s code into context, accurate analysis was possible without context rot.\nWhen Not to Use Subagents: Implementation There\u0026rsquo;s an anti-pattern Cole Medin warns against strongly. Don\u0026rsquo;t split implementation work across subagents.\nWhy not:\n[Anti-pattern] Split frontend/backend/DB across subagents Subagent A: Write React components Subagent B: Write Express API Subagent C: Write DB schema Problem: - API call format from A ≠ API response format from B - DB schema B expects ≠ Schema C created - Type mismatches, field name mismatches, interface mismatches → Major rework needed on integration → worse than not using subagents Implementation is fundamentally about inter-component communication contracts. Subagents don\u0026rsquo;t share each other\u0026rsquo;s context, so interface agreement is impossible. Research can be merged after independent investigation; code implementation cannot.\nThe right pattern:\nResearch → Subagents (parallel) Implementation → Main agent (sequential, in integrated context) Limitations and Caveats of Subagent Architecture 1. Orchestration Overhead The main agent managing subagents also has a cost. Task decomposition, writing subagent prompts, synthesizing results — all of this consumes the main agent\u0026rsquo;s context. Using subagents for simple tasks is actually inefficient.\nGuideline: Subagents aren\u0026rsquo;t needed for problems solvable by reading 2-3 files. Subagents shine when you need to cross-reference 5+ files, or when web search is needed.\n2. Result Quality Variance When subagents use nano-class lightweight models, quality can drop for research requiring complex reasoning. \u0026ldquo;Organize the structure of this file\u0026rdquo; is the right level for subagents — not \u0026ldquo;find the bug in this code.\u0026rdquo;\n3. Security Considerations When subagents perform external web searches, they may be exposed to prompt injection attacks. Malicious instructions embedded in search results can potentially be passed through subagents to the main agent.\nLooking Ahead Subagent architecture is reshaping the fundamental patterns of AI coding, going beyond just \u0026ldquo;faster searching\u0026rdquo;:\nModel specialization accelerates: Combinations of role-optimized models, rather than a single general-purpose model, become the standard Cost structure shifts: N calls to small models are more economical and accurate than a single call to a large model Developer workflow changes: \u0026ldquo;Old-fashioned\u0026rdquo; tools like tmux and terminal multiplexers get recast as AI agent infrastructure OpenAI, Google, and Anthropic all releasing lightweight models for subagent use is a clear signal. The subagent era has already arrived.\nQuick Links Cole Medin — The Subagent Era Is Officially Here OpenAI GPT 5.4 Model Card Claude Code Official Docs Takeaways The true significance of subagent architecture isn\u0026rsquo;t \u0026ldquo;faster coding\u0026rdquo; — it\u0026rsquo;s a fundamental change in how information is processed. We\u0026rsquo;re transitioning from delegating everything to one omnipotent LLM, to a structure where role-optimized lightweight models collaborate. This is strikingly similar to how microservices replaced monoliths in software engineering. OpenAI explicitly putting \u0026ldquo;for subagents\u0026rdquo; in a model release headline is a declaration that this paradigm is not an experiment — it\u0026rsquo;s the industry standard. For developers, what matters isn\u0026rsquo;t the models themselves, but how to integrate this architecture into your own workflow.\n","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-subagent-era/cover-en.jpg","permalink":"/posts/2026-03-20-subagent-era/","title":"The Subagent Era Has Arrived — GPT 5.4 nano and Strategies for Context Rot"},{"content":"Overview This dev log focuses on backend stabilization work for trading-agent. We added year fallback and PBR calculation logic to the DART financial data client, fixed a bug where current_price was missing from the market scanner pipeline, resolved an async/sync mixing issue in FastMCP middleware, and on the frontend improved SignalPanel with date-based grouping and upgraded the DAG workflow visualization.\nPrevious post: #4\nDART Client Improvements Background Fetching financial data from the DART (Electronic Disclosure System) API had two problems:\nMissing year data: When the latest year\u0026rsquo;s financial statements hadn\u0026rsquo;t been disclosed yet, the API returned an empty response. Without fallback logic to try a prior year, the analysis agent had to make decisions with no financial data. PBR not calculated: PER is provided directly by the API, but PBR (Price-to-Book Ratio) is not. Even though market cap and net asset data was available, PBR wasn\u0026rsquo;t being calculated. Industry-specific field differences: Financial statement line item names differ between the financial sector and general companies, causing parse errors for certain industries. Implementation Improvements made to backend/app/services/dart_client.py:\nYear Fallback Logic:\n# Try from current year, find a year with available data for year in range(current_year, current_year - 3, -1): result = await self._fetch_financial_data(corp_code, year) if result and result.get(\u0026#34;list\u0026#34;): break Auto PBR Calculation:\n# Calculate PBR from net equity (total capital) and market cap if total_equity and total_equity \u0026gt; 0: pbr = market_cap / total_equity Industry-specific Field Mapping:\nFinancial sector companies (banks, insurance, securities) use Operating Revenue instead of Operating Income, and reference Interest Income etc. instead of Revenue — branching logic was added for this.\nMarket Scanner Pipeline Fix Background After the market scanner scans stocks and passes them to each specialist agent (technical analysis, fundamental analysis, etc.), the current_price field was missing. The scanner fetches price data but wasn\u0026rsquo;t passing it along when calling downstream experts.\nImplementation flowchart LR A[\"Market Scanner\"] --\u003e|\"Stock list +\u0026lt;br/\u0026gt;current_price\"| B[\"Expert Agents\"] B --\u003e|\"Analysis results\"| C[\"Chief Analyst\"] C --\u003e|\"Final signal\"| D[\"Signal Queue\"]Updated backend/app/agents/market_scanner.py to explicitly pass current_price when calling experts:\n# Before: price info missing from expert call expert_result = await expert.analyze(stock_code, stock_name) # After: current_price passed through entire pipeline expert_result = await expert.analyze(stock_code, stock_name, current_price=price) Also simplified the chief analyst\u0026rsquo;s debate logic in market_scanner_experts.py. The old approach had all expert opinions debating sequentially — reducing unnecessary rounds improved response time.\nFastMCP Middleware async/sync Bug Fix Problem A method accessing context.state in MCP server middleware was a synchronous function being called with await:\n# Bug: await on sync function state = await ctx.get_state(\u0026#34;trading_mode\u0026#34;) # TypeError! FastMCP\u0026rsquo;s context state methods are synchronous functions. await-ing them either causes a TypeError when called on a non-coroutine, or in some Python versions silently returns None.\nFix Removed await from open-trading-api/MCP/Kis Trading MCP/module/middleware.py and tools/base.py:\n# Fix: sync method called directly without await state = ctx.get_state(\u0026#34;trading_mode\u0026#34;) Scheduled Tasks Activated Updated scheduler-related settings in backend/app/models/database.py:\nActivated scheduled tasks that were previously disabled Adjusted cron timings to match Korean market hours (pre-market scan, intraday monitoring, post-market report) Frontend Improvements SignalPanel Date Grouping Added date-based collapsible sections to frontend/src/components/dashboard/SignalPanel.tsx. Previously, all signals were listed chronologically, making it difficult to find signals from a specific date.\nflowchart TD A[\"SignalPanel\"] --\u003e B{\"Date grouping\"} B --\u003e C[\"2026-03-19\u0026lt;br/\u0026gt;5 signals\"] B --\u003e D[\"2026-03-18\u0026lt;br/\u0026gt;3 signals\"] B --\u003e E[\"2026-03-17\u0026lt;br/\u0026gt;8 signals\"] C --\u003e F[\"Collapsible\u0026lt;br/\u0026gt;expand/collapse\"]Daily Chart Data Extended Extended the daily chart data fetch period in backend/app/services/market_service.py from 30 days to 90 days. This was needed to have sufficient data for moving average calculations (60-day, 90-day) in technical analysis.\nDAG Workflow Styling Updated the agent pipeline DAG visualization layout and expert chip styling in frontend/src/components/AgentWorkflow.css and AgentWorkflow.tsx. Adjusted node spacing, connector label positions, and overall container alignment for improved readability.\nCommit Log Message Change feat: improve DART client with year fallback, PBR calculation, and industry-variant fields dart_client.py fix: pass current_price through scanner pipeline and simplify chief debate market_scanner.py, market_scanner_experts.py fix: remove await from sync FastMCP context state methods middleware.py, tools/base.py feat: enable scheduled tasks and adjust cron timings database.py feat: extend daily chart data from 30 to 90 days market_service.py feat: add date-grouped collapsible sections to SignalPanel SignalPanel.tsx, App.css style: improve DAG workflow layout and expert chip styling AgentWorkflow.css, index.css Takeaways async/sync mixing creates silent bugs. In Python, await-ing a sync function can return None instead of raising an error in some runtimes. When using libraries like FastMCP where sync and async coexist, you must verify each method\u0026rsquo;s signature. Missing data in pipelines is a common mistake. The scanner fetching price but not passing it to experts happened because each stage was tested independently. A reminder of the need for end-to-end tests. Financial data APIs require industry-specific handling. Financial sector financial statements are fundamentally structured differently from general companies. If you don\u0026rsquo;t pre-map these variations when wrapping the DART API, KeyError will hit you at runtime. Chart data range and analysis indicators must be designed together. Calculating a 90-day moving average requires at least 90 days of data — we were only fetching 30 days. Whenever adding technical analysis indicators, the data source\u0026rsquo;s range needs to be checked at the same time. ","date":"2026-03-20T00:00:00+09:00","image":"/images/posts/2026-03-20-trading-agent-dev5/cover-en.jpg","permalink":"/posts/2026-03-20-trading-agent-dev5/","title":"Trading Agent Dev Log #5 — Backend Stabilization and Data Pipeline Improvements"},{"content":"Overview Even after installing Claude Code and learning the basics, a common frustration surfaces: \u0026ldquo;Why am I not getting the results everyone else seems to get?\u0026rdquo; Conversations grow long and Claude seems to get dumber, repeating the same mistakes and breaking things in other places when fixing one. Most of these problems come down to context management and workflow.\nThis post synthesizes two videos into immediately actionable strategies. The first is a Meta engineer\u0026rsquo;s 20-minute deep dive on context management and practical workflows, covering everything from Second Brain setup to the WAT framework. The second is Anthropic hackathon winner Afan Mustafa\u0026rsquo;s 10 Claude Code tips, distilling 10 months of experience that earned him 70,000+ GitHub stars, broken down into beginner, intermediate, and advanced levels.\nBoth videos converge on the same message: Claude Code\u0026rsquo;s output quality depends entirely on the quality of context you provide, and building a system to manage that context systematically is what productivity actually looks like.\nCore Principles of Context Management Second Brain — Structure Your Knowledge The strategy is to record patterns, solutions, and decision rationale discovered while working with Claude Code in local markdown files. The Meta engineer maintains a project decision log organized by topic, capturing patterns encountered during development, solutions, and reasoning. When you need to do something similar later, you just feed Claude that file.\nThis used to be manual, but the /memory command now automates it. Claude automatically saves what it learns during a session — build commands, debugging insights, code patterns — to MEMORY.md, which is auto-loaded at the start of each session. Say \u0026ldquo;remember this\u0026rdquo; and it\u0026rsquo;s saved. Check and edit with /memory.\nFile Role Scope Managed by CLAUDE.md Team-shared rules, coding conventions, architecture decisions Whole project Manual MEMORY.md Personal preferences, recurring mistake patterns, learned content Personal Auto (/memory) TODO.md Session-to-session work continuity Per session Manual + AI collaboration The key principle: don\u0026rsquo;t put everything in CLAUDE.md. Keep personal memory in MEMORY.md and team-shared knowledge in CLAUDE.md.\nLazy Loading — Load Only What You Need A common mistake is cramming API specs, DB schemas, coding conventions, and architecture docs all into one CLAUDE.md. The problem is that CLAUDE.md is auto-loaded every session. If it contains 50 API endpoints and 30 DB table schemas, you\u0026rsquo;re burning thousands of tokens every time — for content that\u0026rsquo;s less than 5% relevant to the current task.\nBad — all 50 API endpoints in CLAUDE.md:\n# CLAUDE.md ## API Endpoints POST /api/users ... GET /api/users/:id ... (all 50 endpoints listed) Good — CLAUDE.md holds references; details live in separate files:\n# CLAUDE.md ## Reference Docs - API spec: docs/api-spec.md - DB schema: docs/db-schema.md - Architecture: docs/architecture.md This is Lazy Loading. When you say \u0026ldquo;update the DB schema,\u0026rdquo; Claude reads docs/db-schema.md from the pointer in CLAUDE.md and does the work — without loading the API spec or frontend architecture docs. Afan Mustafa calls this Progressive Disclosure: better to hand a new employee a table of contents and say \u0026ldquo;look things up when you need them\u0026rdquo; than to dump the entire manual on them at once.\nIf the root CLAUDE.md is growing too large, create folder-level CLAUDE.md files:\nproject/ ├── CLAUDE.md # Global rules (keep it lean) ├── apps/api/ │ └── CLAUDE.md # API server-specific rules ├── web/ │ └── CLAUDE.md # Frontend-specific rules ├── supabase/ │ └── CLAUDE.md # DB-related rules └── docs/ └── architecture.md # Mermaid diagrams The relevant CLAUDE.md is auto-loaded when working in that folder, preventing both root CLAUDE.md bloat and context contamination.\nDocument Architecture as Mermaid Diagrams Instead of explaining system structure in prose every time, a Mermaid diagram communicates architecture to Claude far more efficiently:\ngraph TD A[\"API Gateway\"] --\u003e B[\"Auth Service\"] A --\u003e C[\"Order Service\"] A --\u003e D[\"Payment Service\"] A --\u003e E[\"Inventory Service\"]Store diagrams by feature in a separate file like docs/architecture.md and reference it from CLAUDE.md. Combined with Lazy Loading, Claude reads only the architecture relevant to the feature at hand. Token efficiency is dramatically better than prose descriptions.\nSession Hygiene — One Session, One Feature The context window is 200K tokens. That sounds large, but it fills faster than expected. Afan Mustafa calls this \u0026ldquo;Context is milk\u0026rdquo; — it goes stale over time. The longer a conversation runs, the fuzzier the earlier parts become.\nCore principles:\nOne session = one feature. Instead of \u0026ldquo;build the entire payment system,\u0026rdquo; scope it down to \u0026ldquo;implement the Stripe webhook handler.\u0026rdquo; When a feature is done, use /clear or start a fresh session before moving on. Run /compact at the right time. Relying on auto-compression alone can lose critical context. Trigger it manually after completing a major feature or when the direction changes. Monitor token usage with /statusline continuously. You can\u0026rsquo;t manage what you can\u0026rsquo;t see — it\u0026rsquo;s like driving without a fuel gauge. The core principle: \u0026ldquo;Fresh context beats bloated context.\u0026rdquo; Don\u0026rsquo;t cling to previous conversation history. Starting each task with a clean session produces better results.\nMCP Diet — Turn Off Tools You\u0026rsquo;re Not Using Multiple connected MCPs consume significant tokens just from their tool descriptions. Looking at Afan Mustafa\u0026rsquo;s actual setup: 14 MCPs installed, but only 5–6 active at any time. The rest are turned on only as needed.\nThe system prompt can consume up to about 20,000 tokens. Disabling unused MCPs can cut that to 9,000 tokens — more than half. Too many active MCPs can shrink your effective context from 200K down to 70,000 tokens.\nBoth videos give the same advice:\nUse /mcp to check currently active MCPs Disable any not needed for the current task MCPs like Notion and Linear have especially large tool descriptions that consume a lot of tokens Build custom MCPs wrapping only the endpoints you actually use. This saves tokens and improves response quality. flowchart TD A[\"Context Management Strategy\"] --\u003e B[\"Second Brain\u0026lt;br/\u0026gt;CLAUDE.md + MEMORY.md\"] A --\u003e C[\"Lazy Loading\u0026lt;br/\u0026gt;Folder-level CLAUDE.md\"] A --\u003e D[\"Session Hygiene\u0026lt;br/\u0026gt;1 session = 1 feature\"] A --\u003e E[\"MCP Diet\u0026lt;br/\u0026gt;Disable unused MCPs\"] A --\u003e F[\"Mermaid Architecture\u0026lt;br/\u0026gt;Token-efficient structure docs\"] A --\u003e G[\"Script Offloading\u0026lt;br/\u0026gt;Separate heavy work\"] B --\u003e B1[\"Team rules → CLAUDE.md\u0026lt;br/\u0026gt;Personal learning → MEMORY.md\"] C --\u003e C1[\"Rules at root\u0026lt;br/\u0026gt;Details in separate files\"] D --\u003e D1[\"/clear to reset\u0026lt;br/\u0026gt;/statusline to monitor\"] E --\u003e E1[\"14 installed, 5-6 active\u0026lt;br/\u0026gt;20K → 9K token savings\"]Offload Heavy Work to Scripts Running heavy data processing inside a conversation contaminates context. Take a DB migration that needs to parse a 100K-row CSV: Claude has to read all 100K rows, load them into context, and process them. Context gets polluted and quality drops.\nInstead:\nAsk Claude to write a migration script that parses the CSV Have Claude run that script Claude only receives the results (JSON, etc.) and continues from there Claude never needs to read the CSV directly — just the summary output. Heavy data goes through scripts; Claude only receives the result. Context stays clean.\nPractical Workflow Patterns Plan Mode — Design First, Implement Second Both videos emphasize running Plan mode first. Afan\u0026rsquo;s analogy: \u0026ldquo;You wouldn\u0026rsquo;t start laying bricks without a blueprint.\u0026rdquo; Jumping straight to execution can send Claude on a destructive mass-edit in the wrong direction, wasting both context and usage credits.\nThe concrete workflow:\nIn Plan mode, describe the task to Claude Claude presents a plan — which files to modify, what approach to take Review the plan and give feedback. Correct the direction if it\u0026rsquo;s wrong; ask for alternatives if you want them. Once satisfied with the plan, switch to Accept mode and execute After completion, /clear and move to the next step The key: separate the planning session from the implementation session.\nAlways Read the Thinking Process Never ignore Claude\u0026rsquo;s thinking process. There are moments when Claude makes an assumption like \u0026ldquo;this function seems to do X, so I\u0026rsquo;ll do Y\u0026rdquo; — and that assumption can be wrong. Catch it immediately with Escape and correct the assumption. Code built on a wrong assumption is entirely worthless. Catching it early is everything.\nCross-AI Critique A useful tip from the Meta engineer: take Claude\u0026rsquo;s plan and show it to ChatGPT or Gemini for critique.\n\u0026ldquo;Analyze this conversation and point out what Claude might be missing or getting wrong.\u0026rdquo;\nEach AI model approaches problem definition and solutions from genuinely different angles. Taken a step further, this whole process can be automated as a custom skill — a with-multiple-ai skill, for example, could pass one AI\u0026rsquo;s plan to another, collect feedback, and surface a summary automatically.\nTDD-Based Smart Coding Since it\u0026rsquo;s difficult to closely review every line of AI-generated code, tight TDD loops are essential.\nflowchart LR A[\"Small change\"] --\u003e B[\"Write test\"] B --\u003e C[\"Run test\"] C --\u003e D{\"Pass?\"} D -- Yes --\u003e E[\"Commit\"] D -- No --\u003e F[\"Fix\"] F --\u003e C E --\u003e G[\"Next change\"] G --\u003e A Keep change units small Write and run tests after every change Commit immediately on pass. If something breaks, rolling back to the last commit makes debugging trivial. When errors occur, paste the raw log — don\u0026rsquo;t interpret it. Human interpretation introduces omissions and inaccuracies. Claude is excellent at analyzing stack traces; give it the original. TODO.md for Session Continuity AI doesn\u0026rsquo;t know your task list the way you do. Maintaining a TODO.md from project start to finish and sharing it with AI is key to continuity.\nPractical workflow:\nDecide what to do today — implement payment, polish landing page, subscription system, fix bugs 1 and 2 Write it as a checklist in TODO.md Tell Claude: \u0026ldquo;Start from TODO.md\u0026rdquo; Use Agent Teams to parallelize multiple tasks At session end, say \u0026ldquo;update TODO.md\u0026rdquo; and progress is automatically reflected This maintains continuity across multiple sessions.\nThe WAT Framework NetworkChuck\u0026rsquo;s WAT (Workflow-Agent-Tools) framework provides structure for managing Claude Code projects. The Meta engineer tried it and found it solid.\nW (Workflow) — define the task steps clearly in plain English before writing any code. Write out what stages this task should go through. A (Agent) — assign agents to each stage. Self-healing is key — when an error occurs, the agent reads its own logs, identifies the cause, fixes the code, and re-runs. Splitting roles across agents for parallel processing can cut a 10-minute task to 3–4 minutes. T (Tools) — many small scripts beat one large script. Break deploy-all.sh into single-responsibility units. When Claude fails mid-execution, debugging a small script is far more efficient. Concrete example — adding a comment system to a blog:\nW (Workflow): 1. Design and migrate the comments table schema 2. Implement API endpoints 3. Build frontend UI 4. Write and pass tests at each stage A (Agent): - Claude as coordinator, distributing tasks to subagents - One subagent designs tests while another implements the API - Automatic self-healing recovery on errors T (Tools): - scripts/migrate.sh → run DB migration - MCP GitHub → auto-create PR - Hooks → auto-run tests on every commit The framework\u0026rsquo;s core idea is separating AI reasoning from code execution. Have Claude think; let separate tools or scripts handle execution. Complex workflows become reliably manageable.\nModel Selection Strategy Not every task needs Opus. Afan Mustafa uses a restaurant analogy — you don\u0026rsquo;t order a tasting menu for a quick lunch.\nModel Suitable tasks Analogy Haiku File lookup, minor edits, format changes Quick lunch Sonnet Multi-file edits, general coding, bug fixes Regular meal Opus Full architecture design, complex bugs, multi-file refactoring Tasting menu Providing reference code matters too. Show Claude similar open-source code when asking it to build something, and the quality of the output noticeably improves. There\u0026rsquo;s a difference between asking someone to draw on a blank canvas versus giving them a reference to work from.\nAdvanced: Subagents and Automation Subagents — 16 Specialized Agents Afan Mustafa\u0026rsquo;s system has 16 specialized subagents. Like an orchestra conductor who doesn\u0026rsquo;t personally play every instrument, the approach is to give each agent exactly one job and pass the output to the next.\nflowchart TD M[\"Main Agent\u0026lt;br/\u0026gt;Orchestrator\"] --\u003e P[\"Planner\u0026lt;br/\u0026gt;Task planning\"] M --\u003e D[\"Designer\u0026lt;br/\u0026gt;UI/UX design\"] M --\u003e R[\"Reviewer\u0026lt;br/\u0026gt;Code review\"] M --\u003e T[\"Tester\u0026lt;br/\u0026gt;Test writing\"] M --\u003e O[\"Other specialists\u0026lt;br/\u0026gt;16 total\"] P --\u003e |\"pass plan\"| D D --\u003e |\"pass design\"| T R --\u003e |\"apply feedback\"| MUsing subagents keeps each role\u0026rsquo;s context independent, so the main agent only handles orchestration — making complex projects manageable at scale.\nGit Worktrees — The Foundation of Parallel Work Normally you finish one task before starting the next. With git worktree, you can have multiple working directories simultaneously in the same project — like going from one desk to five desks running in parallel.\n# Create worktrees git worktree add ../project-feature-a feature-a git worktree add ../project-feature-b feature-b # Run independent Claude Code sessions in each cd ../project-feature-a \u0026amp;\u0026amp; claude cd ../project-feature-b \u0026amp;\u0026amp; claude Run Claude separately in each directory and five agents develop different features simultaneously. Non-conflicting features develop in parallel and merge to main when complete.\nHooks — Automated Learning System Claude Code\u0026rsquo;s Hook feature works like an alarm clock — commands that run automatically at specific trigger points.\nHook Trigger Use cases session_start On new conversation Auto-load past records, load TODO.md pre_compact Before context compression Save important content to MEMORY.md first stop On conversation end Auto-record what was learned this session Combining these three creates a system where Claude remembers what it learns even after conversations end. It eliminates the manual effort of configuring context every time and lets Claude gradually \u0026ldquo;know\u0026rdquo; more about the project.\nSecurity Notes Warnings from Afan Mustafa not to skip:\nDon\u0026rsquo;t activate too many MCPs — context space shrinks significantly Don\u0026rsquo;t rely solely on auto-compression — critical context can disappear Take security seriously — when Claude reads external data, malicious instructions can be hidden inside. This is Prompt Injection, and Afan\u0026rsquo;s guide includes security tools that automatically detect it. Quick Links Meta engineer\u0026rsquo;s complete Claude Code guide — practical edition — Context management, TDD workflow, WAT framework, Cross-AI critique 10 Claude Code tips — Anthropic hackathon winner — Progressive Disclosure, system prompt diet, subagents, Git Worktrees, Hooks Insights The throughline of both videos is: \u0026ldquo;Claude Code is not a tool — it\u0026rsquo;s a system.\u0026rdquo; Beyond writing good prompts, you need to build a development system that encompasses knowledge management (CLAUDE.md, MEMORY.md), session design (Plan-Implement separation, /clear), tool optimization (MCP Diet), and automation (Hooks, subagents).\nWhat\u0026rsquo;s particularly striking is that both videos start from different places yet arrive at the same conclusions. The Meta engineer is coming from a large team environment; Afan Mustafa from solo hackathon projects — yet both rank context efficiency and task unit separation as their top priorities. This is a natural convergence driven by the physical constraint of Claude Code\u0026rsquo;s context window.\nIf I had to prioritize: first, clean up CLAUDE.md and split it by folder. Then, make Plan mode a habit. Next, use TODO.md to maintain continuity across sessions. Finally, extend automation with subagents and Hooks. Don\u0026rsquo;t try to apply everything at once — weave each step into your workflow one at a time.\n","date":"2026-03-19T00:00:00+09:00","image":"/images/posts/2026-03-19-claude-code-practical-guide/cover-en.jpg","permalink":"/posts/2026-03-19-claude-code-practical-guide/","title":"Claude Code Practical Guide — Context Management to Workflow Patterns"},{"content":"Overview Anthropic has announced a major update to Claude Code Skills. The most prominent change is the introduction of a built-in benchmarking system. You can now quantify whether a skill actually improves output quality through A/B testing, and Skill Creator V2 automates the entire lifecycle from test case generation through iterative improvement. New frontmatter options also provide fine-grained control over how skills execute.\nTwo Skill Categories: Capability Uplift vs. Inquiry Preference Anthropic has formally divided skills into two categories.\nCapability Uplift Skills Skills that enable the model to do something it fundamentally cannot do on its own. Specific API call patterns and external tool integrations fall here. This type of skill may become unnecessary as the model improves — once the model absorbs the capability itself, the skill is redundant.\nInquiry Preference Skills Skills that enforce a user\u0026rsquo;s specific workflow or preferences. Examples: \u0026ldquo;always respond in Korean,\u0026rdquo; \u0026ldquo;follow the security checklist on every PR review.\u0026rdquo; This type will never be deprecated, because it captures requirements that are inherently user-specific, regardless of how powerful the model becomes.\nflowchart TD A[\"Claude Code Skill\"] --\u003e B[\"Capability Uplift\"] A --\u003e C[\"Inquiry Preference\"] B --\u003e D[\"Enables functionality model can't do\"] D --\u003e E[\"May deprecate as model improves\"] C --\u003e F[\"Enforces user workflow\"] F --\u003e G[\"Never deprecated — user-specific requirement\"] style B fill:#f9a825,stroke:#f57f17,color:#000 style C fill:#42a5f5,stroke:#1565c0,color:#000 style E fill:#ef5350,stroke:#c62828,color:#fff style G fill:#66bb6a,stroke:#2e7d32,color:#000This classification matters because of the benchmarking system described next. Capability Uplift skills can be retired based on benchmark results when the model has absorbed the underlying capability.\nBenchmarking System: Proving a Skill\u0026rsquo;s Value with Data This is V2\u0026rsquo;s flagship feature — a built-in evaluation system that quantitatively measures whether a skill actually improves output quality.\nHow It Works flowchart LR subgraph eval[\"A/B Test Execution\"] direction TB A1[\"With skill\"] --\u003e R1[\"Result A\"] A2[\"Without skill\"] --\u003e R2[\"Result B\"] end subgraph judge[\"Score Comparison\"] direction TB R1 --\u003e SC[\"Score by evaluation criteria\"] R2 --\u003e SC SC --\u003e V{\"Score difference?\"} V --\u003e|\"Meaningful difference\"| KEEP[\"Keep skill\"] V --\u003e|\"Similar scores\"| DROP[\"Skill unnecessary — model already has it\"] end eval --\u003e judge style KEEP fill:#66bb6a,stroke:#2e7d32,color:#000 style DROP fill:#ef5350,stroke:#c62828,color:#fffMulti-agent support allows A/B tests to run simultaneously. One agent with the skill and one without perform the same task, and results are compared against evaluation criteria.\nExample Auto-Generated Evaluation Criteria Seven criteria Skill Creator automatically generated for a social media post generation skill:\n# Criteria Description 1 Platform coverage Was a post generated for every specified platform? 2 Language match Was it written in the requested language? 3 X character limit Does the X (Twitter) post respect the character limit? 4 Hashtags Were appropriate hashtags included? 5 Factual content Is the content factually consistent with the source material? 6 Tone differentiation Is the tone appropriately differentiated per platform? 7 Tone compliance Does it follow the specified tone guidelines? If scores differ meaningfully with and without the skill, the skill has value. If scores are similar, the model already has the capability and the skill is unnecessary.\nSkill Creator V2: Automate the Full Lifecycle With Skill Creator upgraded to V2, it goes beyond simple generation to automate the entire skill lifecycle.\nInstallation and Usage Run /plugin Search for \u0026ldquo;skill creator skill\u0026rdquo; and install Describe the desired skill in natural language Automatic: skill generation → test case generation → benchmark execution → result review The Automated Loop flowchart TD START[\"User: describe desired skill\"] --\u003e CREATE[\"Skill Creator generates skill\"] CREATE --\u003e EVAL[\"Auto-generate test cases\"] EVAL --\u003e BENCH[\"Run benchmark \u0026lt;br/\u0026gt; with skill vs without skill\"] BENCH --\u003e REVIEW{\"User satisfied?\"} REVIEW --\u003e|\"No\"| IMPROVE[\"Improve based on feedback\"] IMPROVE --\u003e EVAL REVIEW --\u003e|\"Yes\"| DONE[\"Skill complete\"] style START fill:#42a5f5,stroke:#1565c0,color:#000 style DONE fill:#66bb6a,stroke:#2e7d32,color:#000 style BENCH fill:#f9a825,stroke:#f57f17,color:#000Improving existing skills is also supported. Hand an existing skill to Skill Creator and it benchmarks current performance, identifies areas for improvement, and optimizes iteratively.\nBuilt-in progressive disclosure guidance walks users through skill creation step by step, making it accessible even for those without prior skill-writing experience.\nImproved Implicit Triggering Previous versions had reliability issues with implicit triggers (auto-execution without a slash command). V2 has the Skill Creator perform description optimization alongside skill generation, significantly improving implicit triggering accuracy. The skill\u0026rsquo;s description is automatically refined to communicate more clearly to the model when to invoke it.\nNew Frontmatter Options New frontmatter options in V2 enable fine-grained control over skill behavior.\nOption Description user_invocable: false Only the model can trigger it; users cannot invoke it directly user_enable: false Users cannot invoke it via slash command allow_tools Restrict which tools the skill can use model Specify the model to run the skill with context: fork Run the skill in a sub-agent agents Define sub-agents (requires context: fork) hooks Define per-skill hooks in YAML format The context: fork + agents combination is particularly interesting. It delegates skill execution to a separate sub-agent, so the skill works independently without contaminating the main context. The benchmarking system\u0026rsquo;s multi-agent A/B test also runs on this foundation.\nuser_invocable: false is useful for creating \u0026ldquo;background skills\u0026rdquo; that aren\u0026rsquo;t exposed to users and are invoked internally by the model based on its own judgment.\nQuick Links Claude Skills V2 update video Claude Code official docs Anthropic official site Insights The core of this V2 update is that the effectiveness of a skill can now be measured objectively.\nUntil now, skills operated on the assumption that \u0026ldquo;adding a skill will make things better.\u0026rdquo; With built-in benchmarking, you can finally determine with data whether a skill actually improves output quality, or whether you\u0026rsquo;re adding unnecessary prompt overhead on top of something the model already handles well.\nThe Capability Uplift vs. Inquiry Preference classification is equally practical. Instead of treating all skills identically, it provides a framework for distinguishing skills that should naturally be retired as the model advances from skills that should be maintained permanently.\nSkill Creator V2 automating the generation-evaluation-improvement loop dramatically lowers the barrier to entry. Skill writing used to be squarely in the domain of prompt engineering. Now you just describe what you want, and an optimized, benchmark-validated skill comes out the other end. The skill ecosystem is set to grow rapidly in both quantity and quality.\n","date":"2026-03-19T00:00:00+09:00","image":"/images/posts/2026-03-19-claude-skills-v2/cover-en.jpg","permalink":"/posts/2026-03-19-claude-skills-v2/","title":"Claude Skills V2 — A Skill System Evolved with Benchmarking and Automated Evaluation"},{"content":"Overview Previous post: Harness — Turning Claude Code from a Generic AI into a Dedicated Employee covered the concept of harness engineering and its core components — Skills, Agents, and Commands in Claude Code. This post looks at how to build and use a harness in practice with Antigravity, a free AI development tool from Google. The focus is on the Rules hierarchy, token efficiency of Skills, MCP integration, and the process of building a payment-enabled SaaS through vibe coding.\nAntigravity: The Harness in Action Antigravity is a free AI development tool from Google. It\u0026rsquo;s gaining attention as an alternative to paid tools like Cursor ($20/mo), GitHub Copilot ($10/mo), and Replit ($25/mo).\nThe core structure is an Agent Manager that controls an Editor and a Browser. This isn\u0026rsquo;t just code autocomplete — it\u0026rsquo;s an agent-first development approach. The agent makes plans, creates files, writes code, and self-corrects when errors occur.\nWhat\u0026rsquo;s particularly impressive is multi-model support. Beyond Gemini 3 Pro/Flash, you can choose Claude Sonnet 4.6/Opus 4.6 and GPT OS. Using Anthropic and OpenAI models inside a Google tool is significant from a harness perspective — the same Rules and Skills structure works with different models, letting you find the optimal combination by swapping models.\nflowchart LR subgraph Antigravity AM[\"Agent Manager\"] ED[\"Editor\"] BR[\"Browser Subagent\"] end subgraph Models G3P[\"Gemini 3 Pro\"] G3F[\"Gemini 3 Flash\"] CS[\"Claude Sonnet 4.6\"] CO[\"Claude Opus 4.6\"] GPT[\"GPT OS\"] end subgraph Harness[\"Harness Components\"] R[\"Rules \u0026lt;br/\u0026gt; Global + Workspace + Inline\"] S[\"Skills \u0026lt;br/\u0026gt; Progressive Disclosure\"] M[\"MCP \u0026lt;br/\u0026gt; 35+ Services\"] end AM --\u003e ED AM --\u003e BR Models --\u003e AM Harness --\u003e AMHarness Components in Practice The essence of harness engineering is designing the control structure and work environment before the AI starts. Like a horse\u0026rsquo;s harness — not constraining power but directing it — and like a test harness wrapping the execution environment for control.\nThe flow when an agent activates in Antigravity:\nflowchart TD A[\"Load Global Rules\"] --\u003e B[\"Load Workspace Rules\"] B --\u003e C[\"Load Skills \u0026lt;br/\u0026gt; YAML frontmatter only\"] C --\u003e D[\"Connect MCPs\"] D --\u003e E[\"Agent begins work\"] style A fill:#4a9eff,color:#fff style B fill:#4a9eff,color:#fff style C fill:#f5a623,color:#fff style D fill:#7b61ff,color:#fff style E fill:#4caf50,color:#fffRules: Three-Layer Hierarchy Antigravity\u0026rsquo;s Rules are organized into three layers.\nLayer Location Purpose Global Rules .gemini/gemini.md Rules applied across all projects Workspace Rules .agents/rules/ or .agent/rules/ Per-project rules Inline Rules Directly in agent chat Immediate reminders Global Rules share a path with the Gemini CLI (.gemini/), meaning rules set in Antigravity apply equally in the Gemini CLI. Usage quotas are tracked separately, but harness configuration is unified.\nActivation Mode: When Rules Fire Rules have four activation modes:\nAlways-on — always applied Model Decision — applied when the model judges it necessary GLB (File Pattern Matching) — applied based on file extension patterns Manual — only applied when explicitly mentioned GLB patterns are particularly practical. For example, automatically applying \u0026ldquo;use UV virtual environments\u0026rdquo; whenever working with *.py files. This is useful in projects that mix Python and TypeScript, enforcing different conventions by file type.\nSkills and MCP: The Token Efficiency Gap Progressive Disclosure: The Core Design of Skills Antigravity\u0026rsquo;s Skills use Progressive Disclosure. Initially only the YAML frontmatter (description) is loaded. The full content is only read when the agent determines that particular skill is needed.\nThis design creates a decisive difference from MCP. An MCP like Context7 loads a large volume of context at connection time. Skills consume only as much context as needed, when it\u0026rsquo;s needed. In token-constrained environments, this difference is significant.\nflowchart LR subgraph Skills[\"Skills approach\"] S1[\"Load description only \u0026lt;br/\u0026gt; a few tokens\"] --\u003e S2{\"Skill \u0026lt;br/\u0026gt; needed?\"} S2 --\u003e|Yes| S3[\"Load full content \u0026lt;br/\u0026gt; only what's needed\"] S2 --\u003e|No| S4[\"Skip \u0026lt;br/\u0026gt; tokens saved\"] end subgraph MCP[\"MCP approach\"] M1[\"Full load at connect \u0026lt;br/\u0026gt; large token cost\"] --\u003e M2[\"Always in \u0026lt;br/\u0026gt; context\"] end style S1 fill:#4caf50,color:#fff style S3 fill:#4caf50,color:#fff style S4 fill:#4caf50,color:#fff style M1 fill:#f44336,color:#fff style M2 fill:#f44336,color:#fffSkill Creator and Official Skill Installation Antigravity includes a built-in Skill Creator for creating and iteratively improving skills. You can also fetch and install Anthropic\u0026rsquo;s official skills from GitHub.\nTo apply a skill globally, drag it into the .gemini/skills/ folder. Without Git, download as ZIP and place it manually.\nMCP: Connecting External Services MCP (Model Context Protocol) connects 35+ external services to the agent — databases, APIs, GitHub, and more. Configure an agent workflow and you can automate everything from data collection to report generation and dashboard construction.\nThe key to harness design is combining Skills and MCP appropriately. Frequently used patterns go in Skills; external service integrations go in MCP. This achieves both token efficiency and functionality.\nVibe Coding All the Way to SaaS What Is Vibe Coding? Vibe coding is a concept Andrej Karpathy proposed in February 2025. Rather than writing code line by line, you describe the desired outcome to AI, which generates the code. The developer\u0026rsquo;s role shifts to setting direction and validating results.\nIn Antigravity, vibe coding means the agent handles the full cycle: plan → create files → write code → self-fix errors. The Browser Subagent controls Chrome directly, automating UI testing and debugging.\nFour Projects, Increasing Complexity The four projects introduced in the referenced video naturally escalate in difficulty:\nProject Difficulty Key elements LinkInBio Beginner Static page, basic layout Reading Tracker App Introductory CRUD, data persistence AI SNS Post Generator Intermediate AI API integration, content generation AI Background Removal SaaS Advanced Payment (TossPayments), admin dashboard, MRR tracking The final SaaS project is the impressive one. A production-level service including TossPayments payment integration, admin dashboard, and MRR (Monthly Recurring Revenue) tracking — all through vibe coding.\nDebugging Framework Errors happen even in vibe coding. The framework presented is concise:\nRead the error message — understand what failed Reproduce it — confirm the error under the same conditions Pass it to AI with context — bundle the error log, related code, and reproduction conditions together Debugging is ultimately part of the harness too. Structuring error context well and passing it clearly is itself a control structure that guides the AI in the right direction.\nQuick Links Harness Engineering — Applying Anthropic Claude Skills in Antigravity — Antigravity harness structure, Rules/Skills/MCP in practice Building SaaS and Payment Systems without Coding — Antigravity — Vibe coding, 4 project examples, TossPayments integration Previous post: Harness — Turning Claude Code from a Generic AI into a Dedicated Employee — Harness engineering concept and core components Insights In the previous post, I defined harness as \u0026ldquo;the control structure that transforms AI from generic to specialized.\u0026rdquo; Looking at Antigravity, I see that concept converging into a pattern beyond any single tool.\nClaude Code\u0026rsquo;s CLAUDE.md and Antigravity\u0026rsquo;s .gemini/gemini.md serve the same role with different names. Skills\u0026rsquo; Progressive Disclosure shares the exact same design philosophy as the Claude Code skill system. The tools differ, but the harness components — Rules, Skills, MCP — map almost 1:1.\nWhat stands out is token efficiency. MCP is convenient, but it consumes a large amount of context at connection time. Skills\u0026rsquo; Progressive Disclosure solves this problem elegantly. When designing a harness, the first question shouldn\u0026rsquo;t be \u0026ldquo;what do I put in context?\u0026rdquo; but \u0026ldquo;when do I put it in context?\u0026rdquo;\nThe fact that vibe coding can produce SaaS is a signal that the bottleneck in development is shifting from coding ability to harness design ability. What Rules to set, what Skills to prepare, how to structure error context — these decisions determine output quality.\n","date":"2026-03-19T00:00:00+09:00","image":"/images/posts/2026-03-19-harness-antigravity/cover-en.jpg","permalink":"/posts/2026-03-19-harness-antigravity/","title":"Harness Engineering #2 — Building Real Harnesses with Antigravity"},{"content":"Overview The technology for transferring one image\u0026rsquo;s style onto another has evolved at remarkable speed since Gatys et al.\u0026rsquo;s 2015 paper. What started as a slow, VGG19-based optimization loop has grown into real-time Stable Diffusion style transfer and, now, pose-driven virtual human video generation. This post surveys three open-source projects representing each era and traces the direction the technology has taken.\nThe three projects take completely different approaches. nazianafis/Neural-Style-Transfer is the classic optimization-based method — great for understanding the fundamentals. philz1337x/style-transfer leverages the Stable Diffusion ecosystem for dramatically faster and higher-quality results. And Tencent Music\u0026rsquo;s TMElyralab/MusePose extends the concept of style transfer into pose and motion, turning a still image into a dancing video.\nThe Spectrum of Three Approaches The diagram below shows how the three techniques differ along key axes.\nflowchart LR A[\"Classic NST\u0026lt;br/\u0026gt;VGG19 + L-BFGS\u0026lt;br/\u0026gt;(optimization-based)\"] B[\"Modern style transfer\u0026lt;br/\u0026gt;SD + ControlNet\u0026lt;br/\u0026gt;+ IP-Adapter\"] C[\"Pose-driven video\u0026lt;br/\u0026gt;Diffusion + Pose Align\u0026lt;br/\u0026gt;(video generation)\"] A --\u003e|\"speed + quality gains\"| B B --\u003e|\"add time dimension\"| C style A fill:#f0e6ff,stroke:#9b59b6 style B fill:#e6f0ff,stroke:#2980b9 style C fill:#e6fff0,stroke:#27ae60 Classic NST: An optimization process that runs hundreds of backpropagation steps on a single pair of images. The principles are transparent and the implementation is simple, but it\u0026rsquo;s slow. Modern style transfer: Uses Stable Diffusion\u0026rsquo;s latent space to separate structure preservation (ControlNet Canny) from style injection (IP-Adapter). Speed and quality improve dramatically. Pose-driven video generation: Extends the concept of \u0026ldquo;style\u0026rdquo; into pose and motion. The visual appearance of a reference image is preserved while the movement from a target dance video is applied. 1. nazianafis/Neural-Style-Transfer — The Right Starting Point for Understanding the Principles A Classic Gatys Implementation nazianafis/Neural-Style-Transfer (59 stars) is an educational PyTorch + VGG19 implementation of Gatys et al.\u0026rsquo;s 2015 paper \u0026ldquo;A Neural Algorithm of Artistic Style.\u0026rdquo; The code is concise, and the role of each loss function is visible directly in the code — making it an ideal reference for anyone learning Neural Style Transfer from first principles.\nThe core idea: take one content image and one style image, and directly optimize the output image to minimize three loss functions. Neural network weights are frozen — the pixel values themselves are what\u0026rsquo;s being updated.\nLoss Function Structure Three losses combine to guide the optimization.\nContent Loss: L2 distance between feature maps at the conv4_2 layer. Preserves structure and layout. Style Loss: Gram matrix differences across five layers (conv1_1 through conv5_1). The Gram matrix captures correlations between feature map channels, encoding texture and style. Total Variation Loss: Sum of differences between adjacent pixels. Suppresses noise and smooths the result. # Gram matrix calculation def gram_matrix(feature_map): b, c, h, w = feature_map.size() features = feature_map.view(b * c, h * w) gram = torch.mm(features, features.t()) return gram.div(b * c * h * w) # Total loss total_loss = alpha * content_loss + beta * style_loss + gamma * tv_loss The optimizer is L-BFGS, a quasi-Newton method using second-order derivative approximations that converges faster than Adam. The downside: memory usage grows sharply with resolution, and each image pair requires hundreds of forward/backward passes. Better as an experiment for understanding how Gram matrices encode style and how VGG layer depth affects the information captured, rather than for practical use.\n2. philz1337x/style-transfer — Practical Style Transfer with Stable Diffusion ControlNet + IP-Adapter Combination philz1337x/style-transfer (55 stars) solves the speed problem of classic NST by moving to the Stable Diffusion ecosystem. The approach combines two components: ControlNet Canny preserves edge structure from the content image, while IP-Adapter injects the visual characteristics of the style image into the diffusion process.\nControlNet Canny: Extracts a Canny edge map from the content image and uses it as a guide signal during denoising. This preserves the outlines and structure of the original image in the output. IP-Adapter (Image Prompt Adapter): Encodes the style image with a CLIP image encoder, then injects it into the UNet via cross-attention. The image itself serves as the style guide — no text prompt needed. Using both together provides a clean separation: \u0026ldquo;structure from the content image, color and texture from the style image.\u0026rdquo; The manual weight-tuning that classic NST required becomes much more intuitive.\nDeployment Two ways to run it:\nCog (Replicate) method: Uses cog, a Docker-based packaging tool, to deploy to Replicate or run locally in a container.\n# Local run cog predict -i image=@content.jpg -i style_image=@style.jpg # Replicate API curl -X POST https://api.replicate.com/v1/predictions \\ -H \u0026#34;Authorization: Token $REPLICATE_API_TOKEN\u0026#34; \\ -d \u0026#39;{\u0026#34;version\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;input\u0026#34;: {\u0026#34;image\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;style_image\u0026#34;: \u0026#34;...\u0026#34;}}\u0026#39; A1111 WebUI method: Install the ControlNet extension and IP-Adapter in AUTOMATIC1111\u0026rsquo;s Stable Diffusion Web UI for a GUI-based pipeline. The developer also runs a paid version at ClarityAI.cc with additional features like upscaling.\nCompared to classic NST, quality is higher and speed is much faster. The difference is especially pronounced for artistic styles (watercolor, oil painting, etc.) over photorealistic ones. The base model\u0026rsquo;s pre-training on vast image-text pairs gives it far richer texture and color representation than VGG19-based Gram matrices.\n3. TMElyralab/MusePose — Pose-Driven Virtual Human Video Generation A Practical Implementation of AnimateAnyone TMElyralab/MusePose (2,659 stars) is a pose-driven image-to-video framework developed by Tencent Music Entertainment\u0026rsquo;s Lyra Lab. It\u0026rsquo;s an optimized version of Moore-AnimateAnyone — itself a practical implementation of Alibaba\u0026rsquo;s AnimateAnyone paper — and handles pose animation in the Muse series (MuseV, MuseTalk, MusePose).\nThe goal is simple: take a single reference image of a person and a dance video, and generate a video of that person performing the dance. The reference image provides the appearance (clothing, face); the guide video provides the motion and pose.\nMusePose Pipeline flowchart TD A[\"Reference image\u0026lt;br/\u0026gt;(one person photo)\"] B[\"Guide dance video\u0026lt;br/\u0026gt;(DWPose extracted)\"] C[\"pose_align algorithm\u0026lt;br/\u0026gt;(scale + position alignment)\"] D[\"ReferenceNet\u0026lt;br/\u0026gt;(SD Image Variations)\"] E[\"Denoising UNet\u0026lt;br/\u0026gt;(Temporal Attention)\"] F[\"VAE Decoder\"] G[\"Generated video\"] A --\u003e D A --\u003e C B --\u003e C C --\u003e|\"aligned pose sequence\"| E D --\u003e|\"appearance features\"| E E --\u003e F F --\u003e G style A fill:#fff3e0,stroke:#e67e22 style B fill:#fff3e0,stroke:#e67e22 style G fill:#e8f5e9,stroke:#27ae60pose_align — The Core Contribution MusePose\u0026rsquo;s most important technical contribution is the pose_align algorithm. The person in the reference image and the person in the guide video will typically differ in height, build, and camera distance. Without alignment, the pose transfer looks awkward.\npose_align automatically aligns scale, position, and proportions based on DWPose keypoints from both figures. This preprocessing step is essential for output quality.\n# pose_align example python pose_align.py \\ --imgfn_refer reference_person.jpg \\ --vidfn_guide dance_video.mp4 \\ --outfn_align aligned_pose.mp4 Model Architecture ReferenceNet: Based on Stable Diffusion Image Variations. Encodes appearance features (clothing, face) from the reference image and feeds them to the UNet. Denoising UNet: A UNet with added Temporal Attention layers for maintaining consistency across frames over time. DWPose: A pose estimation model that extracts human body keypoints from each frame. More accurate than OpenPose. VAE: Decodes from latent space back to pixel space. Training code was released in March 2025, enabling fine-tuning on custom datasets. ComfyUI workflows are also supported. The project is actively used in entertainment applications like virtual fashion fitting and K-pop dance generation.\nComparing the Three Projects Item Neural-Style-Transfer style-transfer MusePose Foundation VGG19 + L-BFGS SD + ControlNet + IP-Adapter Diffusion + DWPose Output Image Image Video Speed Slow (minutes) Fast (seconds) Slow (scales with video length) Training needed No No (pretrained) No (pretrained) Best for Learning / experiments Practical style application Virtual humans, dance videos GPU requirements Low Medium High Classic NST can run on CPU without a GPU and is great for visualizing intermediate steps while learning the theory. For actual use, style-transfer has the best quality-to-barrier-of-entry ratio. MusePose produces the most impressive results but has correspondingly demanding infrastructure requirements.\nClosing Thoughts Looking at the three projects together, the evolution path of AI image generation technology comes into focus. What started as \u0026ldquo;transfer one image\u0026rsquo;s style onto another\u0026rdquo; has expanded into the time dimension — free control over a person\u0026rsquo;s motion and pose. The common thread is that all three exploit visual representations already learned by deep learning models. Classic NST uses feature representations from a classification model (VGG19); the modern approaches use the latent space of a generative model (Stable Diffusion).\nWith projects like MusePose open-sourced and training code available, the barrier to virtual human technology keeps dropping. Beyond simple dance generation, real-time avatar control and personalized virtual influencer creation are the logical next applications.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-neural-style-transfer/cover-en.jpg","permalink":"/posts/2026-03-17-neural-style-transfer/","title":"From Neural Style Transfer to Virtual Humans — Three Approaches to AI Image Generation"},{"content":"Overview I added Google OAuth login to the hybrid image search demo app. The app previously had no authentication — every API endpoint was wide open. The image generation feature calls the Gemini API and incurs real costs, so leaving it unprotected wasn\u0026rsquo;t an option. For this task, I ran the full cycle through the Claude Code superpowers plugin workflow: writing the design spec, spec review, implementation planning, coding, and security review. The result: 17 commits, a complete login wall.\nAuthentication Architecture I went with Lightweight Custom Auth instead of a library. FastAPI-Users brings 15+ features I don\u0026rsquo;t need (password reset, email verification, etc.), and Authlib + Session Middleware uses server-side redirects that don\u0026rsquo;t fit a SPA architecture. Building it myself means I understand and can debug every line.\nCore stack:\nBackend: google-auth (Google ID token verification) + python-jose (JWT creation/verification) Frontend: @react-oauth/google (Google Sign-In popup button) Session: JWT stored in HttpOnly cookie (more XSS-resistant than localStorage) Auth Flow sequenceDiagram participant U as User participant F as React Frontend participant G as Google OAuth participant B as FastAPI Backend participant DB as SQLite U-\u003e\u003eF: Open app F-\u003e\u003eB: GET /api/auth/me B--\u003e\u003eF: 401 Unauthorized F-\u003e\u003eU: Show LoginPage U-\u003e\u003eF: Click Google Sign-In F-\u003e\u003eG: OAuth popup G--\u003e\u003eF: Return ID Token F-\u003e\u003eB: POST /api/auth/google\u0026lt;br/\u0026gt;{id_token} B-\u003e\u003eG: verify_oauth2_token() G--\u003e\u003eB: Claims (sub, email, name, picture) B-\u003e\u003eDB: get_or_create_user() DB--\u003e\u003eB: User object B-\u003e\u003eB: create_jwt(user_id) B--\u003e\u003eF: Set-Cookie: access_token=JWT\u0026lt;br/\u0026gt;(HttpOnly, SameSite=Lax) B--\u003e\u003eF: LoginResponse {user} F-\u003e\u003eU: Switch to main app Note over F,B: All subsequent requests F-\u003e\u003eB: API request\u0026lt;br/\u0026gt;(cookie attached automatically) B-\u003e\u003eB: get_current_user()\u0026lt;br/\u0026gt;JWT verification B--\u003e\u003eF: Protected dataDatabase Changes Adding the User Model The app previously had four tables — SearchLog, ImageSelection, GenerationLog, ManualUpload — all recording actions anonymously. I created a new User table and added a user_id FK column to all four.\nclass User(Base): __tablename__ = \u0026#34;users\u0026#34; id = Column(Integer, primary_key=True, autoincrement=True) google_id = Column(String, unique=True, nullable=False, index=True) email = Column(String, unique=True, nullable=False) name = Column(String, nullable=False) picture_url = Column(String, nullable=True) generation_count = Column(Integer, default=0, nullable=False) last_active_at = Column(DateTime, nullable=True) created_at = Column(DateTime, nullable=False, server_default=func.now()) I left existing data untouched. The FK columns are declared nullable=True so existing rows stay as user_id=NULL, and only new rows get a user_id filled by the auth middleware. One Alembic migration handled table creation and FK additions.\nBackend Implementation auth.py — Authentication Module All auth logic lives in backend/src/auth.py. Three core functions:\n1. Google token verification — verify_google_token()\nasync def verify_google_token(token: str) -\u0026gt; dict: try: # verify_oauth2_token is synchronous and may fetch Google\u0026#39;s public keys over the network idinfo = await asyncio.to_thread( id_token.verify_oauth2_token, token, google_requests.Request(), GOOGLE_CLIENT_ID ) if idinfo[\u0026#34;iss\u0026#34;] not in (\u0026#34;accounts.google.com\u0026#34;, \u0026#34;https://accounts.google.com\u0026#34;): raise ValueError(\u0026#34;Invalid issuer\u0026#34;) return idinfo except ValueError as e: raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail=f\u0026#34;Invalid Google token: {e}\u0026#34;, ) The security review flagged this: verify_oauth2_token() is synchronous and may perform a network I/O to fetch Google\u0026rsquo;s public keys. Calling it without asyncio.to_thread() blocks the event loop.\n2. JWT cookie management — create_jwt() / set_auth_cookie()\nThe JWT carries only user_id and exp. Key cookie settings:\nHttpOnly — JavaScript can\u0026rsquo;t read the token, preventing XSS theft SameSite=Lax — CSRF protection (no extra CSRF token needed) Secure — Active in production (HTTPS) only, disabled for local development 3. FastAPI Dependency — get_current_user()\nasync def get_current_user(access_token: str = Cookie(None)): if not access_token: raise HTTPException(status_code=401, detail=\u0026#34;Not authenticated\u0026#34;) try: payload = jwt.decode(access_token, JWT_SECRET, algorithms=[\u0026#34;HS256\u0026#34;]) user_id = payload.get(\u0026#34;user_id\u0026#34;) except JWTError: raise HTTPException(status_code=401, detail=\u0026#34;Invalid token\u0026#34;) user = await get_user_by_id(user_id) if not user: raise HTTPException(status_code=401, detail=\u0026#34;User not found\u0026#34;) # Update last_active_at with throttling (once per minute) now = datetime.now(timezone.utc) if not user.last_active_at or (now - user.last_active_at).seconds \u0026gt; 60: await update_last_active(user.id) return user Updating last_active_at on every request would put write pressure on SQLite, so it\u0026rsquo;s throttled to once per minute. I also created a get_optional_user() variant that returns None instead of 401, for the /api/auth/me endpoint.\nProtecting Endpoints I added user = Depends(get_current_user) to all 10 data-access endpoints. The image generation endpoint additionally calls increment_generation_count(user.id). All logging functions (log_search, log_image_selection, etc.) received a user_id parameter and now store it in the DB.\n# Protected (get_current_user required) POST /search, /search/simple, /search/hybrid, GET /search POST /api/generate-image, /api/log-selection, /api/upload-reference-image GET /api/history/generations, /api/images, /api/images/{image_id} # Unprotected (no auth required) GET /, /health, /api/info, /images/{filename} POST /api/auth/google, /api/auth/logout GET /api/auth/me Frontend Login Flow LoginPage Component I used @react-oauth/google\u0026rsquo;s \u0026lt;GoogleLogin\u0026gt; component for popup-based login. Rather than a redirect flow, a Google account selection in the popup returns an ID token directly via callback.\n// LoginPage.tsx import { GoogleLogin, GoogleOAuthProvider } from \u0026#39;@react-oauth/google\u0026#39;; function LoginPage({ onLogin }: { onLogin: (user: UserProfile) =\u0026gt; void }) { const handleSuccess = async (credentialResponse) =\u0026gt; { const response = await loginWithGoogle(credentialResponse.credential); onLogin(response.user); }; return ( \u0026lt;GoogleOAuthProvider clientId={import.meta.env.VITE_GOOGLE_CLIENT_ID}\u0026gt; \u0026lt;div className=\u0026#34;login-container\u0026#34;\u0026gt; \u0026lt;h1\u0026gt;Hybrid Image Search\u0026lt;/h1\u0026gt; \u0026lt;GoogleLogin onSuccess={handleSuccess} onError={() =\u0026gt; setError(\u0026#39;Login failed\u0026#39;)} /\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/GoogleOAuthProvider\u0026gt; ); } App.tsx Changes Auth state is managed at the app entry point:\nOn mount — call GET /api/auth/me. Success restores the existing session; 401 shows the login page. Conditional rendering — authLoading → spinner, !user → \u0026lt;LoginPage\u0026gt;, else → main UI Logout — top-right button, calls POST /api/auth/logout then clears state Data loading guard — if (!user) return; in useEffect prevents API calls before login // App.tsx (core logic) useEffect(() =\u0026gt; { if (!user) return; const loadHistory = async () =\u0026gt; { const items = await fetchGenerationHistory(20, 0); setGeneratedImages(mapHistoryItems(items)); }; loadHistory(); }, [user]); api.ts — Axios Configuration // withCredentials: true — browser automatically attaches cookie to requests const api = axios.create({ baseURL: API_BASE, withCredentials: true, }); // 401 interceptor — redirect to login on token expiry api.interceptors.response.use( (response) =\u0026gt; response, (error) =\u0026gt; { if (error.response?.status === 401) { window.dispatchEvent(new Event(\u0026#39;auth:logout\u0026#39;)); } return Promise.reject(error); } ); Security Review After implementation, I ran /ship for a security review. Key findings and fixes:\nItem Problem Fix Google token verification Sync function blocking event loop Wrap with asyncio.to_thread() JWT secret not set All auth fails silently on startup without a secret Log logger.warning in configure_auth() create_jwt() guard Signing attempted when JWT_SECRET=None Add guard raising RuntimeError Frontend styles Hardcoded inline styles Convert to Tailwind CSS classes History loading API calls attempted before login Add user dependency guard Secret management was also considered up front. GOOGLE_OAUTH_CLIENT_ID and JWT_SECRET are loaded via os.getenv(), not from YAML config files. YAML is version-controlled, so secrets don\u0026rsquo;t belong there. Only non-secret config like token expiry lives in config.py\u0026rsquo;s AuthConfig.\nDev Tools: /ship Command and PostToolUse Hooks I also set up project-specific dev tooling during this work.\nPostToolUse hook — automatic type checking on every file edit:\n.ts/.tsx files modified → tsc --noEmit runs automatically backend/*.py files modified → pyright runs automatically /ship command — six-step verification pipeline before each commit:\nIdentify changed files Type validation (tsc + pyright) API contract sync check (schemas.py ↔ api.ts) Code simplification review Security review Auto-commit One interesting debugging detour: the PostToolUse hook used $CLAUDE_FILE_PATH as an environment variable, but it didn\u0026rsquo;t work. Turns out hooks receive input via stdin JSON:\nINPUT=$(cat) FILEPATH=$(echo \u0026#34;$INPUT\u0026#34; | jq -r \u0026#39;.tool_input.file_path // empty\u0026#39;) Commit Log Message Key files docs: Google login design spec 2026-03-17-google-login-design.md docs: incorporate spec review feedback same docs: fix endpoint path and description consistency same docs: write implementation plan 2026-03-17-google-login.md feat: User model + user_id FK models.py, Alembic migration feat: google-auth, python-jose dependencies requirements.txt feat: @react-oauth/google dependency package.json feat: auth Pydantic schemas schemas.py feat: user CRUD and activity tracking service.py feat: auth module (token verification + JWT cookie) auth.py feat: AuthConfig config.py, default.yaml feat: LoginPage component LoginPage.tsx feat: auth API functions, 401 interceptor api.ts feat: auth state, login/logout flow App.tsx feat: auth endpoints + full route protection main.py, service.py fix: security guards, async token verification, UI auth.py, App.tsx feat: Google OAuth login wall complete final merge Insights HttpOnly cookie vs. localStorage — Many tutorials store JWTs in localStorage, but one XSS hit and the token is gone. HttpOnly cookies are completely inaccessible to JavaScript. When protecting a paid service like the Gemini API, this is the right choice. The implementation overhead over localStorage is basically just adding allow_credentials=True to the CORS config.\nDesign first, code later — This session followed the sequence: design spec → review → implementation plan → review → coding. It seems slower, but the spec review caught missing get_optional_user() pattern, inadequate secret loading strategy, and endpoint list mismatches — all before a line of code was written. Much cheaper to fix at that stage.\nasyncio.to_thread() pattern — A common trap when using synchronous libraries in FastAPI. google.oauth2.id_token.verify_oauth2_token() makes an HTTP request internally. Calling it with no await freezes the event loop. Wrap it in asyncio.to_thread() to delegate to the thread pool.\nClaude Code /ship workflow — Running type check → API contract sync → code review → security review → auto-commit in one pass noticeably improves commit quality. Automatically verifying that schemas.py and api.ts changed together was especially useful. The ability to build custom hooks and commands per project is one of Claude Code\u0026rsquo;s real strengths.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-hybrid-search-auth/cover-en.jpg","permalink":"/posts/2026-03-17-hybrid-search-auth/","title":"Hybrid Image Search Dev Log — Implementing the Google OAuth Login Wall"},{"content":"Overview You can\u0026rsquo;t develop troubleshooting instincts from books alone. It takes repeated practice: reading real logs, pinpointing exactly which line in a config file is wrong. Infratice delivers that experience directly in the browser. It\u0026rsquo;s a problem-based learning platform covering Kubernetes, Linux, Network, CI/CD, and Monitoring — presenting real-world incident scenarios as static logs and config files. You analyze the root cause, write up your findings, and get an AI-review prompt generated automatically.\nThe GitHub repo kiku99/Infratice is written in TypeScript and has 25 stars. The architecture is noteworthy: a static-content foundation on Next.js App Router, with all problem data managed as Markdown files — making contributions straightforward. Deployment is on Cloudflare Pages for fast global access.\nThe Problem-Solving Flow Infratice\u0026rsquo;s learning flow is simple yet closely mirrors real incident response. Pick a problem, read through the provided logs and config files, reason through the root cause, and write your analysis. When you\u0026rsquo;re done, an AI-review prompt is generated so you can get feedback from ChatGPT or Claude. Finally, check the model answer and compare it with your own.\nflowchart TD A[\"Choose a problem\u0026lt;br/\u0026gt;(browse by category)\"] B[\"Analyze logs / config files\"] C[\"Write your solution notes\"] D[\"Generate AI review prompt\"] E[\"Check the model answer\"] F[\"Try the next problem\"] A --\u003e B B --\u003e C C --\u003e D D --\u003e E E --\u003e F F --\u003e A style A fill:#3b82f6,color:#fff style D fill:#8b5cf6,color:#fff style E fill:#10b981,color:#fffThe AI review prompt generation step is the key innovation. Instead of just showing the correct answer, Infratice composes a prompt based on your own write-up so an AI can give you targeted feedback. This makes the learning active — you discover what you got wrong before seeing the solution. The model answer comes after, which keeps the focus on your reasoning process.\nCategories and Example Problems The platform currently covers five categories: Linux, Kubernetes, Network, CI/CD, and Monitoring. Each problem is stored as a Markdown file at content/problems/{category}/{NNN}-{description}.md, so anyone can add new scenarios via a PR.\nTwo representative examples:\nKubernetes — ImagePullBackOff: Read Pod event logs and kubectl describe output to determine whether the failure is a typo in the image tag or a registry authentication issue. One of the most common incident types in real operations. CI/CD — GitHub Actions build failure: Analyze workflow.yml config and Actions logs to identify the cause — missing environment variable, cache conflict, runner version mismatch, and more. Both reproduce patterns that appear in real production environments. If you\u0026rsquo;ve been in ops for any length of time, you\u0026rsquo;ll recognize them immediately.\nTech Stack and Architecture Infratice runs on Next.js App Router with problem content managed as Markdown files. Code highlighting uses Shiki for readable rendering of logs and config files. Styling is Tailwind CSS v4, deployed on Cloudflare Pages.\nflowchart LR subgraph content [\"Content Layer\"] MD[\"Markdown Files\u0026lt;br/\u0026gt;(problems/)\"] end subgraph app [\"Next.js App Router\"] FS[\"File System\u0026lt;br/\u0026gt;Reader\"] SHIKI[\"Shiki\u0026lt;br/\u0026gt;Code Highlight\"] PROMPT[\"AI Prompt\u0026lt;br/\u0026gt;Generator\"] end subgraph deploy [\"Deploy\"] CF[\"Cloudflare Pages\"] end MD --\u003e FS FS --\u003e SHIKI FS --\u003e PROMPT SHIKI --\u003e CF PROMPT --\u003e CF style content fill:#1e293b,color:#94a3b8 style app fill:#1e3a5f,color:#93c5fd style deploy fill:#1a2e1a,color:#86efacThe decision to separate content into Markdown is the right call. The Next.js app reads Markdown files at build time to generate static pages, so everything is served from Cloudflare\u0026rsquo;s edge network with no server. Adding a new problem means writing one Markdown file and opening a PR — no database, no API.\nShiki tokenizes on the server side, so accurate syntax highlighting is available without any client-side JavaScript. It\u0026rsquo;s well-suited for rendering structured text like log files and YAML configs legibly.\nOther Projects Worth Noting A few other repos that caught my eye:\nyoungwoocho02/unity-cli (57 stars, C#/Go) — A single Go binary for controlling the Unity Editor via CLI. Works standalone without MCP, ready to plug into build automation or CI pipelines. softaworks/agent-toolkit — A curated collection of skills for AI coding agents. A structured repository of reusable skills for tools like Claude Code and Cursor. alibaba/page-agent — An in-page GUI agent that controls web interfaces via natural language. Runs directly inside the browser, handling complex UI automation with plain language commands. Closing Thoughts Infratice is a rare platform that lets you directly train the skill of reading logs. The effort to close the gap between theory and hands-on practice is technically clean. The Markdown-based content model keeps contributions open, so if you\u0026rsquo;ve dealt with a memorable production incident, consider writing it up as a problem and contributing it.\nIf you want to get sharper at infrastructure troubleshooting, try working through problems at infratice.co.kr.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-infratice-devops/cover-en.jpg","permalink":"/posts/2026-03-17-infratice-devops/","title":"Infratice — A Problem-Based DevOps Troubleshooting Platform Built on Real Incident Logs"},{"content":"Overview log-blog is a Python CLI tool that converts Chrome browsing history into Hugo blog posts. Today\u0026rsquo;s work split across two major threads. First, I improved AI chat URL classification and added Gemini share link extraction. Second, I built a new sessions command that parses Claude Code CLI session data to auto-generate development log posts. Across four sessions and roughly five hours, 13 commits landed.\nAI Chat Extraction Improvements — AI_LANDING Noise Filter Background When extracting AI service URLs from Chrome history, actual conversation pages and landing/login pages were mixed together. Across two Chrome profiles, 96 out of 3,575 URLs were AI service URLs — and most were noise: claude.ai/oauth/*, chatgpt.com/ (landing page), gemini.google.com/app (no conversation ID).\nDiagnosis:\nClaude: Most URLs were claude.ai/code/* (Claude Code sessions); claude.ai/chat/{uuid} conversation patterns: 0 ChatGPT: 1 conversation URL, the rest landing pages Gemini: gemini.google.com/app/{id} conversations matched, but gemini.google.com/share/{id} (share links) were missing Perplexity: No URLs in history at all Implementation I added AI_LANDING to the UrlType enum and restructured the classifier to run the noise filter before conversation pattern matching.\nclass UrlType(str, Enum): # ... existing types ... AI_LANDING = \u0026#34;ai_landing\u0026#34; # Noise: landing/OAuth/settings pages Sample noise patterns:\n_AI_NOISE_PATTERNS = [ re.compile(r\u0026#34;claude\\.ai/(?:oauth|chrome|code(?:/(?:onboarding|family))?)?(?:[?#]|$)\u0026#34;), re.compile(r\u0026#34;chatgpt\\.com/?(?:[?#]|$)\u0026#34;), re.compile(r\u0026#34;gemini\\.google\\.com/(?:app)?(?:/download)?(?:[?#]|$)\u0026#34;), # ... ] In content_fetcher.py, AI_LANDING URLs now get an early-return skip with no fetch attempt — no wasting Playwright slots on login walls.\nI also added url_type to the extract --json output, so the skill\u0026rsquo;s Step 2 classification uses the same regex engine instead of having Claude guess the type.\nResult: 34 AI chat conversations correctly classified, 32 noise URLs filtered out.\nGemini Share Link Support Added the gemini.google.com/share/{id} pattern to the Gemini classification regex, and implemented a dedicated _extract_gemini_share() extractor in ai_chat_fetcher.py. Share links are publicly accessible, so they\u0026rsquo;re handled with standard Playwright — no CDP connection needed.\nYouTube Fetcher Fix — Adapting to a Breaking API Change Background While writing a blog post, YouTube transcript fetching failed:\nAttributeError: type object \u0026#39;YouTubeTranscriptApi\u0026#39; has no attribute \u0026#39;list_transcripts\u0026#39; The youtube-transcript-api library shipped a v1.x update that changed class methods to instance methods.\nv0.x (old) v1.x (new) YouTubeTranscriptApi.list_transcripts(video_id) YouTubeTranscriptApi().list(video_id) YouTubeTranscriptApi.get_transcript(video_id) YouTubeTranscriptApi().fetch(video_id) Implementation I rewrote youtube_fetcher.py:\ndef _get_transcript(video_id: str): from youtube_transcript_api import YouTubeTranscriptApi api = YouTubeTranscriptApi() try: return api.fetch(video_id, languages=[\u0026#34;ko\u0026#34;, \u0026#34;en\u0026#34;]) except Exception: pass try: transcript_list = api.list(video_id) for transcript in transcript_list: try: return transcript.fetch() except Exception: continue except Exception: pass return None I also added the YouTube oEmbed API as a fallback to fetch video metadata (title, channel name, thumbnail) even when no transcript is available. Zero dependencies — just urllib.request:\n_OEMBED_URL = \u0026#34;https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v={video_id}\u0026amp;format=json\u0026#34; Three-tier fallback:\nTranscript + oEmbed metadata (best) oEmbed metadata only (when transcript unavailable) Playwright scraping (when everything else fails) Sessions Command — Extracting Dev Logs from Claude Code Sessions Background I run 20–40 Claude Code CLI sessions per day across multiple projects (GitHub + Bitbucket). Those sessions contain rich development narrative — debugging processes, architecture decisions, code changes — but there was no way to turn them into blog posts. The Chrome history pipeline tells me \u0026ldquo;what I looked at\u0026rdquo; but not \u0026ldquo;what I built.\u0026rdquo;\nData Flow flowchart TD A[\"~/.claude/projects/\u0026lt;br/\u0026gt;*.jsonl session files\"] --\u003e B[\"session_parser.py\u0026lt;br/\u0026gt;JSONL parsing + filtering\"] C[\"git log\u0026lt;br/\u0026gt;commits per project\"] --\u003e B B --\u003e D[\"sessions CLI command\u0026lt;br/\u0026gt;--project --json\"] D --\u003e E[\"Structured JSON output\"] E --\u003e F[\"Claude Code Skill\u0026lt;br/\u0026gt;Dev Log Mode\"] F --\u003e G[\"Hugo blog post\u0026lt;br/\u0026gt;narrative dev log\"] G --\u003e H[\"log-blog publish\"]Automatic Project Discovery Claude Code stores session files under ~/.claude/projects/ in directories named with the project path encoded as a string:\n-Users-lsr-Documents-github-trading-agent/ ├── f08f2420-0442-475f-a1f8-3691da54eb9d.jsonl ├── 30de43c5-8bc2-48d0-86df-c1a6a3f7f6ee.jsonl └── ... The problem: directory names can contain hyphens. For a repo named hybrid-image-search-demo, it\u0026rsquo;s impossible to tell from the directory name alone which hyphens are path separators and which are part of directory names.\nI solved this with a greedy filesystem matching algorithm:\ndef _reverse_map_path(dirname: str) -\u0026gt; Path | None: # Strip worktree suffix if present if _WORKTREE_SEPARATOR in dirname: dirname = dirname.split(_WORKTREE_SEPARATOR)[0] raw = \u0026#34;/\u0026#34; + dirname[1:] # leading \u0026#39;-\u0026#39; → \u0026#39;/\u0026#39; segments = raw.split(\u0026#34;-\u0026#34;) result_parts: list[str] = [] i = 0 while i \u0026lt; len(segments): matched = False for j in range(len(segments), i, -1): candidate = \u0026#34;-\u0026#34;.join(segments[i:j]) test_path = \u0026#34;/\u0026#34;.join(result_parts + [candidate]) if os.path.exists(test_path): result_parts.append(candidate) i = j matched = True break if not matched: result_parts.append(segments[i]) i += 1 path = Path(\u0026#34;/\u0026#34;.join(result_parts)) return path if path.exists() else None By trying the longest possible match first, directories with hyphens like /Users/lsr/Documents/bitbucket/hybrid-image-search-demo are resolved correctly.\nJSONL Parsing — Smart Filtering Claude Code\u0026rsquo;s JSONL files contain many message types: user, assistant, system, progress, and more. Including everything produces too much noise; I need to extract what matters.\nMessage type Include? What to extract User text Yes Full text (narrative backbone) Assistant text Yes Up to 1,500 chars (decisions/explanations) Edit/Write tool calls Yes File path + diff content Bash errors Yes Command + stderr Bash success Summary only Command only WebFetch/WebSearch Summary only URL/query only Agent subtasks Summary only Delegation description + result summary Read/Grep/Glob No Exploration noise thinking blocks No Internal reasoning, noise Default exclusions: sessions under 2 minutes or with fewer than 3 messages (override with --include-short). Max 100 items per session.\nCLI Usage # List available projects uv run log-blog sessions --list # Detailed session data for a specific project (JSON) uv run log-blog sessions --project log-blog --all --json # All data including short sessions uv run log-blog sessions --all --include-short --json The output JSON contains three key datasets — sessions, git_commits, and files_changed — which the Claude Code skill\u0026rsquo;s \u0026ldquo;Dev Log Mode\u0026rdquo; reads to write a narrative development log post.\nSkill Update — Adding Dev Log Mode I added a \u0026ldquo;Dev Log Mode\u0026rdquo; section to SKILL.md. When a user says \u0026ldquo;summarize what I did today\u0026rdquo; or \u0026ldquo;write a dev log,\u0026rdquo; the skill now branches to the session-data flow instead of the Chrome history flow.\nComparing the two modes:\nItem Chrome History Mode Dev Log Mode Data source Chrome SQLite DB Claude Code JSONL + git log Content nature \u0026ldquo;What I looked at\u0026rdquo; \u0026ldquo;What I built\u0026rdquo; Post style Topic-based technical analysis Problem → solution narrative Fetching needed Yes (Playwright/API per URL) No (included in session data) Commit Log Message Changed files docs: add design spec for AI chat extraction improvement specs docs: fix stale references in AI chat extraction spec specs docs: add implementation plan for AI chat extraction improvement plans chore: add pytest dev dependency pyproject.toml, uv.lock feat: add AI_LANDING noise filter and Gemini share link support url_classifier.py, tests feat: add url_type to extract \u0026ndash;json and filter AI_LANDING noise cli.py, tests feat: skip AI_LANDING URLs in content fetcher content_fetcher.py feat: add Gemini share link content extraction ai_chat_fetcher.py docs: update skill to use url_type from extract output SKILL.md docs: add session-to-devlog feature design spec specs docs: update session-devlog spec with review fixes specs docs: add session-devlog implementation plan plans feat: add sessions command for Claude Code dev log extraction cli.py, config.py, session_parser.py Insights Two separate threads converged on the same goal today. Improving AI chat URL classification captures \u0026ldquo;what I looked at externally\u0026rdquo; more accurately; the sessions command captures \u0026ldquo;what I built internally.\u0026rdquo; Together they move log-blog from a \u0026ldquo;browsing log tool\u0026rdquo; to a foundation for recording the full scope of development activity.\nThe greedy filesystem matching algorithm is simple but effective. Reverse-mapping hyphenated directory names can\u0026rsquo;t be solved with regex alone — checking the actual filesystem is the most reliable approach. The key insight is accepting that Claude Code\u0026rsquo;s project directory encoding is lossy and validating at runtime instead.\nThe youtube-transcript-api v1.x breaking change was a reminder of why dependency management matters. Adding oEmbed as a fallback reflects graceful degradation — \u0026ldquo;if we can\u0026rsquo;t get the transcript, at least get the metadata.\u0026rdquo; The result is a three-tier fallback (transcript + oEmbed, oEmbed only, Playwright), each level maximizing the information retrieved.\nThe spec → design → plan → implement workflow (brainstorm → writing-plans → subagent-driven-development) continues to prove its worth. The AI chat improvement handled 7 tasks in parallel via subagents, and three spec review loops removed unnecessary types like AI_CHAT_CLAUDE_CODE, meaningfully improving the design before any code was written.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-log-blog-sessions/cover-en.jpg","permalink":"/posts/2026-03-17-log-blog-sessions/","title":"log-blog Dev Log — Extracting Dev Logs from Claude Code Sessions"},{"content":"MEGA Code is a VS Code extension that turns Claude Code sessions into learning materials. It supports 23 languages, but translation coverage was uneven — Korean at around 90%, most others at 20–30%. Manually hunting down missing keys and translating them on every cycle became unsustainable. Automation was overdue.\nThis dev session covers designing and implementing two Claude Code commands (/i18n-audit, /i18n-fill) to automate the i18n workflow, plus fixes for a Node.js overlay race condition and a ChatService PATH mismatch found along the way.\ni18n Automation Command Design The Problem Three friction points were showing up in every i18n session:\nClaude making assumptions about file contents without reading them first Incomplete audits that missed missing keys Repeated file edit failures To eliminate this friction, I adopted a Two-Phase approach: audit first (understand current state), then fill (insert translations). Clean separation.\n/i18n-audit — Read-Only Translation Audit This command, written in .claude/commands/i18n-audit.md, scans 22 language files against en.ts as the reference and reports missing keys. The core rule is simple: always read the file before analyzing it.\nOutput format is a markdown table:\n| Language | File | Total | Present | Missing | Coverage | |----------|---------|-------|---------|---------|----------| | Korean | ko.ts | N | ... | ... | ...% | | Japanese | ja.ts | N | ... | ... | ...% | Key count is dynamically calculated from en.ts at runtime — hardcoding \u0026ldquo;282\u0026rdquo; would break every time a feature was added.\n/i18n-fill — AI-Powered Translation Gap Filling This command inserts translations based on the audit results. Target languages can be specified:\n/i18n-fill # all 22 languages /i18n-fill ko ja # Korean and Japanese only Translation guardrails are clearly defined in the prompt:\nPreserve {param} interpolation placeholders Keep HTML tag attributes (href, class, etc.) untouched; only translate visible text Match the tone and formality level of existing translations Preserve special export names like id_ in id.ts (to avoid JavaScript reserved word conflicts) flowchart TD A[\"User runs /i18n-audit\"] --\u003e B[\"Read en.ts\u0026lt;br/\u0026gt;extract all keys\"] B --\u003e C[\"Scan 22 language files sequentially\"] C --\u003e D{\"Missing keys found?\"} D -- Yes --\u003e E[\"Output coverage table\u0026lt;br/\u0026gt;+ missing key list\"] D -- No --\u003e F[\"Report 100% coverage\"] E --\u003e G[\"User runs /i18n-fill\"] G --\u003e H[\"Read target language files\"] H --\u003e I[\"AI translates missing keys\u0026lt;br/\u0026gt;to target language\"] I --\u003e J[\"Insert translated key-values\u0026lt;br/\u0026gt;preserving section order\"] J --\u003e K[\"npm run compile\u0026lt;br/\u0026gt;type check\"] K --\u003e L[\"Output results summary table\"]Key Design Decisions Spec review surfaced five important issues:\nIndonesian export name — id.ts exports as id_, not id, to avoid a JavaScript reserved word conflict. Hardcoded key count — Changed \u0026ldquo;282\u0026rdquo; to a runtime count from en.ts. Key insertion position — Insert into the correct section matching en.ts structure, not appended to the end of the file. HTML tag preservation — Only translate text inside tags, not attributes. Extra keys — Never delete keys that exist in a language file but not in en.ts. Bug Fix: Node.js Overlay Race Condition Symptom The \u0026ldquo;Node.js missing\u0026rdquo; warning banner wasn\u0026rsquo;t appearing, even on machines without Node.js installed. From the user\u0026rsquo;s perspective, everything looked fine.\nRoot Cause Tracing the overlay state delivery chain revealed four gaps in a push-only design:\nGap 1: Initial load race condition — The node-overlay HTML starts with class=\u0026quot;hidden\u0026quot;, and updateNodeUI() only runs when a node:statusUpdate message arrives. If the message arrives before the listener is registered, the overlay stays hidden forever.\nGap 2: onDidChangeVisibility ignores node state — When a panel is hidden and reopened, sendAuthStatus() is re-sent but there\u0026rsquo;s no recovery path for node state.\nGap 3: sendNodeStatus doesn\u0026rsquo;t check visibility — Messages sent while the panel is hidden are silently lost.\nFix I applied the Push + Pull + Pending triple-safety pattern that the auth system already used — extended it to node state as well:\n// dashboard-provider.ts — new fields private pendingNodeUpdate = false; private lastNodeAvailable: boolean | null = null; // sendNodeStatus — with visibility check public sendNodeStatus(available: boolean): void { this.lastNodeAvailable = available; if (!this.view) return; if (!this.view.visible) { this.pendingNodeUpdate = true; return; } this.view.webview.postMessage({ type: \u0026#39;node:statusUpdate\u0026#39;, data: { available }, }); } On the webview side, node:requestStatus is now sent on DOMContentLoaded and on agent zone transitions:\n// card-scripts-init.ts — added to DOMContentLoaded vscode.postMessage({ type: \u0026#39;node:requestStatus\u0026#39; }); vscode.postMessage({ type: \u0026#39;auth:requestStatus\u0026#39; }); // card-scripts-tabs.ts — added on zone switch if (zone === \u0026#39;agent\u0026#39;) { vscode.postMessage({ type: \u0026#39;node:requestStatus\u0026#39; }); vscode.postMessage({ type: \u0026#39;auth:requestStatus\u0026#39; }); } Three files changed: dashboard-provider.ts, card-scripts-init.ts, card-scripts-tabs.ts.\nBug Fix: ChatService PATH Mismatch Symptom \u0026ldquo;Error: Claude CLI not found\u0026rdquo; appearing in the Q\u0026amp;A panel despite Claude CLI working fine in the terminal.\nRoot Cause ClaudeCliChecker.isAvailable() and ChatService.runClaude() were using different PATH values:\n// ClaudeCliChecker — uses extended PATH (correct) const env = { ...process.env, PATH: buildExtendedPath() }; execFile(\u0026#39;claude\u0026#39;, [\u0026#39;--version\u0026#39;], { timeout: 5000, env }, ...); // ChatService — uses default PATH (bug) const proc = spawn(\u0026#39;claude\u0026#39;, args, { timeout: DEFAULT_CHAT_TIMEOUT_MS, stdio: [\u0026#39;pipe\u0026#39;, \u0026#39;pipe\u0026#39;, \u0026#39;pipe\u0026#39;], // no env! uses VS Code\u0026#39;s default PATH only }); ClaudeCliChecker finds claude using an extended PATH that includes /opt/homebrew/bin and reports it as available, but ChatService spawns without that path and gets ENOENT.\nFix Extracted buildExtendedPath() as a shared utility and unified its use in three places:\n// src/dependency/extended-path.ts (new file) export function buildExtendedPath(): string { const home = os.homedir(); const extra: string[] = []; if (process.platform !== \u0026#39;win32\u0026#39;) { extra.push( \u0026#39;/usr/local/bin\u0026#39;, \u0026#39;/opt/homebrew/bin\u0026#39;, path.join(home, \u0026#39;.local/bin\u0026#39;), path.join(home, \u0026#39;.claude/bin\u0026#39;), path.join(home, \u0026#39;.nvm/versions/node\u0026#39;), path.join(home, \u0026#39;.local/share/fnm\u0026#39;), path.join(home, \u0026#39;.volta/bin\u0026#39;), path.join(home, \u0026#39;.nodenv/shims\u0026#39;), \u0026#39;/usr/bin\u0026#39;, ); } const current = process.env.PATH || \u0026#39;\u0026#39;; return [...extra, current].join(path.delimiter); } Previously, node-checker.ts and claude-cli-checker.ts each had their own buildExtendedPath() and had already drifted: node-checker had 7 paths, cli-checker had 4. Consolidation fixed the latent drift bug as well.\nFour files changed: extended-path.ts (new), claude-cli-checker.ts, node-checker.ts, chat-service.ts.\nOther Work /explain-code skill design — A code tour skill for vibe coders (users unfamiliar with code structure). Provides a \u0026ldquo;Bird\u0026rsquo;s Eye → Room by Room → Glossary\u0026rdquo; three-step walkthrough to understand code before running /mend-logic or /mend-ui. Docs restructuring — Developer README moved to docs/, walkthrough documents removed. Version 0.1.1 release — \u0026ldquo;Reliability Improvements\u0026rdquo; section added to README. Commit Log Message Changed files fix(webview): add pull-based recovery for Node.js overlay status dashboard-provider.ts, card-scripts-init.ts, card-scripts-tabs.ts fix(chat): unify PATH resolution so ChatService finds Claude CLI chat-service.ts, claude-cli-checker.ts, extended-path.ts, node-checker.ts docs: restructure \u0026ndash; move developer README to docs/ README-for-developers.md and others chore: bump version to 0.1.1 package.json, package-lock.json docs: add bug reports, specs, and implementation plans 6 doc files docs: add i18n commands design spec i18n-commands-design.md docs: address spec review feedback for i18n commands i18n-commands-design.md docs: add i18n commands implementation plan i18n-commands-plan.md feat: add /i18n-audit command for translation key auditing i18n-audit.md feat: add /i18n-fill command for AI-powered translation gap filling i18n-fill.md chore: allow .claude/commands/ to be tracked in git .gitignore Insights Push-only messaging will always break Message passing between a VS Code webview and the extension host is asynchronous. Push-only delivery can lose messages based on timing. The auth system already had the Pull + Pending pattern, but node state was missing it in the same codebase. A pattern that solves a problem in one system needs to be applied to every other system with the same constraints.\nPATH mismatch creates \u0026ldquo;check passes, execution fails\u0026rdquo; When dependency checking and actual usage run in different environments (different PATH), the check result becomes meaningless. VS Code extension host PATH can differ significantly from the system shell — always factor this in. Extracting buildExtendedPath() as a shared utility isn\u0026rsquo;t just DRY; it\u0026rsquo;s ensuring consistency.\nThe most important i18n automation rule: \u0026ldquo;always read before analyzing\u0026rdquo; Claude assuming file contents without reading them was the biggest friction in i18n sessions. Both /i18n-audit and /i18n-fill list \u0026ldquo;ALWAYS read a file before editing it\u0026rdquo; as rule number one. Making the most common failure mode the first explicit rule in an AI prompt is an effective design pattern.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-megaupskill-i18n/cover-en.jpg","permalink":"/posts/2026-03-17-megaupskill-i18n/","title":"MegaUpskill Dev Log — Automating i18n Audits and Translation Gap-Filling"},{"content":"Overview Shortly after Replit closed a $400M Series D at a $9 billion valuation, they shipped Agent 4. The progression from Agent 2 (February 2025) → Agent 3 (September 2025) → Agent 4 marks a significant shift in philosophy: from \u0026ldquo;coding agent\u0026rdquo; to \u0026ldquo;creative collaboration platform.\u0026rdquo; Web apps, mobile apps, landing pages, presentations, data visualizations, animated videos — the scope now extends beyond code to knowledge work broadly.\nWhat Changed from Agent 3 Agent 3 focused on long-running autonomous operation — self-testing, bug-fixing, running independently for hours. Agent 4 pivots. Instead of pure autonomy, it emphasizes \u0026ldquo;creative control.\u0026rdquo; The agent handles orchestration and repetitive work; creative judgment stays with the human.\nThis pivot aligns with the dominant trend of 2026 — \u0026ldquo;coding agent → knowledge work agent.\u0026rdquo; Replit joins OpenAI\u0026rsquo;s Cowork and Notion\u0026rsquo;s Custom Agents in moving beyond pure code generation.\nFour Core Pillars flowchart TD subgraph pillar1 [\"Design Freely\"] A1[\"Infinite Canvas\"] A2[\"Simultaneous UI variants\u0026lt;br/\u0026gt;(each a separate agent)\"] A3[\"Design ↔ Code\u0026lt;br/\u0026gt;real-time sync\"] end subgraph pillar2 [\"Move Faster\"] B1[\"Parallel agent execution\u0026lt;br/\u0026gt;(auth, DB, backend, frontend simultaneously)\"] B2[\"Automatic task splitting\u0026lt;br/\u0026gt;+ conflict resolution sub-agent\"] end subgraph pillar3 [\"Ship Anything\"] C1[\"Web apps · Mobile apps\"] C2[\"Presentations · Videos\"] C3[\"Linear · Notion · Excel integrations\"] end subgraph pillar4 [\"Build Together\"] D1[\"Kanban-style workflow\"] D2[\"Concurrent requests + intelligent sequencing\"] D3[\"Background execution\u0026lt;br/\u0026gt;+ approval gates\"] end pillar1 --\u003e pillar2 pillar2 --\u003e pillar3 pillar3 --\u003e pillar41. Design Freely — Infinite Canvas A design canvas is now integrated directly into the build environment. On the infinite canvas you can freely explore designs and generate multiple UI variants simultaneously, with each variant handled by its own agent. The most striking part: design and code sync in real time — no separate design-to-development handoff.\n2. Move Faster — Parallel Agents The technical centerpiece of Agent 4. Multiple agents process project components — auth, database, backend, frontend — concurrently. Tasks are automatically split into smaller units, and a dedicated sub-agent resolves conflicts. This shift from sequential to parallel processing is the basis for Replit\u0026rsquo;s claim of building \u0026ldquo;production-quality software 10x faster.\u0026rdquo;\nNote: parallel agents are currently Pro/Enterprise tier only, available temporarily to Core users.\n3. Ship Anything — Beyond Code A single integrated project can now produce web apps, mobile apps, landing pages, presentations, data visualizations, and animated videos. External service integrations with Linear, Notion, Excel, and Stripe are also supported.\nReplit CEO Amjad Masad said Agent 4 can \u0026ldquo;not just build an application, but build and maintain an entire company\u0026rdquo; — pitch decks, animated logos, and payment integrations all in one platform.\n4. Build Together — Kanban Workflow The sequential chat thread is replaced with a task-based kanban workflow. Multiple team members can submit requests simultaneously, and agents process them with intelligent sequencing. Everything runs in the background with approval gates before merging.\nAgent 3 vs. Agent 4 Item Agent 3 (Sept 2025) Agent 4 (Mar 2026) Core philosophy Long-running autonomous operation Creative collaboration Design Requires separate tools Infinite canvas built-in Agent execution Sequential (single) Parallel (multiple) Scope Code-centric Apps + slides + video Team workflow Chat threads Kanban + approval gates External integrations Limited Linear, Notion, Stripe, etc. Pricing Paid plans Core and above (parallel: Pro+) Quick Links Replit Official Blog — Introducing Agent 4 — Official launch announcement Agent 4 Product Page — Feature overview and getting started AINews — Replit Agent 4: The Knowledge Work Agent — Latent Space analysis Insight The most significant shift in Agent 4 is the retreat from autonomy. Agent 3 pushed \u0026ldquo;the agent figures everything out for you.\u0026rdquo; Agent 4 pulls back to \u0026ldquo;the agent handles the repetitive work; creative decisions stay with you.\u0026rdquo; This is the pattern emerging across the entire AI coding tools market in 2026 — rather than full autonomy, where to place the human-AI collaboration boundary has become the central design question.\nThe parallel agent architecture is also interesting. Auth, DB, backend, and frontend processed concurrently with a sub-agent resolving conflicts — this design shares the same core hypothesis as TradingAgents\u0026rsquo; multi-agent debate structure: \u0026ldquo;collaboration between multiple agents outperforms a single agent.\u0026rdquo; Whether it\u0026rsquo;s actually 10x faster remains to be validated in practice; the question is how expensive conflict resolution between parallel agents turns out to be relative to sequential processing.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-replit-agent4/cover-en.jpg","permalink":"/posts/2026-03-17-replit-agent4/","title":"Replit Agent 4 — From Coding Agent to Creative Collaboration Platform"},{"content":"Overview Previous post: Stock Trading Agent Dev Log #2 — Expert Agent Team and KOSPI200 Data Adventures\nBuilding the Expert Agent Team architecture in #2 taught me something important: a multi-agent debate structure produces far richer analysis than a single LLM. It turns out someone had already taken that idea and built a serious framework around it. TradingAgents is a multi-agent trading framework with 32,395 GitHub stars (as of March 2026) that models the decision-making structure of an actual trading firm using LLM agents.\nTradingAgents Architecture The Four-Stage Pipeline TradingAgents mirrors the decision flow of a real securities research team. It has academic grounding in arXiv paper 2412.20138, with a separate Trading-R1 technical report also available.\nflowchart TD subgraph analysts [\"Stage 1: Analyst Team\"] A1[\"Fundamentals Analyst\u0026lt;br/\u0026gt;Financials \u0026amp; Valuation\"] A2[\"Sentiment Analyst\u0026lt;br/\u0026gt;Market Mood \u0026amp; Social\"] A3[\"News Analyst\u0026lt;br/\u0026gt;News \u0026amp; Disclosures\"] A4[\"Technical Analyst\u0026lt;br/\u0026gt;Charts \u0026amp; Indicators\"] end subgraph researchers [\"Stage 2: Researcher Team\"] B1[\"Bullish Researcher\u0026lt;br/\u0026gt;Buy Thesis\"] B2[\"Bearish Researcher\u0026lt;br/\u0026gt;Sell Thesis\"] end A1 \u0026 A2 \u0026 A3 \u0026 A4 --\u003e B1 A1 \u0026 A2 \u0026 A3 \u0026 A4 --\u003e B2 B1 \u0026 B2 --\u003e C[\"Stage 3: Trader Agent\u0026lt;br/\u0026gt;Position Decision\"] C --\u003e D[\"Stage 4: Risk Management\u0026lt;br/\u0026gt;+ Portfolio Manager\"] D --\u003e E[\"Final Trade Decision\"]The Analyst Team consists of four specialists. The Fundamentals Analyst covers financial statements and valuation. The Sentiment Analyst handles market mood and social data. The News Analyst covers news and regulatory filings. The Technical Analyst focuses on chart patterns and indicators. Each agent writes its report independently.\nThe Researcher Team is where TradingAgents\u0026rsquo; real differentiator lives. Two researchers — Bullish and Bearish — take the analyst reports and debate each other rather than simply aggregating information. Opposing views are deliberately put in collision. Compared to the Expert Team I built in #2, TradingAgents adds multiple debate rounds that iterate toward consensus.\nThe Trader Agent synthesizes the analyst and researcher reports to make the actual position decision. Risk Management and the Portfolio Manager handle the final approval stage.\nComparison with Our System The Expert Agent Team from #2 used 4 experts + a Chief Analyst. Against TradingAgents:\nItem Our System (#2) TradingAgents Analysis agents 4 (same) 4 (same) Debate structure Chief Analyst synthesis Bullish vs. Bearish debate Risk management None Risk Management + Portfolio Manager Data sources KIS API + NAVER Finance Alpha Vantage + News API LLM Claude API GPT-5.4, Gemini 3.1, Claude 4.6, etc. Korean market KOSPI200 native Not supported The key differences are the Bullish vs. Bearish debate structure and the risk management layer. Our system has a Chief Analyst synthesizing opinions; TradingAgents explicitly collides opposing positions before the Trader decides. This structure can produce richer analysis, but API call costs scale accordingly.\nQuick Start git clone https://github.com/TauricResearch/TradingAgents.git cd TradingAgents pip install -r requirements.txt from tradingagents.graph.trading_graph import TradingAgentsGraph from tradingagents.default_config import DEFAULT_CONFIG ta = TradingAgentsGraph(debug=True, config=DEFAULT_CONFIG) _, decision = ta.propagate(\u0026#34;NVDA\u0026#34;, \u0026#34;2024-05-10\u0026#34;) print(decision) A single propagate() call runs the entire pipeline. Pass a ticker symbol and a date, and the full Analyst Team writes reports in parallel, the Researcher Team debates, and the Trader\u0026rsquo;s final decision is returned.\nSwitching LLM Providers v0.2.1 supports GPT-5.4, Gemini 3.1, Claude 4.6, Grok 4.x, and Ollama. Switching is just a config change:\nconfig = DEFAULT_CONFIG.copy() config[\u0026#34;llm_provider\u0026#34;] = \u0026#34;anthropic\u0026#34; config[\u0026#34;deep_think_llm\u0026#34;] = \u0026#34;claude-sonnet-4-6\u0026#34; config[\u0026#34;quick_think_llm\u0026#34;] = \u0026#34;claude-haiku-4-6\u0026#34; Not being locked into a single model is a meaningful practical advantage. Our system is tied to the Claude API, so TradingAgents\u0026rsquo; provider abstraction layer is worth studying.\nConsiderations Before Production Deployment The backtest results in the paper and technical report are impressive. But there are practical considerations before going live.\nAPI costs: The more debate rounds between agents, the faster costs multiply. A single analysis run can generate dozens of LLM calls.\nHallucination risk: LLMs hallucinate — especially with specific numbers and dates. Without a fact-verification layer, bad information can feed directly into investment decisions. The \u0026ldquo;Blank beats wrong\u0026rdquo; principle from stock-analysis-agent is a good reference here.\nNo order execution: As an open-source framework, the actual order execution layer needs to be built separately. KIS API integration, as in our system, would be required.\nNo Korean market support: Handling KOSPI200 or DART disclosures requires additional development — which is where our system has an advantage.\nNext Steps The Bullish vs. Bearish debate structure and the risk management layer from TradingAgents are worth incorporating. Specifically:\nChief Analyst → debate structure: Replace simple synthesis with explicit collision of opposing positions Add a risk management layer: Portfolio-level risk checks that consider the full context LLM provider abstraction: Build in the ability to experiment with models beyond Claude Another option is to fork TradingAgents directly and add KIS API and DART data support. The core architecture is already validated; you\u0026rsquo;d only need to add the Korean market specialization layer on top.\nQuick Links TauricResearch/TradingAgents — Multi-agent trading framework (32K stars) arXiv paper 2412.20138 — Academic foundation stock-analysis-agent post — Claude Code-powered practical analysis tool #2 Expert Agent Team — Previous post in the series Insights What stands out most about TradingAgents is that the core value of a multi-agent debate structure isn\u0026rsquo;t \u0026ldquo;more information\u0026rdquo; — it\u0026rsquo;s the structuring of opposing views. When Bullish and Bearish researchers interpret the same data in opposite directions, the investor sees both arguments before making a judgment call. This is a structural solution to the confirmation bias inherent in single-LLM analysis.\n32,000 stars is the community voting on that idea. LLM-based financial analysis has already moved past \u0026ldquo;is this possible?\u0026rdquo; to \u0026ldquo;how do we make it trustworthy?\u0026rdquo; — and that\u0026rsquo;s a more interesting problem.\n","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-trading-agents/cover-en.jpg","permalink":"/posts/2026-03-17-trading-agents/","title":"Stock Trading Agent Dev Log #3 — TradingAgents: A 30K-Star Multi-Agent Trading Firm Simulator"},{"content":"Overview In the previous post (#3 — TradingAgents Analysis), I analyzed the open-source TradingAgents repo and immediately spotted gaps in our own agent: fundamental analysis from financial statements, investment signal quality validation, scenario-based R/R (Risk/Reward) scoring, and rich report output. This post documents how I closed all four gaps.\nThree sessions, over 20 hours of work. 37 commits, 25 new files, 65 changed files. One full cycle from design → spec review → implementation plan → subagent-driven TDD → merge → frontend debugging → dashboard reactivity improvements.\n1. Gap Analysis — What We Were Missing A feature comparison against kipeum86/stock-analysis-agent:\nFeature stock-analysis-agent Our trading-agent Fundamental data (DART API) PER, EPS, Revenue, etc. Technical analysis only Data confidence grading A/B/C/D validation None Scenario framework Bull/Base/Bear + probabilities None R/R score formula Quantitative calculation None Critic agent 7-item quality rubric None Rich HTML report KPI tiles, charts Plain text Our strengths, on the other hand: real-time order execution, risk management (stop-loss/take-profit), event-driven multi-agent orchestration, WebSocket push. The fundamental difference is read-only research tool vs. an executable trading system.\nThe goal was clear: A) DART fundamental integration → B) signal quality validation (critic + R/R) → C) rich dashboard. I chose a Vertical Slice approach — make one stock flow through the entire pipeline before anything else.\n2. DART Financial Data Integration DartClient Design I built a DartClient service wrapping the FSS (Financial Supervisory Service) DART OpenAPI. Key design decisions:\nDART_API_KEY is optional — if absent, enabled=False and all fields get grade D. This causes immediate rejection at the confidence hard gate, blocking signal generation without wasting Claude API calls. Corp code caching — DART uses 8-digit unique codes rather than ticker symbols. The full mapping is fetched from the corpCode.xml endpoint and cached in a SQLite dart_corp_codes table, refreshed once daily. Daily financial cache — a dart_cache table prevents duplicate API calls for the same ticker within a day. class DartClient: def __init__(self): self.enabled = bool(settings.dart_api_key) self.base_url = \u0026#34;https://opendart.fss.or.kr/api\u0026#34; async def fetch(self, stock_code: str) -\u0026gt; dict: if not self.enabled: return {\u0026#34;financials\u0026#34;: None, \u0026#34;confidence_grades\u0026#34;: { \u0026#34;dart_revenue\u0026#34;: \u0026#34;D\u0026#34;, \u0026#34;dart_operating_profit\u0026#34;: \u0026#34;D\u0026#34;, \u0026#34;dart_per\u0026#34;: \u0026#34;D\u0026#34;, \u0026#34;dart_eps\u0026#34;: \u0026#34;D\u0026#34;, }} corp_code = await self._resolve_corp_code(stock_code) # fnlttSinglAcntAll endpoint for last 4 quarters ... Confidence Grading Every data source gets a confidence grade:\nclass DataConfidence(Enum): A = \u0026#34;A\u0026#34; # Official disclosure, arithmetically verified B = \u0026#34;B\u0026#34; # 2+ sources, within 5% variance C = \u0026#34;C\u0026#34; # Single source, unverified D = \u0026#34;D\u0026#34; # No data — triggers hard gate Hard gate: if any of current_price, volume, dart_revenue, dart_operating_profit, or dart_per is grade D, signal generation halts entirely. The principle: \u0026ldquo;if we don\u0026rsquo;t know, we don\u0026rsquo;t guess.\u0026rdquo;\n3. Signal Pipeline — 5 Experts → Critic → R/R Gate I added a Fundamentals Analyst as the fifth expert alongside the existing four (Technical, Macro, Sentiment, Risk). It takes DART data as its primary input and analyzes revenue growth trends, operating margin, PER/PBR valuation, and debt ratio.\nflowchart TD A[\"KOSPI200 Screening\"] --\u003e B[\"Charts + Technical Indicators\"] B --\u003e C[\"DART Financial Data Fetch\"] C --\u003e D{\"Confidence\u0026lt;br/\u0026gt;Hard Gate\"} D --\u003e|\"grade D present\"| E[\"Signal Rejected\u0026lt;br/\u0026gt;(no Claude call)\"] D --\u003e|\"pass\"| F[\"5 Expert Panel\u0026lt;br/\u0026gt;(parallel Claude calls)\"] F --\u003e G[\"Chief Analyst Debate\u0026lt;br/\u0026gt;Bull/Base/Bear Scenarios\"] G --\u003e H[\"R/R Score Calculation\"] H --\u003e I[\"SignalCriticAgent\u0026lt;br/\u0026gt;5-item Rubric\"] I --\u003e|\"pass\"| J[\"signal.generated event\"] I --\u003e|\"fail\"| K[\"1 revision attempt\"] K --\u003e|\"re-pass\"| J K --\u003e|\"re-fail\"| L[\"signal.rejected\"] J --\u003e M{\"RiskManager\u0026lt;br/\u0026gt;R/R ≥ 2.0?\"} M --\u003e|\"Yes\"| N[\"Auto-approve → Order\"] M --\u003e|\"No\"| O[\"Pending manual approval\"]R/R Scoring I replaced the old confidence: float field with a scenario-based structure:\nclass Scenario(BaseModel): label: str # \u0026#34;Bull\u0026#34; / \u0026#34;Base\u0026#34; / \u0026#34;Bear\u0026#34; price_target: float upside_pct: float # % vs. current price probability: float # 0.0–1.0, three sum to 1.0 class SignalAnalysis(BaseModel): bull: Scenario base: Scenario bear: Scenario rr_score: float # (bull.upside × bull.prob + base.upside × base.prob) # / |bear.upside × bear.prob| variant_view: str # What the market consensus is missing def compute_rr_score(bull, base, bear) -\u0026gt; float: upside = bull.upside_pct * bull.probability + base.upside_pct * base.probability downside = abs(bear.upside_pct * bear.probability) return upside / downside if downside \u0026gt; 0 else 0.0 The RiskManager auto-approval gate now requires both min_rr_score (≥ 2.0) and critic_result == \u0026quot;pass\u0026quot;.\nSignalCriticAgent Immediately after signal generation, before the event is published, the critic checks five items:\n# Check Pass Condition 1 Scenario completeness 3 scenarios present, probabilities sum to 1.0 ±0.01 2 Data confidence No grade D on key fields 3 R/R arithmetic Computed R/R and declared R/R within 5% 4 Expert dissent represented At least one non-consensus view in the debate 5 Variant view specificity References a concrete data point, not a generic risk statement Checks 1–3 are purely programmatic (no Claude call). Only checks 4–5 invoke the LLM rubric. On failure, the Chief gets the feedback injected and gets one revision attempt. A second failure drops the signal as signal.rejected.\nChief Debate Update The consensus threshold was updated for the 5-expert setup:\nbullish_count \u0026gt;= 4 → \u0026quot;dominant\u0026quot; (≥80%) bullish_count == 3 → \u0026quot;majority\u0026quot; (60%) bullish_count \u0026lt;= 2 → \u0026quot;split\u0026quot; 4. Database Schema Extension Seven columns were added to the signals table, and a new agent_events table was created:\n-- ALTER TABLE migration (ignores column-already-exists errors) ALTER TABLE signals ADD COLUMN scenarios_json TEXT; ALTER TABLE signals ADD COLUMN variant_view TEXT; ALTER TABLE signals ADD COLUMN rr_score REAL; ALTER TABLE signals ADD COLUMN expert_stances_json TEXT; ALTER TABLE signals ADD COLUMN dart_fundamentals_json TEXT; ALTER TABLE signals ADD COLUMN confidence_grades_json TEXT; ALTER TABLE signals ADD COLUMN critic_result TEXT; -- Agent event persistence CREATE TABLE IF NOT EXISTS agent_events ( id INTEGER PRIMARY KEY AUTOINCREMENT, event_type TEXT NOT NULL, agent_name TEXT, data_json TEXT, timestamp DATETIME DEFAULT (datetime(\u0026#39;now\u0026#39;)) ); The risk_config table was seeded with min_rr_score (default 2.0) and require_critic_pass (default true).\n5. Dashboard Reactivity and ReportViewer WebSocket-Based Live Updates The old dashboard fetched data once on mount and never reacted to WebSocket events. Fixed:\nflowchart LR BE[\"Backend\u0026lt;br/\u0026gt;EventBus\"] --\u003e|\"WebSocket\"| WS[\"WS Connection\"] WS --\u003e DF[\"DashboardView\"] DF --\u003e|\"refreshTrigger+1\"| SP[\"SignalPanel\"] DF --\u003e|\"refreshTrigger+1\"| OH[\"OrderHistory\"] DF --\u003e|\"refreshTrigger+1\"| PC[\"PerformanceChart\"] WS --\u003e AF[\"AlertFeed\u0026lt;br/\u0026gt;(live events)\"] WS --\u003e RB[\"RiskAlertBanner\u0026lt;br/\u0026gt;(stop-loss/take-profit)\"] WS --\u003e AP[\"AgentPanel\u0026lt;br/\u0026gt;(recent logs)\"] BE --\u003e|\"DB persistence\"| DB[\"agent_events\u0026lt;br/\u0026gt;table\"] DB --\u003e|\"load on mount\"| AFKey change: DashboardView increments a refreshTrigger state on each WS message, and each panel component re-fetches when that prop changes. RiskAlertBanner watches for signal.stop_loss and signal.take_profit events and displays a warning banner at the top.\nAgent Event Persistence Previously, agent events lived only in memory and disappeared on server restart. Now event_bus.py fire-and-forgets each event to the DB. On AlertFeed mount, recent events are loaded from the DB and merged with live WS events.\nReportViewer A new component fully replacing the old ReportList:\nKPI tile row: total return, win rate, average R/R, total trade count Trade table: buy/sell details and return per ticker Signal grid: scenario cards and expert stances Narrative section: markdown report body On the backend, report_generator.py produces structured summary_json, and _enrich_report() in the reports.py router parses the JSON columns and delivers them to the frontend.\n6. Debug Notes Missing import type Blanks the React Page After the merge, the dashboard went completely white. With no error boundary, there were no clues. Only after checking the browser console via Playwright did I find the cause:\nUncaught SyntaxError: The requested module does not provide an export named \u0026#39;Scenario\u0026#39; TypeScript interface declarations are erased at compile time. But three components were doing runtime imports of Scenario. The fix was straightforward:\n// Before — runtime import of a type-only construct import { Scenario } from \u0026#39;../../types\u0026#39;; // After — properly erased at compile time import type { Scenario } from \u0026#39;../../types\u0026#39;; All three files (SignalCard.tsx, ScenarioChart.tsx, FundamentalsKPI.tsx) had the same pattern. Without an error boundary, one component crashing takes the whole page down — the same pattern as a single broken Mermaid diagram hiding all diagrams.\n\u0026ldquo;9 hours ago\u0026rdquo; — UTC Timestamp Parsing Bug Every timestamp in the dashboard showed \u0026ldquo;9 hours ago.\u0026rdquo; SQLite\u0026rsquo;s datetime('now') stores UTC strings without a Z suffix — \u0026quot;2026-03-17 01:55:01\u0026quot;. JavaScript\u0026rsquo;s new Date() treats these as local time, causing a 9-hour offset in a KST (UTC+9) environment.\n// frontend/src/utils/time.ts — shared UTC parser export function parseUTC(timestamp: string): Date { const ts = timestamp.endsWith(\u0026#39;Z\u0026#39;) || timestamp.includes(\u0026#39;+\u0026#39;) ? timestamp : timestamp + \u0026#39;Z\u0026#39;; return new Date(ts); } Replaced new Date(timestamp) with parseUTC(timestamp) across all six components: AgentPanel, AlertFeed, OrderHistory, PerformanceChart, ReportViewer, RiskAlertBanner.\n7. Commit Log Summary of 37 commits across 3 sessions:\nPhase Commits Content Design 3 Spec docs, review feedback, implementation plan Phase A: DART 4 DataConfidence enum, Scenario/SignalAnalysis models, DB schema, DartClient Phase B: Quality 4 Chief debate update, SignalCriticAgent, DART + hard gate wiring, critic loop Phase C: UI 4 R/R gate, signals API extension, 3 React components, import type fix Merge 1 feature branch → main (1,493 insertions, 25 files) Dashboard 7 Design spec, implementation plan, WS refresh, RiskAlertBanner, event persistence, AgentPanel logs Report 6 ReportSummary type, getReport API, summary_json calculation, ReportViewer, CSS, ReportList removal Fixes 4 confidence_grades_json parsing, Reports tab navigation, AgentPanel layout, UTC parsing Misc 4 .gitignore, plan docs, etc. 8. Insights Vertical Slice Surfaces Integration Issues Early Pushing one stock through the full DART → Expert → Chief → Critic → R/R Gate → UI pipeline immediately exposed integration issues like the import type bug and missing confidence_grades_json right after the merge. Building layer by layer would have deferred all of this to a far more expensive debugging session later.\nProgrammatic Critic Checks Cut LLM Costs Three of the five rubric items (scenario completeness, data confidence, R/R arithmetic) are verified with pure code — no Claude call needed. Only the remaining two require LLM judgment. The principle: let code handle arithmetic, let the LLM handle judgment.\nTimestamps Are Always a Trap SQLite\u0026rsquo;s datetime('now') storing UTC without a Z is documented behavior, but new Date() in JavaScript interpreting it as local time is a pitfall I fall into every time. The right answer was to build a parseUTC() utility once and use it consistently across every component.\nWhat\u0026rsquo;s Next Add error boundaries — one component crash should not take down the entire page DART API rate limiting — the current daily cache works for single stocks, but concurrent multi-ticker scanning needs proper throttling Run the live market scanner and measure critic rejection rates — if the rubric is too strict, it may be blocking useful signals ","date":"2026-03-17T00:00:00+09:00","image":"/images/posts/2026-03-17-trading-agent-dev4/cover-en.jpg","permalink":"/posts/2026-03-17-trading-agent-dev4/","title":"Stock Trading Agent Dev Log #4 — DART Integration, Signal Critic, Real-time Dashboard"},{"content":"Overview Three recent episodes from AI Frontier, a leading Korean AI podcast. EP 90 looks back on ten years since AlphaGo. EP 88 surveys the RL-driven technical landscape. EP 86 gets into the real mechanics of agentic coding workflows. The thread running through all three: verifiability.\nNavigation Map graph TD A[\"AI Frontier Podcast\"] --\u003e B[\"EP 90: Ten Years After AlphaGo \u0026lt;br/\u0026gt; A decade of AI retrospective\"] A --\u003e C[\"EP 88: No Secret Recipe \u0026lt;br/\u0026gt; The Age of RL\"] A --\u003e D[\"EP 86: Agentic Workflow \u0026lt;br/\u0026gt; Real-world agentic coding\"] B --\u003e E[\"ImageNet → Transformer → LLM\"] C --\u003e F[\"RL Scaling \u0026lt;br/\u0026gt; Environment Bottleneck \u0026lt;br/\u0026gt; Verifiability\"] D --\u003e G[\"Backend.AI:GO \u0026lt;br/\u0026gt; 40 days, 13B tokens \u0026lt;br/\u0026gt; 1M lines of code\"] EP 90: Ten Years After AlphaGo Guest: Jinwon Lee, CTO of HyperAccel (inference-focused AI chip startup)\nRecorded on Pi Day (March 14, 2026) to mark the tenth anniversary of the AlphaGo match, this episode reflects on a decade of deep learning. Hosts Jeongseok Noh and Seungjun Choi are joined by CTO Jinwon Lee.\nKey timeline:\nImageNet and NPU development: Lee\u0026rsquo;s experience building deep learning NPUs at Samsung Framework evolution: The progression from Theano → Caffe → TensorFlow → PyTorch From GAN to Transformer: The GAN era and the rise of generative AI, then the emergence of the Attention mechanism BERT vs GPT: The encoder (BERT) and decoder (GPT) fork, and how GPT became the path to LLMs Korean foundation models: The roles of HyperCLOVA and the Stability AI community Andrej Karpathy\u0026rsquo;s Autoresearch and \u0026ldquo;repeated verification of verifiable signals\u0026rdquo; emerge as key phrases, alongside a revisit of Noam Brown\u0026rsquo;s AlphaGo anniversary post on the significance of Move 37.\nEP 88: No Secret Recipe Guest: Seunghyun (AI researcher)\nThe title says it all — there is no single \u0026ldquo;secret sauce\u0026rdquo; in AI. That\u0026rsquo;s the core message.\nKey points:\nGLM 5 report and RL: Yao Shunyu\u0026rsquo;s \u0026ldquo;The Second Half\u0026rdquo; paper proposing an RL-centric paradigm. The conclusion: \u0026ldquo;there\u0026rsquo;s no secret recipe, but RL is currently the most promising direction.\u0026rdquo; Back to basics: At this stage, data quality and product intuition matter more than flashy architectural innovations Fog of Progress: Why predicting the future is structurally hard. Model performance curves are nonlinear, so the intuition \u0026ldquo;this will work by end of year\u0026rdquo; often misfires Environment scaling: The biggest bottleneck for agentic RL isn\u0026rsquo;t the model — it\u0026rsquo;s scaling the environment. The key question is how richly you can build verifiable simulation environments Context management: Strategies for working around context length limits with Sparse Attention and multi-agent approaches Harness-model fusion: The blurring of the boundary between product and model. A good harness pulls up model performance EP 86: Agentic Workflow in Practice Guest: Jungkyu Shin, CEO of Lablup (Backend.AI)\nThe most hands-on episode. The story centers on building Backend.AI:GO — in 40 days, using 13 billion tokens, generating 1 million lines of code — and what it taught about agentic coding.\ngraph LR A[\"Backend.AI:GO \u0026lt;br/\u0026gt; 40-day build\"] --\u003e B[\"13B tokens consumed\"] B --\u003e C[\"1M lines of code\"] C --\u003e D[\"Cloud model routing \u0026lt;br/\u0026gt; Distributed dispatch\"]Core insights:\nToken cost competitiveness and fast inference: Inference speed directly impacts developer productivity in agentic coding Bio-tokens: The concept of \u0026ldquo;human cognitive load\u0026rdquo; in the AI era — even humans have a limit on how much information they can process Software abundance: The rise of \u0026ldquo;instant apps\u0026rdquo; — is the value of code converging toward zero? Claude Code\u0026rsquo;s real advantage is the harness: The differentiator isn\u0026rsquo;t the model itself, but what wraps it — tools, context management, workflow Build the generator, not the output: Automation\u0026rsquo;s real goal is a system that produces results, not individual results Polite prompting: An empirical observation that tone in a prompt may affect output (though the mechanism is unclear) Particularly memorable is the analogy to \u0026ldquo;Cyber Formula\u0026rdquo; to explain the philosophical difference between Claude Code and Codex.\nQuick Links AI Frontier EP 90 — Ten Years After AlphaGo AI Frontier EP 88 — No Secret Recipe AI Frontier EP 86 — Agentic Workflow in Practice Insight The keyword running through all three episodes is verifiability. EP 90: Karpathy\u0026rsquo;s \u0026ldquo;repeated verification of verifiable signals.\u0026rdquo; EP 88: \u0026ldquo;verifiable environments\u0026rdquo; as the bottleneck for RL scaling. EP 86: \u0026ldquo;build the generator, not the output.\u0026rdquo; All three are different facets of the same underlying problem. As AI models grow more powerful, the weight of the question \u0026ldquo;how do you know this result is correct?\u0026rdquo; only increases. EP 88\u0026rsquo;s conclusion — \u0026ldquo;focus on fundamentals: data, harness, environment\u0026rdquo; — is probably the most honest answer available.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-ai-frontier-podcast/cover-en.jpg","permalink":"/posts/2026-03-16-ai-frontier-podcast/","title":"AI Frontier Podcast: Three Episodes — AlphaGo at 10, the RL Era, and Agentic Workflows"},{"content":"Overview You\u0026rsquo;re deep into a refactoring session with Claude Code at your desk and have to step away. Closing the terminal ends the session. Previously, this required an SSH tunnel or a third-party tool (happy, hapi, etc.). Now Claude Code has an official Remote Control feature. One command — claude remote-control — lets you resume the same session from a smartphone, tablet, or another computer.\nHow It Works graph TD A[\"Local Machine \u0026lt;br/\u0026gt; claude remote-control\"] --\u003e|\"HTTPS outbound only\"| B[\"Anthropic API \u0026lt;br/\u0026gt; Message routing\"] B --\u003e C[\"claude.ai/code \u0026lt;br/\u0026gt; Browser\"] B --\u003e D[\"Claude Mobile App \u0026lt;br/\u0026gt; iOS/Android\"] B --\u003e E[\"Another Computer \u0026lt;br/\u0026gt; Browser\"] C --\u003e|\"Real-time sync\"| A D --\u003e|\"Real-time sync\"| A E --\u003e|\"Real-time sync\"| AThe key point: the session always runs on your local machine. Code never leaves for the cloud — your filesystem, MCP servers, and project settings remain intact. The local Claude Code process only sends HTTPS outbound requests; no inbound ports are opened. Anthropic\u0026rsquo;s API handles message routing in the middle.\nIf the network drops or your laptop sleeps, the session auto-reconnects when the machine comes back online — though a network outage longer than 10 minutes will time out the session.\nUsage Basic: Server Mode claude remote-control A session URL and QR code are printed in the terminal. Press Space to toggle the QR code so you can scan it with your phone.\nKey Flags Flag Description --name \u0026quot;My Project\u0026quot; Name shown in the claude.ai/code session list --spawn same-dir Concurrent sessions share the same directory (default) --spawn worktree Each session gets its own independent git worktree --capacity \u0026lt;N\u0026gt; Maximum concurrent sessions (default 32) --sandbox Enables filesystem/network isolation Activating From an Existing Session You can also activate Remote Control from an in-progress interactive session with /remote-control. Or go to /config and turn on \u0026ldquo;Enable Remote Control for all sessions\u0026rdquo; to apply it globally.\nThree Ways to Connect URL: Enter the session URL from the terminal directly in a browser QR code: Press Space to show the QR code, then scan with your phone camera Session list: Find the session by name in claude.ai/code or the Claude app (green dot = online) Remote Control vs Claude Code on the Web graph LR subgraph RC[\"Remote Control\"] A1[\"Runs on local machine\"] --\u003e B1[\"Access to your filesystem\"] A1 --\u003e C1[\"Uses your MCP servers\"] A1 --\u003e D1[\"Preserves your project settings\"] end subgraph Web[\"Claude Code on the Web\"] A2[\"Runs on Anthropic cloud\"] --\u003e B2[\"Cloud VM environment\"] A2 --\u003e C2[\"No local config needed\"] A2 --\u003e D2[\"No repo clone needed\"] end Remote Control Claude Code on the Web Runs on Your local machine Anthropic cloud Filesystem Your local files Cloud VM MCP servers Available Not available Local setup needed Yes (project must be cloned) No Best for Continuing ongoing work Starting something new quickly Remote Control = \u0026ldquo;continue in my environment\u0026rdquo;. Web = \u0026ldquo;start fresh anywhere\u0026rdquo;.\nThird-Party Alternatives Community-mentioned third-party projects:\nslopus/happy, tiann/hapi — open-source tools with similar goals SSH tunnel to a remote terminal The official Remote Control\u0026rsquo;s advantage: no separate server setup, TLS security via the Anthropic API by default. The downside, noted in community discussion, is that you have to set up the session in advance — which can feel less flexible than some open-source alternatives.\nLimitations Plans: Pro, Max, Team, Enterprise (Team/Enterprise requires an admin to enable Claude Code first) No API key support: Authentication via claude.ai login only Terminal dependency: Closing the claude process ends the session Single remote connection: Outside server mode, only one remote connection per session is allowed Version: Requires Claude Code v2.1.51 or later (check with claude --version) Insight The real value of Remote Control isn\u0026rsquo;t \u0026ldquo;remote access\u0026rdquo; — it\u0026rsquo;s context preservation. A Claude Code session accumulates conversation history, the context of files already read, and active MCP server connections. Being able to switch devices without losing any of that is the point. A comment from the GeekNews discussion — \u0026ldquo;I can already see the YouTube videos about vibe coding from a café\u0026rdquo; — captures this feature\u0026rsquo;s use pattern perfectly. Combined with cmux\u0026rsquo;s notification system — monitoring multiple agents in cmux, then picking up with Remote Control on mobile when you step away — you have a complete multi-device agentic coding workflow.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-claude-code-remote-control/cover-en.jpg","permalink":"/posts/2026-03-16-claude-code-remote-control/","title":"Claude Code Remote Control — Pick Up Your Coding Session From Any Device"},{"content":"Overview Anthropic has launched the Claude for Chrome extension. You can now invoke Claude directly inside your browser without switching to a separate tab or app. Simultaneously, from March 13 to March 27, Anthropic is running a promotion that doubles usage limits during off-peak hours.\nClaude for Chrome Extension graph LR A[\"Browsing the Web\"] --\u003e B[\"Invoke Claude Extension\"] B --\u003e C[\"Current Page Context Passed to Claude\"] C --\u003e D[\"Claude Response\"] D --\u003e E[\"Displayed Inline in Browser\"]Claude for Chrome is available on the Chrome Web Store. Key capabilities:\nIn-browser invocation: Pass the current web page\u0026rsquo;s context to Claude instantly Claude Code integration: Works alongside Claude Code for code review, doc summarization, etc. Background tasks: Run tasks in the background and get a notification on completion Scheduled workflows: Automated execution of scheduled tasks The strategic significance is broader access to Claude. Previously you needed the claude.ai site, the desktop app, or the API. Now Claude is reachable from anywhere in the browser with a single shortcut. ChatGPT, Gemini, and Perplexity already offer browser extensions — Anthropic has now joined the field.\nMarch 2x Usage Promotion Detail Value Period 2026.03.13 – 2026.03.27 Plans Free, Pro, Max, Team (Enterprise excluded) Condition Off-peak hours (outside ET 8AM–2PM / PT 5AM–11AM) Activation Automatic (no sign-up required) Weekly limit Bonus usage does not count against the weekly cap graph TD A[\"Any Weekday\"] --\u003e B{\"Check Time Zone\"} B --\u003e|\"ET 8AM-2PM \u0026lt;br/\u0026gt; (Peak)\"| C[\"Normal usage\"] B --\u003e|\"All other hours \u0026lt;br/\u0026gt; (Off-peak)\"| D[\"2x usage\"] D --\u003e E[\"Doesn't count against weekly cap\"]For users outside the US: ET 8AM–2PM corresponds to roughly 10PM–4AM in Korea, Japan, and other East Asian time zones. This means daytime hours in East Asia are almost entirely off-peak, making the 2x bonus available throughout a normal workday.\nThe promotion covers Claude web, desktop, mobile, Cowork, Claude Code, Claude for Excel, and Claude for PowerPoint.\nClaude Platform Expansion Strategy graph TD A[\"Claude Platform\"] --\u003e B[\"claude.ai \u0026lt;br/\u0026gt; Web/Desktop/Mobile\"] A --\u003e C[\"Claude Code \u0026lt;br/\u0026gt; Terminal/VS Code/JetBrains\"] A --\u003e D[\"Claude for Chrome \u0026lt;br/\u0026gt; Browser extension\"] A --\u003e E[\"Claude for Office \u0026lt;br/\u0026gt; Excel/PowerPoint\"] A --\u003e F[\"Claude for Slack\"] A --\u003e G[\"Cowork \u0026lt;br/\u0026gt; Autonomous agent\"]Anthropic is expanding Claude from a single chatbot into an AI layer present across every work environment. Terminal (Claude Code), browser (Chrome), office (Excel/PowerPoint), collaboration (Slack), autonomous agent (Cowork) — Claude now exists on nearly every surface where a developer works.\nInsight Launching the Chrome extension and running a usage promotion at the same time is a clear strategy: raise accessibility (extension), lower the cost of trying it (promotion), and build habits. The timing advantage for users in East Asian time zones — where business hours fall almost entirely in off-peak periods — is notable. Through March 27, both Claude Code and the web interface carry 2x usage, making it a good window to try new features or tackle a large-scale refactor.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-claude-for-chrome/cover-en.jpg","permalink":"/posts/2026-03-16-claude-for-chrome/","title":"Claude for Chrome — Anthropic's Strategy to Embed AI Into the Browser"},{"content":"Overview Anthropic has added a beta feature to Claude that generates interactive charts, diagrams, and visualizations directly within the conversation. It builds on last fall\u0026rsquo;s \u0026ldquo;Imagine with Claude\u0026rdquo; preview and existing Artifacts functionality — with the key difference that visuals are embedded in the chat body itself as \u0026ldquo;temporary visualizations,\u0026rdquo; not pushed to a side panel.\nThe Core Change: No Code, Right in the Flow graph TD A[\"User Request\"] --\u003e B{\"Claude Decides\"} B --\u003e|\"Text is better\"| C[\"Standard text response\"] B --\u003e|\"Visual is better\"| D[\"Interactive chart generated\"] D --\u003e E[\"Embedded in chat body\"] E --\u003e F[\"User interacts \u0026lt;br/\u0026gt; Click, change values\"] F --\u003e G[\"Refine via conversation\"] G --\u003e DTwo things define this feature. First, asking \u0026ldquo;draw that as a diagram\u0026rdquo; or \u0026ldquo;show how this changes over time\u0026rdquo; triggers immediate generation — and Claude may also auto-generate a visualization when it judges a diagram would communicate faster. Second, the output is an ephemeral tool, not a permanent document.\nYou can generate a compound interest graph and then refine it conversationally — \u0026ldquo;extend it to 20 years,\u0026rdquo; \u0026ldquo;switch to monthly contributions.\u0026rdquo; Clickable periodic tables and interactive decision trees are particularly well-suited to this exploratory format.\nHow It Differs From Artifacts graph LR A[\"Artifacts\"] --\u003e B[\"Side panel \u0026lt;br/\u0026gt; Persisted \u0026lt;br/\u0026gt; Shareable/downloadable\"] C[\"In-Chat Visuals\"] --\u003e D[\"Embedded in chat \u0026lt;br/\u0026gt; Ephemeral \u0026lt;br/\u0026gt; Refined conversationally\"] Artifacts In-Chat Interactive Visuals Location Side panel Answer body Lifespan Permanent (save/share) Temporary (evolves with conversation) Purpose Delivering a deliverable Supporting explanation Modification Separate edit Reflected immediately via conversation Community reports indicate that rendering location varies by environment — some see the inline version, others get an artifact (right panel), and platform support varies across app versions. iOS/iPadOS visual support was reportedly delayed, and some users hit usage limits quickly.\nPractical Use Cases Learning: Clickable periodic tables and decision trees turn \u0026ldquo;reading to learn\u0026rdquo; into \u0026ldquo;exploring to learn.\u0026rdquo; In math and science, watching a graph change the moment you tweak one variable accelerates comprehension dramatically.\nWork meetings: Ask Claude to \u0026ldquo;diagram our funnel by stage\u0026rdquo; or \u0026ldquo;compare hypothesis A vs B in a chart\u0026rdquo; to pull up a temporary dashboard during the meeting and update it in real time as questions come up.\nData analysis: There are reports of automated portfolio visualizations producing results that \u0026ldquo;would have taken a person a week\u0026rdquo; in a matter of minutes.\nImportant Caveat: Impressive ≠ Accurate Testing by The New Stack found that while diagrams looked plausible, some label positions in an aviation pattern diagram were incorrect. A visualization is a UI that aids understanding — it is not a certificate of correctness.\nA practical workflow:\nStart with \u0026ldquo;show this as a table/chart\u0026rdquo; Add \u0026ldquo;also include the assumptions and formulas behind this graph\u0026rdquo; as a verification layer Iterate with \u0026ldquo;change just one variable and compare\u0026rdquo; This feature is available on all plans (Free, Pro, Max, Team).\nInsight Claude\u0026rsquo;s in-chat interactive charts are a signal of the transition from AI delivering answers in text to users exploring answers by interacting. Combining text-based conversation with visual exploration is a direction shared with ChatGPT Canvas and Gemini\u0026rsquo;s multimodal output — a glimpse of how AI interfaces are evolving. Since it\u0026rsquo;s still in beta, rendering location, speed, and platform support may be inconsistent. The most important habit to maintain: don\u0026rsquo;t get swept up in an impressive-looking visualization — always ask for the underlying data and assumptions alongside it.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-claude-interactive-visuals/cover-en.jpg","permalink":"/posts/2026-03-16-claude-interactive-visuals/","title":"Claude In-Chat Interactive Visuals — When a Conversation Becomes a Dashboard"},{"content":"Overview Run three or four AI coding agents simultaneously and your terminal explodes. Fifteen iTerm2 tabs, eight tmux sessions — you spend time just hunting for which agent is waiting on input. cmux is a macOS-native terminal designed from the ground up to solve exactly this problem.\nBuilt with Swift + AppKit and using Ghostty\u0026rsquo;s libghostty as its rendering engine, cmux is completely free under the AGPL license. It displays git branch, PR status, open ports, and notification text in a real-time workspace sidebar; supports inter-pane communication via read-screen; and provides a full automation API through a built-in browser. This isn\u0026rsquo;t a comparison to traditional terminal multiplexers — it\u0026rsquo;s a new category of tool built for AI agents.\nComparison with tmux is covered in a separate post.\nArchitecture: A New Layer on Top of Ghostty cmux is not a fork of Ghostty. It\u0026rsquo;s a separate app that uses libghostty as a library — the same relationship Safari has to WebKit, not a fork of it. Mitchell Hashimoto (creator of both Ghostty and HashiCorp) gave it a positive mention as \u0026ldquo;another libghostty-based project.\u0026rdquo;\ngraph TD A[\"cmux.app \u0026lt;br/\u0026gt; Swift + AppKit\"] --\u003e B[\"libghostty \u0026lt;br/\u0026gt; GPU-accelerated terminal rendering\"] A --\u003e C[\"Vertical Tab Sidebar \u0026lt;br/\u0026gt; git branch, PR, ports\"] A --\u003e D[\"Notification System \u0026lt;br/\u0026gt; OSC 9/99/777 + macOS notifications\"] A --\u003e E[\"Built-in Browser \u0026lt;br/\u0026gt; Full automation API\"] A --\u003e F[\"Socket API \u0026lt;br/\u0026gt; CLI automation + read-screen\"] A --\u003e G[\"Session Restore \u0026lt;br/\u0026gt; Layout + metadata\"] F --\u003e H[\"UNIX domain socket \u0026lt;br/\u0026gt; CMUX_SOCKET_PATH\"] H --\u003e I[\"cmux CLI \u0026lt;br/\u0026gt; identify, send, read-screen, \u0026lt;br/\u0026gt; split, browser, notify\"]GPU-accelerated rendering comes from Ghostty unchanged — same speed. cmux adds workspace management, notifications, browser integration, and CLI automation on top. Communication is via UNIX domain socket; each pane automatically receives a CMUX_SOCKET_PATH environment variable.\nExisting Ghostty users don\u0026rsquo;t need a separate Ghostty installation. cmux bundles libghostty itself. And installing both Ghostty and cmux causes no conflicts.\nInstallation and Initial Setup Homebrew Install brew tap manaflow-ai/cmux \u0026amp;\u0026amp; brew install --cask cmux Or download the DMG directly from the official site.\nCLI Symlink To use the cmux CLI from anywhere in the terminal, set up a symlink.\nsudo ln -sf /Applications/cmux.app/Contents/MacOS/cmux-cli /usr/local/bin/cmux The GUI works without this, but CLI automation commands like cmux send and cmux read-screen require it.\nGhostty Config Compatibility cmux reads your existing Ghostty config file directly.\n~/.config/ghostty/config Font, theme, and color settings are applied automatically. Coming from Ghostty, you get an identical terminal environment with no extra configuration. New users can start with cmux\u0026rsquo;s defaults immediately.\nVerify CLI Installation # Confirm CLI is installed cmux identify --json # Check environment variables env | grep CMUX cmux identify prints the current workspace, surface, and pane IDs. Run this inside a cmux terminal.\nTroubleshooting Symptom Cause Fix cmux: command not found CLI symlink not set up Run the sudo ln -sf command Socket connection error cmux app not running Launch cmux.app first Ghostty config conflict Incompatible config keys Separate into cmux-specific config Font looks different Ghostty config path mismatch Check ~/.config/ghostty/config Core Concept: The Hierarchy cmux\u0026rsquo;s hierarchy is easiest to understand with a building analogy.\ngraph TD W[\"Window \u0026lt;br/\u0026gt; macOS window = Building\"] --\u003e WS1[\"Workspace 1 \u0026lt;br/\u0026gt; = Floor \u0026lt;br/\u0026gt; git branch, PR, ports\"] W --\u003e WS2[\"Workspace 2 \u0026lt;br/\u0026gt; = Floor\"] WS1 --\u003e S1[\"Surface 1 \u0026lt;br/\u0026gt; = Desk (tab)\"] WS1 --\u003e S2[\"Surface 2 \u0026lt;br/\u0026gt; = Desk (tab)\"] S1 --\u003e P1[\"Pane A \u0026lt;br/\u0026gt; = Room (split area)\"] S1 --\u003e P2[\"Pane B \u0026lt;br/\u0026gt; = Room (split area)\"] S1 --\u003e P3[\"Browser Pane \u0026lt;br/\u0026gt; = Room (browser)\"] S2 --\u003e P4[\"Pane C\"] S2 --\u003e P5[\"Pane D\"] Level Analogy Description Window Building macOS window. Usually just one. Workspace Floor Independent work context. Shown as sidebar tabs. Includes git branch, PR status, ports, notification metadata. Surface Desk A tab inside a workspace. Contains multiple panes. Pane Room A split area running an actual terminal or browser. This maps to tmux\u0026rsquo;s Session \u0026gt; Window \u0026gt; Pane, but the decisive difference is that cmux attaches metadata to each workspace. A single glance at the sidebar tells you the git branch, related PR number, open ports, and latest notification for each project.\nEnvironment Variables cmux automatically injects environment variables into each pane.\nVariable Purpose CMUX_WORKSPACE_ID ID of the workspace this pane belongs to CMUX_SURFACE_ID ID of the surface this pane belongs to CMUX_SOCKET_PATH cmux socket path — used for CLI communication An agent can read CMUX_WORKSPACE_ID to automatically know which project it\u0026rsquo;s running in — no need to pass a project path as a parameter.\nWorkspace Management Workspaces are cmux\u0026rsquo;s top-level work units. They appear as vertical tabs in the sidebar, each updated in real time with:\nGit branch name: currently checked-out branch PR status/number: pull request associated with this branch Working directory: current path Open ports: localhost:3000, localhost:8080, etc. Latest notification text: preview of the last notification Think Firefox\u0026rsquo;s vertical tabs, but for terminals. When switching between 5–6 projects, the sidebar tab alone gives you full context.\nWorkspace Shortcuts Action Shortcut New workspace ⌘N Switch workspace ⌘1 – ⌘8 Rename ⌘⇧R Close ⌘⇧W CLI Workspace Management # Create a new workspace cmux new-workspace --name \u0026#34;my-project\u0026#34; # List workspaces cmux list-workspaces # Get current workspace info cmux identify --json Sample cmux identify --json output:\n{ \u0026#34;workspace_id\u0026#34;: \u0026#34;ws-abc123\u0026#34;, \u0026#34;surface_id\u0026#34;: \u0026#34;sf-def456\u0026#34;, \u0026#34;pane_id\u0026#34;: \u0026#34;pn-ghi789\u0026#34; } These IDs are used to target specific panes in cmux send and cmux read-screen.\nSurfaces and Panes Surface A surface is a tab inside a workspace. One workspace can hold multiple surfaces, each maintaining a different work context.\nAction Shortcut New surface ⌘T Next surface ⌘⇧] Previous surface ⌘⇧[ Close surface ⌘W Pane A pane is a split region of a surface — horizontal or vertical. Each pane runs an independent terminal session or browser.\nAction Shortcut Split right ⌘D Split down ⌘⇧D Move to pane (left) ⌥⌘← Move to pane (right) ⌥⌘→ Move to pane (up) ⌥⌘↑ Move to pane (down) ⌥⌘↓ Key difference: no prefix key. tmux requires pressing Ctrl+b before every command key. cmux uses native macOS shortcuts directly — ⌘D to split, ⌥⌘→ to move. Users coming from iTerm2 or VS Code\u0026rsquo;s integrated terminal face almost no learning curve.\nCLI Pane Management # Split right cmux split --direction right # Split down cmux split --direction down # Send command to a specific pane cmux send --pane-id \u0026lt;target-pane-id\u0026gt; \u0026#34;npm run dev\u0026#34; Notification System cmux\u0026rsquo;s notification system is layered. When running multiple AI agents simultaneously, it\u0026rsquo;s designed to instantly answer: \u0026ldquo;which agent is waiting on my input?\u0026rdquo;\n4-Level Notifications Pane notification ring (blue ring): A blue ring appears around a pane that is waiting for input. Instantly identifies which pane on the current surface needs attention.\nSidebar unread badge: When a notification fires in another workspace, the sidebar tab shows an unread count. You can check the status of other projects without leaving your current workspace.\nIn-app notification panel: Open the panel with ⌘I to see all notifications in chronological order, with the workspace and pane context for each.\nmacOS desktop notification: macOS notification center fires even when cmux doesn\u0026rsquo;t have focus. You\u0026rsquo;ll know when an agent needs input even while working in a browser.\nNotification Shortcuts Action Shortcut Open notification panel ⌘I Jump to most recent unread ⌘⇧U ⌘⇧U is especially useful with 5 agents running across 5 workspaces — one shortcut jumps you directly to the pane of the agent that most recently requested input.\nStandard Escape Sequence Support cmux notifications use standard terminal escape sequences (OSC 9, OSC 99, OSC 777). Any tool that outputs these sequences automatically triggers cmux notifications with no plugins or configuration needed.\nCLI Notifications # Send a custom notification cmux notify --title \u0026#34;Build done\u0026#34; --body \u0026#34;Success\u0026#34; # In a CI/CD script npm run build \u0026amp;\u0026amp; cmux notify --title \u0026#34;Build\u0026#34; --body \u0026#34;Build succeeded\u0026#34; \\ || cmux notify --title \u0026#34;Build\u0026#34; --body \u0026#34;Build FAILED\u0026#34; # Long-running task completion python train_model.py \u0026amp;\u0026amp; cmux notify --title \u0026#34;Training\u0026#34; --body \u0026#34;Model training complete\u0026#34; This integrates with Claude Code hooks — configure agents to automatically notify when they complete specific tasks.\nread-screen and send: Inter-Agent Communication These two features are what elevate cmux from a terminal app to an agent communication platform.\nread-screen Read the terminal output of another pane from within one pane.\n# Read the current screen of a target pane cmux read-screen --pane-id \u0026lt;target-pane-id\u0026gt; This returns the text currently displayed in the specified pane. Agent A can read Agent B\u0026rsquo;s output and decide what to do next based on it.\nPractical Scenarios # Agent A: check test results in another pane TEST_OUTPUT=$(cmux read-screen --pane-id $TEST_PANE_ID) if echo \u0026#34;$TEST_OUTPUT\u0026#34; | grep -q \u0026#34;FAIL\u0026#34;; then echo \u0026#34;Test failure detected — starting fix\u0026#34; fi # Agent B: monitor build server status BUILD_STATUS=$(cmux read-screen --pane-id $BUILD_PANE_ID) if echo \u0026#34;$BUILD_STATUS\u0026#34; | grep -q \u0026#34;compiled successfully\u0026#34;; then cmux notify --title \u0026#34;Build\u0026#34; --body \u0026#34;Build succeeded\u0026#34; fi Similar to tmux\u0026rsquo;s capture-pane, but read-screen is designed with a clear intent: inter-agent communication. Pane IDs are injected via environment variables, so agents can programmatically discover both their own ID and the IDs of neighboring panes.\nsend Send commands to another pane programmatically.\n# Send to a specific pane cmux send --pane-id \u0026lt;target-pane-id\u0026gt; \u0026#34;npm run test\u0026#34; # Send to current surface cmux send --surface-id \u0026lt;target-surface-id\u0026gt; \u0026#34;cd ~/projects/my-app\u0026#34; # Send sequentially to multiple panes cmux send --pane-id $PANE_1 \u0026#34;git pull\u0026#34; cmux send --pane-id $PANE_2 \u0026#34;npm install\u0026#34; cmux send --pane-id $PANE_3 \u0026#34;docker compose up -d\u0026#34; read-screen + send Combined Combining both, an agent can read another agent\u0026rsquo;s state and issue commands in response — an autonomous workflow.\n# Agent A: check build pane status and proceed while true; do STATUS=$(cmux read-screen --pane-id $BUILD_PANE) if echo \u0026#34;$STATUS\u0026#34; | grep -q \u0026#34;ready on\u0026#34;; then cmux send --pane-id $TEST_PANE \u0026#34;npm run e2e\u0026#34; cmux notify --title \u0026#34;Pipeline\u0026#34; --body \u0026#34;E2E tests started\u0026#34; break fi sleep 2 done Built-in Browser cmux can open a browser pane inside the same window as the terminal. This enables workflows like viewing a PR page alongside Claude Code modifying code, or checking a localhost dev server result immediately.\nBasic Usage # Open a standalone browser window cmux browser open http://localhost:3000 # Open as a split pane in the current surface cmux browser open-split http://localhost:3000 # Navigate the current browser pane cmux browser navigate https://github.com/my/repo/pull/42 # Back/forward cmux browser back cmux browser forward # Reload cmux browser reload # Get current URL cmux browser url open-split is the key command. The browser appears as one side of a terminal split — code and result visible simultaneously without leaving the screen.\nBrowser Automation in Depth The built-in browser is not just a viewer. It provides a full Playwright-level automation API.\nWaiting # Wait for element by CSS selector cmux browser wait --selector \u0026#34;.login-form\u0026#34; # Wait for text to appear cmux browser wait --text \u0026#34;Dashboard loaded\u0026#34; # Wait for URL change cmux browser wait --url-contains \u0026#34;/dashboard\u0026#34; # Wait for page load state cmux browser wait --load-state networkidle # Wait for JavaScript function result cmux browser wait --function \u0026#34;document.readyState === \u0026#39;complete\u0026#39;\u0026#34; DOM Manipulation cmux browser click --selector \u0026#34;#submit-button\u0026#34; cmux browser dblclick --selector \u0026#34;.editable-cell\u0026#34; cmux browser hover --selector \u0026#34;.dropdown-trigger\u0026#34; cmux browser focus --selector \u0026#34;#email-input\u0026#34; cmux browser check --selector \u0026#34;#agree-terms\u0026#34; cmux browser type --selector \u0026#34;#search\u0026#34; --text \u0026#34;query\u0026#34; cmux browser fill --selector \u0026#34;#email\u0026#34; --text \u0026#34;user@example.com\u0026#34; cmux browser press --key \u0026#34;Enter\u0026#34; cmux browser select --selector \u0026#34;#country\u0026#34; --value \u0026#34;KR\u0026#34; cmux browser scroll --selector \u0026#34;.content\u0026#34; --direction down Inspection cmux browser snapshot # Accessibility-tree-based page snapshot cmux browser screenshot --output /tmp/page.png # Screenshot cmux browser get text --selector \u0026#34;.result-count\u0026#34; # Extract text cmux browser get html --selector \u0026#34;.article-body\u0026#34; # Extract HTML cmux browser get value --selector \u0026#34;#price-input\u0026#34; # Get input value cmux browser get attr --selector \u0026#34;img.logo\u0026#34; --attr \u0026#34;src\u0026#34; # Get attribute cmux browser get count --selector \u0026#34;.list-item\u0026#34; # Count elements cmux browser is visible --selector \u0026#34;.modal\u0026#34; cmux browser is enabled --selector \u0026#34;#submit\u0026#34; cmux browser is checked --selector \u0026#34;#newsletter\u0026#34; cmux browser find role --role \u0026#34;button\u0026#34; cmux browser find text --text \u0026#34;Submit\u0026#34; cmux browser find label --label \u0026#34;Email address\u0026#34; cmux browser get title cmux browser get url JavaScript Execution cmux browser eval \u0026#34;document.querySelectorAll(\u0026#39;.item\u0026#39;).length\u0026#34; cmux browser addinitscript \u0026#34;window.__TEST_MODE = true\u0026#34; cmux browser addscript --url \u0026#34;https://cdn.example.com/helper.js\u0026#34; cmux browser addstyle \u0026#34;body { background: #f0f0f0; }\u0026#34; State Management cmux browser cookies cmux browser storage cmux browser state --save /tmp/browser-state.json cmux browser state --load /tmp/browser-state.json State save/restore is useful for preserving auth sessions. Log in once, save state, and automation scripts don\u0026rsquo;t need to repeat the login flow.\nTab Management cmux browser tab list cmux browser tab switch --index 2 Automation Pattern Examples Pattern 1: Navigate, Wait, Inspect cmux browser navigate https://github.com/my/repo/pull/42 cmux browser wait --selector \u0026#34;.merge-message\u0026#34; PR_STATUS=$(cmux browser get text --selector \u0026#34;.State\u0026#34;) echo \u0026#34;PR status: $PR_STATUS\u0026#34; Pattern 2: Fill Form and Verify cmux browser fill --selector \u0026#34;#title\u0026#34; --text \u0026#34;Fix: resolve memory leak\u0026#34; cmux browser fill --selector \u0026#34;#body\u0026#34; --text \u0026#34;Closes #123\u0026#34; cmux browser click --selector \u0026#34;#create-pr\u0026#34; cmux browser wait --text \u0026#34;Pull request created\u0026#34; Pattern 3: Capture Debug Artifacts on Failure cmux browser click --selector \u0026#34;#deploy-button\u0026#34; cmux browser wait --text \u0026#34;Deployed\u0026#34; || { cmux browser screenshot --output /tmp/deploy-failure.png cmux browser snapshot \u0026gt; /tmp/deploy-failure-dom.txt cmux notify --title \u0026#34;Deploy\u0026#34; --body \u0026#34;Deployment failed — screenshot saved\u0026#34; } CLI Automation Reference Workspace Management Command Description cmux new-workspace --name \u0026quot;name\u0026quot; Create new workspace cmux list-workspaces List workspaces cmux identify Print workspace/surface/pane IDs for current pane cmux identify --json Print IDs in JSON format Panes and Splits Command Description cmux split --direction right Split right cmux split --direction down Split down Communication Command Description cmux send \u0026quot;command\u0026quot; Send command to current pane cmux send --pane-id ID \u0026quot;command\u0026quot; Send command to specific pane cmux send --surface-id ID \u0026quot;command\u0026quot; Send command to specific surface cmux read-screen Read current pane\u0026rsquo;s screen cmux read-screen --pane-id ID Read specific pane\u0026rsquo;s screen Notifications Command Description cmux notify --title \u0026quot;T\u0026quot; --body \u0026quot;B\u0026quot; Send notification Browser Command Description cmux browser open URL Open browser cmux browser open-split URL Open browser as split pane cmux browser navigate URL Navigate to URL cmux browser snapshot Page snapshot cmux browser screenshot Screenshot cmux browser click --selector S Click element cmux browser wait --selector S Wait for element cmux browser eval \u0026quot;JS\u0026quot; Execute JavaScript Auto-injected Environment Variables CMUX_WORKSPACE_ID=ws-abc123 CMUX_SURFACE_ID=sf-def456 CMUX_SOCKET_PATH=/tmp/cmux-socket-xyz Scripts can use these without hardcoding to automatically detect current context.\nMulti-Agent Workflow cmux\u0026rsquo;s real value shows when managing multiple AI agents simultaneously. Here\u0026rsquo;s a practical multi-agent setup script.\nProject Setup Automation #!/bin/bash # cmux multi-agent workflow setup script # 1. Create project workspace cmux new-workspace --name \u0026#34;my-project\u0026#34; # 2. Navigate to project directory in main agent pane cmux send \u0026#34;cd ~/projects/my-app\u0026#34; # 3. Split right for second agent pane cmux split --direction right # 4. Start dev server in second pane cmux send --surface-id right \u0026#34;npm run dev\u0026#34; # 5. Open browser split to check localhost result cmux browser open-split http://localhost:3000 Claude Code Multi-Agent Pattern # Workspace 1: Backend agent cmux new-workspace --name \u0026#34;backend\u0026#34; cmux send \u0026#34;cd ~/projects/api \u0026amp;\u0026amp; claude\u0026#34; # Workspace 2: Frontend agent cmux new-workspace --name \u0026#34;frontend\u0026#34; cmux send \u0026#34;cd ~/projects/web \u0026amp;\u0026amp; claude\u0026#34; # Workspace 3: Testing agent cmux new-workspace --name \u0026#34;testing\u0026#34; cmux send \u0026#34;cd ~/projects/api \u0026amp;\u0026amp; claude\u0026#34; # Now the sidebar shows all 3 workspaces at a glance. # Each shows its git branch, PR status, and notifications. # ⌘⇧U jumps instantly to whichever agent is waiting for input. Agent Collaboration Workflow # Pane A: Claude Code modifying code # Pane B: Test runner # Script in Pane B — detect Pane A completion, run tests AGENT_PANE=$1 # Pane A\u0026#39;s ID while true; do SCREEN=$(cmux read-screen --pane-id $AGENT_PANE) # Claude Code\u0026#39;s prompt reappears when work is done if echo \u0026#34;$SCREEN\u0026#34; | grep -q \u0026#34;claude\u0026gt;\u0026#34;; then cmux notify --title \u0026#34;Agent\u0026#34; --body \u0026#34;Code update complete — running tests\u0026#34; npm run test break fi sleep 5 done Session Restore When you reopen cmux after closing it, the previous state is restored.\nWhat Is Restored Workspace layout (split structure, pane arrangement) Workspace metadata (name, git branch, etc.) Working directory for each pane URL for browser panes What Is Not Restored Running processes: Live processes like Claude Code sessions or npm run dev are not restored. This is the key difference from tmux — tmux keeps sessions alive as long as the server is running; cmux requires you to restart processes. tmux sessions: If you ran tmux inside cmux, the tmux sessions are managed by the tmux server and persist separately. For process persistence, running tmux inside cmux is a practical workaround.\n\u0026ldquo;Primitive, Not Solution\u0026rdquo; Philosophy cmux\u0026rsquo;s core design philosophy is \u0026ldquo;Primitive, Not Solution\u0026rdquo;.\nSolution approach: \u0026ldquo;I\u0026rsquo;ll give you a UI that runs 3 Claude Code agents simultaneously.\u0026rdquo; Primitive approach: \u0026ldquo;I give you read-screen, send, notifications, and a browser API as building blocks — compose the workflow you want.\u0026rdquo; This has several advantages:\nTool independence: Works with any AI agent — not just Claude Code, but Cursor, Windsurf, Codex, Gemini CLI, and whatever comes next. Workflow flexibility: You\u0026rsquo;re not locked into a predetermined workflow. Each team and project can combine primitives differently. Future compatibility: New AI tools can plug into the existing primitives without changes. The tradeoff: initial setup is more complex, and finding the optimal workflow requires experimentation. This is a higher barrier to entry compared to \u0026ldquo;complete solution\u0026rdquo; tools like Claude Squad.\nCompetitive Landscape The AI agent terminal space has grown rapidly since the second half of 2025.\nTool Approach Notable traits cmux Provides primitives Native macOS, Ghostty-based, read-screen, browser automation Claude Squad Agent orchestration GitHub-based, focused on agent lifecycle management Pane Terminal for AI agents Agent state visualization Amux AI-centric multiplexer Aims to replace tmux Calyx Emerging competitor Different approach from cmux, growing fast Community Reception Positive feedback from Google DeepMind Research Director Edward Grefenstette, Dagster founder Nick Schrock, and HashiCorp founder Mitchell Hashimoto. Japanese developer communities are reporting migrations along the path \u0026ldquo;Warp → Ghostty → cmux.\u0026rdquo;\nOn Hacker News, interest in the features was balanced by concerns about stability. Fast update cycles and macOS-only availability were notable discussion points.\nThe most-mentioned real-world workflow:\n\u0026ldquo;One vertical tab per WIP task. Claude Code on one side, browser with PR and resources on the other. Context switching feels natural.\u0026rdquo;\nLimitations Know these before adopting cmux.\nmacOS Only Requires macOS 14.0+. No plans for Linux or Windows support. Given that local AI coding agent workflows predominantly happen on macOS, this isn\u0026rsquo;t an immediate dealbreaker — but it is a real constraint.\nNo Process Persistence Closing the app kills running processes. Layout and metadata are restored; Claude Code sessions and dev servers are not — you restart them manually. This is the biggest structural weakness relative to tmux.\nFast Update Cadence Active development means APIs and features change frequently. Factor in version dependencies when writing automation scripts.\nStability Stability concerns were raised on Hacker News. Thorough testing is recommended before making cmux a critical tool in a production workflow.\nFAQ Question Answer Is cmux paid? No. Completely free under the AGPL license. Do I need to install Ghostty separately? No. libghostty is bundled. Can I use tmux inside cmux? Yes. Can I use cmux for SSH? You can run SSH inside a cmux pane, but cmux itself cannot be installed on a remote server. Quick Links cmux official site — docs, download, tutorials cmux concepts docs cmux getting started docs cmux GitHub cmux Homebrew — brew install --cask cmux cmux intro and guide (daleseo.com) cmux analysis (goddaehee) cmux intro video tmux vs cmux comparison Insight cmux\u0026rsquo;s position is clear: treat terminal rendering as a solved problem (delegate it to libghostty) and focus entirely on the agent UX layer above it. The hard problem of GPU-accelerated rendering is handed off to the Ghostty library; cmux invests its effort in workspace metadata, layered notifications, inter-agent communication, and browser automation.\nThe read-screen + send combination is particularly notable because it enables \u0026ldquo;conversation\u0026rdquo; between agents. Agent A reading Agent B\u0026rsquo;s output and reacting to it is not just multiplexing — it\u0026rsquo;s foundational infrastructure for agent orchestration.\nThe depth of the browser automation API is also impressive. navigate → wait → inspect, fill → click → verify, screenshot capture on failure — Playwright-level automation from a single terminal CLI command. Agents that manipulate and verify web UIs directly are fully self-contained inside cmux, without additional tooling.\n\u0026ldquo;Primitive, Not Solution\u0026rdquo; is a double-edged sword. You gain the generality to work with any agent, but pay the cost of initial complexity. Competitors like Calyx are rising fast with more opinionated solutions — worth watching.\nStill macOS-only with no process persistence — real structural constraints. But given that AI agent-centric development is predominantly happening on macOS today, and given the community\u0026rsquo;s fast growth, cmux has become the most fully-realized tool in this space.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-cmux-terminal/cover-en.jpg","permalink":"/posts/2026-03-16-cmux-terminal/","title":"cmux — A macOS-Native Terminal Designed for the AI Agent Era"},{"content":"Overview For small teams, the biggest bottleneck in marketing isn\u0026rsquo;t creativity — it\u0026rsquo;s producing consistent, on-brand assets across multiple channels quickly. Pomelli, released by Google Labs, addresses this directly: paste in a website URL and it extracts your brand DNA, then generates campaign creatives tailored to that brand.\nThe Three-Step Workflow graph TD A[\"1. Enter website URL\"] --\u003e B[\"Pomelli analyzes the site \u0026lt;br/\u0026gt; copy, visuals, colors\"] B --\u003e C[\"Business DNA created \u0026lt;br/\u0026gt; tone, fonts, image style\"] C --\u003e D[\"2. Campaign ideas proposed\"] D --\u003e E[\"User selects an idea \u0026lt;br/\u0026gt; or enters a custom prompt\"] E --\u003e F[\"3. Channel-specific creatives generated\"] F --\u003e G[\"In-app editing \u0026lt;br/\u0026gt; refine copy and images\"] G --\u003e H[\"Download \u0026lt;br/\u0026gt; → publish to your channels\"]Step 1: Business DNA Enter your website URL and Pomelli analyzes the copy and visual elements to build a brand profile — your Business DNA. This profile captures brand tone, font style, image aesthetic, and color palette.\nOne important caveat: Pomelli follows the brand as it exists on your site, not the brand you aspire to. If your site is outdated or has inconsistent tone across pages, the extracted DNA will reflect that inconsistency. It\u0026rsquo;s worth cleaning up your key pages before you start.\nStep 2: Campaign Ideas Once your Business DNA is ready, Pomelli suggests campaign themes aligned with your brand. You can also write your own prompt. Short, specific prompts work best — structure them as \u0026ldquo;target audience + value proposition + desired action\u0026rdquo;. Example: \u0026ldquo;First-time visitors, 10% off, drive booking link clicks.\u0026rdquo;\nStep 3: Creative Generation \u0026amp; Editing Pomelli generates assets for social, web, and ads. You can edit copy and images directly in the app, then download. It stops short of auto-publishing — the workflow is AI drafts, human approves.\nUse Cases Scenario Example Pomelli\u0026rsquo;s Role Seasonal campaign Spring limited menu launch Instagram feed images + caption variations in café brand tone Product launch \u0026ldquo;Sugar-free, 7-day trial\u0026rdquo; Launch announcement → review request → return-visit post set Booking / consultation Salon, fitness studio Multiple headline + CTA variations for A/B testing Employer branding Team values, work culture Recruitment creatives that stay on-brand Re-engagement Lapsed customer win-back Discount codes + return messages from multiple angles The standout strength is rapid variation. Take one campaign theme and quickly spin out multiple versions with different tones — casual vs. premium, for example.\nCaveats Verify Business DNA matches your current brand before using it (an outdated site produces outdated tone) Factual details — product names, prices, discount terms — must be human-verified before publishing Health, finance, and education sectors: check for compliance with advertising regulations and required disclosures Google Labs public beta — quality may vary, and availability by region and language may be limited Despite the name sounding like \u0026ldquo;Pomodoro,\u0026rdquo; this is a marketing tool, not a time-management app Insight The core problem Pomelli solves is eliminating the need to re-explain your brand every time you open an AI tool. Instead of starting each session with \u0026ldquo;our brand tone is casual but professional, our colors are\u0026hellip;,\u0026rdquo; Pomelli auto-extracts a persistent profile from your website and applies it consistently. This is the same pattern as Claude\u0026rsquo;s CLAUDE.md or Cursor\u0026rsquo;s .cursorrules — set context once, reuse it forever. Seeing Google apply this pattern to an SMB marketing tool is an interesting signal about where AI tooling is heading.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-google-pomelli/cover-en.jpg","permalink":"/posts/2026-03-16-google-pomelli/","title":"Google Pomelli — AI Marketing Tool That Builds On-Brand Content from a URL"},{"content":"Overview Once you start using Claude Code seriously, sessions accumulate fast. api-refactor, debug-pipeline, write-tests — each running in its own tmux session. Telling at a glance which agents are waiting for input and which are still working becomes a real problem. recon is a tmux-native dashboard built to solve exactly that.\nArchitecture: A TUI on Top of tmux graph TD A[\"tmux server\"] --\u003e B[\"session: api-refactor \u0026lt;br/\u0026gt; Claude Code\"] A --\u003e C[\"session: debug-pipeline \u0026lt;br/\u0026gt; Claude Code\"] A --\u003e D[\"session: write-tests \u0026lt;br/\u0026gt; Claude Code\"] B --\u003e E[\"recon TUI\"] C --\u003e E D --\u003e E E --\u003e F[\"tmux list-panes \u0026lt;br/\u0026gt; PID, session name\"] E --\u003e G[\"~/.claude/sessions/ \u0026lt;br/\u0026gt; PID.json\"] E --\u003e H[\"tmux capture-pane \u0026lt;br/\u0026gt; status bar text\"]recon is written in Rust (98K lines) and assumes each Claude Code instance runs in its own tmux session. Status detection works by reading the status bar text at the bottom of each pane:\nStatus bar text State Meaning esc to interrupt Working Streaming a response or executing a tool Esc to cancel Input Waiting for permission approval — needs your attention other Idle Waiting for the next prompt (0 tokens) New No interaction yet Session matching uses ~/.claude/sessions/{PID}.json — the file Claude Code itself writes — rather than parsing ps output or relying on CWD heuristics, which makes it accurate.\nTwo Views Table View (default) ┌─ recon — Claude Code Sessions ─────────────────────────────────────┐ │ # Session Git(Branch) Status Model Context │ │ 1 api-refactor feat/auth ● Input Opus 4.6 45k/1M │ │ 2 debug-pipeline main ● Work Sonnet 4.6 12k/200k │ │ 3 write-tests feat/auth ● Work Haiku 4.5 8k/200k │ │ 4 code-review pr-452 ● Idle Sonnet 4.6 90k/200k │ └────────────────────────────────────────────────────────────────────┘ Git repo name and branch, model name, and context usage (e.g., 45k/1M) are visible at a glance. Rows in Input state are highlighted so they immediately draw your eye.\nTamagotchi View Each agent is represented as a pixel art character. Working is a green blob with legs, Input is an angry orange blob (blinking), Idle is a blue blob with a Zzz, and New is an egg. Agents are grouped into \u0026ldquo;rooms\u0026rdquo; by working directory and paginated in a 2×2 grid.\nIt\u0026rsquo;s designed to be thrown on a side monitor — one glance tells you which agents are working, sleeping, or need attention.\nKey Features Live status: Polls every 2 seconds with incremental JSONL parsing Git-aware: Shows repo name and branch per session Context tracking: Token usage displayed as used/available (e.g., 45k/1M) Model display: Shows Claude model name and effort level Resume picker: recon resume scans past sessions, press Enter to resume JSON mode: recon --json for scripting and automation recon next: Jump directly to the next agent in Input state tmux Integration # Add to ~/.tmux.conf bind g display-popup -E -w 80% -h 60% \u0026#34;recon\u0026#34; # prefix + g → dashboard bind n display-popup -E -w 80% -h 60% \u0026#34;recon new\u0026#34; # prefix + n → new session bind r display-popup -E -w 80% -h 60% \u0026#34;recon resume\u0026#34; # prefix + r → resume picker bind i run-shell \u0026#34;recon next\u0026#34; # prefix + i → jump to Input agent It opens as a popup overlay, so you can switch sessions without interrupting your current work.\nInstallation cargo install --path . Requires tmux and Claude Code to be installed. Interestingly, recon\u0026rsquo;s own commit history includes Co-Authored-By: Claude Opus 4.6 — a meta structure where Claude Code was used to build a tool for managing Claude Code.\nInsight recon solves the \u0026ldquo;session management\u0026rdquo; problem of AI coding agents by building on top of tmux — proven, reliable infrastructure. Compared to alternatives like agentsview (a web dashboard) or agf (fzf-based search), being tmux-native is the key differentiator: you never leave the terminal to manage your agents. The Tamagotchi view is both functional and fun, but more importantly it represents a meaningful UX experiment in making agent state intuitively perceptible. If you regularly run three or more Claude Code sessions simultaneously, recon is worth trying.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-recon-claude-code-tmux/cover-en.jpg","permalink":"/posts/2026-03-16-recon-claude-code-tmux/","title":"recon — A tmux Dashboard for Managing Claude Code Agents Like Tamagotchis"},{"content":"Overview Say \u0026ldquo;analyze NVDA\u0026rdquo; and get back scenario analysis (Bull/Base/Bear), probability-weighted R/R Score, eight quarters of financials, and an interactive HTML dashboard. stock-analysis-agent is an institutional-grade stock research automation tool built on top of Claude Code. For US stocks it pulls data directly from SEC filings; for Korean stocks, from the FSS DART OpenAPI.\nCore Principle: Blank Beats Wrong graph TD A[\"Ticker input \u0026lt;br/\u0026gt; NVDA / 005930\"] --\u003e B[\"Data collection\"] B --\u003e C{\"Source verification\"} C --\u003e|\"Grade A: SEC/DART direct\"| D[\"Display number + source tag\"] C --\u003e|\"Grade B: 2+ sources cross-checked\"| D C --\u003e|\"Grade C: single source\"| E[\"Display with warning\"] C --\u003e|\"Unverifiable\"| F[\"— (left blank)\"] D --\u003e G[\"Generate analysis report\"] E --\u003e G F --\u003e GThe agent\u0026rsquo;s core philosophy is \u0026ldquo;show a blank rather than an unverifiable number.\u0026rdquo; This directly addresses AI hallucination — the tendency to produce plausible-looking but fabricated figures. Every number carries a source tag like [Filing], [Portal], or [Calc], and a four-tier confidence system runs from Grade A (original filing) down to Grade D (unverifiable → blank).\nFour Output Modes Mode Name Format Purpose A At-a-glance HTML Decision card + 180-day event timeline — for screening B Benchmark HTML Side-by-side comparison matrix for 2–5 stocks C Chart (default) HTML Interactive dashboard — scenarios, KPIs, charts D Document DOCX 3,000+ word investment memo — Goldman Sachs research note style The Mode C dashboard includes scenario cards (Bull/Base/Bear), an R/R Score badge, KPI tiles (P/E, EV/EBITDA, FCF Yield, etc.), Variant View (where the market is wrong), Precision Risk (causal chain analysis), Chart.js charts, and eight quarters of income statement data.\nDual Data Pipeline graph LR subgraph US[\"US Stocks\"] A1[\"Financial Datasets API\"] --\u003e B1[\"SEC 10-K, 10-Q \u0026lt;br/\u0026gt; Grade A\"] A2[\"Yahoo Finance \u0026lt;br/\u0026gt; TipRanks etc.\"] --\u003e B2[\"Price, consensus \u0026lt;br/\u0026gt; Grade B\"] end subgraph KR[\"Korean Stocks\"] C1[\"DART OpenAPI\"] --\u003e D1[\"Consolidated financials \u0026lt;br/\u0026gt; Grade A\"] C2[\"Naver Finance\"] --\u003e D2[\"Current price, PER \u0026lt;br/\u0026gt; Grade B\"] end B1 --\u003e E[\"Claude Code \u0026lt;br/\u0026gt; Analysis Engine\"] B2 --\u003e E D1 --\u003e E D2 --\u003e E E --\u003e F[\"HTML / DOCX \u0026lt;br/\u0026gt; Report\"]US stocks: When the Financial Datasets API MCP is connected, Grade A data is extracted directly from SEC filings. Without MCP, the agent falls back to web scraping from Yahoo Finance, SEC EDGAR, and TipRanks — but maxes out at Grade B.\nKorean stocks: The DART OpenAPI (Korea\u0026rsquo;s FSS disclosure system) is connected directly. The fnlttSinglAcntAll endpoint fetches consolidated financial statements (IS/BS/CF), while Naver Finance supplies current price, PER, and foreign ownership ratio. The DART API key is free.\nR/R Score — Risk/Reward in a Single Number R/R Score = (Bull_return% × Bull_prob + Base_return% × Base_prob) ───────────────────────────────────────────────────── |Bear_return% × Bear_prob| A probability-weighted average of scenario targets produces a single score. Above 2.0 = Attractive; 1.0–2.0 = Neutral; below 1.0 = Unfavorable.\nVariant View — \u0026ldquo;Where the Market Is Wrong\u0026rdquo; This is the most interesting section. Where typical AI analysis stops at listing pros and cons, stock-analysis-agent identifies the specific points where market consensus is mistaken, backed by company-specific evidence. It extracts three points in Q1–Q3 format, each explaining \u0026ldquo;why the market is missing this.\u0026rdquo;\nUsage # Single stock analysis Analyze NVDA Deep analysis on 005930 # Peer comparison Compare Samsung vs SK Hynix NVDA vs AMD vs INTC # Portfolio / watchlist Scan my watchlist Show catalyst calendar Commands are given conversationally inside Claude Code. The commit history includes Co-Authored-By: Claude Opus 4.6, confirming this agent was itself built with Claude Code.\nInsight The most important pattern stock-analysis-agent demonstrates is solving AI hallucination through system design. Forcing a source tag on every number and leaving blanks when verification fails is a simple rule — but it\u0026rsquo;s a powerful one. The dual pipeline covering both US (SEC) and Korean (DART) markets with direct API integration is also a particularly practical reference for Korean developers. That said, with only 3 stars it\u0026rsquo;s an early-stage project; treat it as a learning resource for architecture and prompt design rather than a production tool.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-stock-analysis-agent/cover-en.jpg","permalink":"/posts/2026-03-16-stock-analysis-agent/","title":"stock-analysis-agent — Automating Institutional-Grade Stock Research with Claude Code"},{"content":"Overview The term \u0026ldquo;vibe coding\u0026rdquo; started with a tweet from Andrej Karpathy and has since established itself as a development paradigm. Vibe Coding Fundamentals In 33 minutes on YouTube is a systematic breakdown of the fundamentals behind this paradigm — the practice of building software by giving AI natural language instructions without writing a single line of code directly.\nWhat Is Vibe Coding? graph TD A[\"Traditional coding\"] --\u003e B[\"Developer writes code \u0026lt;br/\u0026gt; AI assists\"] C[\"Vibe coding\"] --\u003e D[\"Developer communicates intent \u0026lt;br/\u0026gt; AI writes code\"] D --\u003e E[\"Developer validates the result\"] E --\u003e|\"Needs revision\"| D E --\u003e|\"Done\"| F[\"Deploy\"]Karpathy\u0026rsquo;s original framing was simple: \u0026ldquo;I see the code, but I don\u0026rsquo;t read it. When there\u0026rsquo;s an error, I paste the error message straight into the AI. It works most of the time.\u0026rdquo; That\u0026rsquo;s the essence of vibe coding.\nBut this pure form works well for prototyping and falls apart for production code. Vibe Coding Fundamentals presents a structured approach that bridges that gap.\nCore Principles 1. Deliver Clear Context The starting point is giving AI structured documentation — not \u0026ldquo;build me a chat app,\u0026rdquo; but tech stack, directory structure, coding conventions, and business requirements. Files like Claude Code\u0026rsquo;s CLAUDE.md and Cursor\u0026rsquo;s .cursorrules serve exactly this role.\n2. Iterate in Small Units Rather than asking for an entire feature at once, break it into small chunks and cycle through: request → validate → next request. One change per prompt is the key discipline.\n3. Verifiable Outputs \u0026ldquo;Seems to work\u0026rdquo; isn\u0026rsquo;t verification. Use test code or actual run results. This is where TDD and vibe coding converge — write the tests first and have the AI produce code that passes them.\n4. Build the Generator, Not Just the Output Rather than one-off code generation, build a reproducible workflow. Version-control your prompts and capture successful patterns as skill files or rule files.\nThe Vibe Coding Spectrum graph LR A[\"Pure vibe \u0026lt;br/\u0026gt; prototyping\"] --\u003e B[\"Structured vibe \u0026lt;br/\u0026gt; rules + validation\"] B --\u003e C[\"Agentic coding \u0026lt;br/\u0026gt; autonomous execution + harness\"] C --\u003e D[\"Team coding \u0026lt;br/\u0026gt; multi-agent collaboration\"]Vibe coding isn\u0026rsquo;t a single method — it\u0026rsquo;s a spectrum:\nLevel Characteristics Best for Pure vibe Natural language only, minimal validation Prototyping, one-off scripts Structured vibe CLAUDE.md-style rules + TDD Side projects, MVPs Agentic coding Harness + autonomous execution loop Production feature development Team coding Multi-agent + code review Large-scale projects Quick Links Vibe Coding Fundamentals In 33 minutes — Original YouTube video Insight Despite the casual-sounding name, vibe coding done well requires significant engineering discipline. Clear context delivery, small-unit iteration, verifiable outputs — these are the fundamentals of traditional software engineering. What changed is who writes the code, not how good software is made. The principles are the same. This maps exactly to the advice from AI Frontier EP 86: \u0026ldquo;build the generator, not just the output\u0026rdquo; — you\u0026rsquo;re designing the system that produces software, not just producing software once.\n","date":"2026-03-16T00:00:00+09:00","image":"/images/posts/2026-03-16-vibe-coding-fundamentals/cover-en.jpg","permalink":"/posts/2026-03-16-vibe-coding-fundamentals/","title":"Vibe Coding Fundamentals — The Core Principles in 33 Minutes"},{"content":"Overview You\u0026rsquo;re deep in a complex refactoring session with Claude Code and something comes to mind: \u0026ldquo;What was the reason this function was deprecated again?\u0026rdquo; Typing it into the main prompt dirties your conversation history and risks breaking the agent\u0026rsquo;s context. /btw is the side question feature designed to solve exactly this problem.\nHow /btw Works graph TD A[\"Main Conversation in Progress\"] --\u003e B[\"/btw Question Input\"] B --\u003e C[\"Reads Full Conversation Context \u0026lt;br/\u0026gt; Code already read, decisions agreed upon\"] C --\u003e D[\"One-shot Overlay Response\"] D --\u003e E[\"Close with Space/Enter/Escape\"] E --\u003e F[\"Main Conversation History \u0026lt;br/\u0026gt; Unchanged\"] style D fill:#2196F3,color:#fff style F fill:#4CAF50,color:#fffKey properties:\nContext access: Reads the full context of the current conversation — code already seen, decisions made, everything discussed so far. History isolation: The question and answer are handled as a one-shot overlay and never written to the main conversation history. Non-blocking: You can invoke /btw even while Claude is generating its main response. It does not interrupt the main output. /btw vs Subagent: Different Tools for Different Jobs Property /btw Subagent Context Full conversation context ✓ New session, no context Tool use Not available ✗ Available ✓ Conversation One-shot (single response) Multi-turn capable History Not saved Result returned only Cost Reuses prompt cache (low cost) New session cost Rule of thumb:\n\u0026ldquo;Something Claude probably already knows\u0026rdquo; → /btw \u0026ldquo;Something that requires fresh research or exploration\u0026rdquo; → Subagent Limitations No Tool Access /btw cannot use any tools — no file reading, command execution, or web search. It answers purely from what is already in the current conversation context. This is intentional: if tool calls were allowed, a side question could interfere with the main task.\nSingle-Turn Only If you need follow-up questions and a back-and-forth exchange, use a regular prompt instead. /btw is literally \u0026ldquo;by the way\u0026rdquo; — one quick question, then move on.\nCost /btw is designed to reuse the parent conversation\u0026rsquo;s prompt cache, so no new context needs to be built. If you\u0026rsquo;re watching your Claude Code token costs, quick confirmations are cheapest as /btw questions.\nPractical Usage Patterns # Quick check during refactoring /btw Which method was it that you said was deprecated earlier in this file? # Convention check during code review /btw Are we using try-catch or a Result type for error handling in this project? # Referencing a past decision during design discussion /btw What was the reason we went with PostgreSQL earlier? Insight /btw looks like a small feature, but it fills an important gap in Claude Code\u0026rsquo;s conversation model. There was no way to leverage context without polluting the main conversation. This design reflects a real pattern in how developers think during work — \u0026ldquo;wait, what was that again?\u0026rdquo; — without stopping what they\u0026rsquo;re doing. The constraints (no tool access, single-turn) are intentional guardrails to ensure side questions never disrupt the main workflow.\n","date":"2026-03-13T00:00:00+09:00","image":"/images/posts/2026-03-13-claude-code-btw/cover-en.jpg","permalink":"/posts/2026-03-13-claude-code-btw/","title":"Claude Code /btw — Ask Side Questions Without Breaking Your Flow"},{"content":"Overview Gartner predicts Google search traffic will drop 50% by 2028. AI referral traffic grew 527% year-over-year. AI-driven traffic converts at 4.4x the rate of organic. The numbers are clear: SEO\u0026rsquo;s center of gravity is shifting toward GEO (Generative Engine Optimization). geo-seo-claude is a single Claude Code skill that addresses this transition.\nWhat Is GEO? GEO means optimizing for AI search engines — ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews. Traditional SEO was about getting users to click your link in search results. GEO is about getting AI to cite your content.\ngraph LR A[\"Traditional SEO\"] --\u003e B[\"Search Result Rankings \u0026lt;br/\u0026gt; Backlink-focused\"] C[\"GEO\"] --\u003e D[\"AI Citability \u0026lt;br/\u0026gt; Brand mention-focused\"] B --\u003e E[\"User clicks a link\"] D --\u003e F[\"AI cites the content\"] style C fill:#4CAF50,color:#fffKey market signals:\nMetric Value GEO services market $850M+ (projected $7.3B by 2031) Backlinks vs brand mentions (AI visibility) Brand mentions show 3x stronger correlation Domains cited by both ChatGPT and Google AIO Only 11% Marketers actively investing in GEO Only 23% Architecture: 5 Parallel Subagents What makes geo-seo-claude interesting is how textbook-perfectly it demonstrates Claude Code\u0026rsquo;s skill + subagent pattern.\ngraph TD U[\"/geo audit URL\"] --\u003e D[\"Discovery \u0026lt;br/\u0026gt; Fetch homepage + detect business type\"] D --\u003e S1[\"AI Visibility \u0026lt;br/\u0026gt; Citability + crawlers + llms.txt + brand\"] D --\u003e S2[\"Platform Analysis \u0026lt;br/\u0026gt; ChatGPT/Perplexity/AIO coverage\"] D --\u003e S3[\"Technical SEO \u0026lt;br/\u0026gt; Core Web Vitals + SSR + security\"] D --\u003e S4[\"Content Quality \u0026lt;br/\u0026gt; E-E-A-T + readability + freshness\"] D --\u003e S5[\"Schema Markup \u0026lt;br/\u0026gt; Detection + validation + generation\"] S1 --\u003e R[\"Synthesis \u0026lt;br/\u0026gt; GEO Score 0-100\"] S2 --\u003e R S3 --\u003e R S4 --\u003e R S5 --\u003e R R --\u003e O[\"Prioritized Action Plan\"]A single /geo audit command runs 5 subagents simultaneously:\nAI Visibility — citability score, crawler access, llms.txt, brand mentions Platform Analysis — optimization for ChatGPT, Perplexity, Google AIO individually Technical SEO — Core Web Vitals, SSR, security, mobile Content Quality — E-E-A-T, readability, content freshness Schema Markup — detection, validation, JSON-LD generation Key Features AI Citability Scoring Quantifies what makes a text block easy for AI to cite. The optimal citation passage is 134–167 words, self-contained, fact-dense, and directly answers a question.\nAI Crawler Analysis Checks the accessibility of 14+ AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) in robots.txt and provides allow/block recommendations.\nBrand Mention Scanning Scans 7+ platforms (YouTube, Reddit, Wikipedia, LinkedIn, etc.) for brand mentions — which show 3x stronger correlation with AI visibility than backlinks.\nllms.txt Generation Analyzes or generates an llms.txt file, an emerging standard that helps AI crawlers understand site structure.\nScoring Methodology Category Weight AI Citability \u0026amp; Visibility 25% Brand Authority Signals 20% Content Quality \u0026amp; E-E-A-T 20% Technical Foundation 15% Structured Data 10% Platform Optimization 10% Installation and Usage # One-command install curl -fsSL https://raw.githubusercontent.com/zubair-trabzada/geo-seo-claude/main/install.sh | bash # Usage in Claude Code /geo audit https://example.com # Full audit /geo quick https://example.com # 60-second snapshot /geo citability https://example.com # Citability score only /geo report-pdf # Generate PDF report Requires Python 3.8+, Claude Code CLI, and Git. Playwright is optional.\nBusiness Angle The tool itself is free under the MIT license. What\u0026rsquo;s interesting is the accompanying GEO agency business model it presents alongside the tool. GEO agency retainer ranges: $2K–$12K/month. The tool does the auditing; the community teaches you how to sell. 2,264 stars, 369 forks — a notable scale for a Claude Code skill.\nInsight geo-seo-claude demonstrates two things. First, a Claude Code skill can be more than a prompt wrapper — it can be a full software product built on 11 sub-skills + 5 parallel subagents + Python utilities. Second, as AI search replaces traditional search, the SEO → GEO transition is a real business opportunity. \u0026ldquo;AI search is eating traditional search\u0026rdquo; — the tool\u0026rsquo;s own slogan is becoming reality faster than expected.\n","date":"2026-03-13T00:00:00+09:00","image":"/images/posts/2026-03-13-geo-seo-claude/cover-en.jpg","permalink":"/posts/2026-03-13-geo-seo-claude/","title":"geo-seo-claude — Automating GEO for the AI Search Era with Claude Code"},{"content":"Overview GPT-5.4 excels at maintaining long context, multi-step agentic tasks, and grounded synthesis. But these strengths don\u0026rsquo;t emerge automatically. OpenAI\u0026rsquo;s official prompt guide makes a clear point: \u0026ldquo;reduce drift\u0026rdquo; comes before \u0026ldquo;encourage deeper thinking.\u0026rdquo;\nFour Core Techniques graph TD A[\"GPT-5.4 Prompt Optimization\"] --\u003e B[\"1. Output Contract \u0026lt;br/\u0026gt; Lock the format\"] A --\u003e C[\"2. Tool Rules \u0026lt;br/\u0026gt; Persistence/dependencies/parallelization\"] A --\u003e D[\"3. Completeness Contract \u0026lt;br/\u0026gt; Handle everything or mark blocked\"] A --\u003e E[\"4. Verification Loop \u0026lt;br/\u0026gt; Requirements/sources/format/permission\"] B --\u003e F[\"Adjust reasoning_effort \u0026lt;br/\u0026gt; only after these four\"] C --\u003e F D --\u003e F E --\u003e F style F fill:#FF9800,color:#fffRaising reasoning_effort is the last fine-tuning knob. In most cases, the four prompt techniques below give better cost-effectiveness first.\n1. Output Contract Prevents GPT-5.4 from breaking your format by inserting helpful-sounding explanations.\n\u0026lt;output_contract\u0026gt; - Return exactly the sections requested, in the requested order. - If a format is required (JSON, Markdown, SQL, XML), output only that format. \u0026lt;/output_contract\u0026gt; Think of it like a delivery spec. Stating \u0026ldquo;the deliverable must follow this template\u0026rdquo; reduces over-helpfulness and parsing failures.\n2. Tool Persistence Rules The most common agent failure pattern: skipping a prior lookup because the answer seems obvious. Three principles to prevent this:\nPrinciple Description Forced use When accuracy, grounding, or completeness is at stake, tools must be used; if results are empty, retry with a different strategy Dependency check Before acting, verify whether a prior lookup is needed — never skip it even if the final state seems obvious Parallel vs sequential Independent lookups run in parallel for speed; dependent steps run sequentially for accuracy The goal: make tool use a prerequisite, not an option.\n3. Completeness Contract Solves the problem of models \u0026ldquo;halfheartedly finishing\u0026rdquo; long batch tasks.\nEvery requested item must be processed, or marked [blocked] if not When a search returns empty, don\u0026rsquo;t immediately conclude \u0026ldquo;nothing found\u0026rdquo; — try at least 1–2 alternative strategies first (different query, broader filter, prior lookup, different source) These two rules give prompts the \u0026ldquo;endurance\u0026rdquo; to carry long tasks all the way through.\n4. Verification Loop A four-pronged check just before completion:\ngraph LR A[\"Before Completion\"] --\u003e B[\"Requirements \u0026lt;br/\u0026gt; Was everything done?\"] A --\u003e C[\"Sources \u0026lt;br/\u0026gt; Are claims grounded?\"] A --\u003e D[\"Format \u0026lt;br/\u0026gt; Was the schema followed?\"] A --\u003e E[\"External Impact \u0026lt;br/\u0026gt; Was permission granted for irreversible actions?\"]Additional gating rule: \u0026ldquo;If needed information is missing, don\u0026rsquo;t guess — use a lookup tool if possible; if not, ask only the minimum necessary question.\u0026rdquo;\nHandling Mid-Conversation Direction Changes Users frequently change course. The default policy:\nReversible and low-risk → proceed without asking External impact / irreversible / sensitive data → ask for permission Instruction priority: the latest user instruction overrides earlier style rules — except safety, honesty, and privacy, which are never overridden Grounding Research Quality The biggest risk in AI research: blurring the line between what was actually found and what was inferred.\nOnly cite sources actually retrieved in this workflow Never fabricate URLs, IDs, or quotations Place citations inline next to each claim, not bundled at the end Enforce a 3-phase research cycle: decompose the question → search each sub-question + follow secondary leads → reconcile contradictions, then write with citations reasoning_effort Tuning Task type Recommended level Fast execution / extraction / classification / short transforms none ~ low Long synthesis / multi-document review / strategic writing medium or above xhigh Only when evals show a clear gain Migration order: swap the model first → fix reasoning effort → evaluate → add prompt blocks → adjust the reasoning knob one step at a time.\nInsight The lessons in this guide aren\u0026rsquo;t specific to GPT-5.4. They\u0026rsquo;re universal patterns that apply to Claude, Gemini, and any other LLM agent. The core strategy is to exploit the model\u0026rsquo;s ability to follow rules precisely and consistently: output contract, forced tool use, completeness contract, verification loop. Fix these four first, then raise reasoning effort only if you still need more. Ultimately, prompt engineering is not about telling an AI to think harder — it\u0026rsquo;s about eliminating the room for it to drift.\n","date":"2026-03-13T00:00:00+09:00","image":"/images/posts/2026-03-13-gpt54-prompt-guide/cover-en.jpg","permalink":"/posts/2026-03-13-gpt54-prompt-guide/","title":"GPT-5.4 Prompt Guide Essentials — Lock Down the Contract Before Tuning Reasoning"},{"content":"Overview Attaching an AI agent to a web page normally requires a headless browser like Playwright or a Chrome extension. Alibaba\u0026rsquo;s page-agent flips that assumption — one line, \u0026lt;script src=\u0026quot;page-agent.js\u0026quot;\u0026gt;\u0026lt;/script\u0026gt;, and your website becomes an AI-native app.\nCore Architecture: The In-Page Execution Model page-agent\u0026rsquo;s biggest differentiator is its in-page execution model. Compare it to existing browser automation approaches:\ngraph TD A[\"Existing Approaches\"] --\u003e B[\"Playwright/Puppeteer \u0026lt;br/\u0026gt; Headless browser control\"] A --\u003e C[\"Chrome Extension \u0026lt;br/\u0026gt; Separate permission request\"] A --\u003e D[\"Multimodal LLM \u0026lt;br/\u0026gt; Screenshot + OCR\"] E[\"page-agent\"] --\u003e F[\"Direct DOM Access \u0026lt;br/\u0026gt; Text-based manipulation\"] F --\u003e G[\"No permission requests\"] F --\u003e H[\"No screenshots or OCR\"] F --\u003e I[\"Runs inside the web page\"] style E fill:#4CAF50,color:#fffEverything runs inside the web page. DOM elements are controlled directly — no separate permissions, no screenshots, no OCR, no multimodal LLM required. Text-based DOM manipulation keeps it fast.\nHow to Use It Embed Directly in Your Code \u0026lt;script src=\u0026#34;page-agent.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; Apply to Any Site via Bookmarklet You don\u0026rsquo;t need to touch the source code. A bookmarklet lets you inject page-agent into any website on the fly. The default bookmarklet goes through Alibaba\u0026rsquo;s servers, but you can point it at your own LLM endpoint:\njavascript:(function(){ import(\u0026#39;https://cdn.jsdelivr.net/npm/page-agent@1.5.5/+esm\u0026#39;) .then(module =\u0026gt; { window.agent = new module.PageAgent({ model: \u0026#39;gpt-5.4\u0026#39;, baseURL: \u0026#39;\u0026lt;your-api-url\u0026gt;\u0026#39;, apiKey: \u0026#39;\u0026lt;your-api-key\u0026gt;\u0026#39; }); if(window.agent.panel) window.agent.panel.show(); }) .catch(e =\u0026gt; console.error(e)); })(); Supported Models OpenAI, Claude, DeepSeek, Qwen, and more — including fully offline operation via Ollama (API key-based integration).\nUse Cases Use case Description SaaS AI Copilot Add an in-product AI Copilot without touching the backend Smart form automation Compress multi-step click flows to a single sentence (ERP/CRM/admin tools) Accessibility Voice commands and screen readers for enhanced web accessibility Admin tool workflows Build only CRUD, then use sequential instructions to compose workflows automatically The admin tool use case got the strongest reaction in the GeekNews community. The pattern: \u0026ldquo;build basic CRUD, then tell it to do this and then that, and you get a workflow.\u0026rdquo; One user reported it running noticeably faster than Playwright for a demo that fetched 30-day stock prices from a financial site.\nChrome Extension — Multi-Page Support Beyond the single-page bookmarklet, installing the Chrome extension adds support for tasks spanning multiple pages — browser-level control and external integrations, enabling complex automation scenarios beyond simple DOM manipulation.\nSecurity Considerations The primary concern raised by the community is security. API keys are exposed on the client side, so:\nIn production, route API calls through a proxy server Safest for internal admin tools or development environments If routing through Alibaba\u0026rsquo;s servers by default is a concern, specify your own LLM endpoint The MIT license means you can fork and customize freely.\nInsight page-agent\u0026rsquo;s \u0026ldquo;in-page execution model\u0026rdquo; represents a paradigm shift in browser automation. Where external tools previously controlled the browser from the outside, AI now reads and manipulates the DOM directly from inside the page. Instead of the heavyweight pipeline of screenshot → OCR → coordinate-based clicking, text-based DOM understanding wins on both speed and accuracy. Particularly compelling is the scenario of inserting an AI Copilot into a SaaS product without any backend changes — a new path for modernizing legacy systems.\n","date":"2026-03-13T00:00:00+09:00","image":"/images/posts/2026-03-13-page-agent/cover-en.jpg","permalink":"/posts/2026-03-13-page-agent/","title":"page-agent — Alibaba's Open-Source Tool That Turns Any Web Page Into an AI-Native App With One Line of Code"},{"content":"Overview As Claude Opus 4.6\u0026rsquo;s quality improvements lead to heavier use at work, questions naturally arise: \u0026ldquo;Am I actually getting value from this plan?\u0026rdquo; and \u0026ldquo;How much headroom do I have before hitting my limit?\u0026rdquo; ClaudeTuner is a Chrome extension plus web dashboard that addresses exactly these questions.\ngraph TD A[\"Chrome Extension\"] --\u003e|Collect Usage| B[\"ClaudeTuner Server\"] B --\u003e C[\"Web Dashboard\"] C --\u003e D[\"5-hour / 7-day Usage Gauge\"] C --\u003e E[\"Reset Countdown\"] C --\u003e F[\"Usage Forecast\"] C --\u003e G[\"Limit Warning Alerts \u0026lt;br/\u0026gt; at 80% and 95%\"]Key Features Personal Usage Monitoring Install the Chrome extension and log in to Claude — usage is collected automatically. Claude Code usage is tracked and included in the total.\n5-hour / 7-day usage gauge: See your current status at a glance Reset countdown: Shows time remaining until the next reset Usage forecast: Projects your usage rate at the next reset based on current pace Limit warning alerts: Browser notifications when you hit 80% and 95% to encourage throttling Hourly usage patterns: Analyze when you use Claude most B2B Cost Optimization Management features are also provided for organizations using the Claude Team plan. Track usage per team member and get recommendations for the optimal plan based on actual usage patterns.\nHow Data Is Collected The Chrome extension periodically reads usage data from the Claude website. After the initial login, collection is automatic — no additional steps needed.\nInsights It\u0026rsquo;s interesting that AI tool usage management is becoming its own independent software category. A tool that shows actual usage relative to subscription cost in data form signals that AI tools have shifted from \u0026ldquo;try it and see\u0026rdquo; to \u0026ldquo;manage and optimize.\u0026rdquo; The fact that team management features are included is evidence that AI tool cost optimization has become a real operational concern for organizations.\n","date":"2026-03-11T00:00:00+09:00","image":"/images/posts/2026-03-11-claudetuner/cover-en.jpg","permalink":"/posts/2026-03-11-claudetuner/","title":"ClaudeTuner — Real-Time Claude Usage Tracking and Plan Optimization"},{"content":"Overview Google DeepMind released Gemini Embedding 2 on March 10, 2026. It\u0026rsquo;s the first native multimodal embedding model to map text, images, video, audio, and documents into a single embedding space.\ngraph TD A[\"Gemini Embedding 2\"] --\u003e B[\"Text\"] A --\u003e C[\"Image\"] A --\u003e D[\"Video\"] A --\u003e E[\"Audio\"] A --\u003e F[\"Document\"] B --\u003e G[\"Single Embedding Space \u0026lt;br/\u0026gt; (shared vector dimensions)\"] C --\u003e G D --\u003e G E --\u003e G F --\u003e G G --\u003e H[\"Multimodal Search\"] G --\u003e I[\"Cross-Modal Classification\"]New Modalities and Flexible Dimensions Prior embedding models either handled text only, or used separate encoders even when nominally multimodal. Gemini Embedding 2 natively maps multiple modalities into a single vector space.\nThis means searching with \u0026ldquo;a photo of a cat\u0026rdquo; can retrieve video clips where cats appear, audio containing cat sounds, and documents containing the text \u0026ldquo;cat\u0026rdquo; — all in one query.\nState-of-the-Art Performance Google announced that Gemini Embedding 2 achieves state-of-the-art performance across multiple benchmarks, demonstrating strong performance not only in text-to-text retrieval but also in cross-modal search.\nUse Cases Multimodal RAG Traditional RAG (Retrieval-Augmented Generation) pipelines only indexed text documents for retrieval. With Gemini Embedding 2, images, video, and audio can be included in the retrieval corpus — enabling genuinely multimodal RAG.\nMedia Library Search For large media archives: search for images or videos using a text query, find similar videos from an image, or otherwise perform cross-modal search across modalities.\nContent Classification Organize diverse content types under a single classification scheme. Because text labels and image/audio content are compared in the same space, no separate classification models are needed.\nInsights Multimodal embeddings can be a game-changer for search and RAG. Until now, \u0026ldquo;image search\u0026rdquo; and \u0026ldquo;text search\u0026rdquo; were entirely separate pipelines — a single embedding space dissolves that boundary. Particularly in RAG pipelines, the ability to search images inside PDFs, presentation slides, and whiteboard photos alongside text could dramatically expand the coverage of enterprise knowledge management systems. Released as a public preview, so you can start testing it right away.\n","date":"2026-03-11T00:00:00+09:00","image":"/images/posts/2026-03-11-gemini-embedding-2/cover-en.jpg","permalink":"/posts/2026-03-11-gemini-embedding-2/","title":"Gemini Embedding 2 — Google's First Native Multimodal Embedding Model"},{"content":"Overview When an EC2 instance runs low on disk space, df and du alone make it hard to pinpoint which directories are consuming the most storage. ncdu is an ncurses-based TUI tool that visually analyzes disk usage.\ngraph TD A[\"Disk Space Running Low\"] --\u003e B[\"df: Check Total Usage\"] B --\u003e C[\"du: Per-Directory Size\"] C --\u003e D[\"ncdu: Interactive TUI \u0026lt;br/\u0026gt; Bar Graph + Navigation + Delete\"] style D fill:#4CAF50,color:#fffInstallation # Ubuntu/Debian sudo apt-get install ncdu # CentOS/RHEL yum install -y ncdu # macOS brew install ncdu Basic Usage # Analyze current directory ncdu # Analyze a specific path ncdu /var/log # Analyze entire disk ncdu / After scanning, ncdu displays a tree of directories and files with graphical bar indicators. It\u0026rsquo;s immediately obvious where storage is being consumed.\nKey Controls Key Action Arrow keys Navigate directories Enter Enter subdirectory i Item info d Delete selected item (with confirmation) ? / Shift+? Help q Quit How ncdu Compares to df and du Tool Strengths Weaknesses df Instantly shows per-partition total usage Doesn\u0026rsquo;t tell you which directory is the problem du Calculates per-directory sizes Output is long and sorting is tedious ncdu Interactive TUI, instant sorting, in-place deletion Requires separate installation Insights For server disk management, ncdu does what htop does for process management — the same operations are possible with the basic commands (df, du), but interactive TUI navigation changes efficiency dramatically. Especially in disk-constrained environments like EC2 when disk unexpectedly fills up, ncdu handles everything from diagnosing the cause to cleaning up, all without leaving the terminal.\n","date":"2026-03-11T00:00:00+09:00","image":"/images/posts/2026-03-11-ncdu/cover-en.jpg","permalink":"/posts/2026-03-11-ncdu/","title":"ncdu — The TUI Tool for Quickly Understanding Linux Disk Usage"},{"content":"Overview \u0026ldquo;I want to pick up where I left off on OpenCode — without going back to my desk.\u0026rdquo; SSH feels clunky, and mobile makes it even worse. opencode serve/web may be the answer.\ngraph LR A[\"Dev Machine \u0026lt;br/\u0026gt; opencode serve/web\"] --\u003e|API + Web UI| B[\"Tailscale VPN\"] B --\u003e C[\"Mobile Browser \u0026lt;br/\u0026gt; Same Session\"] A --\u003e D[\"Repo Access\"] A --\u003e E[\"Tool Execution\"] A --\u003e F[\"Model Calls\"]Key Insight: \u0026ldquo;The Server Is the Real Thing, Not the TUI\u0026rdquo; There\u0026rsquo;s an important shift in perspective when thinking about opencode\u0026rsquo;s architecture. The TUI (terminal UI) is just a client connecting to the server — the actual work happens in the server (backend). Once you internalize this, remote development becomes natural.\nopencode web The most convenient option when you need it fast. Since it launches both the API server and the web UI together, you can open a phone browser and jump right back into the session you were working on — no app install, no SSH required.\nHeavy lifting (repo access, tool execution, model calls) stays on the server machine. Your mobile device only handles input and display.\nopencode serve opencode serve starts a \u0026ldquo;headless backend.\u0026rdquo; It runs the API server without a web UI, so you can connect a custom client or integrate it into an automation pipeline.\nSecurity: Tailscale + Password Rather than opening a port directly, the recommended approach is to connect through Tailscale VPN. Since access is limited to your Tailscale network, there is no risk of external exposure.\nInsight The \u0026ldquo;server-client separation\u0026rdquo; pattern for AI coding tools is becoming the norm. GitHub Codespaces, code-server, and now opencode — the architecture of \u0026ldquo;heavy computation on the server, interaction on the client\u0026rdquo; is settling naturally into AI-assisted coding. The ability to give an AI agent a quick instruction from your phone can significantly reduce the time constraints in a development workflow.\n","date":"2026-03-11T00:00:00+09:00","image":"/images/posts/2026-03-11-opencode-remote/cover-en.jpg","permalink":"/posts/2026-03-11-opencode-remote/","title":"opencode serve/web — Control Your Dev PC Remotely From Your Phone"},{"content":"Overview Use AI coding agents seriously for a while and sessions pile up in the dozens. Remembering what work you did in which project, and where you left off, becomes genuinely hard. Two tools have emerged to solve this: agentsview and agf. Here\u0026rsquo;s a comparison.\ngraph LR A[\"AI Agent Session Data\"] --\u003e B[\"agentsview\"] A --\u003e C[\"agf\"] B --\u003e D[\"Web/Desktop UI \u0026lt;br/\u0026gt; Dashboard + Search + Analytics\"] C --\u003e E[\"TUI \u0026lt;br/\u0026gt; Fast Search + Instant Resume\"]agentsview — Session Analytics Dashboard agentsview is a local-first desktop/web application for browsing, searching, and analyzing AI agent coding sessions. Built with a Go backend, Svelte frontend, and Tauri desktop app.\nSupported Agents Supports 11+ AI coding agents including Claude Code, Codex, and OpenCode. Parses session logs from each agent and presents a unified view.\nKey Features Dashboard: Usage statistics and visualizations per project and per agent Full-text Search: Search session contents to answer \u0026ldquo;what did I do back then?\u0026rdquo; questions Local-First: All data stored locally, privacy guaranteed Desktop App: macOS/Windows installers provided, auto-update supported Installation # CLI curl -fsSL https://agentsview.io/install.sh | bash # Desktop App # Download from GitHub Releases Tech Stack Component Technology Backend Go 1.25+ Frontend Svelte + TypeScript Desktop Tauri (Rust) Stars 453 agf — Terminal Session Finder agf is a TUI-based agent session management tool built by Korean developer subinium. Written in Rust — fast and simple to install.\nThe Problem It Solves It describes the typical experience of agent users like this:\nCan\u0026rsquo;t remember which project you were working in cd to the wrong directory Try to remember the session ID Give up and start a new session Key Features Unified View: Supports Claude Code, Codex, OpenCode, pi, Kiro, Cursor CLI, Gemini Fuzzy Search: Instant search by project name One-Key Resume: Resume a selected session with a single Enter press Resume Mode Picker: Tab to choose resume mode (v0.5.5) Worktree Scanning: Parallelized worktree scan that tracks even deleted projects Installation brew install subinium/tap/agf agf setup # Then restart shell and run agf Quick Resume agf resume project-name # Resume immediately via fuzzy match Comparing the Two graph TD subgraph agentsview A1[\"Go + Svelte + Tauri\"] A2[\"Web Dashboard\"] A3[\"Analytics + Statistics\"] A4[\"11+ Agents Supported\"] end subgraph agf B1[\"Rust TUI\"] B2[\"Terminal Native\"] B3[\"Fast Search + Resume\"] B4[\"7 Agents Supported\"] end A1 --\u003e A2 --\u003e A3 B1 --\u003e B2 --\u003e B3 Criterion agentsview agf Interface Web/Desktop GUI TUI (terminal) Primary Use Session analytics + search Fast session resumption Language Go + Svelte Rust Installation curl or desktop app Homebrew Stars 453 99 Agents Supported 11+ 7 Insights The emergence of session management tools for AI coding agents itself speaks to this ecosystem\u0026rsquo;s maturity. Like editor plugins, agents have moved past \u0026ldquo;just using them\u0026rdquo; — \u0026ldquo;managing them well\u0026rdquo; is now the core of productivity.\nagentsview is strong for retrospective questions like \u0026ldquo;what did I do with AI this week?\u0026rdquo; agf is strong for immediate needs like \u0026ldquo;pick up right where I left off.\u0026rdquo; Both tools are local-first, which is impressive — you can use them without worrying about AI session data leaking to the cloud. Ultimately, the two tools are complementary rather than competitive.\n","date":"2026-03-11T00:00:00+09:00","image":"/images/posts/2026-03-11-ai-agent-session-managers/cover-en.jpg","permalink":"/posts/2026-03-11-ai-agent-session-managers/","title":"The Evolution of AI Coding Agent Session Management — agentsview vs agf"},{"content":"Overview We\u0026rsquo;re moving from an era of \u0026ldquo;just using\u0026rdquo; AI coding agents to one of \u0026ldquo;using them with structure.\u0026rdquo; This post compares three extension frameworks that layer on top of OpenAI Codex and Claude Code to control agent behavior, shape team workflows, and enforce development methodology.\ngraph TD A[\"AI Coding Agent \u0026lt;br/\u0026gt; (Codex, Claude Code)\"] --\u003e B[\"bkit-codex \u0026lt;br/\u0026gt; PDCA Methodology\"] A --\u003e C[\"oh-my-codex \u0026lt;br/\u0026gt; Multi-Agent Orchestration\"] A --\u003e D[\"Superpowers \u0026lt;br/\u0026gt; Skill Framework\"] B --\u003e E[\"Plan → Do → Check → Act\"] C --\u003e F[\"hooks + teams + HUDs\"] D --\u003e G[\"brainstorm → plan → TDD → review\"]bkit-codex — PDCA + Context Engineering bkit-codex is an OpenAI Codex CLI extension that provides an AI-native development workflow through PDCA (Plan-Do-Check-Act) methodology and a Context Engineering architecture.\nWhat Is Context Engineering? A methodology for systematically curating the context tokens delivered to AI. It goes beyond writing good prompts — it structures which information to provide to AI, and in what order.\nCore Components PDCA Cycle: Plan (write a planning document) → Do (generate code) → Check (test/verify) → Act (improve/deploy) Skills: Reusable agent behavior modules Pipeline: Chain multiple skills to compose complex workflows MCP Integration: Tool connection via Model Context Protocol Tech Stack JavaScript-based, Apache 2.0 license, 10 Stars\noh-my-codex (OMX) — Multi-Agent Orchestration oh-my-codex adds a multi-agent orchestration layer on top of the OpenAI Codex CLI. With 1,744 Stars, it has the most active community in this space.\nCore Features Agent Teams: Multiple agents divide roles and collaborate (leader/worker structure) Hooks: Insert custom logic before/after agent execution HUDs (Head-Up Displays): Real-time monitoring of agent status Harness: Standardizes and packages agent execution environments OpenClaw Integration: Agent status notifications via notification gateway Architecture graph LR A[\"User Command\"] --\u003e B[\"OMX Orchestrator\"] B --\u003e C[\"Team Leader\"] C --\u003e D[\"Worker Agent 1\"] C --\u003e E[\"Worker Agent 2\"] C --\u003e F[\"Worker Agent N\"] D --\u003e G[\"Codex CLI\"] E --\u003e G F --\u003e G B --\u003e H[\"Hooks \u0026lt;br/\u0026gt; pre/post execution\"] B --\u003e I[\"HUD \u0026lt;br/\u0026gt; Real-time Monitoring\"]Tech Stack TypeScript-based, MIT license, 1,744 Stars, v0.8.12\nSuperpowers — Skill-Based Development Methodology Superpowers is an agent skill framework with an overwhelming 76,619 Stars. More than a simple tool, it provides a complete software development methodology.\nPhilosophy It starts from the principle that a coding agent \u0026ldquo;doesn\u0026rsquo;t write code first.\u0026rdquo; Instead:\nBrainstorming: Ask the user clarifying questions about what they want to build Spec Review: Break the spec into digestible units for review Implementation Plan: A plan that even \u0026ldquo;an enthusiastic but inexperienced junior engineer\u0026rdquo; can follow Subagent-Driven Development: Sub-agents handle individual tasks; the main agent reviews TDD + YAGNI + DRY: Enforce test-driven development and conciseness Key Skills brainstorming — Explore requirements before implementing features writing-plans — Create implementation plans test-driven-development — Enforce Red/Green TDD systematic-debugging — Systematic debugging workflow dispatching-parallel-agents — Parallel processing of independent tasks verification-before-completion — Force verification before claiming done Tech Stack Shell + JavaScript, v5.0.0, 76,619 Stars\nThree-Way Comparison Criterion bkit-codex oh-my-codex Superpowers Target Agent Codex CLI Codex CLI Claude Code + general Core Value PDCA methodology Multi-agent collaboration Enforce development methodology Stars 10 1,744 76,619 Language JavaScript TypeScript Shell Team Features Pipeline Agent Teams Subagent Monitoring Reports HUD real-time Verification checklist Quick Links If You Want to Use Claude Code Properly — Complete AI Coding Mastery with bkit — 58-minute hands-on bkit tutorial Insights All three frameworks share the same theme: \u0026ldquo;imposing structure on AI.\u0026rdquo; This is the core trend in AI coding in 2026.\nbkit-codex is an experimental attempt to apply manufacturing\u0026rsquo;s PDCA cycle to software. oh-my-codex is a practical approach to scaling Codex into a team. Superpowers — with 76K Stars as evidence — is the most validated methodology.\nSuperpowers\u0026rsquo; philosophy in particular is striking: \u0026ldquo;prevent the coding agent from writing code first.\u0026rdquo; It\u0026rsquo;s a good lesson for human developers too — diving into coding without design is inefficient, whether you\u0026rsquo;re AI or human.\nAI\u0026rsquo;s ability to \u0026ldquo;write\u0026rdquo; code is already sufficient. What\u0026rsquo;s needed now is a framework that makes AI write code well, and these three projects are leading in that direction.\n","date":"2026-03-11T00:00:00+09:00","image":"/images/posts/2026-03-11-ai-agent-frameworks/cover-en.jpg","permalink":"/posts/2026-03-11-ai-agent-frameworks/","title":"Three AI Coding Agent Extension Frameworks Compared — bkit-codex, oh-my-codex, and Superpowers"},{"content":"Overview Building a presentation takes real time — hours gathering material, more hours organizing it, and more still on slide design. The idea of automating this with AI isn\u0026rsquo;t new, but workflows that actually produce high-quality results have been rare. Combining two of Google\u0026rsquo;s AI tools — NotebookLM and Gemini — changes the equation.\nThat\u0026rsquo;s exactly why a YouTube tutorial titled \u0026ldquo;The Insane Gemini + NotebookLM Combo for Making High-Quality PPTs\u0026rdquo; (14 min 20 sec) struck a nerve. It doesn\u0026rsquo;t just say \u0026ldquo;ask AI to make your PPT\u0026rdquo; — it shows a systematic method that plays to each tool\u0026rsquo;s strength: NotebookLM handles research and synthesis, Gemini handles content generation and formatting. The division of labor is genuinely efficient.\nWhat Is NotebookLM? NotebookLM is a free AI-powered research tool from Google. Its key difference from a general-purpose LLM: it only answers based on the source documents you provide. Add PDFs, Google Docs, Google Slides, YouTube video links, web URLs, or plain text files to a notebook, and NotebookLM analyzes those sources to answer questions, generate summaries, and surface insights. Because sources are clearly cited, the hallucination risk drops significantly.\nOne standout feature is Audio Overview — NotebookLM automatically generates a podcast-style audio commentary from your notebook sources. Two AI hosts discuss the material in a natural radio-show style. It\u0026rsquo;s not directly about PPT creation, but it\u0026rsquo;s a fast way to absorb material. Beyond that, NotebookLM can restructure content into mind maps, study guides, briefing documents, and FAQs — all of which become input material for slide creation.\nNotebookLM also shines at cross-analyzing multiple sources simultaneously. Load three papers, two YouTube lectures, and five news articles into one notebook and ask \u0026ldquo;What\u0026rsquo;s the core argument when you synthesize all this?\u0026rdquo; — NotebookLM gives you an integrated analysis with citations from each source. That\u0026rsquo;s what cuts hours of research down to minutes. The more complex and multi-perspective your topic, the bigger the payoff.\nGemini\u0026rsquo;s Role Gemini is Google\u0026rsquo;s multimodal large language model, available free at gemini.google.com/app. It competes with GPT-4 and Claude, supporting text generation, summarization, code writing, and image analysis. Starting with Gemini 2.0, multimodal capabilities expanded to include describing images passed as input and extracting data from charts.\nIn the PPT workflow, Gemini takes NotebookLM\u0026rsquo;s organized research and generates actual slide content. Use a specific prompt like \u0026ldquo;Organize the following content into 10 slides. Each slide should include a title, 3 key points, and presenter notes\u0026rdquo; — and you get an immediately editable slide structure. Its natural integration with Google Slides is another advantage: generate content in Gemini, paste into Google Docs, then convert to Slides or use Gemini in Slides directly.\nGemini also follows detailed formatting instructions well. Tell it \u0026ldquo;structure each section as intro → problem → solution → case study → summary\u0026rdquo; or \u0026ldquo;explain technical terms in plain language for a non-specialist audience\u0026rdquo; — and the output reflects those instructions. If NotebookLM decides what to say, Gemini decides how to say it.\nThe Practical PPT Workflow flowchart LR A[\"Gather Sources \u0026lt;br/\u0026gt; PDFs, YouTube, Web\"] --\u003e B[\"Add to \u0026lt;br/\u0026gt; NotebookLM\"] B --\u003e C[\"NotebookLM Analysis \u0026lt;br/\u0026gt; Summaries / Mind Map / FAQ\"] C --\u003e D[\"Extract Key Insights \u0026lt;br/\u0026gt; as Text\"] D --\u003e E[\"Feed to Gemini \u0026lt;br/\u0026gt; Generate Slide Structure\"] E --\u003e F[\"Gemini Output \u0026lt;br/\u0026gt; Titles + Content Per Slide\"] F --\u003e G[\"Paste Into \u0026lt;br/\u0026gt; Google Slides / PowerPoint\"] G --\u003e H[\"Design Editing \u0026lt;br/\u0026gt; Images / Layout\"] H --\u003e I[\"Finished High-Quality PPT\"] style A fill:#4285f4,color:#fff style B fill:#34a853,color:#fff style C fill:#34a853,color:#fff style E fill:#4285f4,color:#fff style F fill:#4285f4,color:#fff style I fill:#ea4335,color:#fffThe workflow breaks into three stages. Stage 1: Source collection and NotebookLM analysis. Gather as wide a variety of material as possible — academic papers, relevant YouTube talks, industry reports, competitive analysis. Add everything to one NotebookLM notebook. Once loaded, ask NotebookLM to \u0026ldquo;structure these materials for a presentation — suggest major sections and summarize the key points for each.\u0026rdquo; The mind map and study guide generation features help you grasp the overall structure fast.\nStage 2: Slide structure generation with Gemini. Copy the summary and key insights from NotebookLM and paste them into Gemini. Write a specific prompt: specify audience (expert vs. non-expert), presentation length (10 min / 30 min / 1 hour), number of slides, and structure format (problem-solution, storytelling, etc.). Gemini outputs a complete slide structure — titles, bullet points, and presenter notes for every slide. This becomes the skeleton of your deck.\nStage 3: Editing and design. Paste the Gemini output into Google Slides or PowerPoint and begin design editing. Here, Gemini 2.0\u0026rsquo;s image analysis is useful — attach charts or data images and Gemini analyzes them, generates interpretive text, and adjusts the explanation to fit your presentation context. The final polish is still a human job, but by this point you\u0026rsquo;re refining and visualizing existing content rather than creating content from scratch.\nWhat Changes When You Combine the Two Using either tool alone has clear limits. Gemini alone relies on its training data without a source basis, making it hard to guarantee accuracy on specific contexts or recent information — hallucination risk included. NotebookLM alone excels at analysis but leaves the slide formatting and presentation-language conversion to you. Only together do you get \u0026ldquo;source credibility + generation flexibility\u0026rdquo; at the same time.\nThe synergy is especially strong for presentations where you need to quickly master a new domain, rather than just organize what you already know. If you suddenly get asked to present on an unfamiliar technical topic, load 10 relevant sources into NotebookLM, spend 30 minutes grasping the structure, then generate slides with Gemini — you can be presentation-ready in under two hours. The same work used to take a full day or more.\nAnother value is reusability. NotebookLM notebooks are saved, so you can generate multiple presentations on the same topic from different angles. Tell Gemini \u0026ldquo;make a 5-minute executive summary version of the same topic\u0026rdquo; and it instantly produces a new version based on the already-organized research. The more expertise accumulates in a notebook, the faster future presentations become — a virtuous cycle that goes beyond tool usage into building a personal knowledge base.\nQuick Links Google NotebookLM — Free AI research tool, document analysis and Audio Overview generation Google Gemini — Google\u0026rsquo;s multimodal LLM, free to use YouTube: Gemini + NotebookLM PPT Combo — 14 min 20 sec practical tutorial Google Slides — The final editing tool for Gemini output NotebookLM Official Guide — Source addition methods and feature documentation Insights The Gemini + NotebookLM combination draws attention because the two tools solve different problems in AI productivity. NotebookLM fundamentally limits the hallucination problem — AI fabricating content — by restricting answers to source documents. Gemini solves the formatting problem — rapidly converting organized content into a presentable form. This division of labor produces more trustworthy results than using either tool alone.\nAs PPT automation workflows mature, tighter integration that passes NotebookLM analysis results directly to Gemini in Slides seems likely. But the bigger implication this combination reveals is that the benchmark for \u0026ldquo;using AI tools well\u0026rdquo; is shifting. Prompt engineering matters, but workflow design — knowing how to connect the right AI tools at the right moment — is becoming the new core competency. The dramatic reduction in time cost for presentation creation is significant for knowledge worker productivity: specialists can spend more time reviewing content and making strategic judgments, rather than generating content in the first place.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-gemini-notebooklm-ppt/cover-en.jpg","permalink":"/posts/2026-03-06-gemini-notebooklm-ppt/","title":"Building High-Quality Presentations with Gemini + NotebookLM — From Research to Slides"},{"content":"Overview Claude Code is Anthropic\u0026rsquo;s agentic coding tool. It doesn\u0026rsquo;t just autocomplete code — it reads entire codebases, edits files, executes terminal commands directly, and integrates deeply with development tooling. As of early 2026, Claude Code supports nearly every environment where developers work: Terminal, VS Code, Desktop app, Web, JetBrains, and Chrome extension (beta).\nA recent short video from the YouTube channel @codefactory_official (\u0026ldquo;Claude Code Latest Update: Statusline\u0026rdquo;) drew 246 likes and considerable attention. The key feature highlighted is Statusline — a status bar displayed at the bottom of the terminal — whose addition makes the terminal UI substantially smarter. This post starts from the Statusline update and covers the full multi-environment AI coding ecosystem Claude Code is building.\nStatusline — A Smarter Terminal Statusline is a status bar UI component that Claude Code added to its terminal interface. Previously, when running Claude Code in a terminal, it was hard to quickly see what task was in progress or how much context had been consumed. With Statusline, current task state, the model in use, and context usage are displayed in real time at the bottom of the terminal.\nThis is more than a UX improvement. For developers who prefer terminal-based workflows, Claude Code now provides IDE-level visual feedback from within the terminal itself. Statusline works properly alongside multiplexers like tmux and zellij, and makes it easy to distinguish the state of each session when managing multiple sessions simultaneously. \u0026ldquo;The terminal got beautiful\u0026hellip;?\u0026rdquo; may sound like a casual observation, but it signals clearly that Anthropic is treating the terminal as a first-class citizen for AI coding.\nThe introduction of Statusline shows Claude Code evolving from a simple CLI tool into a fully-featured terminal development environment. Where most AI coding tools have been distributed as GUI IDE plugins, Claude Code has a distinctive position: the terminal is at the center, with other environments as extensions. This direction squarely targets the need to use AI coding assistants in environments without a GUI — server access, CI/CD pipelines, Docker containers.\nEvery Environment Claude Code Supports graph TD CC[Claude Code Core] CC --\u003e T[\"Terminal\u0026lt;br/\u0026gt;CLI / Statusline\"] CC --\u003e VS[\"VS Code\u0026lt;br/\u0026gt;Extension\"] CC --\u003e DA[\"Desktop App\u0026lt;br/\u0026gt;macOS / Windows\"] CC --\u003e WB[\"Web Browser\u0026lt;br/\u0026gt;claude.ai\"] CC --\u003e JB[\"JetBrains\u0026lt;br/\u0026gt;IntelliJ family\"] CC --\u003e CR[\"Chrome Extension\u0026lt;br/\u0026gt;beta\"] CC --\u003e RC[\"Remote Control\u0026lt;br/\u0026gt;mobile / remote devices\"] CC --\u003e GA[\"GitHub Actions\u0026lt;br/\u0026gt;CI/CD integration\"] CC --\u003e GL[\"GitLab CI/CD\u0026lt;br/\u0026gt;pipeline integration\"] CC --\u003e SL[\"Slack\u0026lt;br/\u0026gt;team collaboration\"] CC --\u003e SDK[\"Agent SDK\u0026lt;br/\u0026gt;custom agents\"] CC --\u003e MCP[\"MCP\u0026lt;br/\u0026gt;tool connection protocol\"] style CC fill:#4a90d9,color:#fff style RC fill:#f5a623,color:#fff style SDK fill:#7ed321,color:#fff style MCP fill:#9b59b6,color:#fffClaude Code\u0026rsquo;s supported environments fall into two axes. The first is the interface layer where developers interact directly: Terminal (CLI), VS Code extension, Desktop App, Web (claude.ai), JetBrains IDEs, and Chrome Extension (beta). The second is the automation and integration layer: GitHub Actions, GitLab CI/CD, Slack integration, Remote Control, and the Agent SDK.\nThe VS Code extension lets you call Claude Code directly from within the editor. With a file open, you issue natural language commands like \u0026ldquo;refactor this function\u0026rdquo; or \u0026ldquo;write tests for this module,\u0026rdquo; and Claude Code reads the current file\u0026rsquo;s context and performs the edits. JetBrains support covers the entire IntelliJ IDEA family — IntelliJ IDEA, PyCharm, GoLand, WebStorm — letting backend developers in Java/Kotlin/Python ecosystems use Claude Code from within their own IDE.\nThe Chrome Extension is still in beta, but it opens interesting possibilities. While browsing a code page in the browser (GitHub, GitLab, documentation sites), you can interact with Claude Code directly. Particularly useful for PR reviews and exploring open-source code. Installation on macOS/Linux is a single command: curl -fsSL https://claude.ai/install.sh | bash. Windows uses a PowerShell script.\nRemote Control and the Future of Async Coding Remote Control is one of Claude Code\u0026rsquo;s most innovative features. You run a local development session, then continue it from a phone or another device. For example, kick off a complex refactoring task in the office, head home, and check progress and issue the next instruction from your smartphone. This shifts the AI coding paradigm from synchronous interaction to asynchronous collaboration.\nRemote Control is technically grounded in Claude Code\u0026rsquo;s session persistence. A running Claude Code instance on your local machine syncs session state to the server, and authorized devices can connect to that session to send instructions or check results. This makes it possible to hand off long-running tasks — large codebase migrations, full test suite runs — and only intervene when needed.\nGitHub Actions and GitLab CI/CD integration is effectively an automated extension of Remote Control. When a PR opens, Claude Code automatically reviews the code; when tests fail, it analyzes the cause and suggests fixes. This elevates the CI/CD pipeline beyond simple build/test automation into an AI-assisted code quality gate. Slack integration lets teams assign tasks to Claude Code from a team channel and receive result reports, naturally fitting into a team\u0026rsquo;s async collaboration workflow.\nExpanding the Agent Ecosystem — MCP, Skills, Hooks MCP (Model Context Protocol) is the standard protocol through which Claude Code connects to external tools. Any tool — database, API, file system, other AI services — implemented as an MCP server becomes usable by Claude Code via natural language commands. Anthropic published MCP as an open spec, and a growing ecosystem of third-party MCP servers has already emerged. This log-blog repository uses Claude Code skills with Claude AI as the intelligence layer in the same spirit.\nSkills and Hooks are Claude Code\u0026rsquo;s customization layer. Skills let Claude Code learn behavior specialized to a specific domain or project — define domain knowledge and task patterns in a SKILL.md file, and Claude Code references them to produce more accurate results. Hooks connect custom scripts to specific events (file save, before/after a command runs, etc.) — useful for enforcing project-specific rules or building automation pipelines.\nThe Agent SDK is Claude Code\u0026rsquo;s most extensible feature. It lets developers build custom agents from scratch and supports \u0026ldquo;agent team\u0026rdquo; execution where multiple agents collaborate on complex tasks. For example: one agent analyzes requirements, another writes code, a third runs tests and verifies results. This opens the door to genuine multi-agent software development, beyond the limits of a single AI assistant.\nThe competitive landscape is also moving fast. Amazon recently launched Kiro IDE (app.kiro.dev). Using AWS Cognito-based authentication, Kiro is a strategic move to anchor developers to Amazon\u0026rsquo;s AI coding ecosystem. With Kiro joining GitHub Copilot, Cursor, and Windsurf, competition in the AI coding tool market is intensifying further. Claude Code\u0026rsquo;s differentiators are agent-level autonomy, the breadth of multi-environment support, and open extensibility through MCP.\nQuick Links Claude Code Official Docs — full guide from installation to Agent SDK Claude Code Install Script — install instantly with curl -fsSL https://claude.ai/install.sh | bash Anthropic Academy — Claude Code in Action — official hands-on course YouTube: Claude Code Latest Update Statusline — @codefactory_official short video Kiro IDE — Amazon\u0026rsquo;s new AI IDE, the competitor to watch Insights Claude Code\u0026rsquo;s Statusline update looks like a minor UI improvement, but it signals that Anthropic is making a serious investment in the terminal as the core interface for AI coding. The multi-environment support spanning Terminal, VS Code, JetBrains, Web, and Chrome Extension is a strategy to make Claude Code available regardless of what tools a developer uses — and a message that it won\u0026rsquo;t lock in to any specific IDE ecosystem. Remote Control and GitHub Actions/GitLab integration mean something deeper: AI coding is shifting from \u0026ldquo;a tool I sit in front of and chat with\u0026rdquo; to \u0026ldquo;an agent that works in the background and reports results.\u0026rdquo; MCP\u0026rsquo;s open spec and the Agent SDK\u0026rsquo;s availability are attempts to turn Claude Code from a standalone tool into a platform — potentially a significant moat compared to competitors. Amazon Kiro, GitHub Copilot Workspace, and Cursor are all rapidly building out agent capabilities, and 2026 looks like the year AI coding tools make a genuine leap toward autonomous agents. In that competition, the winner will likely be determined not by raw code generation quality, but by how seamlessly the tool weaves itself into developers\u0026rsquo; entire workflow.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-claude-code-statusline-2026/cover-en.jpg","permalink":"/posts/2026-03-06-claude-code-statusline-2026/","title":"Claude Code 2026 — Statusline Update and the Multi-Environment AI Coding Ecosystem"},{"content":"Overview Google Antigravity is Google\u0026rsquo;s AI-first IDE powered by Gemini, entering the AI-driven development environment market alongside OpenAI Codex and Anthropic Claude Cowork. It goes beyond code autocomplete, targeting the vibe coding paradigm — building entire projects from natural language commands alone. Its key differentiator: deep integration with Google NotebookLM to build specialized sub-agent architectures.\nAntigravity: Basic Setup and UI Structure On first launch, Antigravity presents a web-based IDE layout reminiscent of Cursor or VS Code — but it\u0026rsquo;s fundamentally different in where control lives. The sidebar holds a file tree and project navigator, the center pane is a code editor, but the Gemini chat panel on the right is where actual work begins. Every toolbar button maps to a specific Gemini function, so reading the UI is itself a guide to the tool\u0026rsquo;s design philosophy.\nThe most important step in initial setup is connecting your Google account and initializing a project. After account linking, creating a new project lets Gemini automatically understand the project context — all subsequent chat requests are processed against that context. Notably, MCP (Model Context Protocol) connection settings are exposed right on the setup screen, a clear signal that Google has officially adopted MCP as the standard interface for external tool integration.\nFrom a vibe coding perspective, Antigravity\u0026rsquo;s barrier to entry is lower than other AI IDEs. Type \u0026ldquo;Build a to-do app in React\u0026rdquo; and Gemini proposes a file structure; approve it and code is generated immediately, with results visible in a built-in preview pane. This flow looks similar to Claude Cowork or Codex on the surface, but for developers in the Google ecosystem there\u0026rsquo;s a clear edge: direct integration with Google Cloud infrastructure (Cloud Run deployment, Firebase, etc.) is essentially one-click.\nThree-Way Comparison: Antigravity vs Codex vs Claude Cowork All three tools claim natural language-based code generation, but their design philosophies and actual user experience diverge sharply. OpenAI Codex leans toward a terminal-friendly CLI agent. Anthropic Claude Cowork excels at long-context processing and precise code review. Google Antigravity leads with visual UI and Google service ecosystem integration. Rather than one being objectively better, the right choice depends on your workflow style and cloud environment.\nCode quality differences surface most clearly when handling complex logic. Claude Cowork\u0026rsquo;s long context window shines for refactoring that references an entire large codebase. Codex delivers consistent performance on test writing and automation scripts. Antigravity provides the fastest results for UI component generation and Google Cloud boilerplate, but tends to require more revision cycles as domain-specific logic grows more complex.\ngraph TD Dev[\"Developer — Natural Language Request\"] Dev --\u003e AG[Google Antigravity] Dev --\u003e Codex[OpenAI Codex] Dev --\u003e CW[Claude Cowork] AG --\u003e AG_Engine[Gemini 2.0 Flash] Codex --\u003e OAI_Engine[\"GPT-4o / o3\"] CW --\u003e CW_Engine[Claude 3.7 Sonnet] AG_Engine --\u003e AG_Feat[\"Google Cloud Integration \u0026lt;br/\u0026gt; MCP Support \u0026lt;br/\u0026gt; NotebookLM Sub-Agent\"] OAI_Engine --\u003e Codex_Feat[\"CLI Agent \u0026lt;br/\u0026gt; Terminal-Centric \u0026lt;br/\u0026gt; Filesystem Access\"] CW_Engine --\u003e CW_Feat[\"Long Context Processing \u0026lt;br/\u0026gt; Code Review Focused \u0026lt;br/\u0026gt; MCP Support\"] AG_Feat --\u003e Deploy_AG[\"Firebase / Cloud Run\"] Codex_Feat --\u003e Deploy_OAI[\"General Purpose\"] CW_Feat --\u003e Deploy_CW[\"General Purpose\"] style AG fill:#4285F4,color:#fff style Codex fill:#10a37f,color:#fff style CW fill:#cc785c,color:#fffMCP support is becoming an increasingly important axis in comparing these tools. Claude Cowork was MCP\u0026rsquo;s original champion, Antigravity adopted it quickly, and Codex is building compatible external tool integration as well. This suggests the next front in AI IDE competition is shifting from model quality benchmarks toward ecosystem integration depth — how naturally a tool connects to external data sources and services is becoming the real productivity differentiator.\nBuilding a NotebookLM Sub-Agent Google NotebookLM is known as a document analysis and knowledge management tool, but connecting it to Antigravity transforms it into a domain-specific knowledge sub-agent. There are two integration paths. The first registers a NotebookLM share link in Antigravity\u0026rsquo;s MCP settings, injecting that notebook\u0026rsquo;s document knowledge directly into Antigravity\u0026rsquo;s chat context. The second wraps NotebookLM\u0026rsquo;s API endpoint as a custom MCP server — more precise query control, but higher upfront setup cost.\nThe practical value of this sub-agent architecture is clear. Upload hundreds of pages of legacy system documentation to NotebookLM, connect it to Antigravity, and when you ask \u0026ldquo;Write a new Python client that calls this legacy API,\u0026rdquo; Antigravity searches the relevant spec in NotebookLM to generate grounded code. The core value: significantly higher accuracy in internal domain knowledge areas where AI IDEs are normally most prone to hallucination.\nThe key concept in this architecture is role separation. Antigravity acts as the orchestrator handling code generation and execution. NotebookLM acts as the retriever providing domain knowledge. This pattern is essentially identical to RAG (Retrieval-Augmented Generation) architecture — but developers get the same effect through GUI-level setup without building a vector database or managing an embedding pipeline.\nReal-world demos have revealed limitations too. Noticeable latency exists in context transfer between NotebookLM and Antigravity, and longer NotebookLM responses reportedly correlate with some degradation in code generation quality. Access permission management for specific notebooks is also not yet granular, requiring additional information security consideration in team environments. Even so, the pattern this integration demonstrates — plugging a domain knowledge base into an AI IDE — is likely to become core architecture in enterprise AI development environments.\nQuick Links Google Antigravity Setup, Codex App, and Claude Cowork Comparison — todaycode channel, 29 min 43 sec. UI button walkthrough and three-way comparison hands-on Sub-Agent with Antigravity + NotebookLM — Two-soul AI Agent channel, 14 min 20 sec. Two NotebookLM integration methods and agent-building practice Insights Google Antigravity\u0026rsquo;s arrival means more than just another competitor. Google embedding Gemini inside a developer tool rather than selling it as a standalone product makes clear that the main battleground in the AI model race has shifted from API performance benchmarks to developer workflow integration. The NotebookLM sub-agent integration is particularly interesting — it signals that AI IDEs are evolving toward supplementing a single model\u0026rsquo;s limitations with multiple specialized agents. MCP as the standard connecting protocol for this ecosystem is also becoming evident: Anthropic proposed it, Google adopted it, and OpenAI is moving toward compatibility. Vibe coding is increasingly real, but right now it\u0026rsquo;s most practical for rapid prototyping in the design phase and boilerplate generation — complex business logic implementation still requires developer judgment and validation. In the three-way AI IDE competition, the real winner is likely not a specific model but whichever tool integrates most naturally with a developer\u0026rsquo;s existing stack.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-google-antigravity-ide-analysis/cover-en.jpg","permalink":"/posts/2026-03-06-google-antigravity-ide-analysis/","title":"Google Antigravity IDE Deep Dive — A New Player in the AI IDE Wars"},{"content":"Overview Google Code Wiki, publicly available at codewiki.google, is Google\u0026rsquo;s new AI documentation tool. Gemini analyzes a codebase, automatically generates an interactive knowledge base, and updates relevant documentation in real time every time a PR is merged. The tagline \u0026ldquo;Stop documenting. Start understanding.\u0026rdquo; captures it well: this tool is an attempt to shift documentation from a burden developers must shoulder to infrastructure AI maintains automatically.\nWhat Is Code Wiki? Code Wiki looks like an automated documentation tool on the surface, but its essence is an agentic system that transforms a codebase into a living knowledge graph. Traditional documentation tools — Confluence, Notion, GitBook — require developers to write content manually, and when code changes, documentation doesn\u0026rsquo;t follow automatically. This \u0026ldquo;drift\u0026rdquo; between code and docs is a chronic problem in large codebases. Because Code Wiki\u0026rsquo;s Gemini AI agent reads the code directly to generate documentation, code becomes the source of truth and documentation becomes its derivative.\nThe tool\u0026rsquo;s core positioning is captured in the phrase \u0026ldquo;A new perspective on development for the agentic era.\u0026rdquo; The agentic era means AI doesn\u0026rsquo;t just assist tools but judges and acts autonomously — Code Wiki declares it will take on that agentic role in the domain of documentation. The promise that Gemini-generated documentation stays \u0026ldquo;always up-to-date\u0026rdquo; suggests developers could be freed from the obligation to maintain docs manually.\nCode Wiki currently operates on an invite-only basis, publicly demoing some notable open-source repositories as featured repos. Private repository support is listed as \u0026ldquo;Coming Soon.\u0026rdquo; This staged rollout looks like a deliberate strategy — publicly validate AI-generated documentation quality while scaling infrastructure.\nCore Features Code Wiki\u0026rsquo;s first core feature is section-by-section deep exploration (Understand your code section by section). Rather than generating a single high-level overview, you can select a specific section and drill down into how it works. For new team members onboarding to a large project, or returning developers trying to understand how a particular service behaves, this replaces the old approach — read the code directly or ask a colleague. Whether Gemini\u0026rsquo;s explanations are accurate and useful enough is the key question, but the interactive exploration experience itself proposes a new documentation UX.\nThe auto-update mechanism is the most technically interesting part of Code Wiki. Every time a PR is merged, the Gemini agent analyzes the changed code and automatically updates relevant documentation. For this pipeline to work correctly, it must simultaneously solve three hard problems: diff analysis, identifying related documentation, and maintaining consistency with existing docs. Refactoring in particular — where code structure changes substantially — requires significant reasoning ability to determine which parts of previous documentation to update and which to retire.\nThe bidirectional link between code and documentation (Linked back to your code) has strong practical value. Reading an architecture overview and clicking on a specific service description takes you directly to that service\u0026rsquo;s source file; a function description links directly to the function\u0026rsquo;s definition. This moves away from the silo model where docs and code live separately, proposing a new pattern where documentation functions as a navigation layer over the code. JetBrains\u0026rsquo; code navigation and GitHub\u0026rsquo;s code search provide this experience at the code level — Code Wiki attempts the same experience at the natural language description level.\nAuto-generated diagrams are also notable. The promise: instead of mentally assembling complex systems piece by piece, code is transformed into clear, intuitive visual diagrams. Whether these diagrams are actually accurate for large microservice architectures or complex data flows needs more real-world validation. That said, a diagram extracted directly from code by AI is probably more current than one drawn manually by a human.\ngraph TD Repo[GitHub Repository] PR[PR Merged] Agent[Gemini AI Agent] DocGen[\"Auto-Generated Docs \u0026lt;br/\u0026gt; Section Explanations \u0026lt;br/\u0026gt; Diagrams\"] Wiki[\"Code Wiki \u0026lt;br/\u0026gt; Interactive Knowledge Base\"] Chat[\"Natural Language Queries \u0026lt;br/\u0026gt; Codebase Chat\"] CodeLink[\"Direct Code Links \u0026lt;br/\u0026gt; Jump to Definition\"] Repo --\u003e|Initial Analysis| Agent PR --\u003e|Change Trigger| Agent Agent --\u003e|Auto-Generate and Update| DocGen DocGen --\u003e Wiki Wiki --\u003e Chat Wiki --\u003e CodeLink CodeLink --\u003e Repo style Agent fill:#4285F4,color:#fff style Wiki fill:#34A853,color:#fffThe natural language chat with your codebase feature (Talk to your codebase) is described as a \u0026ldquo;24/7 on-call engineer\u0026rdquo; experience. This isn\u0026rsquo;t just document search — it\u0026rsquo;s real-time conversation with an AI that understands the codebase. If you could instantly answer questions like \u0026ldquo;What authentication method does this API endpoint use?\u0026rdquo; or \u0026ldquo;What events flow between the payment service and order service?\u0026rdquo;, onboarding time for new team members and the context-sharing burden on senior engineers would both drop.\nThe Paradigm Shift in Documentation for the Agentic Era Traditional documentation philosophy is built on the norm: \u0026ldquo;When code changes, update the docs.\u0026rdquo; In reality, this norm is rarely followed. The faster the development pace, the larger the team, and the harder it is to feel documentation has direct business value — the more docs fall behind. Code Wiki\u0026rsquo;s approach attempts to solve this human limitation through automation rather than norms. Instead of placing the documentation obligation on developers, it makes code changes an automatic pipeline trigger.\nThe deeper implication of this paradigm shift is a change in developer roles. Until now, one of a senior developer\u0026rsquo;s important contributions was capturing tacit knowledge — design decisions not explicit in code, historical context, tradeoffs — in documentation or passing it on to junior developers. As AI can automatically extract explicit knowledge from code, the valuable knowledge contributions developers make will increasingly move toward this tacit knowledge domain. Ironically, for AI to capture even tacit knowledge, developers need to leave richer context in commit messages, PR descriptions, and code comments — the better AI tools get, the higher the quality of structured information developers need to produce. A paradox emerges.\nFor Code Wiki to become a meaningful long-term tool, it must solve the trust problem with AI-generated documentation. When a developer writes documentation, accountability is clear. When AI-generated documentation is wrong — who\u0026rsquo;s responsible? And how much will developers trust and act on AI documentation? These are cultural questions, not technical ones. Particularly in mission-critical systems, basing maintenance decisions on AI documentation requires high confidence in that documentation\u0026rsquo;s accuracy.\nCode Wiki currently only works with public open-source repositories, with private repo support in progress. Enterprise adoption will require meeting governance requirements: code security, data sovereignty, on-premises deployment options. Google\u0026rsquo;s existing enterprise Google Cloud customer base is an advantage here, but overcoming corporate conservatism about exposing codebases to an external AI service is a separate challenge.\nQuick Links [Product Review] Google\u0026rsquo;s Code Wiki, Codebase Documentation — LOADING_ channel, 9 min 12 sec. Real-world review of codewiki.google Code Wiki Official Site — Featured repo demos and invitation signup Insights Code Wiki isn\u0026rsquo;t just a documentation tool — it\u0026rsquo;s a symbol of the inflection point where AI agents begin autonomously handling portions of the software development lifecycle. A developer\u0026rsquo;s action (merging a PR) automatically triggers an AI agent\u0026rsquo;s work, and the result is immediately shared with the whole team. This shows an early model of how agents and humans collaborate. Google releasing Antigravity (code writing) and Code Wiki (code documentation) simultaneously feels intentional — an attempt to create a complete loop where AI writes code and AI explains that code. If NotebookLM serves as the knowledge repository, Antigravity generates the code, and Code Wiki documents the results, the integration of these three tools may be the big picture Google has in mind for AI development environments. The practical implication for developers: good commit messages and well-structured PR descriptions are no longer just team collaboration etiquette — they become the key inputs that determine AI documentation quality.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-google-code-wiki/cover-en.jpg","permalink":"/posts/2026-03-06-google-code-wiki/","title":"Google Code Wiki — Let AI Write Your Codebase Documentation"},{"content":"Overview A new employee joins the team. They\u0026rsquo;re talented, but they don\u0026rsquo;t know your coding conventions, your preferred frameworks, or your PR review standards. So you prepare onboarding docs, walk them through the style guide, and explain the patterns you use repeatedly. It takes time, but once they\u0026rsquo;ve been properly onboarded, they work in the right direction without needing constant reminders.\nUsing Claude Code without Harness is like resetting that onboarding every single day. Harness solves this. Define your project\u0026rsquo;s coding approach, preferred libraries, and team rules once — and Claude Code carries that context forward from session to session. One-time setup, compounding savings over time.\nWhat Is Harness Harness is a configuration system that gives Claude Code persistent context. Where CLAUDE.md stores project-wide instructions in a single Markdown file, Harness defines AI behavior in a more structured way. Its three core components are Skills, Agents, and Commands.\nWithout Harness, Claude Code is a general-purpose AI. It might use FastAPI or Django, handle dependencies differently, and apply different error-handling patterns depending on the session. With Harness installed, Claude Code starts every session already knowing: this project uses FastAPI, schemas are defined with Pydantic v2, and error responses follow this specific format. The difference isn\u0026rsquo;t just convenience — it directly affects the quality and consistency of AI output.\nThe new-hire analogy makes this intuitive. Even a skilled new developer can head in the wrong direction without team context. If the team lead has to re-explain context every session, that cost multiplies across the whole team, not just the individual. Harness replaces that recurring cost with a single initial installation.\nThe Three Core Concepts Skills — Domain Knowledge Documents A Skill is a Markdown document that teaches Claude Code the patterns for a specific domain: how to structure a FastAPI backend in this project, what rules govern Next.js component creation, how to write Mermaid diagrams correctly. Skill files are the core mechanism for shifting Claude Code\u0026rsquo;s behavior from general to specialized.\nHere\u0026rsquo;s an example of what a FastAPI backend Skill file might look like:\n# FastAPI Backend Skill ## Project Structure - Routers in `app/routers/` separated by domain - Schemas with Pydantic v2 (`app/schemas/`) - Dependency injection in `app/dependencies.py` ## Response Format Success: { \u0026#34;success\u0026#34;: true, \u0026#34;data\u0026#34;: { ... } } Error: { \u0026#34;success\u0026#34;: false, \u0026#34;error\u0026#34;: { \u0026#34;code\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;message\u0026#34;: \u0026#34;...\u0026#34; } } ## Coding Rules - async/await required — no synchronous endpoints - Specify response_model on every endpoint - Use custom AppException instead of HTTPException Skill files document the team\u0026rsquo;s decisions as a reference Claude Code uses when generating code. They\u0026rsquo;re not just style guides — they become Claude Code\u0026rsquo;s decision-making criteria. When the team\u0026rsquo;s rules change, update the Skill file, and Claude Code\u0026rsquo;s output automatically follows.\nSkills become more valuable as their scope broadens. Define a FastAPI Skill, a Next.js Skill, a database migration Skill, a PDF generation Skill, and a Mermaid diagram Skill — and Claude Code writes code consistently across that entire stack. No need to include all that knowledge in every prompt; Harness loads the right Skill automatically.\nAgents — Purpose-Built AI Instances Agents are Claude Code instances pre-configured for a specific role: Planner, Plan Reviewer, Web Research Specialist. Each agent has pre-defined instructions for what it should do, which Skills to reference, and which tools it can use.\nThe Planner agent writes a detailed execution plan before implementation begins. The Plan Reviewer agent independently examines that plan and identifies gaps. The Web Research Specialist searches for up-to-date library documentation and technical references. Splitting agents by role produces far more predictable, reliable output than a single generalist AI trying to do everything.\nCommands — Trigger Entire Workflows with a Single Slash Commands are macros that bundle a recurring workflow into a single slash command. Define /review-pr, /generate-schema, /write-tests — and instead of writing a complex prompt each time, a single command triggers the entire workflow. The Claude Code skill in this log-blog project operates on the same principle.\nSkills in Practice — From FastAPI to Mermaid Skills in the Harness ecosystem are organized around the project\u0026rsquo;s tech stack. A FastAPI backend Skill defines router structure, schema patterns, and error handling. A Next.js frontend Skill defines component naming conventions, state management approach, and API call patterns.\nA Mermaid diagram Skill prevents the syntax errors that commonly appear when Claude Code generates diagrams. For example, documenting the rules that Mermaid v11 doesn\u0026rsquo;t support \\n for line breaks in node labels, and that the Hugo Stack theme requires \u0026amp;lt;br/\u0026amp;gt; instead of \u0026lt;br/\u0026gt;, means Claude Code automatically follows these rules every time it creates a diagram.\n# Mermaid Diagram Skill ## Hugo Stack Theme Rules - Node label line breaks: use `\u0026amp;lt;br/\u0026amp;gt;` (NOT `\\n`, NOT `\u0026lt;br/\u0026gt;`) - Labels containing slashes must be quoted: `[\u0026#39;label/text\u0026#39;]` - No double quotes — potential Hugo parsing conflict - One diagram with a syntax error hides ALL diagrams on the page — validate syntax thoroughly ## Allowed Diagram Types flowchart TD, graph TD, sequenceDiagram, classDiagram PDF/PPTX document tool Skills and web design review Skills each guide Claude Code to produce consistent output in their respective domains. As the number of Skills grows, Claude Code becomes more consistent and predictable across the entire project.\nBuilding an Agent Team One of the interesting aspects of Harness\u0026rsquo;s agent design is that it builds a team structure where role-optimized agents collaborate — rather than a single AI trying to cover all roles. Just as a development team divides responsibilities between developers, reviewers, and researchers, AI agents are structured the same way.\nThe Planner agent writes a detailed plan before implementation. It determines which files need to change, in what order, and what risks to watch for. The Plan Reviewer agent independently examines this plan and surfaces missed edge cases or flawed assumptions. The collaboration between these two agents reduces the self-confirmation bias that emerges when a single agent both writes and reviews its own plan.\ngraph TD U[User Request] --\u003e H[Harness] H --\u003e SK[Skills] H --\u003e AG[Agents] H --\u003e CM[Commands] SK --\u003e S1[FastAPI Skill] SK --\u003e S2[Next.js Skill] SK --\u003e S3[Mermaid Skill] SK --\u003e S4[Custom Skills...] AG --\u003e A1[Planner] AG --\u003e A2[Plan Reviewer] AG --\u003e A3[Web Research Specialist] CM --\u003e C1[\"/review-pr\"] CM --\u003e C2[\"/generate-schema\"] CM --\u003e C3[\"/write-tests\"] S1 --\u003e CC[Claude Code] S2 --\u003e CC S3 --\u003e CC A1 --\u003e CC A2 --\u003e CC C1 --\u003e CC C2 --\u003e CC CC --\u003e PM[Project Memory] CC --\u003e OUT[\"Output — consistent code / docs\"] PM --\u003e CC style H fill:#1a3a5c,color:#fff style CC fill:#2d5a27,color:#fff style OUT fill:#5c3a1a,color:#fffThe Web Research Specialist agent focuses on finding current API documentation, library changes, and technical references. Instead of telling Claude Code \u0026ldquo;refer to the latest Pydantic v2 docs\u0026rdquo; every time, the research agent autonomously gathers and organizes the necessary information, then hands it off to the implementation agent. This division of labor improves the overall workflow quality by letting each agent focus on its own role.\nGeneral AI vs. Dedicated Expert The question Harness poses contains a more fundamental perspective on how AI tools should be used. A general-purpose AI can do anything but may not be optimal in specific contexts. A dedicated specialist has a narrower scope, but within that scope delivers far more predictable and reliable results.\nJust as a software development team gets better overall productivity from specialized roles rather than one person covering everything, the same principle applies to AI agents. Harness is a framework that layers project-specific expertise onto Claude Code — a powerful general-purpose AI — and turns it into a dedicated team member.\nDeveloper accounts of spending six months training their AI to work well reflect the fact that this process isn\u0026rsquo;t trivial. What Skills to define, how to divide agents by role, at what level of abstraction to create Commands — all of this depends on the characteristics of the project and the team. But once it\u0026rsquo;s properly configured, the savings that compound with each subsequent session quickly recoup the initial investment.\nQuick Links Harness Unveiled — Making Claude Code Your Dedicated AI Employee — Maker Evan channel, full walkthrough from installation to Skills/Agents/Commands (5 min 13 sec, 7,800 views) Why It Took a Developer 6 Months to Train Their AI — Maker Evan earlier video, all the trial and error exposed (110k views) Insights Harness isn\u0026rsquo;t technically a new invention. It\u0026rsquo;s a combination of Markdown files and a configuration system. But the reason this simple combination qualitatively changes the Claude Code experience comes from a shift in how you think about working with AI — from explaining everything from scratch every session, to configuring once and having it remember permanently. The tripartite structure of Skills, Agents, and Commands each solves a distinct problem: documenting knowledge, specializing roles, and automating workflows. The most effective way to transfer team context to AI is the most explicit way. Skill files have a side effect of converting the team\u0026rsquo;s tacit knowledge into explicit documentation — in the process, rules that team members took for granted get formally documented for the first time. Separating agents by role has genuine practical value in reducing the self-confirmation bias that emerges when a single AI both writes and evaluates its own plans. If you calculate the ROI by comparing the initial setup investment against the long-term savings, it typically pays back faster than almost any other developer tool investment — especially for individuals and teams who repeatedly work with the same tech stack.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-claude-code-harness/cover-en.jpg","permalink":"/posts/2026-03-06-claude-code-harness/","title":"Harness — From General-Purpose AI to Dedicated Team Member"},{"content":"Overview You give Claude Code a task. It reports \u0026ldquo;successfully completed.\u0026rdquo; You run the tests. Errors. This mismatch comes from a structural limitation in AI coding tools: the AI stops the moment it decides the task is done, without verifying whether that decision is actually correct.\nRalph Loop directly addresses this problem. The core idea is simple: even when AI says \u0026ldquo;done,\u0026rdquo; automatically restart it and make it verify itself. Trap an AI agent inside a loop that never ends, and the agent detects failures, makes fixes, and keeps going until it actually passes. This idea emerged as one of the most watched automation patterns in the AI development community during 2025–2026.\nThe Origin of Ralph Loop The name comes from Ralph Wiggum, a character from The Simpsons. Ralph isn\u0026rsquo;t particularly smart, but he never gives up. Geoffrey Huntley drew on this metaphor to propose the simplest possible agent loop pattern, and the original implementation is a single bash command:\nwhile :; do cat PROMPT.md | agent; done That\u0026rsquo;s it. Write your task instructions in PROMPT.md, and this loop immediately spins up a new agent every time the previous one exits, restarting with the same prompt. Context window full, agent hangs, error thrown — the loop keeps going. Each new agent reads the filesystem and git history to understand how far the previous agent got, then picks up where it left off.\nThe pattern\u0026rsquo;s breakthrough moment was a Y Combinator hackathon. Participants spun up Ralph Loop on a GCP instance and went to sleep. By morning, 1,100 commits had accumulated across 6 repositories. The Browser Use library had been nearly completely ported from Python to TypeScript overnight. Total cost: $800 — equivalent to hiring a developer at $10.50 per hour. This case validated the real-world utility of Ralph Loop and spread through the community.\nDerivative projects followed quickly. The snarktank/ralph repository accumulated over 9,200 GitHub stars, and the oh-my-opencode project included /ralph-loop as a built-in command. What started as an experimental hack rapidly evolved into a standardized tool.\nWhy It Works: Context vs. Filesystem Traditional AI coding tools store progress only inside the context window. An LLM\u0026rsquo;s context window is finite; once full, earlier content is forgotten. On long tasks, agents either fail to remember what they\u0026rsquo;ve already done or hit context limits and terminate.\nRalph Loop\u0026rsquo;s key insight is storing state in the external filesystem and git, not in context. When an agent writes code, it\u0026rsquo;s saved to files. When it makes git commits, history accumulates. When context overflows and the agent exits, the loop spins up a new agent. That new agent reads the filesystem and checks git log to understand how far the previous agent got, then continues.\nflowchart TD P[PROMPT.md] --\u003e A[Run Agent] A --\u003e T{Execute Task} T --\u003e W[\"Write Files + git commit\"] W --\u003e C{Context Limit?} C --\u003e|No| V{Verification Pass?} C --\u003e|Yes — Agent Exits| R[Start New Agent] R --\u003e FS[\"Read Filesystem / git State\"] FS --\u003e T V --\u003e|Fail| F[Analyze Error + Fix] F --\u003e T V --\u003e|Pass| E[Complete] style P fill:#2d2d2d,color:#fff style E fill:#1a5c2e,color:#fff style R fill:#5c3317,color:#fffWhy this architecture matters: it completely decouples the persistence of an agent loop from the limits of a context window. Context can reset at any time, but the filesystem and git are persistent. Each new agent starts \u0026ldquo;fresh\u0026rdquo; while fully inheriting the previous agent\u0026rsquo;s results. This pattern is particularly powerful for long-context work: large-scale refactoring, library porting, legacy code migration.\nPatterns That Evolved to Production Level Starting from a simple bash loop, Ralph Loop has evolved in multiple directions to meet production complexity. Peter Steinberger\u0026rsquo;s OpenClaw project (152,000+ GitHub stars) represents a case of bringing agent loops to real service level. OpenClaw connects 12+ channels including WhatsApp, Slack, Discord, iMessage, and Telegram; manages the agent\u0026rsquo;s personality and behavioral principles with a \u0026ldquo;soul document\u0026rdquo;; and includes gateway-based session routing and usage monitoring — with over 8,700 total commits.\nThe Nanobot project distills the agent loop\u0026rsquo;s essence into 330 lines. Stripping away infrastructure and preserving only the core loop, this code most clearly shows Ralph Loop\u0026rsquo;s mechanical structure:\nwhile iteration \u0026lt; self.max_iterations: iteration += 1 response = await self.provider.chat( messages=messages, tools=self.tools.get_definitions() ) if response.has_tool_calls: for tool_call in response.tool_calls: result = await self.tools.execute( tool_call.name, tool_call.arguments) messages = self.context.add_tool_result( messages, tool_call.id, tool_call.name, result) else: final_content = response.content break Looking at this structure, it\u0026rsquo;s clear how much Ralph Loop is based on ancient computer science concepts: while loop, tool call response handling, message history accumulation, exit condition. Nothing new. What changed: the decision-maker inside the loop shifted from rules-based logic to an LLM, and the definition of \u0026ldquo;done\u0026rdquo; became a contextual judgment by AI rather than a pre-programmed condition. max_iterations is a safety guard against infinite loops — when the limit is reached, instead of force-terminating, it calls MaxReachedAgent to summarize progress and suggest next steps.\nFrontALF\u0026rsquo;s Real-World Design at Channel.io Channel.io\u0026rsquo;s AI support system FrontALF is a case of applying the Ralph Loop pattern to a real B2B service, separating two loops by purpose. This design shows an architectural perspective that specializes agent loops for different situations beyond simple repetition.\nThe first is a Stateless Agent Loop, used for customer Q\u0026amp;A, RAG search, and situations requiring fast response. Each turn runs independently without storing state externally:\nfor i := 0; i \u0026lt; maxTurns; i++ { response := llm.Request(currentHistory) currentHistory = append(currentHistory, response.Events...) if !checkShouldContinue(response.Events) { break } } Inside the RAG Handler, a mini-loop judges whether search results are sufficient and re-searches if needed. The outer loop is simple, but inner loops autonomously supplement as needed.\nThe second is a Stateful Task Loop, used for multi-step workflows like refund processing or tasks requiring external system approvals:\ntype TaskSession struct { CurrentNodeID string TaskMemory map[string]any // Shared state across nodes NodeTrace []string // Execution path tracking } TaskMemory maintains shared state across nodes; NodeTrace records the execution path to support debugging and restarting. If a specific node fails, it can be re-run from that node. Sessions can be paused while waiting for external approval, then resumed. The separation of the two loops is a pragmatic choice — when requirements differ, don\u0026rsquo;t force a single pattern.\nQuick Links Claude Code Ralph Loop — Making AI Code While You Sleep — OhNoteToday channel, Ralph Loop concept intro and practice (8 min 16 sec) How to Make Claude Test and Fix on Its Own | Ralph Loop — DingCodingCo channel, hands-on tutorial (6 min 29 sec, 32,000 views) Ralph Loop, OpenClaw — Nothing New — Channel.io engineer Mong\u0026rsquo;s in-depth analysis, including FrontALF real-world design Insights What makes Ralph Loop interesting isn\u0026rsquo;t technical innovation — it\u0026rsquo;s a shift in perspective. while loops, state machines, retry patterns, graceful shutdown — these have existed for decades. What changed: the decision-maker inside the loop went from rules-based logic to an LLM, and the definition of \u0026ldquo;done\u0026rdquo; became a contextual criterion that AI judges rather than a pre-programmed condition. The Y Combinator case\u0026rsquo;s $800 / $10.50 per hour figure shows this pattern already operates as a realistic economic unit. Channel.io\u0026rsquo;s two-loop separation — Stateless and Stateful — leaves a practical lesson: don\u0026rsquo;t force a single pattern when requirements differ. OpenClaw\u0026rsquo;s soul document concept — explicitly defining an agent\u0026rsquo;s personality and behavioral principles as a document — raises a deeper design question beyond simple loop repetition: how do you control and make agents trustworthy? For production deployment of Ralph Loop, safety guards like max_iterations and cost monitoring are essential — an unconverging loop can drive up costs at non-linear rates.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-ralph-loop-ai-automation/cover-en.jpg","permalink":"/posts/2026-03-06-ralph-loop-ai-automation/","title":"Ralph Loop — The Agent Loop Pattern Where AI Tests and Fixes Itself"},{"content":"Overview code-server is an open-source project (GitHub 76,491 stars, primary language TypeScript) that lets you run VS Code in a browser. Install code-server on a server, connect via browser, and you have a full VS Code development environment anywhere. But that same \u0026ldquo;runs in a browser\u0026rdquo; property creates a critical problem for VSCode extensions that use OAuth-based authentication.\nThe issue is URI schemes. Local VS Code handles OAuth redirects via the vscode:// scheme — the OS registers a handler that routes URLs starting with vscode:// to the VS Code process. In code-server, VS Code runs as a browser tab. The browser doesn\u0026rsquo;t know the code-oss:// scheme, and there\u0026rsquo;s no OS-level handler. The OAuth flow breaks entirely at the redirect step after authentication completes. This post analyzes the technical structure of that problem and maps out the correct solutions.\nThe Core Problem: vscode:// vs code-oss:// URI Schemes Extensions using OAuth in local VS Code typically follow this flow: the extension opens the OAuth provider\u0026rsquo;s auth URL in the browser; the user logs in and approves permissions; the provider redirects to a pre-registered redirect_uri in the form vscode://extension-name/auth-callback; the OS recognizes this scheme and wakes the VS Code process; the extension extracts the authorization code from the URI and exchanges it for an access token.\nIn code-server, VS Code\u0026rsquo;s own URI scheme changes to code-oss:// — the default scheme of Code-OSS, the VS Code fork that code-server uses. This scheme is not registered in either the browser or the OS. When a redirect occurs to a URL like code-oss://augment.vscode-augment/auth/..., the browser shows an error like this:\nFailed to launch \u0026#39;code-oss://{extension_name}?{params}\u0026#39; because the scheme does not have a registered handler code-server Issue #6584, filed by user @tianze0926 using the Augment Code extension, reported exactly this symptom. After authentication completed, the code-oss://augment.vscode-augment/auth/... URI wouldn\u0026rsquo;t open automatically, requiring manual copy-paste. This isn\u0026rsquo;t a code-server-specific quirk — it\u0026rsquo;s a structural limitation of any browser-based VS Code environment.\nWhy OAuth Fails in Browser Environments graph LR subgraph local[\"Local VS Code\"] L1[\"Extension Opens \u0026lt;br/\u0026gt; OAuth URL\"] --\u003e L2[\"OAuth Approval \u0026lt;br/\u0026gt; in Browser\"] L2 --\u003e L3[\"redirect_uri: \u0026lt;br/\u0026gt; vscode://ext/cb\"] L3 --\u003e L4[\"OS Calls \u0026lt;br/\u0026gt; vscode:// Handler\"] L4 --\u003e L5[\"Extension Receives \u0026lt;br/\u0026gt; Token — SUCCESS\"] end subgraph cs[\"code-server Browser\"] C1[\"Extension Opens \u0026lt;br/\u0026gt; OAuth URL\"] --\u003e C2[\"OAuth Approval \u0026lt;br/\u0026gt; in New Tab\"] C2 --\u003e C3[\"redirect_uri: \u0026lt;br/\u0026gt; code-oss://ext/cb\"] C3 --\u003e C4[\"Browser: No Handler \u0026lt;br/\u0026gt; — FAIL\"] C4 --\u003e C5[\"Auth Aborted \u0026lt;br/\u0026gt; No Token\"] end style L5 fill:#27ae60,color:#fff style C4 fill:#e74c3c,color:#fff style C5 fill:#e74c3c,color:#fffThe OS-level URI scheme handler acts as a bridge in local VS Code. On macOS through Info.plist-registered URL schemes, on Windows through the registry, on Linux through XDG settings — vscode:// URLs get delivered to the VS Code process because VS Code registers that scheme handler at install time.\ncode-server runs as a browser tab. OAuth authentication proceeds in a new tab or popup, and when complete the OAuth provider attempts to redirect to the registered redirect_uri. But code-oss:// isn\u0026rsquo;t in the browser\u0026rsquo;s custom protocol handler list. The browser doesn\u0026rsquo;t know how to handle this URL and returns an error. As code-server maintainer @code-asher analyzed, fixing this requires either modifying VS Code itself or having the extension choose a different authentication approach.\nThe polling approach was an early suggested workaround: instead of an OAuth redirect, the extension opens its own server endpoint and the client polls it periodically to check whether a token has arrived. This changes the redirect_uri to a regular HTTPS URL like https://extension-server.com/callback, bypassing the browser scheme problem. But it requires separate server infrastructure and raises security concerns about tokens passing through an intermediate server, making it an incomplete solution.\nregisterUriHandler — The Correct Solution The VSCode Extension API\u0026rsquo;s vscode.window.registerUriHandler is the official solution. This API lets an extension directly register a handler for URIs in the form vscode://publisher.extension-name/path. In code-server environments, the code-server server side intercepts incoming requests for that URI and routes them to the extension handler.\nHow it works: code-server runs as a web server, so the OAuth redirect_uri can be set to a regular HTTPS URL like https://your-code-server.com/vscode-extension/callback. When authentication completes, this HTTPS endpoint is called, and code-server internally converts it into a vscode:// URI event and delivers it to the extension handler. The browser\u0026rsquo;s custom scheme problem is bypassed at the HTTP/HTTPS layer.\n// Correct approach — using registerUriHandler import * as vscode from \u0026#39;vscode\u0026#39;; export function activate(context: vscode.ExtensionContext) { // Register handler for vscode://publisher.my-extension/auth-callback URI const uriHandler = vscode.window.registerUriHandler({ handleUri(uri: vscode.Uri): void { if (uri.path === \u0026#39;/auth-callback\u0026#39;) { const params = new URLSearchParams(uri.query); const code = params.get(\u0026#39;code\u0026#39;); const state = params.get(\u0026#39;state\u0026#39;); if (code \u0026amp;\u0026amp; state) { // Exchange authorization code for token exchangeCodeForToken(code, state); } } } }); context.subscriptions.push(uriHandler); } // Setting redirect_uri when starting OAuth flow function startOAuthFlow() { // In code-server, this gets translated and routed via HTTPS const redirectUri = vscode.env.uriScheme + \u0026#39;://publisher.my-extension/auth-callback\u0026#39;; const authUrl = buildOAuthUrl({ redirect_uri: redirectUri }); vscode.env.openExternal(vscode.Uri.parse(authUrl)); } // Wrong approach — hardcoded code-oss:// scheme function startOAuthFlowBroken() { // This URL cannot be opened in code-server browser environments const redirectUri = \u0026#39;code-oss://extension-name/auth-callback\u0026#39;; const authUrl = buildOAuthUrl({ redirect_uri: redirectUri }); vscode.env.openExternal(vscode.Uri.parse(authUrl)); // Browser: \u0026#34;the scheme does not have a registered handler\u0026#34; error } Using vscode.env.uriScheme is the key. This value returns vscode in local VS Code and code-oss (or the appropriate value for the environment) in code-server. You can dynamically detect the current environment\u0026rsquo;s scheme and construct the redirect_uri without hardcoding. GitLens successfully implemented this pattern and was cited by the code-server maintainer as the reference implementation. Community confirmation: GitLens OAuth authentication works correctly in code-server.\nPopup Window API Request (VSCode #142080) VSCode issue #142080 requests an Extension API addition for handling OAuth2 authentication in popup windows. Currently OAuth windows can only be opened as new tabs; with popup windows, scripts can automatically close the window after authentication completes, greatly improving the user experience.\nVSCode team member @TylerLeonhardt explained that the GitHub Authentication extension receives popup handling on vscode.dev through a hardcoded URI whitelist — not an official API available to general extensions. Electron maintainer @deepak1556 noted that on desktop, the implementation delegates to OS platform handlers (XDGOpen, OpenURL, ShellExecuteW), making a general-purpose popup API complex to implement. There\u0026rsquo;s some opinion that implementation is feasible in web-embedded environments.\nThis issue is currently OPEN, awaiting community upvotes (20 needed). The current situation — where only the GitHub Authentication extension receives special popup treatment — is a known community frustration. The core demand is an official API that lets general extensions provide the same user experience.\nBrowser Restrictions on window.close() Using OAuth popup windows requires window.close() to close the window after authentication completes. But browsers have an important restriction on window.close(). Per MDN spec, scripts can only close windows that were opened by script (via window.open()) or windows opened through links/forms without user-initiated navigation.\nIf a user directly opens a new tab via Ctrl+Click or the middle mouse button, scripts cannot close it. Chrome prints this to the console in that case:\nScripts may not close windows that were not opened by script. For the OAuth popup pattern to work correctly, the popup window must be opened with window.open(). The completion page uses window.opener to send a message to the parent window (window.opener.postMessage()), then calls window.close(). This is the standard implementation for OAuth popups:\n// OAuth initiator side (extension/app) const popup = window.open(authUrl, \u0026#39;oauth-popup\u0026#39;, \u0026#39;width=600,height=700\u0026#39;); window.addEventListener(\u0026#39;message\u0026#39;, (event) =\u0026gt; { if (event.source === popup \u0026amp;\u0026amp; event.data.type === \u0026#39;oauth-success\u0026#39;) { const { code, state } = event.data; // Proceed with token exchange exchangeCodeForToken(code, state); } }); // OAuth callback page (redirect_uri) // Pass code to parent window and close popup after auth completes window.opener.postMessage({ type: \u0026#39;oauth-success\u0026#39;, code: new URLSearchParams(location.search).get(\u0026#39;code\u0026#39;), state: new URLSearchParams(location.search).get(\u0026#39;state\u0026#39;) }, \u0026#39;*\u0026#39;); window.close(); // Closeable because opened via window.open() Debugger Detach Issues in WSL1 VSCode issue #1650 (vscode-js-debug) looked like an OAuth problem at first but had a different root cause. Reports described a Chrome debug session disconnecting on OAuth redirect (cross-domain navigation). vscode-js-debug maintainer @connor4312 responded that \u0026ldquo;once connected, connections should stay connected — no known issues.\u0026rdquo;\nInvestigation revealed the actual cause: WSL1 network isolation. WSL1 runs without a Linux kernel, translating Linux system calls on top of the Windows kernel — this structure causes cases where network interfaces aren\u0026rsquo;t properly shared. Chrome DevTools Protocol connections breaking during OAuth redirects when passing through WSL1\u0026rsquo;s network layer were the problem. The fix: run VS Code directly on Windows rather than in WSL1, or migrate to WSL2. WSL2 uses a real Linux kernel and doesn\u0026rsquo;t have network isolation issues.\nThis issue is a separate example from the code-oss scheme problem, but illustrates a broader pattern: \u0026ldquo;VSCode extensions in browser/remote environments behave differently from local environments.\u0026rdquo; With extensions running in WSL, Docker, code-server, vscode.dev, and more, extension developers need to deeply understand the differences between each environment.\nQuick Links code-server GitHub — 76,491 stars, TypeScript open-source project code-server Issue #6584 — code-oss:// scheme OAuth failure (CLOSED) VSCode Issue #142080 — OAuth2 popup window Extension API request (OPEN) VSCode API: registerUriHandler — Official API documentation MDN: window.close() — Browser window close restrictions GitLens Extension — Reference implementation using registerUriHandler Insights The code-server OAuth problem illustrates just how complex a compatibility challenge \u0026ldquo;VS Code running in a browser\u0026rdquo; entails. The OS-level URI scheme handler that works transparently in local environments simply doesn\u0026rsquo;t exist inside a browser sandbox — bridging that gap is a VS Code core-level problem that the code-server team can\u0026rsquo;t solve alone. The registerUriHandler API exists as the solution, but not every extension developer knows about it or uses it correctly — even commercial products like Augment Code ran into this problem. That GitLens provides a successful reference implementation demonstrates the value of open-source knowledge sharing once again. The pattern of using vscode.env.uriScheme to dynamically detect environment is a technique every VSCode extension developer who needs to support local, remote, and browser environments must master. If the popup window API (#142080) is standardized as an official API, OAuth UX would improve significantly — but whether the current situation where only GitHub Auth gets special treatment will improve is unclear. The WSL1 debugger issue offers a separate lesson: networking problems can stem from structural differences in the execution environment rather than code bugs, so environment diagnosis should come first in debugging.\n","date":"2026-03-06T00:00:00+09:00","image":"/images/posts/2026-03-06-vscode-code-server-oauth/cover-en.jpg","permalink":"/posts/2026-03-06-vscode-code-server-oauth/","title":"VSCode + code-server OAuth Failures — The code-oss:// Scheme Problem Explained"},{"content":"Overview A one-day development log of introducing an Expert Agent Team architecture into a KIS OpenAPI-based AI trading system. Covers the four-expert AI + Chief Analyst discussion simulation, a pure-Python technical indicator calculator, and three KOSPI200 data source swaps that ended in hard lessons.\ngraph TD A[KOSPI200 constituents] --\u003e B[Volume/Change TOP50 intersection] B --\u003e C[5 to 25 candidate stocks] C --\u003e D[Daily candle data collection] D --\u003e E[Technical indicator calculation] E --\u003e F[4 Expert parallel analysis] F --\u003e G[Chief Analyst discussion] G --\u003e H[Trade signal generation] style F fill:#e8d44d,color:#333 style G fill:#e74c3c,color:#fffExpert Agent Team Architecture The previous MarketScanner analyzed stocks with a single Claude call. This was replaced by a discussion structure: four specialists analyze from their own perspectives, and a Chief Analyst synthesizes their views.\nThe Four Specialists Specialist Analysis Focus Technical Analyst MA alignment/divergence, RSI zones, MACD cross, Bollinger Bands Momentum Trader Volume surge ratio, Stochastic K/D, short-term breakout patterns Risk Assessor ATR-based volatility, RSI overbought, portfolio concentration Portfolio Strategist Cash allocation, sector diversification, opportunity cost The key is calling all four in parallel via asyncio.gather:\nasync def run_expert_panel(data_package: dict) -\u0026gt; list[dict]: experts = [ (\u0026#34;Technical Analyst\u0026#34;, \u0026#34;MA alignment/divergence, RSI, MACD ...\u0026#34;), (\u0026#34;Momentum Trader\u0026#34;, \u0026#34;Volume surge, Stochastic K/D ...\u0026#34;), (\u0026#34;Risk Assessor\u0026#34;, \u0026#34;ATR-based volatility, RSI overbought ...\u0026#34;), (\u0026#34;Portfolio Strategist\u0026#34;, \u0026#34;Cash allocation, sector concentration ...\u0026#34;), ] tasks = [_call_expert(persona, focus, data_package) for persona, focus in experts] return await asyncio.gather(*tasks, return_exceptions=True) Chief Analyst Discussion Simulation Once four opinions are in, the Chief Analyst reviews the bullish/bearish ratio and makes a final call. The prompt is designed to evaluate the reasoning behind minority views, not just count votes:\n# even 3 bullish vs 1 bearish can result in HOLD if the bearish reasoning is strong prompt = f\u0026#34;\u0026#34;\u0026#34; Expert opinion summary: {analyses_text} When the vote is not unanimous, pay special attention to the concerns raised by the minority opinion. \u0026#34;\u0026#34;\u0026#34; Pure Python Technical Indicator Calculator To eliminate external library dependencies (TA-Lib, pandas-ta), RSI, MACD, Stochastic, Bollinger Bands, and ATR were implemented directly.\ndef calculate_rsi(closes: list[float], period: int = 14) -\u0026gt; float | None: gains, losses = [], [] for i in range(1, len(closes)): diff = closes[i] - closes[i - 1] gains.append(max(diff, 0)) losses.append(max(-diff, 0)) avg_gain = sum(gains[:period]) / period avg_loss = sum(losses[:period]) / period # Wilder\u0026#39;s smoothing — exponential smoothing, not SMA for i in range(period, len(gains)): avg_gain = (avg_gain * (period - 1) + gains[i]) / period avg_loss = (avg_loss * (period - 1) + losses[i]) / period rs = avg_gain / avg_loss if avg_loss != 0 else float(\u0026#39;inf\u0026#39;) return round(100 - (100 / (1 + rs)), 2) Wilder\u0026rsquo;s Smoothing is used because it\u0026rsquo;s more sensitive to recent values than a plain SMA, improving the timeliness of trading signals.\nThe KOSPI200 Data Source Saga Three data source swaps in a single day. Here\u0026rsquo;s each failure and how it was resolved.\ngraph LR A[\"KIS API\u0026lt;br/\u0026gt;inquire_index_components\"] --\u003e|failed| B[\"KIS API\u0026lt;br/\u0026gt;market_cap\"] B --\u003e|30-item limit| C[pykrx] C --\u003e|session cookie LOGOUT| D[\"NAVER Finance\u0026lt;br/\u0026gt;scraping\"] style A fill:#ff6b6b,color:#fff style B fill:#ff9f43,color:#fff style C fill:#ff6b6b,color:#fff style D fill:#2ecc71,color:#fffAttempt 1: KIS API inquire_index_components ❌ Not registered in domestic_stock.json → API call impossible KIS OpenAPI\u0026rsquo;s inquire_index_components exists in the documentation but was never registered in the actual SDK. A ghost API.\nAttempt 2: KIS API market_cap (fid_input_iscd=2001) ⚠️ Call succeeds but returns a maximum of 30 items Even with a KOSPI200 filter (2001), only the top 30 market-cap stocks are returned. Not enough for screening all 200 constituents.\nAttempt 3: pykrx A popular Python library for pulling KRX official data. But:\n❌ KRX endpoint returns LOGOUT without a session cookie pykrx\u0026rsquo;s internal HTTP session sometimes fails to manage KRX server authentication cookies properly, causing the server to return only the text LOGOUT.\nFinal Solution: NAVER Finance Scraping The most stable source turned out to be NAVER Finance:\ndef _fetch_kospi200_via_naver() -\u0026gt; dict[str, str]: session = requests.Session() session.headers[\u0026#34;User-Agent\u0026#34;] = \u0026#34;Mozilla/5.0\u0026#34; session.get(\u0026#34;https://finance.naver.com/\u0026#34;) # acquire session cookie codes: dict[str, str] = {} for page in range(1, 25): # iterate 24 pages resp = session.get( \u0026#34;https://finance.naver.com/sise/entryJongmok.naver\u0026#34;, params={\u0026#34;indCode\u0026#34;: \u0026#34;KPI200\u0026#34;, \u0026#34;page\u0026#34;: str(page)}, ) pairs = re.findall( r\u0026#34;item/main\\.naver\\?code=(\\d{6})[^\u0026gt;]*\u0026gt;([^\u0026lt;]+)\u0026#34;, resp.text, ) if not pairs: break for code, name in pairs: codes[code] = name.strip() return codes # returns exactly 199 constituents Key points:\nsession.get(\u0026quot;https://finance.naver.com/\u0026quot;) must run first to acquire the session cookie indCode=KPI200 in entryJongmok.naver is the KOSPI200 filter Iterating 24 pages retrieves all 199 constituents Results are upserted into SQLite for a same-day cache with automatic next-day refresh Market Scanner Pipeline The final pipeline runs in four stages:\nStage Action Output 1 KOSPI200 × (Volume TOP50 + Change TOP50) intersection ~5 candidates 2 Collect daily candles + calculate technical indicators enriched data 3 4 Expert parallel Claude analysis each returns bullish/bearish/neutral 4 Chief Analyst discussion → final signal BUY/SELL/HOLD 10 commits in a day, 2,689 lines added — the entire architecture migrated from a single Claude call to an Expert Team discussion system.\nQuick Links sharebook-kr/pykrx — KRX stock data scraping library (not adopted due to session issues) NAVER Finance KOSPI200 — the final data source Insights The biggest lesson here is that financial data API reliability can only be verified by actually running it, not by reading the docs. KIS API had endpoints documented but missing from the SDK; pykrx had session management bugs that made it unsuitable for production.\nThe Expert Agent Team pattern is applicable to any AI system that needs to make decisions — not just stock analysis. The key is the Chief Analyst\u0026rsquo;s prompt design: evaluating the reasoning behind minority opinions, not just counting votes. Three bullish vs. one bearish can still result in HOLD if the bearish view is backed by ATR-based volatility data.\nPure Python technical indicator implementation fully eliminates the TA-Lib installation headache (C library dependency) while maintaining algorithmic accuracy like Wilder\u0026rsquo;s Smoothing. A valuable approach for projects with deployment environment constraints.\n","date":"2026-03-05T00:00:00+09:00","image":"/images/posts/2026-03-05-trading-agent-expert-team/cover-en.jpg","permalink":"/posts/2026-03-05-trading-agent-expert-team/","title":"Building a Stock Trading Agent #2 — Expert Agent Team and KOSPI200 Data Struggles"},{"content":"Overview Antigravity, Google\u0026rsquo;s agentic IDE built as a VS Code fork, has arrived. It\u0026rsquo;s emerging as the third major player in the AI IDE market, after Cursor and Windsurf. This post synthesizes YouTube demos, real-world developer reviews, Reddit community reactions, and the URL scheme compatibility issues it introduces.\ngraph TD A[VS Code original] --\u003e B[Cursor] A --\u003e C[Windsurf] A --\u003e D[Antigravity] B --\u003e E[\"'Cursor Tab' autocomplete focus\"] C --\u003e F[Codeium-based AI Flow] D --\u003e G[Agent control panel + large context] style D fill:#4285f4,color:#fffFirst Impressions — More Agent Control Panel Than IDE YouTube demo footage makes Antigravity\u0026rsquo;s key differentiator clear: it feels less like an IDE and more like an agent control panel.\nAccording to real-world usage notes from developer Jimmy Song:\nInterface structure: Splits into an agent management view and an editor view — feels like AgentHQ and VS Code merged into one Agent execution speed: Higher task completion rate per code modification compared to typical chat-based assistants Context window: Wide editor and context panels make it well-suited for analyzing long diffs and logs Extension marketplace: Defaults to OpenVSX Gallery, which doesn\u0026rsquo;t match the VS Code official Marketplace Using It Like VS Code — A Migration Guide The practical migration steps Jimmy Song shared apply directly to VS Code users making the switch.\nStep 1: Replace the Extension Marketplace In Settings → Antigravity Settings → Editor, replace the two URLs with the official VS Code ones:\nMarketplace Item URL: https://marketplace.visualstudio.com/items Marketplace Gallery URL: https://marketplace.visualstudio.com/_apis/public/gallery This single change gives you access to the entire VS Code extension ecosystem.\nStep 2: Installing External Extensions AMP: Supports free mode, strong for documentation and script execution. In Antigravity, only API key login is possible (no OAuth). CodeX: Direct VSIX download isn\u0026rsquo;t possible → install in VS Code first, export as .vsix → install in Antigravity via Install from VSIX. Step 3: Fixing TUN Mode Proxy Issues If you use a VPN or TUN mode, Antigravity\u0026rsquo;s Chrome DevTools Protocol debugging breaks. Fix it by adding localhost and 127.0.0.1 to Settings → HTTP: No Proxy.\nCommunity Reaction — Reddit\u0026rsquo;s Honest Assessment The title of the Antigravity review thread on Reddit r/ChatGPTCoding says it all: \u0026ldquo;I tried Google\u0026rsquo;s new Antigravity IDE so you don\u0026rsquo;t have to\u0026rdquo;\nThe community\u0026rsquo;s core criticisms:\nStability: \u0026ldquo;Agent terminated due to error\u0026rdquo; errors are frequent, requiring manual retries Model ecosystem: No native integration with external models from OpenAI, Anthropic, or xAI Customization: Cannot create custom prompts or agents like Copilot Chat — only rules settings available Pricing: No free model tier (estimated $20+/month), in contrast to GitHub Copilot\u0026rsquo;s free tier The URL Scheme War — vscode:// vs cursor:// vs antigravity:// VS Code forks create an interesting problem: which editor at the OS level handles a vscode:// URL click?\nFrom a discussion in the Cursor forum:\n\u0026ldquo;VS Code registers the vscode:// URI scheme to open files, trigger specific actions, etc. Does Cursor have its own unique scheme?\u0026rdquo;\nA practical solution using duti, a macOS tool, was shared for remapping URL schemes:\n# Find Cursor\u0026#39;s bundle ID osascript -e \u0026#39;id of application \u0026#34;Cursor\u0026#34;\u0026#39; # Remap vscode:// → Cursor duti -s com.todesktop.230313mzl4w4u92 vscode # Test it open \u0026#34;vscode://file/somefile.text:123\u0026#34; Antigravity\u0026rsquo;s arrival makes this problem more complex — three IDEs can now all claim vscode://. Handling custom URIs through VS Code API\u0026rsquo;s UriHandler interface has become an essential consideration for extension developers.\ngraph LR A[\"'vscode://' URL clicked\"] --\u003e B{OS URL Router} B --\u003e|default| C[VS Code] B --\u003e|duti remapped| D[Cursor] B --\u003e|?| E[Antigravity] F[Extension developer] --\u003e|implement UriHandler| G[handle scheme conflicts]Quick Links Google Antigravity YouTube Demo — 9-minute hands-on demo Using Antigravity Like VS Code (Jimmy Song) — practical migration guide (Chinese) URL Scheme Remapping with duti — macOS-only solution Cursor Forum: URL scheme discussion — community thread VS Code UriHandler API — reference for extension developers Insights The AI IDE war has evolved beyond \u0026ldquo;which AI writes better code\u0026rdquo; into a platform lock-in battle. The VS Code fork strategy lets each IDE borrow the existing extension ecosystem, but unexpected friction emerges — URL scheme conflicts, authentication compatibility, and marketplace policy. Antigravity\u0026rsquo;s agent control panel approach is a philosophical inversion of the usual formula: instead of \u0026ldquo;attach AI to a code editor,\u0026rdquo; it says \u0026ldquo;attach an editor to an AI agent environment.\u0026rdquo; This philosophical difference may ultimately determine the winner. For now, stability issues and model ecosystem limitations make production adoption difficult. The duti URL scheme remapping tip is immediately actionable, and extension developers should seriously consider multi-IDE compatibility via UriHandler going forward.\n","date":"2026-03-05T00:00:00+09:00","image":"/images/posts/2026-03-05-google-antigravity-ide/cover-en.jpg","permalink":"/posts/2026-03-05-google-antigravity-ide/","title":"Google Antigravity IDE — The New Contender in the AI IDE War"},{"content":"Overview When you first start using Claude Code, you type commands like you\u0026rsquo;re chatting. But spend a little time with it and you start to sense something more is going on. And there is — Claude Code isn\u0026rsquo;t just an AI chat window. It\u0026rsquo;s an agent framework built on three core layers: Skills, Subagents, and Commands. Without understanding these three concepts, you\u0026rsquo;re only using Claude Code at half capacity.\ngraph TD U[User Request] --\u003e C[\"Commands \u0026lt;br/\u0026gt;Slash command entry point\"] C --\u003e S[\"Skills \u0026lt;br/\u0026gt;Reusable workflow definitions\"] S --\u003e A[\"Subagents \u0026lt;br/\u0026gt;Independently executing agents\"] A --\u003e R[Results returned] S --\u003e RSkills — Handing the AI a Playbook What Skills Are Skills are reusable workflow definitions you inject into Claude Code. Each Skill is a single Markdown (.md) file that describes, in plain language, how Claude should behave in a given situation and in what order it should work.\nThe difference from regular prompts matters. A prompt must be rewritten every time. A Skill, once installed, auto-triggers when the right conditions are met. When you say \u0026ldquo;add a feature\u0026rdquo; and the AI automatically walks through brainstorming → planning → implementation → review on its own — that\u0026rsquo;s a Skill at work.\ngraph LR Normal[\"Regular Prompt \u0026lt;br/\u0026gt;Rewritten every time\"] --\u003e|repetitive work| Waste[\"Lost context \u0026lt;br/\u0026gt;Inconsistency\"] Skill[\"Skill File \u0026lt;br/\u0026gt;Defined once\"] --\u003e|auto-triggers| Consistent[\"Consistent workflow \u0026lt;br/\u0026gt;Reusable\"]Skill File Structure .claude/ └── skills/ └── my-skill/ └── SKILL.md A SKILL.md file contains a description (when this Skill should activate) and instructions (the procedure to follow). Example:\n--- name: code-review description: Automatically runs on PR code review requests --- ## Review Procedure 1. Check the list of changed files 2. Check for security vulnerabilities 3. Analyze performance issues 4. Write improvement suggestions The Skills Marketplace You can write Skills yourself, but a mature ecosystem of pre-built Skills already exists. The most prominent is obra/superpowers (⭐69k). Install it and the full engineering workflow — brainstorming, planning, TDD implementation, code review — runs automatically.\n# Add marketplace and install in Claude Code /plugin marketplace add obra/superpowers-marketplace /plugin install superpowers@superpowers-marketplace Subagents — AI Delegating to AI The Core Idea A Subagent is a structure where the main Claude Code session spawns a separate Claude instance and delegates a specific task to it. Think of a senior developer saying \u0026ldquo;you own this module\u0026rdquo; and handing off work to a teammate.\nThis means more than just splitting tasks. A Subagent has a completely independent context window, free from the main session\u0026rsquo;s accumulated context, prior failures, and tangled history. This dramatically reduces the likelihood of hallucinations.\ngraph TD Main[\"Main Agent \u0026lt;br/\u0026gt;Orchestrator\"] --\u003e|write crypto module| Sub1[\"Subagent 1 \u0026lt;br/\u0026gt;Clean context\"] Main --\u003e|write validation logic| Sub2[\"Subagent 2 \u0026lt;br/\u0026gt;Clean context\"] Main --\u003e|write tests| Sub3[\"Subagent 3 \u0026lt;br/\u0026gt;Clean context\"] Sub1 --\u003e|result only| Main Sub2 --\u003e|result only| Main Sub3 --\u003e|result only| MainHow to Create Subagents Use the Task tool inside Claude Code to spawn a Subagent. Specify it in a Skill file like this:\n## Subagent Execution Assign each module to an independent Subagent: - Auth module: run as separate agent via Task tool - DB layer: run as separate agent via Task tool Each Subagent reports results back to main only. Recommended Subagent Patterns Pattern Description Benefit Parallel module implementation Develop independent files/modules simultaneously 2–3x faster development Specialized review Different agents for security, performance, and style Thorough, unbiased review Context reset Re-examine complex bugs with fresh eyes Overcomes confirmation bias Long task isolation Experimental work without polluting the main session Safe exploration Subagent vs. Agent Teams: Subagents are one-directional — they only return results. Agent Teams (experimental feature) allows two-directional collaboration, where teammates message each other directly. Agent Teams is substantially more complex and expensive.\nCommands — Creating Entry Points with Slash Commands What Commands Are Commands are slash commands that users invoke directly in the format /command-name. Internally, they trigger a specific Skill or encapsulate a complex prompt into a single callable command.\n.claude/ └── commands/ └── review.md # defines the /review command └── deploy.md # defines the /deploy command Command File Structure # /review — Run PR Code Review ## What This Does 1. Analyze changes on the current branch 2. Review in order: security → performance → style 3. Compile improvement suggestions as Markdown Use $ARGUMENTS to accept additional options Built-in vs. Custom Commands Claude Code ships with built-in commands like /help, /clear, and /compact. Beyond those, any .md file you place in .claude/commands/ becomes a custom command. Installing a plugin like Superpowers adds commands like /brainstorm, /write-plan, and /execute-plan.\ngraph LR User[\"/review typed\"] --\u003e Cmd[\"Commands layer \u0026lt;br/\u0026gt;parse command\"] Cmd --\u003e Skill[\"Skills layer \u0026lt;br/\u0026gt;execute workflow\"] Skill --\u003e Sub[\"Spawn Subagents \u0026lt;br/\u0026gt;parallel execution\"] Sub --\u003e Out[Aggregate results]How the Three Layers Relate graph TD Commands[\"Commands \u0026lt;br/\u0026gt;'Entry point' \u0026lt;br/\u0026gt;User invokes\"] Skills[\"Skills \u0026lt;br/\u0026gt;'Workflow' \u0026lt;br/\u0026gt;Procedure AI follows\"] Subagents[\"Subagents \u0026lt;br/\u0026gt;'Executors' \u0026lt;br/\u0026gt;Independent instances\"] Commands --\u003e|trigger Skills| Skills Skills --\u003e|spawn Subagents| Subagents Skills --\u003e|or execute directly| Result[Result] Subagents --\u003e|return results| ResultThe three layers connect like this:\nCommands: The user-facing entry point. When you type /review, the Commands layer determines which Skill to run. Skills: The AI\u0026rsquo;s operating manual. Defines what order to work in and what principles to follow. Subagents: The actual execution units. Independent agents spawned when a Skill needs to delegate complex work. Quick Links obra/superpowers GitHub — ⭐69k, the definitive Claude Code Skills collection Claude Code Official Skills Docs — Skill file format reference Claude Code 3 Core Concepts Video — 25-minute hands-on tutorial Insights Skills, Subagents, and Commands aren\u0026rsquo;t just a feature list — they\u0026rsquo;re the architecture that elevates Claude Code from a tool into a system. The difference between repeatedly typing \u0026ldquo;do this for me\u0026rdquo; and defining a Skill once for it to run automatically is a difference in a different class of development productivity. The Subagent\u0026rsquo;s \u0026ldquo;clean context\u0026rdquo; concept is an elegant structural solution to the hallucination problem. An agent that always starts fresh on a task can\u0026rsquo;t get trapped by prior failures. Commands are the UX layer that gives this complex system a simple entry point — the fact that you can trigger an entire pipeline with a single word like /deploy is itself a statement about the system\u0026rsquo;s maturity.\n","date":"2026-03-04T00:00:00+09:00","image":"/images/posts/2026-03-04-claude-code-skills-subagents-commands/cover-en.jpg","permalink":"/posts/2026-03-04-claude-code-skills-subagents-commands/","title":"Claude Code's Three Core Concepts — Skills, Subagents, and Commands"},{"content":"Overview On February 26, 2026, Google rewrote the history of image generation models. Nano Banana 2 (gemini-3.1-flash-image-preview) — a new standard that combines Pro-level intelligence with Flash-class speed. If the original Nano Banana was a viral sensation and Nano Banana Pro delivered studio-grade quality, Nano Banana 2 distills the best of both and opens it to everyone.\ngraph LR NB1[\"Nano Banana \u0026lt;br/\u0026gt;Aug 2025 \u0026lt;br/\u0026gt;'viral sensation'\"] --\u003e NBP[\"Nano Banana Pro \u0026lt;br/\u0026gt;Nov 2025 \u0026lt;br/\u0026gt;'studio quality'\"] NBP --\u003e NB2[\"Nano Banana 2 \u0026lt;br/\u0026gt;Feb 26, 2026 \u0026lt;br/\u0026gt;'Pro quality + Flash speed'\"] NB1 --\u003e NB2 style NB2 fill:#4285F4,color:#fffWhat Nano Banana 2 Changes Pro Features, Now for Everyone Capabilities previously exclusive to Nano Banana Pro are now available to all users in Nano Banana 2:\nReal-world knowledge-grounded generation — Using Gemini\u0026rsquo;s live web search, it accurately renders specific people, places, and products. Infographics, diagrams, and data visualizations are noticeably more precise.\nPrecise text rendering — Generates sharp, accurate text inside images. Supports marketing mockups, greeting cards, multilingual translation, and localization.\nNew Core Capabilities Subject consistency — Maintains consistent appearance for up to 5 characters and 14 objects within a single workflow. Enables storyboarding and sequential image series.\nPrecise instruction following — Captures the specific nuances of complex prompts. \u0026ldquo;Getting the image you wanted\u0026rdquo; is far more consistent than before.\nProduction-ready specs — Resolutions from 512px to 4K, with support for extreme aspect ratios including 4:1, 1:4, 8:1, and 1:8. Covers everything from vertical social posts to widescreen backgrounds.\ngraph TD NB2[Nano Banana 2] --\u003e WK[\"Real-world knowledge \u0026lt;br/\u0026gt;web search integration\"] NB2 --\u003e TR[\"Text rendering \u0026lt;br/\u0026gt;multilingual support\"] NB2 --\u003e SC[\"Subject consistency \u0026lt;br/\u0026gt;up to 5 people + 14 objects\"] NB2 --\u003e IF[Precise instruction following] NB2 --\u003e PS[\"Production specs \u0026lt;br/\u0026gt;512px to 4K\"] NB2 --\u003e VF[\"Visual fidelity \u0026lt;br/\u0026gt;vivid lighting and textures\"]Three API Access Methods Prerequisite: A Paid API Key Is Required This is where many developers get stuck initially. Image generation is not available on the free tier. If you see this error, you don\u0026rsquo;t have a paid key:\nQuota exceeded for metric: generativelanguage.googleapis.com/ generate_content_free_tier_input_token_count, limit: 0 Method 1: Google AI Studio (No-Code Testing) Go to AI Studio Select gemini-3.1-flash-image-preview from the model dropdown Enter a prompt and run Ideal for experimenting with prompts before writing production code.\nMethod 2: Direct Gemini API Call Python:\nimport google.generativeai as genai import base64 genai.configure(api_key=\u0026#34;YOUR_PAID_API_KEY\u0026#34;) model = genai.GenerativeModel(\u0026#34;gemini-3.1-flash-image-preview\u0026#34;) response = model.generate_content( \u0026#34;A photorealistic golden retriever puppy in a sunlit meadow, \u0026#34; \u0026#34;soft bokeh background, warm afternoon light\u0026#34;, generation_config=genai.GenerationConfig( response_modalities=[\u0026#34;image\u0026#34;, \u0026#34;text\u0026#34;], ), ) for part in response.parts: if part.inline_data: image_data = base64.b64decode(part.inline_data.data) with open(\u0026#34;output.png\u0026#34;, \u0026#34;wb\u0026#34;) as f: f.write(image_data) Node.js:\nconst { GoogleGenerativeAI } = require(\u0026#34;@google/generative-ai\u0026#34;); const fs = require(\u0026#34;fs\u0026#34;); const genAI = new GoogleGenerativeAI(\u0026#34;YOUR_PAID_API_KEY\u0026#34;); async function generateImage() { const model = genAI.getGenerativeModel({ model: \u0026#34;gemini-3.1-flash-image-preview\u0026#34;, }); const result = await model.generateContent({ contents: [{ role: \u0026#34;user\u0026#34;, parts: [{ text: \u0026#34;a photorealistic cat\u0026#34; }] }], generationConfig: { responseModalities: [\u0026#34;image\u0026#34;, \u0026#34;text\u0026#34;] }, }); const imageData = result.response.candidates[0].content.parts[0].inlineData; fs.writeFileSync(\u0026#34;output.png\u0026#34;, Buffer.from(imageData.data, \u0026#34;base64\u0026#34;)); } generateImage(); Method 3: OpenAI-Compatible Gateway For projects already using the OpenAI SDK, a gateway lets you switch with minimal code changes:\nfrom openai import OpenAI client = OpenAI( api_key=\u0026#34;YOUR_GATEWAY_KEY\u0026#34;, base_url=\u0026#34;https://gateway.example.com/v1\u0026#34;, ) response = client.images.generate( model=\u0026#34;gemini-3.1-flash-image-preview\u0026#34;, prompt=\u0026#34;A minimalist workspace with a MacBook and plant\u0026#34;, n=1, ) Pricing Resolution Google Official Third-Party Gateway 2K image $0.101/image ~$0.081/image (~20% cheaper) 4K image $0.150/image ~$0.120/image If you\u0026rsquo;re generating at production volumes, gateway options offer meaningful cost savings.\nNano Banana 2 vs. Nano Banana Pro Nano Banana 2 Nano Banana Pro Model ID gemini-3.1-flash-image-preview gemini-3-pro-image-preview Speed Flash (fast) Pro (slower) Quality High (near Pro) Maximum quality Best for Rapid iteration, high-volume generation Professional work requiring maximum fidelity Default in Gemini app Yes (current default) Selectable via three-dot menu Launch Platforms Nano Banana 2 launched simultaneously across Google\u0026rsquo;s entire ecosystem:\nGemini app: Default model in Fast, Thinking, and Pro modes Google Search: AI Mode, Lens, mobile/desktop browser (141 countries) AI Studio + Gemini API: Available in preview Google Cloud (Vertex AI): Preview Flow: Default image generation model (no credit consumption) Google Ads: Integrated into campaign creation suggestions Prompt Engineering Tips Be specific — \u0026ldquo;golden retriever puppy in a sunlit meadow, soft bokeh, warm afternoon light\u0026rdquo; far outperforms just \u0026ldquo;puppy.\u0026rdquo;\nUse style keywords — Combining terms like photorealistic, cinematic lighting, studio quality, minimalist, watercolor steers the aesthetic direction.\nSet thinking level — For complex compositions, specifying Thinking: High or Thinking: Dynamic produces more refined results.\nMulti-turn editing — Don\u0026rsquo;t expect perfection in a single request. Iterative refinements like \u0026ldquo;make the background darker\u0026rdquo; or \u0026ldquo;change the character\u0026rsquo;s outfit to blue\u0026rdquo; are the path to the best final result.\nProvenance Technology: SynthID + C2PA Two technologies mark AI-generated content:\nSynthID: Embeds an invisible watermark into the image. Machine-verifiable proof of AI generation. C2PA Content Credentials: Includes generation metadata in the image file. Enables provenance tracking. This is Google\u0026rsquo;s technical response to questions about trust in generative AI media.\nQuick Links Nano Banana 2 Official Announcement (blog.google) — full feature details and prompt examples Nano Banana 2 API Tutorial (evolink.ai) — Python/Node.js code samples and pricing guide Google AI Studio — test immediately, no code needed Gemini API Pricing — latest image generation rates Insights Nano Banana 2 represents something more fundamental than \u0026ldquo;better image generation.\u0026rdquo; By combining Pro-grade capabilities with Flash speed, it changes the economics of image generation entirely. The trade-off that previously forced you to choose between quality and speed disappears. Subject consistency (up to 5 characters + 14 objects) and real-world knowledge integration directly target production workflows in marketing, content creation, and game asset pipelines. Knowledge-grounded image generation points toward a future where AI doesn\u0026rsquo;t just generate patterns but understands and visualizes the world. The built-in SynthID and C2PA provenance technology is also notable — baking in verifiable attribution from day one signals how seriously Google expects this technology to be used in production environments.\n","date":"2026-03-04T00:00:00+09:00","image":"/images/posts/2026-03-04-nano-banana-2/cover-en.jpg","permalink":"/posts/2026-03-04-nano-banana-2/","title":"Nano Banana 2 Deep Dive — Google's Latest Image Generation Model"},{"content":"Overview Claude Code is powerful. And yet the output can feel unsatisfying. Code that \u0026ldquo;technically works\u0026rdquo; but has no tests, shaky structure, and the AI can\u0026rsquo;t remember what you built yesterday. Superpowers is a Skills framework that solves this problem structurally. With ⭐69k on GitHub, it\u0026rsquo;s the single most popular installable plugin for Claude Code.\nThis isn\u0026rsquo;t a collection of clever prompts. It\u0026rsquo;s a system that forces engineering discipline — think first, design, test, then implement — onto AI behavior.\ngraph TD Request[\"User Request \u0026lt;br/\u0026gt;'Build this for me'\"] --\u003e Brain[\"brainstorming \u0026lt;br/\u0026gt;clarify requirements\"] Brain --\u003e Plan[\"writing-plans \u0026lt;br/\u0026gt;create implementation plan\"] Plan --\u003e Exec[\"subagent-driven-development \u0026lt;br/\u0026gt;parallel implementation\"] Exec --\u003e Review[\"requesting-code-review \u0026lt;br/\u0026gt;quality verification\"] Review --\u003e Done[Finished codebase] style Brain fill:#4A90D9,color:#fff style Plan fill:#4A90D9,color:#fff style Exec fill:#4A90D9,color:#fff style Review fill:#4A90D9,color:#fffWhat Is Superpowers Superpowers is an open-source Skills framework created by Jesse Vincent (@obra). It supports Claude Code, Cursor, Codex, and OpenCode.\nThe core idea is simple: when the AI receives a coding request, stop it from writing code immediately. Force it through brainstorming → planning → TDD implementation → review in that order. It\u0026rsquo;s teaching AI the old software engineering truth: the more you\u0026rsquo;re in a hurry, the more you should slow down.\nInstallation # Register the marketplace in Claude Code /plugin marketplace add obra/superpowers-marketplace # Install the plugin /plugin install superpowers@superpowers-marketplace Start a new session after installation. If /sup shows options like brainstorm, write-plan, and execute-plan, the install succeeded.\nThe 7 Core Skills Superpowers covers the entire software development lifecycle with 7 core Skills.\ngraph LR S1[brainstorming] --\u003e S2[writing-plans] S2 --\u003e S3[using-git-worktrees] S3 --\u003e S4[subagent-driven-development] S4 --\u003e S5[requesting-code-review] S5 --\u003e S6[receiving-code-review] S6 --\u003e S7[finishing-a-development-branch]1. brainstorming — The Art of Stopping Before You Code When it receives a request, the AI doesn\u0026rsquo;t write code — it asks questions. \u0026ldquo;What are the use scenarios?\u0026rdquo;, \u0026ldquo;What\u0026rsquo;s the deployment environment?\u0026rdquo;, \u0026ldquo;What are the performance requirements?\u0026rdquo; Like a veteran architect briefly pausing a junior developer\u0026rsquo;s coding sprint.\nAt the end of this process, the AI produces a requirements document. Once the user approves it, the workflow advances.\nPsychological background: The Superpowers creator studied psychology. The framework applies the cognitive psychology principle that \u0026ldquo;declaring goals first changes behavior\u0026rdquo; to AI workflows.\n2. writing-plans — Plans a Junior Developer Can Follow After brainstorming, an implementation plan is drafted. The bar for this plan is intentionally specific: \u0026ldquo;Clear enough that an enthusiastic junior developer with no judgment and no context — who hates writing tests — can still follow it.\u0026rdquo;\nThe plan decomposes into atomic tasks. Each task can be executed independently and its completion is unambiguous.\n├── Task 1: Create validators/ module structure (files only) ├── Task 2: Email format validation logic + tests ├── Task 3: DNS MX record validation logic + tests └── Task 4: Integrate middleware layer 3. using-git-worktrees — Isolated Work Environments Each development task runs in a Git Worktree — an independent copy of the filesystem that doesn\u0026rsquo;t touch the main branch. If an experiment fails, the main codebase is safe.\n# Worktree creation Superpowers runs automatically git worktree add .claude/worktrees/feature-auth feature/auth 4. subagent-driven-development — Parallel Development with an AI Team Each task from the plan is assigned to an independent Subagent. Each Subagent:\nStarts with a clean context (no memory of prior failures) Focuses on exactly one task Reports only the result back to main graph TD Lead[\"Main Agent \u0026lt;br/\u0026gt;PM role\"] --\u003e|Task 1| S1[\"Subagent 1 \u0026lt;br/\u0026gt;email validation\"] Lead --\u003e|Task 2| S2[\"Subagent 2 \u0026lt;br/\u0026gt;DNS validation\"] Lead --\u003e|Task 3| S3[\"Subagent 3 \u0026lt;br/\u0026gt;middleware integration\"] S1 --\u003e|result + tests| Lead S2 --\u003e|result + tests| Lead S3 --\u003e|result + tests| Lead Lead --\u003e Merge[Integrate and verify] \u0026ldquo;Whoever thought of this architecture is a genius.\u0026rdquo; — developer blog after hands-on experience\n5. requesting-code-review / 6. receiving-code-review — Verification Before Completion Once implementation is done, a code review is automatically requested. The receiving-code-review Skill prevents the AI from blindly agreeing with all feedback — it validates technical soundness before accepting any suggestion.\n7. finishing-a-development-branch — Safe Merge After development, the Skill presents a merge strategy. It guides you systematically through PR creation, branch cleanup, release notes, and other wrap-up steps.\nLive Demo: Building an Email Validation Service Here\u0026rsquo;s the actual flow when you type the following into Claude Code with Superpowers installed:\nBuild an enterprise-grade email validation service in Python. Support RFC standards (including sub-addressing), IDN, and DNS MX record checking. Step 1: brainstorm auto-activates\nInstead of code, the AI asks:\n\u0026ldquo;Is this single-email validation or batch processing?\u0026rdquo; \u0026ldquo;What level of DNS validation? (basic/deep)\u0026rdquo; \u0026ldquo;Do you need a caching strategy?\u0026rdquo; Step 2: Project structure proposed\nemail_validator/ ├── validators/ # validation logic ├── middleware/ # rate limiting ├── cache/ # result caching └── tests/ # fail-first tests Step 3: TDD implementation\nFailing tests are written first, then code is written to make them pass. This cuts off at the root the \u0026ldquo;it runs but has no tests\u0026rdquo; spaghetti code that AI typically produces.\nEngineering Principles Superpowers Enforces Principle Meaning Superpowers Implementation TDD Tests first, implementation second Explicit in subagent-driven-development Skill YAGNI You Aren\u0026rsquo;t Gonna Need It — build only what\u0026rsquo;s needed now Scope limiting in writing-plans DRY Don\u0026rsquo;t Repeat Yourself Duplicate detection in the review stage Clean context Fresh start uncorrupted by prior failures Guaranteed by Subagent architecture Comparison with mega-code wisdomgraph/mega-code (⭐15), which emerged around the same time, is worth noting. Where Superpowers focuses on \u0026ldquo;enforcing engineering workflow,\u0026rdquo; mega-code focuses on \u0026ldquo;accumulating knowledge across sessions.\u0026rdquo;\ngraph LR SP[\"Superpowers \u0026lt;br/\u0026gt;workflow discipline\"] --\u003e|install| Claude[Claude Code] MC[\"mega-code \u0026lt;br/\u0026gt;knowledge evolution\"] --\u003e|install| Claude SP -.-\u003e|complements| MC MC -.-\u003e|complements| SP Superpowers: Improves the quality of each session\u0026rsquo;s development. Skills auto-trigger. mega-code: Remembers mistakes across sessions and improves incrementally. BYOK (bring your own API key) model. Using both together lets you capture both per-session quality and cross-session learning.\nQuick Links obra/superpowers GitHub — ⭐69k, source code and installation docs Claude Code × Superpowers hands-on (velog) — live email validation service demo Superpowers Complete Guide (YouTube) — 30-minute live demo of all 7 Skills wisdomgraph/mega-code GitHub — self-evolving AI coding infrastructure Insights The central insight Superpowers reveals is this: the problem with AI isn\u0026rsquo;t a lack of capability — it\u0026rsquo;s a lack of discipline. Claude Code is already more than smart enough. The problem is its instinct to start generating code the moment it receives \u0026ldquo;build this for me.\u0026rdquo; Just as a veteran developer responds to a new requirement by asking questions, designing, and sketching test scenarios before touching the keyboard, Superpowers forces AI to do the same. The subagent-driven-development pattern\u0026rsquo;s design — where each Subagent starts with clean context — is a structural solution to the hallucination problem. Subagent isolation prevents prior failures in a long conversation from contaminating future responses. Sixty-nine thousand stars say this approach has been validated by a lot of developers.\n","date":"2026-03-04T00:00:00+09:00","image":"/images/posts/2026-03-04-claude-code-superpowers/cover-en.jpg","permalink":"/posts/2026-03-04-claude-code-superpowers/","title":"Superpowers: Injecting Engineering Discipline into Claude Code"},{"content":"Overview Claude Code\u0026rsquo;s Agent Teams is an experimental feature that groups multiple Claude Code instances into a single team for parallel work. Where a traditional Subagent simply returns results to the main session, Agent Teams members can message each other directly and autonomously coordinate through a shared task list. This post covers the Agent Teams architecture, how it differs from Subagents, and practical usage patterns.\ngraph TD Lead[Team Lead] --\u003e|spawns| T1[Teammate 1] Lead --\u003e|spawns| T2[Teammate 2] Lead --\u003e|spawns| T3[Teammate 3] T1 --- T2 T2 --- T3 T1 --- T3 T1 --\u003e TL[Shared Task List] T2 --\u003e TL T3 --\u003e TL T1 --\u003e MB[Mailbox] T2 --\u003e MB T3 --\u003e MBAgent Teams vs. Subagents — Key Differences Both Agent Teams and Subagents parallelize work, but their operating models are fundamentally different.\nSubagents are lightweight helpers that run inside the main session. They perform a task, report the result back, and that\u0026rsquo;s it. Subagents cannot talk to each other or share discoveries mid-task — the main agent is the sole coordinator.\nAgent Teams consists of fully independent Claude Code instances. Each teammate has its own context window and autonomously claims tasks from a shared task list. The key feature is direct peer-to-peer communication — teammates can message each other or broadcast to the whole team.\nSubagent Agent Teams Context Independent context, returns results only Independent context, fully autonomous Communication Reports to main agent only Direct messaging between teammates Coordination Main agent manages everything Shared task list + autonomous coordination Best for Simple tasks where only the result matters Complex tasks requiring discussion and collaboration Token cost Low (only summarized results returned) High (each teammate is a separate instance) graph LR MA[Main Agent] --\u003e|directs| SA1[Subagent 1] MA --\u003e|directs| SA2[Subagent 2] SA1 --\u003e|result| MA SA2 --\u003e|result| MAThe Agent Teams model changes this structure:\ngraph LR TL[Team Lead] --\u003e|coordinates| AT1[Teammate 1] TL --\u003e|coordinates| AT2[Teammate 2] AT1 --- AT2 AT1 --\u003e TaskList[Task List] AT2 --\u003e TaskListSetup and Activation Agent Teams is disabled by default. Enable it by setting an environment variable in settings.json:\n{ \u0026#34;env\u0026#34;: { \u0026#34;CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS\u0026#34;: \u0026#34;1\u0026#34; } } Once enabled, request a team in natural language:\nCreate an agent team with 3 teammates — one focused on UX, one on technical architecture, and one as a devil\u0026#39;s advocate. Display Modes In-process: All teammates run in the main terminal. Switch between them with Shift+Down. No extra setup needed. Split panes: Each teammate gets its own panel in tmux or iTerm2. View all work simultaneously. Set the mode in settings.json:\n{ \u0026#34;teammateMode\u0026#34;: \u0026#34;tmux\u0026#34; } Practical Usage Patterns 1. Parallel Code Review A single reviewer naturally focuses on one type of issue at a time. Splitting review perspectives into independent domains lets you cover security, performance, and test coverage simultaneously and thoroughly:\nCreate an agent team to review PR #142. 3 reviewers: - Security vulnerability specialist - Performance impact analysis - Test coverage verification Have each review independently and report back. 2. Competing Hypothesis Debugging When the cause of a bug is unclear, a single agent tends to stop once it finds one explanation. Running Agent Teams with different hypotheses and encouraging teammates to challenge each other\u0026rsquo;s theories means the surviving hypothesis is far more likely to be the real cause:\nInvestigate why the app exits after a single message. Spawn 5 teammates, each exploring a different hypothesis, and have them debate like scientists — actively try to disprove each other. 3. Cross-Layer Feature Development For work that requires simultaneous changes across frontend, backend, and tests, assign each layer to a separate teammate. Clearly separate the file sets each teammate owns to avoid conflicts.\nCombining with Git Worktrees Agent Teams members share the same filesystem by default. Editing different files is fine, but editing the same file simultaneously causes conflicts. Combining with Git Worktrees gives each teammate an independent copy of the filesystem:\ngraph TD A[Teammate A] --\u003e|edits| FS[Shared Filesystem] B[Teammate B] --\u003e|edits| FS FS --\u003e CONFLICT[File Conflict] C[Teammate A] --\u003e|edits| WT1[Worktree A] D[Teammate B] --\u003e|edits| WT2[Worktree B] WT1 --\u003e|merge| MAIN[Main Branch] WT2 --\u003e|merge| MAINSet isolation: worktree in the agent definition to create a separate worktree for each teammate.\nCost and Operational Tips Agent Teams consumes tokens proportionally to the number of teammates. Three teammates use roughly 3–4x the tokens of a single session. Running in Plan mode can push this up to 7x.\nStrategies for maximizing value while managing cost:\nAssign Sonnet to teammates: Good balance of cost and capability. Reserve Opus for the lead. Start with 3–5 teammates: Optimal for most workflows. Aim for 5–6 tasks per teammate. Disband immediately after completion: Idle teammates still consume tokens. Use Clean up the team when done. Include sufficient context in spawn prompts: Teammates do not inherit the lead\u0026rsquo;s conversation history, so include all necessary context in their prompts. Quick Links Claude Code Agent Teams Official Docs — setup, commands, and limitations Claude Code Agent Teams Complete Guide (claudefa.st) — comprehensive 2026 guide Worktree + Agent Teams Guide — filesystem isolation strategies Insights Agent Teams adds a new dimension beyond simple parallel execution: communication and autonomous coordination between agents. If Subagents represent a hierarchical \u0026ldquo;assign work, receive results\u0026rdquo; model, Agent Teams is closer to a collaborative model where peers discuss and solve problems together. The competing hypothesis debugging pattern is especially effective at overcoming the confirmation bias that plagues single-agent exploration. The feature is still experimental — sessions can\u0026rsquo;t be resumed, among other limitations — but for tasks requiring parallel exploration across a complex codebase, it delivers real value. Combined with Worktrees, it enables fully parallel development with zero file conflicts, making it particularly useful for large-scale refactoring or multi-layer feature implementation.\n","date":"2026-03-03T00:00:00+09:00","image":"/images/posts/2026-03-03-claude-code-agent-teams/cover-en.jpg","permalink":"/posts/2026-03-03-claude-code-agent-teams/","title":"Claude Code Agent Teams — A New Paradigm for Multi-Agent Collaboration"},{"content":"Overview Claude Code costs the average developer around $6 per day, or $100–200 per month. But that number varies dramatically based on how you use it. Through context management, model selection, CLAUDE.md optimization, and monitoring with the Usage \u0026amp; Cost API, you can cut token consumption by 50–80%. This post breaks down Claude Code\u0026rsquo;s cost structure and the practical techniques you can apply today.\ngraph TD A[Token Cost Reduction] --\u003e B[Context Management] A --\u003e C[Model Selection] A --\u003e D[CLAUDE.md Optimization] A --\u003e E[Monitoring] B --\u003e B1[clear command] B --\u003e B2[compact command] B --\u003e B3[Auto-compaction] C --\u003e C1[Sonnet] C --\u003e C2[Opus] C --\u003e C3[Haiku] D --\u003e D1[Keep under 500 lines] D --\u003e D2[Split into Skills] E --\u003e E1[cost command] E --\u003e E2[Usage API] E --\u003e E3[Cost API]Understanding Where the Costs Come From Claude Code\u0026rsquo;s token costs scale proportionally to context size. The larger the context Claude processes, the higher the cost per message. Longer conversations, more referenced files, and more MCP servers all increase context size.\nClaude Code automatically applies two optimizations:\nPrompt Caching: Automatically reduces costs for repeated content like system prompts Auto-compaction: Automatically summarizes conversation history as you approach the context limit But these alone are not enough. Real savings require active management from the user\u0026rsquo;s side.\nStrategy 1: Aggressively Manage Context The biggest source of token waste is accumulating unnecessary context.\n/clear — Essential When Switching Tasks Always run /clear when moving to an unrelated task. Old context from previous conversations wastes tokens on every subsequent message.\n/rename auth-refactoring # name the current session /clear # reset context # start new task You can return to a named session later with /resume.\n/compact — Every 10–15 Exchanges When conversations grow long, use /compact to compress history. You can specify what to preserve:\n/compact Focus on code samples and API usage You can also customize compaction behavior in CLAUDE.md:\n# Compact instructions When you are using compact, please focus on test output and code changes /cost — Real-Time Cost Monitoring Use /cost to check token usage for the current session. For a persistent display, configure the statusline to show it continuously.\nStrategy 2: Match Models to Tasks Not every task needs Opus.\nModel Best For Cost Opus Complex architecture decisions, multi-step reasoning High Sonnet General coding tasks (most of the time) Medium Haiku File exploration, running tests, simple questions Low (~80% cheaper) Switch mid-session with /model, and set defaults in /config. Assign model: haiku to Subagents for simple tasks to save money.\ngraph LR Task[Task Type] --\u003e|complex design| Opus[Opus 4.6] Task --\u003e|general coding| Sonnet[Sonnet 4.6] Task --\u003e|exploration| Haiku[Haiku] Opus --\u003e|switch| Sonnet Sonnet --\u003e|switch| HaikuTuning Extended Thinking Extended Thinking is enabled by default with a 31,999-token budget. Thinking tokens are billed as output tokens, so they\u0026rsquo;re unnecessary cost for simple tasks:\nLower the effort level for Opus 4.6 via /model Disable thinking in /config Cap the budget with MAX_THINKING_TOKENS=8000 Strategy 3: Keep CLAUDE.md Lean CLAUDE.md is loaded into context in full at the start of every session. If it contains workflow instructions for things like PR reviews or database migrations, those tokens are charged on every turn — even when you\u0026rsquo;re working on something completely unrelated.\nSplit Into Skills Move specialized instructions from CLAUDE.md into Skills, which only load when invoked:\nCLAUDE.md (essentials only, ~500 lines max) ├── Project architecture summary ├── Core coding conventions └── Frequently used commands .claude/skills/ (loaded only when needed) ├── pr-review/ # PR review workflow ├── db-migration/ # DB migration guide └── deploy/ # Deployment process Reduce MCP Server Overhead Each MCP server adds tool definitions to your context even when idle. Check current context occupancy with /context, then:\nDisable unused MCP servers via /mcp Prefer CLI tools like gh and aws over MCP servers (zero context overhead) Lower the tool search threshold with ENABLE_TOOL_SEARCH=auto:5 Strategy 4: Optimize Your Work Patterns Use Plan Mode Enter Plan Mode with Shift+Tab to have Claude explore the codebase and suggest an approach before touching code. This avoids expensive rework when the initial direction is wrong.\nCorrect Course Early If Claude heads in the wrong direction, hit Escape immediately. Use /rewind or press Escape twice to restore a previous checkpoint.\nDelegate Heavy Tasks to Subagents For high-output tasks like running tests, fetching documentation, or processing log files, delegate to a Subagent. The verbose output stays in the Subagent\u0026rsquo;s context; only a summary comes back to the main conversation.\nStrategy 5: Use the Usage \u0026amp; Cost API for Team-Level Monitoring For tracking costs across an entire team rather than just individually, use Anthropic\u0026rsquo;s Admin API.\nUsage API — Track Token Consumption Query daily token usage broken down by model:\ncurl \u0026#34;https://api.anthropic.com/v1/organizations/usage_report/messages?\\ starting_at=2026-03-01T00:00:00Z\u0026amp;\\ ending_at=2026-03-03T00:00:00Z\u0026amp;\\ group_by[]=model\u0026amp;\\ bucket_width=1d\u0026#34; \\ --header \u0026#34;anthropic-version: 2023-06-01\u0026#34; \\ --header \u0026#34;x-api-key: $ADMIN_API_KEY\u0026#34; Key capabilities:\nTime-series aggregation at 1-minute, 1-hour, and 1-day intervals Filter by model, workspace, API key, and service tier Track uncached input, cached input, cache creation, and output tokens Data residency (inference region) and Fast Mode tracking Cost API — Track USD Spend Query costs broken down by workspace:\ncurl \u0026#34;https://api.anthropic.com/v1/organizations/cost_report?\\ starting_at=2026-03-01T00:00:00Z\u0026amp;\\ ending_at=2026-03-03T00:00:00Z\u0026amp;\\ group_by[]=workspace_id\u0026amp;\\ group_by[]=description\u0026#34; \\ --header \u0026#34;anthropic-version: 2023-06-01\u0026#34; \\ --header \u0026#34;x-api-key: $ADMIN_API_KEY\u0026#34; graph TD AdminKey[Admin API Key] --\u003e UsageAPI[Usage API] AdminKey --\u003e CostAPI[Cost API] UsageAPI --\u003e Dashboard[Monitoring Dashboard] CostAPI --\u003e Dashboard Dashboard --\u003e Alert[Budget Alerts] Dashboard --\u003e Report[Per-Team Cost Reports] Dashboard --\u003e Optimize[Cache Efficiency Analysis]Partner Solutions If you\u0026rsquo;d rather not build your own dashboard, platforms like Datadog, Grafana Cloud, and CloudZero offer ready-made integrations. For per-user cost analysis of Claude Code, the Claude Code Analytics API provides a separate endpoint.\nQuick Links Claude Code Cost Management Official Docs — official guide Usage and Cost API Docs — Admin API reference Claude Code Token Optimization (GitHub) — community tips Insights The core insight of Claude Code token optimization comes down to one simple principle: keep the context small. Separating tasks with /clear, distributing CLAUDE.md content into Skills, and choosing models appropriate to the task\u0026rsquo;s complexity — these three practices alone eliminate most token waste. At the team level, the Usage \u0026amp; Cost API makes consumption patterns visible, enabling you to measure caching efficiency and set budget alerts. The Data Residency and Fast Mode tracking features added in February 2026 are especially useful for compliance and performance monitoring in enterprise environments. Ultimately, good habits — clearing context between tasks, compacting every 10–15 exchanges, writing specific prompts instead of vague ones — are more effective than any configuration setting.\n","date":"2026-03-03T00:00:00+09:00","image":"/images/posts/2026-03-03-claude-code-token-optimization/cover-en.jpg","permalink":"/posts/2026-03-03-claude-code-token-optimization/","title":"Claude Code Token Optimization — Practical Strategies to Cut Costs by 80%"},{"content":"Overview Kintsugi is an Agentic Development Environment (ADE) being developed experimentally by SonarSource for CLI agent users. Rather than replacing an IDE, it takes a different approach: visually augmenting CLI agents like Claude Code.\ngraph LR A[CLI AgentClaude Code] --\u003e|generates code| B[Kintsugi ADE] B --\u003e C{Sonar guardrails} C --\u003e|passes| D[Review approved] C --\u003e|issues found| E[Change requested] E --\u003e AWhat Is Kintsugi? Kintsugi is a fundamentally different concept from traditional IDEs. It defines itself as an Agentic Development Environment (ADE) — instead of writing code directly, it focuses on orchestrating and reviewing code generated by AI agents. Currently it supports only Claude Code, with Gemini CLI and Codex support planned.\nThree core features:\nMulti-threaded development — Manage multiple AI sessions in parallel with a visual queue tracking each task\u0026rsquo;s status. Solves the problem of losing context when running multiple claude commands across different terminals. Plan review and change requests — Visually inspect an agent\u0026rsquo;s proposed implementation plan and redirect it before any code is written. Sonar-powered guardrails — Integrates SonarQube/SonarCloud\u0026rsquo;s static analysis engine to automatically check AI-generated code for security vulnerabilities and quality issues at every step. Privacy and System Requirements The privacy story is notable. Kintsugi is a local desktop app that never sends your source code to Sonar servers. Only anonymous usage data is collected, and that can be opted out in settings.\nSystem requirements:\nmacOS only (currently) Claude Code 2.0.57+ Git, Node.js, Java 17+ It\u0026rsquo;s in early access with an invite-based rollout, and can be linked with a SonarCloud account.\nHow It Differs from Cursor and Windsurf graph TD subgraph IDE replacement approach A[Cursor] --\u003e A1[AI built into the editor] B[Windsurf] --\u003e B1[AI built into the editor] end subgraph CLI augmentation approach C[Kintsugi] --\u003e C1[Keeps the CLI agent] C1 --\u003e C2[Visual management layer] C2 --\u003e C3[Sonar quality gate] endWhere Cursor and Windsurf embed AI inside the editor to replace the IDE itself, Kintsugi preserves the full power of the CLI agent and adds only a visual management layer on top. The differentiating factor is that \u0026ldquo;AI writes code, humans review it\u0026rdquo; workflow with SonarQube\u0026rsquo;s static analysis guardrails applied automatically.\nInsights Kintsugi\u0026rsquo;s message is clear: AI-generated code still needs quality and security guarantees. It\u0026rsquo;s an attempt to maintain the productivity of CLI agents while structurally blocking the risk of \u0026ldquo;AI code merged without review.\u0026rdquo; As the developer\u0026rsquo;s role shifts from \u0026ldquo;code writer\u0026rdquo; to \u0026ldquo;AI orchestrator,\u0026rdquo; a dedicated tool for that orchestration has emerged.\n","date":"2026-02-27T00:00:00+09:00","image":"/images/posts/2026-02-27-kintsugi-ade/cover-en.jpg","permalink":"/posts/2026-02-27-kintsugi-ade/","title":"Kintsugi — SonarSource's ADE Built for Claude Code"},{"content":"Overview KIS Developers, the Korea Investment \u0026amp; Securities developer portal, is the most aggressive Open API platform among domestic Korean brokerages. Beyond REST and WebSocket APIs, it now provides infrastructure for calling trading APIs directly from LLMs via MCP (Model Context Protocol).\ngraph TD A[KIS Open API] --\u003e B[REST API] A --\u003e C[WebSocket API] A --\u003e D[AI Tools] B --\u003e B1[Orders / Account] B --\u003e B2[Price Quotes] B --\u003e B3[Stock Analysis] C --\u003e C1[Real-time Trades] C --\u003e C2[Real-time Order Book] D --\u003e D1[Coding Assistant MCP] D --\u003e D2[Trading MCP] D --\u003e D3[GPTs Assistant]API Structure KIS Open API is available in two modes: REST and WebSocket. Domestic stocks alone are divided into orders/account, basic quotes, ELW, sector/other, stock info, price analysis, ranking analysis, and real-time quotes. Including overseas stocks, futures/options, and bonds, there are hundreds of endpoints.\nAuthentication uses an OAuth-style flow — obtain an appkey and appsecret, then generate an access token. WebSocket requires a separate connection key for real-time data. Python sample code for both REST and WebSocket is published on GitHub, enabling rapid prototyping.\nMCP Integration — Trading Directly from LLMs The most eye-catching section is AI Tools. KIS Developers officially supports MCP with two offerings:\nCoding Assistant MCP — Handles API usage questions, sample code generation, and error resolution via LLM conversation Trading MCP — Exposes trading functions like orders and price queries that can be called directly from ChatGPT or Claude A 24/7 GPTs-based 1:1 support assistant is also running. Official MCP support from a domestic brokerage is still rare, making this a compelling environment for developers building API-based automated trading systems.\nSecurity Notes Two recent security announcements from KIS are worth highlighting:\nDo not expose appkey/appsecret — Never share the issued security credentials or access token publicly or post them on the web. If an anomaly is detected, immediately revoke the service (security code). WebSocket infinite reconnection blocking — Abnormal patterns such as repeated connect-immediately-disconnect cycles or infinite subscribe/unsubscribe loops will result in temporary blocking of the IP and app key. Normal pattern: Connect → Subscribe to symbols → Receive data → Unsubscribe → Close connection\ngraph LR A[Connect] --\u003e B[Subscribe to symbols] B --\u003e C[Receive data] C --\u003e D[Unsubscribe] D --\u003e E[Close connection] style A fill:#4CAF50,color:#fff style E fill:#4CAF50,color:#fffInsights KIS Developers officially supporting MCP signals that the combination of financial APIs and LLMs is moving beyond experimentation into production. The infrastructure now exists to delegate the process of reading API docs and writing code to AI, and to integrate trading decisions into LLM pipelines. That said, security credential management and abnormal call pattern prevention remain non-negotiable — the more automated the system, the more critical proper error handling becomes.\n","date":"2026-02-27T00:00:00+09:00","image":"/images/posts/2026-02-27-kis-open-api-mcp/cover-en.jpg","permalink":"/posts/2026-02-27-kis-open-api-mcp/","title":"KIS Developers — Korea Investment \u0026 Securities Open API and MCP Trading"},{"content":"Overview Publishing a VS Code extension to the Marketplace requires the @vscode/vsce package. This post covers the entire workflow: generating an Azure DevOps PAT, creating a Publisher, packaging, and deploying.\ngraph LR A[Develop Extension] --\u003e B[Install vsce] B --\u003e C[Generate Azure DevOps PAT] C --\u003e D[Create Publisher] D --\u003e E[vsce login] E --\u003e F{Deploy Method} F --\u003e|Direct| G[vsce publish] F --\u003e|Package| H[\"vsce package → .vsix\"]Step 1: Install vsce vsce is the CLI tool responsible for packaging and publishing VS Code extensions.\nnpm install -g @vscode/vsce Three key commands:\nvsce login — authenticate with your publisher account vsce publish — publish directly to the Marketplace vsce package — bundle as a .vsix static file Step 2: Generate an Azure DevOps PAT The VS Code Marketplace authenticates through Azure DevOps.\nSign up / log in at Azure DevOps Create a Personal Access Token (PAT) Important: Grant Manage permission for VS Code Marketplace Store the token securely — you cannot retrieve it after creation Step 3: Create a Publisher and Log In Go to VS Code Marketplace → Publish extensions → Create publisher Set a publisher name and create it Log in via the CLI: vsce login \u0026lt;publisherName\u0026gt; # You\u0026#39;ll be prompted to enter your PAT Step 4: Required package.json Fields { \u0026#34;name\u0026#34;: \u0026#34;my-extension\u0026#34;, \u0026#34;displayName\u0026#34;: \u0026#34;My Extension\u0026#34;, \u0026#34;publisher\u0026#34;: \u0026#34;my-publisher\u0026#34;, \u0026#34;version\u0026#34;: \u0026#34;0.0.1\u0026#34;, \u0026#34;engines\u0026#34;: { \u0026#34;vscode\u0026#34;: \u0026#34;^1.84.0\u0026#34; } } Missing any of these fields will cause the publish step to fail.\nStep 5: Deploy # Publish directly to the Marketplace vsce publish # Or package into a .vsix file for manual upload vsce package The .vsix file produced by vsce package can be uploaded manually through the Marketplace web UI, or installed locally with code --install-extension my-extension.vsix.\nInsights Azure DevOps and the VS Code Marketplace are separate systems, which makes the initial setup confusing. The key is to follow this exact order: generate a PAT (Azure DevOps) → create a Publisher (Marketplace) → log in (vsce CLI) → deploy. Once this is configured, every subsequent release is a single vsce publish command. You can also integrate this into a CI/CD pipeline to trigger automatic deployments on tag pushes.\n","date":"2026-02-27T00:00:00+09:00","image":"/images/posts/2026-02-27-vsce-extension-deploy/cover-en.jpg","permalink":"/posts/2026-02-27-vsce-extension-deploy/","title":"Publishing a VS Code Extension — The Complete vsce Workflow"},{"content":"Overview A day spent managing the dev environment infrastructure for an AI service. The work covered ECS service updates, EC2 instance checks, ElastiCache (Valkey) monitoring, IAM access key creation, and configuring AWS CLI credentials locally.\nECS Service Management In the dev ECS cluster, I checked task status, health checks, and performed a service update for the AI service. Items reviewed in the ECS console:\nService tasks: Container status and logs for running tasks Health and metrics: Service health check results, CPU/memory metrics Service update: Rolling deployment after updating the task definition graph TD A[\"ECS Clusterdev-cluster\"] --\u003e B[\"Serviceai-service\"] B --\u003e C[\"Task Definition\"] B --\u003e D[\"Running Tasks\"] B --\u003e E[\"Health Check\"] D --\u003e F[\"Container: app\"] F --\u003e G[\"EC2 Instance\"] F --\u003e H[\"ElastiCache / Valkey\"]ECS Express Mode was also reviewed — a mode for quickly deploying simple services.\nEC2 Instances and ElastiCache Checked the status of EC2 instances running in the dev environment. On the ElastiCache side, I monitored a Valkey (Redis-compatible in-memory data store) cluster. Valkey is an open-source Redis fork that AWS officially supports as a managed in-memory cache engine.\nIAM Access Key Creation and CLI Setup Generated a new access key from the Security credentials tab of the development IAM user. Then followed the AWS CLI configuration docs and ran aws configure to set up the local environment.\nAWS CLI credential lookup order:\ngraph TD A[\"1. Command-line options--profile, --region\"] --\u003e B[\"2. Environment variablesAWS_ACCESS_KEY_ID, etc.\"] B --\u003e C[\"3. CLI credentials file~/.aws/credentials\"] C --\u003e D[\"4. CLI config file~/.aws/config\"] D --\u003e E[\"5. Container credentialsECS task role\"] E --\u003e F[\"6. EC2 instance profileIAM role\"]aws configure prompts for four values:\nAWS Access Key ID AWS Secret Access Key Default region name (e.g. ap-northeast-2) Default output format (json, yaml, text, table) The results are stored in ~/.aws/credentials (credentials) and ~/.aws/config (region, output format). To set up multiple profiles, use aws configure --profile \u0026lt;profile-name\u0026gt;.\nInsights Today\u0026rsquo;s AWS work was routine DevOps, but a few things stand out. ECS service updates done manually through the console are fine for one-offs, but for repeated tasks a CI/CD pipeline or Terraform automation is the right answer. The flow from IAM access key generation to CLI setup is something you go through every time you set up a new development environment — having a precise understanding of credential precedence makes debugging environment variable vs. file config conflicts much faster. Choosing Valkey (the Redis fork) as a managed ElastiCache engine is a practical response to the Redis license change.\n","date":"2026-02-26T00:00:00+09:00","image":"/images/posts/2026-02-26-aws-ecs-cli-setup/cover-en.jpg","permalink":"/posts/2026-02-26-aws-ecs-cli-setup/","title":"AWS ECS Service Operations and CLI Credential Setup"},{"content":"Overview Claude Code works by having an LLM \u0026ldquo;choose\u0026rdquo; which tools to call and when. But some operations shouldn\u0026rsquo;t be a choice — they must always happen: formatting after file saves, command logging, blocking modifications to production files. Claude Code Hooks is a lifecycle shell command system that addresses exactly this need.\nHook Event Types Claude Code provides 10 hook events that fire at various points in the workflow:\ngraph LR A[\"Session starts\"] --\u003e|SessionStart| B[\"User input\"] B --\u003e|UserPromptSubmit| C[\"Claude processes\"] C --\u003e|PreToolUse| D[\"Tool executes\"] D --\u003e|PostToolUse| E[\"Result returned\"] C --\u003e|PermissionRequest| F[\"Permission check\"] E --\u003e G[\"Response complete\"] G --\u003e|Stop| H[\"Session ends\"] H --\u003e|SessionEnd| I[\"Done\"] C --\u003e|Notification| J[\"Notification\"] C --\u003e|PreCompact| K[\"Compact\"] D --\u003e|SubagentStop| L[\"Subagent complete\"] Event Timing Control PreToolUse Before tool call Can block PostToolUse After tool call Provide feedback PermissionRequest Permission dialog Allow / deny UserPromptSubmit On prompt submit Pre-process Notification On notification Custom alert Stop On response complete Post-process SubagentStop On subagent complete Post-process PreCompact Before compact Pre-process SessionStart Session start / resume Initialize SessionEnd Session end Cleanup Practical Example: Bash Command Logging The most basic hook — log every shell command to a file. Attach a Bash matcher to the PreToolUse event and parse the tool input with jq:\n{ \u0026#34;hooks\u0026#34;: { \u0026#34;PreToolUse\u0026#34;: [ { \u0026#34;matcher\u0026#34;: \u0026#34;Bash\u0026#34;, \u0026#34;hooks\u0026#34;: [ { \u0026#34;command\u0026#34;: \u0026#34;jq -r \u0026#39;\\\u0026#34;\\\\(.tool_input.command) - \\\\(.tool_input.description // \\\u0026#34;No description\\\u0026#34;)\\\u0026#34;\u0026#39; \u0026gt;\u0026gt; ~/.claude/bash-command-log.txt\u0026#34; } ] } ] } } Access the configuration via the /hooks slash command, and choose whether to save it to User settings (global) or Project settings (per-project).\nUsage Patterns Auto-formatting: Run formatters from PostToolUse based on file extension. Automatically apply prettier for .ts files, gofmt for .go, black for .py — ensuring code Claude generates always follows the project\u0026rsquo;s style.\nFile protection: In PreToolUse, block writes to specific path patterns (e.g. production/, .env). Prevents the LLM from accidentally touching production configuration.\nCustom notifications: Connect Notification events to system alerts, Slack webhooks, or sound playback. Get notified in whatever way you prefer when Claude is waiting for input or when a task completes.\nCode quality feedback: Return lint results to Claude via PostToolUse and Claude will automatically incorporate the fixes. This is enforcement at the code level, not through prompt instructions.\nSecurity Considerations Hooks run automatically inside the agent loop with the credentials of the current environment. This is powerful — and dangerous. Malicious hook code could read environment variables and exfiltrate them, delete files, or execute arbitrary commands. Always review hook implementations before registering them, and include .claude/settings.json changes in code review for project-level hooks.\nInsights The core value of hooks is turning suggestions into code. You can write \u0026ldquo;always run prettier\u0026rdquo; in a prompt, but the LLM will occasionally forget. Register it as a hook and it runs 100% of the time. This is the pattern for compensating for LLM-based development tools\u0026rsquo; fundamental limitation — non-deterministic behavior — with deterministic shell commands. Master three hook points — PreToolUse for blocking, PostToolUse for post-processing, Stop for cleanup — and you can align Claude Code\u0026rsquo;s behavior precisely with your project\u0026rsquo;s requirements.\n","date":"2026-02-26T00:00:00+09:00","image":"/images/posts/2026-02-26-claude-code-hooks/cover-en.jpg","permalink":"/posts/2026-02-26-claude-code-hooks/","title":"Claude Code Hooks — Deterministic Control Over Agent Behavior"},{"content":"Overview A previous post covered Gemini 3\u0026rsquo;s model lineup, pricing, Thought Signatures, thinking_level/media_resolution parameters, image generation (Nano Banana Pro), and the Flash Preview bug. This post tackles the remaining sections of the Gemini 3 Developer Guide: Function Calling strict validation, Structured Outputs with tools, Code Execution with images, Multimodal function responses, and the OpenAI-compatible API.\nPrevious posts: Gemini 3 Image Generation API + Mermaid.js, Gemini 3 Flash Preview Infinite Loop Bug\nGemini 3.1 Pro Preview Announcement Gemini 3.1 Pro is now available in preview. It brings improvements in performance, behavior, and intelligence over Gemini 3 Pro, with model ID gemini-3.1-pro-preview. Pricing and context window (1M/64k) are the same as 3 Pro. Try it for free in Google AI Studio.\nFunction Calling — Strict Validation Gemini 3 introduces strict validation for Function Calling. Earlier models applied loose schema validation for tool calls, but now image generation/editing and Function Calling modes enforce strict validation that includes Thought Signatures.\nTwo calling patterns are supported:\ngraph TD A[\"Function Calling\"] --\u003e B[\"Sequentialmulti-step\"] A --\u003e C[\"Parallelconcurrent\"] B --\u003e B1[\"1. Model returns tool_call\"] B1 --\u003e B2[\"2. Client executes\"] B2 --\u003e B3[\"3. Feed result back to model\"] B3 --\u003e B4[\"4. Model returns next tool_call or final answer\"] C --\u003e C1[\"1. Model returns multiple tool_calls at once\"] C1 --\u003e C2[\"2. Client executes in parallel\"] C2 --\u003e C3[\"3. Feed all results back to model\"] C3 --\u003e C4[\"4. Model returns final answer\"]Sequential (multi-step): The model calls one tool at a time, receives the result, then decides on the next call. Best suited for agentic workflows where each step depends on the previous result.\nParallel: The model returns multiple independent tool calls at once. The client executes them in parallel, collects results, and feeds them back — the model then generates a combined response. This significantly reduces latency.\nImportant caveat: strict validation does not apply to text/streaming or in-context reasoning. That means calling a tool without a Thought Signature in image generation mode returns a 400 error, but normal text mode behaves as before.\nStructured Outputs with Tools Function Calling and Structured Output can now be combined. When defining a tool, specify a response schema to force the model to return tool call results as structured JSON. Where models previously responded in free-form text, production pipelines can now parse results reliably without parsing errors.\nCode Execution with Images Gemini 3\u0026rsquo;s code execution now supports image output. The model can run Python code and return charts or graphs generated by libraries like matplotlib as images. The key capability here is completing the pipeline of data analysis → visualization → explanation in a single API call.\ngraph LR A[\"Prompt:'Plot a chart of this data'\"] --\u003e B[\"Gemini 3\"] B --\u003e C[\"Generate Python code\"] C --\u003e D[\"Execute code(matplotlib)\"] D --\u003e E[\"Return chart image\"] E --\u003e F[\"Image + explanatory text\"]Multimodal Function Responses Tool call results can now include not just text but images, audio, and other multimodal data. For example, a tool call that returns a satellite image of an address lets the model analyze that image and produce a combined response. Agents can now pair data fetched from external APIs — including non-text data — with the model\u0026rsquo;s multimodal understanding.\nOpenAI-Compatible API Gemini 3 provides an OpenAI-compatible endpoint. Codebases using the OpenAI API can switch to Gemini 3 by changing only the model name and API key — a strategic choice that minimizes migration cost.\nMigrating from Gemini 2.5 Key things to watch when upgrading from Gemini 2.5:\nModel ID changes (gemini-2.5-* → gemini-3-*-preview) Thought Signatures are newly introduced — strict validation now applies in Function Calling Temperature defaults are optimized for 1.0 — remove any code setting a lower temperature thinking_level and thinking_budget cannot be used together (400 error) Insights Looking at Gemini 3\u0026rsquo;s new features, it\u0026rsquo;s clear Google is focused on reliability in agentic pipelines. Function Calling strict validation, Structured Outputs, and the parallel calling pattern all address the parsing errors and latency problems that arise in production agents. Code Execution with images and Multimodal function responses extend tool calling beyond text. The OpenAI-compatible API reduces the switching cost between competing models — a strategy similar to Claude\u0026rsquo;s own OpenAI compatibility mode. As API compatibility increases across models, developers gain the freedom to choose models based on performance and cost rather than being locked to a vendor.\n","date":"2026-02-26T00:00:00+09:00","image":"/images/posts/2026-02-26-gemini-3-function-calling/cover-en.jpg","permalink":"/posts/2026-02-26-gemini-3-function-calling/","title":"Gemini 3 — Function Calling, Structured Outputs, and Code Execution New Features"},{"content":"Overview trading-agent is a web app under development that lets users query stock prices and place paper trading orders using natural language. It wraps the Korea Investment \u0026amp; Securities (KIS) OpenAPI as an MCP (Model Context Protocol) server and uses Claude\u0026rsquo;s tool-calling to interpret user intent. This post walks through the architecture and the role of the CLAUDE.md added in PR #1.\nArchitecture The system is composed of three services: a React frontend (Vite, :5173), a FastAPI backend (:8000), and a KIS Trading MCP Server (SSE, :3000).\nReact (Vite, :5173) \u0026lt;--\u0026gt; FastAPI (:8000) \u0026lt;--\u0026gt; Claude API | MCP Client (fastmcp) | KIS Trading MCP Server (SSE, :3000) | KIS OpenAPI (paper trading) When a user asks \u0026ldquo;What\u0026rsquo;s the current price of Samsung Electronics?\u0026rdquo;, the flow is:\nsequenceDiagram participant User participant React as React UI participant API as FastAPI participant Claude as Claude API participant MCP as MCP Server participant KIS as KIS OpenAPI User-\u003e\u003eReact: \"What's Samsung Electronics at right now?\" React-\u003e\u003eAPI: POST /chat API-\u003e\u003eClaude: message + MCP tool definitions Claude-\u003e\u003eAPI: tool_use: get_stock_price API-\u003e\u003eMCP: fastmcp tool call MCP-\u003e\u003eKIS: REST API request KIS--\u003e\u003eMCP: price data MCP--\u003e\u003eAPI: tool result API-\u003e\u003eClaude: feed back tool result Claude--\u003e\u003eAPI: \"Samsung Electronics is currently...\" API--\u003e\u003eReact: SSE streaming React--\u003e\u003eUser: display responseThe key is that FastAPI passes MCP tool definitions alongside the message when calling the Claude API. Once Claude identifies the user\u0026rsquo;s intent and decides which tool to call, FastAPI executes it via the MCP Client (fastmcp) and feeds the result back to Claude. The final response is streamed to the React UI via SSE.\nTech Stack and Configuration Requirements: Python 3.12+ (uv), Node.js 22+, an Anthropic API key, and KIS paper trading credentials. The default model is claude-sonnet-4-5-20250929. Run make install \u0026amp;\u0026amp; make start to bring up all three services.\nKey environment variables:\nVariable Description ANTHROPIC_API_KEY Anthropic API key MCP_SERVER_URL MCP server SSE endpoint (default: http://localhost:3000/sse) CLAUDE_MODEL Claude model to use KIS_PAPER_APP_KEY KIS paper trading app key KIS_PAPER_APP_SECRET KIS paper trading app secret KIS_PAPER_STOCK Paper trading account number (8 digits) make targets are provided for install, start, and starting individual services to streamline the developer experience.\nCLAUDE.md — Project Guide for Claude Code PR #1 added CLAUDE.md. This file is the first context document Claude Code reads when entering a project. Documenting build commands, architecture overview, and development conventions means Claude Code will stay consistent when modifying code.\ngraph TD A[\"Claude Code session starts\"] --\u003e B[\"Read CLAUDE.md\"] B --\u003e C[\"Understand project structure\"] C --\u003e D[\"Check build/test commands\"] D --\u003e E[\"Code while following conventions\"] B --\u003e F[\"Understand architecture\"] F --\u003e G[\"Understand MCP tool structure\"] G --\u003e EAdding CLAUDE.md is not just documentation — it\u0026rsquo;s designing the collaboration interface for an AI agent. Each project has different build commands, different test conventions, different code styles. Rather than explaining all of this in conversation every time, defining it in a single file makes Claude Code\u0026rsquo;s first actions accurate from the start.\nInsights This project makes MCP\u0026rsquo;s value tangible. KIS OpenAPI is REST-based, but wrapping it in an MCP Server lets Claude go directly from natural language intent to a tool call. The important design point is that FastAPI acts as the orchestrator between the MCP Client and Claude API — Claude decides which tool to call, FastAPI actually runs it. That separation is clean. Starting with paper trading and being able to switch to the live API via a single environment variable is good design, and the make start DX that brings the whole stack up at once is a meaningful detail.\n","date":"2026-02-26T00:00:00+09:00","image":"/images/posts/2026-02-26-kis-trading-agent-mcp/cover-en.jpg","permalink":"/posts/2026-02-26-kis-trading-agent-mcp/","title":"KIS Trading Agent — LLM Stock Trading Architecture with MCP"},{"content":"Overview The VS Code extension ecosystem is at a crossroads. On one side, Microsoft\u0026rsquo;s official Webview UI Toolkit has been deprecated and archived. On the other, AI coding assistants have become an essential category of extensions. This post examines both trends.\nThe End of Webview UI Toolkit Issue #561 contains the announcement from hawkticehurst. A project with 2.1k stars and 157 forks was archived on January 6, 2025.\nThe root cause was the deprecation of its core dependency, FAST Foundation. In May 2024, the FAST project announced a re-alignment that placed several core packages on the deprecated list, pulling the rug out from under Webview UI Toolkit\u0026rsquo;s foundation. The only path forward was a complete rewrite using FAST Element (a lower-level web component library), but no resources were allocated for it.\ngraph TD A[\"FAST Foundationdeprecated (2024.05)\"] --\u003e B[\"Webview UI Toolkitloses its foundation\"] B --\u003e C{\"Rewrite?\"} C --\u003e|\"No resources allocated\"| D[\"Archived 2025.01\"] C --\u003e|\"Alternative\"| E[\"Full rewrite withFAST Element required\"] D --\u003e F[\"Options for existing users\"] F --\u003e G[\"Use VS Code CSS variablesdirectly\"] F --\u003e H[\"Custom componentswith Svelte/React\"] F --\u003e I[\"@vscode/codiconsfor icons only\"]The library provided three things of value:\nUI components following VS Code\u0026rsquo;s design language (buttons, dropdowns, data grids, etc.) Automatic theme support (auto-switching between dark and light mode) Framework-agnostic web components that worked with React, Vue, Svelte, etc. There is now no official replacement. Developers are left using VS Code\u0026rsquo;s CSS variables (--vscode-button-background, --vscode-input-border, etc.) directly, or pulling in @vscode/codicons for icons and building everything else themselves.\nBest Extensions for 2026 — AI as Its Own Category The notable shift in Builder.io\u0026rsquo;s Best VS Code Extensions for 2026 roundup is that AI extensions are now a standalone category. The post operates from the premise that 2025 was the year of AI agents, and that by 2026 most developers are already using AI IDEs like Cursor or Claude Code.\nTop three AI extension picks:\nFusion: Visual editing + AI code changes that create PRs directly in the real repo Claude Code: Context-aware in-IDE coding, 5M+ installs Sourcegraph Cody: Cross-repo context based on code graphs Other notable recommendations:\nThunder Client: REST client (Postman alternative) Error Lens: Inline error/warning display Pretty TypeScript Errors: More readable TS diagnostic messages TODO Tree: Collects all TODO/FIXME comments in one place Git Graph: Visual commit history CSS Peek: Jump from markup/JSX directly to style definitions Import Cost: Shows bundle size for imports inline The checklist for evaluating extensions is practical: check who made it (verified publisher, open source), whether it was updated recently, its performance impact, and what permissions it requests. The advice to install heavy extensions only in specific workspaces and exclude folders like node_modules is also included.\nClaude Code for VS Code On the VS Code Marketplace, Claude Code has crossed 5M+ installs. It\u0026rsquo;s available through Pro, Max, Team, and Enterprise subscriptions or pay-as-you-go, and supports both a terminal-based workflow and full IDE integration. A separate desktop app via Homebrew is also available (brew install --cask claude-code).\ngraph TD A[\"Claude Code\"] --\u003e B[\"VS Code extension5M+ installs\"] A --\u003e C[\"Terminal CLI\"] A --\u003e D[\"Desktop appvia Homebrew\"] B --\u003e E[\"In-IDE coding\"] C --\u003e F[\"Terminal workflow\"] D --\u003e G[\"Standalone use\"] E \u0026 F \u0026 G --\u003e H[\"Same Claude modelPro/Max/Team/Enterprise\"]Insights Two forces are colliding in the VS Code extension ecosystem. Established infrastructure like Webview UI Toolkit collapses through dependency chain failures (FAST Foundation → Toolkit), while AI coding assistants grow into a must-have category. If you\u0026rsquo;re building a webview-based extension, you now have to construct your own UI components or choose a lightweight framework — and ironically, AI tools like Claude Code can help generate that boilerplate. The vacant spot in the extension ecosystem is being filled by AI.\n","date":"2026-02-26T00:00:00+09:00","image":"/images/posts/2026-02-26-vscode-ecosystem-2026/cover-en.jpg","permalink":"/posts/2026-02-26-vscode-ecosystem-2026/","title":"The VS Code Extension Ecosystem in 2026: From Webview UI Toolkit's End to AI Extensions"},{"content":"Overview juehang/vscode-mcp-server is a VS Code extension that exposes the editor\u0026rsquo;s built-in capabilities — file manipulation, symbol search, diagnostics, and more — through the MCP protocol. This lets Claude Desktop or any other MCP client code directly inside VS Code. Inspired by Serena, its differentiator is using VS Code\u0026rsquo;s native API rather than external tooling.\nArchitecture The extension provides a Streamable HTTP API (http://localhost:3000/mcp), using the newer MCP transport instead of SSE. Connect from Claude Desktop via npx mcp-remote@next:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;vscode-mcp-server\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;mcp-remote@next\u0026#34;, \u0026#34;http://localhost:3000/mcp\u0026#34;] } } } graph LR A[\"Claude DesktopMCP Client\"] --\u003e|\"Streamable HTTP\"| B[\"vscode-mcp-server:3000/mcp\"] B --\u003e C[\"VS Code API\"] C --\u003e D[\"Workspace files\"] C --\u003e E[\"Language Server\"] C --\u003e F[\"Terminal\"]MCP Tool Catalog Five categories, seven or more tools total:\nFile Tools — File system operations\nlist_files_code: List files in a directory read_file_code: Read file contents create_file_code: Create a file (with overwrite option) Edit Tools — Code modifications\nreplace_lines_code: Replace a specific line range. Requires exact match with the original content. Diagnostics Tools — Code diagnostics\nget_diagnostics_code: Returns Language Server diagnostics (errors and warnings) Symbol Tools — Code navigation\nsearch_symbols_code: Search for functions/classes across the entire workspace get_document_symbols_code: Symbol outline for a single file Shell Tools — Terminal command execution\nHow It Differs from Claude Code Claude Code also supports reading and writing files, but vscode-mcp-server is distinct in exposing VS Code-native capabilities. Language Server-backed symbol search, document outlines, and code diagnostics are semantically more precise than Claude Code\u0026rsquo;s grep/ripgrep-based search. Combining both tools gives you Claude Code\u0026rsquo;s powerful file manipulation alongside VS Code\u0026rsquo;s semantic code understanding.\nThe recommended workflow from the project README:\nlist_files_code to understand project structure search_symbols_code to find the target function/class read_file_code to see current contents replace_lines_code for small changes, create_file_code with overwrite for large ones After every edit, get_diagnostics_code to catch errors Security Considerations Shell Tools are included, meaning shell command execution is possible. MCP authentication specs are not yet finalized, so authentication is not implemented — take care that the port is not exposed externally. Only trusted MCP clients should be connected.\nInsights This extension shows the MCP ecosystem evolving beyond \u0026ldquo;tool standardization\u0026rdquo; toward \u0026ldquo;environment integration.\u0026rdquo; Where LLMs previously read and wrote files directly, vscode-mcp-server enables access to Language Server type checking, symbol indexing, and diagnostics as well. The pattern of calling get_diagnostics_code after every edit maps the human developer workflow — \u0026ldquo;write code → ask the compiler → fix it\u0026rdquo; — onto an LLM. Once the MCP authentication spec is finalized, this will be even safer to deploy.\n","date":"2026-02-26T00:00:00+09:00","image":"/images/posts/2026-02-26-vscode-mcp-server/cover-en.jpg","permalink":"/posts/2026-02-26-vscode-mcp-server/","title":"vscode-mcp-server — Exposing VS Code's Editor Capabilities to LLMs via MCP"},{"content":"Overview If you\u0026rsquo;re running gemini-3-flash-preview in production, a reported bug warrants adding defensive code immediately. When sending 100+ concurrent requests, the model enters an infinite reasoning loop at a rate of 3–5%, consuming all available maxOutputTokens and returning its internal reasoning as the final response. These two failures happen simultaneously. A previous post covered Gemini 3\u0026rsquo;s Thought Signatures and the thinking_level parameter — this bug is exactly a stop-condition failure in that Thinking mechanism.\nBackground: Gemini 3 image generation API, Thought Signatures, thinking_level, media_resolution → 2026-02-20 post\nBug Details: What\u0026rsquo;s Actually Happening Trigger Conditions The bug appears with problems requiring step-by-step proof — bitwise operations, mathematical verification, logic puzzles. Example prompt type: \u0026ldquo;Bitwise Toggle algorithm.\u0026rdquo;\nWhen the model doesn\u0026rsquo;t derive the answer directly and starts verifying specific integer values, it fails to converge:\nChecking n = 67108863... correct Checking n = 67108864... correct Checking n = 134217727... correct Checking n = 134217728... (continues) A loop that verifies sequentially doubling values runs endlessly until it hits the token limit.\nTwo Simultaneous API Response Failures { \u0026#34;response\u0026#34;: { \u0026#34;usageMetadata\u0026#34;: { \u0026#34;totalTokenCount\u0026#34;: 16233, \u0026#34;thoughtsTokenCount\u0026#34;: 15356, // ← 94.6% of all tokens consumed by internal reasoning \u0026#34;candidatesTokenCount\u0026#34;: 640 }, \u0026#34;candidates\u0026#34;: [{ \u0026#34;content\u0026#34;: { \u0026#34;parts\u0026#34;: [ { \u0026#34;text\u0026#34;: \u0026#34;**Algorithm for Bitwise Toggle**\\n\\nOkay, here\u0026#39;s my line of thinking...\u0026#34;, \u0026#34;thought\u0026#34;: true // ← Normal internal reasoning (should be hidden) }, { // ⚠️ BUG: This is an internal reasoning loop but thought: true flag is missing \u0026#34;text\u0026#34;: \u0026#34;Wait, let\u0026#39;s check n = 67108863... Correct. Wait, let\u0026#39;s check n = 67108864...\u0026#34;, \u0026#34;thoughtSignature\u0026#34;: \u0026#34;.....\u0026#34; // thought: true missing → parser treats this as the final response } ] }, \u0026#34;finishReason\u0026#34;: \u0026#34;MAX_TOKENS\u0026#34; // ← Not a clean finish; forced termination by token limit }] } } Failure 1 — Token exhaustion: Of 16,233 total tokens, 15,356 (94.6%) are consumed by thoughtsTokenCount. Only 640 tokens remain for the actual response, and no valid answer is generated.\nFailure 2 — Internal logic leak: When finishReason: MAX_TOKENS forces termination, the current buffer is flushed. The problem: the loop text parts lack the \u0026quot;thought\u0026quot;: true flag. The SDK parser treats them as final user-facing responses and returns them.\nflowchart TD A[Prompt input] --\u003e B{Model begins reasoning} B --\u003e|Normal case 95-97%| C[Derives generalized formula] C --\u003e D[Generates final answer] D --\u003e E[Returns thought: true parts + answer parts] B --\u003e|Bug case 3-5%| F[Starts verifying specific integer values] F --\u003e G[\"n=67108863 check... n=67108864 check...\"] G --\u003e|Token limit exceeded| H[MAX_TOKENS finishReason] H --\u003e I[\"Buffer flush: loop text returned \u0026lt;br/\u0026gt;⚠️ thought: true flag missing\"] I --\u003e J[\"Client: treats useless reasoning text as final answer\"]Impact Model: gemini-3-flash-preview (confirmed) Reproduction rate: 3–5% with 100+ concurrent requests Token settings: Occurs with both maxOutputTokens 16k and 32k Execution mode: Affects both Batch mode and regular API calls Defensive Code You Can Add Now Client-side defenses until an official fix is available:\n1. Check finishReason response = model.generate_content(prompt) for candidate in response.candidates: if candidate.finish_reason == \u0026#34;MAX_TOKENS\u0026#34;: # Invalid response — retry or raise an error raise ValueError(\u0026#34;Response was truncated due to token limit\u0026#34;) 2. Check thoughtsTokenCount Ratio usage = response.usage_metadata thoughts_ratio = usage.thoughts_token_count / usage.total_token_count if thoughts_ratio \u0026gt; 0.9: # Over 90% of tokens consumed by reasoning → likely infinite loop logger.warning(f\u0026#34;Possible reasoning loop detected: {thoughts_ratio:.1%} tokens in thoughts\u0026#34;) raise ValueError(\u0026#34;Model entered a reasoning loop\u0026#34;) 3. Check the thought Flag for part in response.candidates[0].content.parts: # Parts with thoughtSignature but no thought: true are suspicious if hasattr(part, \u0026#39;thought_signature\u0026#39;) and not getattr(part, \u0026#39;thought\u0026#39;, False): logger.error(\u0026#34;Leaked reasoning detected in response parts\u0026#34;) # Remove this part from the response or retry the whole request 4. Adjust thinking_level Setting the thinking_level parameter (covered in the previous post) to \u0026quot;low\u0026quot; or \u0026quot;medium\u0026quot; reduces occurrence frequency — but also reduces reasoning quality:\ngeneration_config = { \u0026#34;thinking_config\u0026#34;: { \u0026#34;thinking_budget\u0026#34;: 4096, # Directly cap the token budget instead of using thinking_level } } Why Flash Preview? Gemini 3 Flash was optimized for speed and cost efficiency, with a lighter reasoning process than Pro. Its stop-condition safety net appears weaker. The vulnerability surfaces with problem types like bitwise operations or mathematical proofs where the model feels compelled to \u0026ldquo;verify every case to be sure.\u0026rdquo;\nPractical recommendations for production use of gemini-3-flash-preview:\nRoute logic/math problems to gemini-3-pro-preview when possible When using Flash, always include finishReason + thoughtsTokenCount defensive checks Add a response validation layer for high-volume batch processing Quick Links Google AI Developers Forum — Original bug report Gemini 3 Developer Guide — thinking_level parameter Insights This bug illustrates that stronger reasoning capabilities introduce new categories of failure. Pre-Thinking models just gave wrong answers. Thinking models have a strong drive to \u0026ldquo;find the right answer\u0026rdquo; — and when they can\u0026rsquo;t converge, they loop infinitely. When deploying reasoning models in production, a separate response validation layer is safer than simply lowering maxOutputTokens. Treat any response with finishReason: MAX_TOKENS as suspect until proven otherwise.\n","date":"2026-02-25T00:00:00+09:00","image":"/images/posts/2026-02-25-gemini-3-flash-infinite-loop-bug/cover-en.jpg","permalink":"/posts/2026-02-25-gemini-3-flash-infinite-loop-bug/","title":"Gemini 3 Flash Preview Bug: Infinite Reasoning Loop and Internal Logic Leak"},{"content":"Overview While exploring DevOps engineer positions and looking at WhaTap Labs (a leading Korean APM vendor) job postings, I went deep on the observability tooling ecosystem. Comparing Honeycomb and Grafana reveals more than just \u0026ldquo;which tool is better\u0026rdquo; — it exposes a fundamental difference between monitoring and observability as two distinct paradigms. This post breaks down that difference through data models, query approaches, and SLO design.\ngraph TD subgraph \"Traditional Monitoring (Grafana)\" A[Application] --\u003e|metrics| B[Prometheus/InfluxDB] A --\u003e|logs| C[Loki/Elasticsearch] A --\u003e|traces| D[Tempo/Jaeger] B --\u003e E[Grafana Dashboard] C --\u003e E D --\u003e E E --\u003e|separate views| F[Developer] end subgraph \"Observability (Honeycomb)\" G[Application] --\u003e|wide events| H[Honeycomb single store] H --\u003e|unified query builder| I[Developer] endThe Paradigm Difference: It\u0026rsquo;s All About the Data Model Monitoring (Grafana\u0026rsquo;s Approach) Traditional monitoring was designed to answer predefined questions. You decide in advance which metrics matter and aggregate them as time series.\nMetrics: CPU usage, P99 response time, error rate — numbers aggregated as time series Logs: Individual event text — stored separately in Loki or Elasticsearch Traces: Distributed request tracking — stored separately in Tempo or Jaeger Each signal type lives in a separate store. To figure out \u0026ldquo;we had an error — which user triggered it and on which server?\u0026rdquo; you have to jump between three tabs, manually align time ranges, and piece together the correlation yourself.\nGrafana\u0026rsquo;s strength is visualization flexibility. Connect any data source and build dashboards. If you\u0026rsquo;re already using Prometheus, MySQL, and CloudWatch, Grafana serves as a unified viewer.\nObservability (Honeycomb\u0026rsquo;s Approach) The core concept in observability is Wide Events. When a request is processed, every relevant piece of context is captured as a single event:\n{ \u0026#34;timestamp\u0026#34;: \u0026#34;2026-02-25T10:30:00Z\u0026#34;, \u0026#34;service\u0026#34;: \u0026#34;payment-api\u0026#34;, \u0026#34;user_id\u0026#34;: \u0026#34;u_12345\u0026#34;, \u0026#34;tenant_id\u0026#34;: \u0026#34;enterprise_co\u0026#34;, \u0026#34;request_path\u0026#34;: \u0026#34;/api/charge\u0026#34;, \u0026#34;duration_ms\u0026#34;: 2340, \u0026#34;db_query_count\u0026#34;: 12, \u0026#34;cache_hit\u0026#34;: false, \u0026#34;region\u0026#34;: \u0026#34;ap-northeast-2\u0026#34;, \u0026#34;k8s_pod\u0026#34;: \u0026#34;payment-6c8b7d9-xk2p4\u0026#34;, \u0026#34;feature_flag\u0026#34;: \u0026#34;new_checkout_flow\u0026#34;, \u0026#34;error\u0026#34;: null } This single event contains metrics (duration_ms), log context (error), and trace context (k8s_pod, region). Honeycomb analyzes all of this in a single store with a single query builder.\nFeature Comparison graph LR subgraph \"High Cardinality Handling\" A[\"Grafana \u0026lt;br/\u0026gt;Requires a column per field \u0026lt;br/\u0026gt;or increased cost\"] B[\"Honeycomb \u0026lt;br/\u0026gt;Query any field, unlimited \u0026lt;br/\u0026gt;(no cost change)\"] end subgraph \"SLO Design\" C[\"Grafana \u0026lt;br/\u0026gt;Metric-based SLOs \u0026lt;br/\u0026gt;Context is lost\"] D[\"Honeycomb \u0026lt;br/\u0026gt;Event-based SLOs \u0026lt;br/\u0026gt;Drill into violations immediately\"] end subgraph \"Query Complexity\" E[\"Grafana \u0026lt;br/\u0026gt;PromQL + LogQL separately\"] F[\"Honeycomb \u0026lt;br/\u0026gt;Unified Query Builder\"] endThe High Cardinality Problem Cardinality is the number of unique values a field can hold. user_id is a high-cardinality field — it can have millions of unique values.\nGrafana (Prometheus): Each unique value creates a separate time series. Grouping by user_id produces millions of time series, causing storage explosion. Avoiding this requires pre-aggregation or careful indexing strategy. Analyzing \u0026ldquo;the slow request pattern for a specific user\u0026rdquo; after the fact is difficult.\nHoneycomb: Just put user_id in the Wide Event. Event-based storage has no cardinality constraints. After a problem occurs, filter by user_id = \u0026quot;u_12345\u0026quot; and immediately query all events for that user.\nSLO Comparison A poorly designed SLO fires alerts but leaves you with no idea what to actually fix.\nCriterion Grafana Honeycomb Data source Aggregated metrics Raw events Violation context None (just a number) Drill directly into violating events Alert accuracy False positives possible Higher precision via event basis \u0026ldquo;Why did it violate?\u0026rdquo; Manual cross-reference of logs/traces Immediate analysis in the same UI Example: P99 response time SLO violation\nGrafana: Alert → metric dashboard → search logs in Loki → analyze traces in Tempo (3 tabs) Honeycomb: Alert → list of violating events → spot feature_flag = \u0026quot;new_checkout_flow\u0026quot; pattern (1 UI) Pricing Model Item Grafana Cloud Honeycomb Base unit Bytes + series count + users Event count High cardinality Additional cost Included Query cost Extra above threshold Included Predictability Low (multiple variables) High (per-event) Grafana is cheaper when: you\u0026rsquo;re already using Prometheus, your metric count is low, and you don\u0026rsquo;t need deep ad-hoc analysis.\nHoneycomb is cheaper when: you need high-cardinality analysis, or the engineering cost of integrating multiple signals (metrics/logs/traces) is significant.\nWhen to Use Which graph TD A[Choose your strategy] --\u003e B{Current infrastructure?} B --\u003e|Already using Prometheus| C[Keep using Grafana] B --\u003e|Starting fresh or evaluating alternatives| D{Team size and requirements?} D --\u003e|Infrastructure metrics focus, small team| E[Grafana + Prometheus] D --\u003e|Distributed systems, per-user debugging needed| F[Honeycomb] D --\u003e|Large enterprise| G[Datadog / New Relic] C --\u003e H{High cardinality analysis needed?} H --\u003e|No| I[Grafana is sufficient] H --\u003e|Yes| J[Honeycomb or Grafana + Tempo combination]Grafana is the right fit when:\nAlready running a Prometheus/Loki stack Infrastructure metric dashboards are the primary use case Cost sensitivity is high and traffic is predictable Open-source self-hosting is a requirement Honeycomb is the right fit when:\nYou need to quickly answer \u0026ldquo;which requests are slow and why\u0026rdquo; in a microservices/distributed system High-cardinality attributes (user_id, tenant_id, feature_flag) are central to your analysis workflow Your SRE team is focused on DORA metrics and SLO management Korean Market Context: WhaTap Labs and APM Looking at their job postings today revealed something interesting — WhaTap Labs is a Korean-built APM (Application Performance Monitoring) company. They\u0026rsquo;re positioned as a domestic alternative to global tools like Honeycomb and Datadog, with agent-based auto-instrumentation, Korean language support, and on-premises deployment options as key differentiators.\nMany Korean companies hiring DevOps/Observability engineers (Coinone, Yanolja, etc.) use combinations of Grafana and internal tooling. Globally, the shift toward a \u0026ldquo;developer-centric observability\u0026rdquo; paradigm like Honeycomb is accelerating. This space looks increasingly interesting from a career perspective.\nQuick Links Honeycomb vs Grafana — Honeycomb\u0026rsquo;s official comparison Gartner Peer Insights — Grafana vs Honeycomb WhaTap Labs DevOps Job Posting Insights The difference between monitoring and observability comes down to whether you know the question in advance. Traditional monitoring alerts you when a predefined metric crosses a threshold — it\u0026rsquo;s strong against known failure modes. Observability enables exploring questions you didn\u0026rsquo;t define upfront, like \u0026ldquo;why is this specific user\u0026rsquo;s request slow?\u0026rdquo; As systems grow more complex and unknown failure modes multiply, the value of the observability paradigm compounds. If you\u0026rsquo;re already on Grafana, Loki + Tempo + Grafana can approximate observability — but with data living in separate stores, the query UX limitations are unavoidable.\n","date":"2026-02-25T00:00:00+09:00","image":"/images/posts/2026-02-25-observability-honeycomb-vs-grafana/cover-en.jpg","permalink":"/posts/2026-02-25-observability-honeycomb-vs-grafana/","title":"Observability vs Monitoring: Honeycomb vs Grafana"},{"content":"Overview When running a FastAPI backend alongside a Vite frontend on EC2, approaches like nohup python ... \u0026amp; leave you flying blind — if the process dies you won\u0026rsquo;t know, a reboot wipes everything, and log management is painful. PM2 (Process Manager 2) originated in the Node.js world but works as a production process manager for any language. This post covers PM2 basics, the real-world pattern of managing Python (uvicorn) and Node.js (Vite) with a single ecosystem.config.js, and how to fix dotenv conflicts.\ngraph TD A[pm2 start ecosystem.config.js] --\u003e B[PM2 Daemon] B --\u003e C[\"backend process \u0026lt;br/\u0026gt;uv run uvicorn :8000\"] B --\u003e D[\"frontend process \u0026lt;br/\u0026gt;npm run dev :5173\"] C --\u003e|crash| E[auto-restart autorestart] D --\u003e|crash| E E --\u003e B F[pm2 save] --\u003e G[\"~/.pm2/dump.pm2 \u0026lt;br/\u0026gt;process list saved\"] H[pm2 startup] --\u003e I[\"system service registered \u0026lt;br/\u0026gt;auto-recovery after reboot\"] G --\u003e IPM2 Command Cheat Sheet # Install globally npm install pm2 -g # Start a single process pm2 start app.js pm2 start server.py --interpreter python3 # Python # List processes pm2 list # Detailed info pm2 show \u0026lt;name\u0026gt; # Live logs pm2 logs # all processes pm2 logs backend # specific process only pm2 logs --lines 200 # last 200 lines # Restart / stop / delete pm2 restart \u0026lt;name\u0026gt; pm2 stop \u0026lt;name\u0026gt; pm2 delete \u0026lt;name\u0026gt; # remove from list entirely # Resource monitor (CPU/memory live) pm2 monit # Save current process list → restore after reboot pm2 save The difference between pm2 stop and pm2 delete: stop keeps the entry in the list while delete removes it entirely. Use stop if you plan to restart it later, delete to clean it up completely.\necosystem.config.js — Managing Multiple Processes as One Once you have a growing list of flags like pm2 start app.js --name backend --watch --max-memory-restart 1G ..., managing becomes hard. Use ecosystem.config.js to declare all configuration as code.\n# Auto-generate a sample file pm2 ecosystem Edit the generated file to fit your project:\nmodule.exports = { apps: [ { name: \u0026#39;my-api\u0026#39;, script: \u0026#39;server.js\u0026#39;, instances: 1, autorestart: true, // auto-restart on crash watch: false, // restart on file changes (true for dev only) max_memory_restart: \u0026#39;1G\u0026#39;, env: { // default environment variables NODE_ENV: \u0026#39;development\u0026#39;, PORT: 3000 }, env_production: { // applied with --env production NODE_ENV: \u0026#39;production\u0026#39;, PORT: 8080 } } ] }; To use different variables per environment, add an env_\u0026lt;name\u0026gt; key and select it with the --env flag at startup:\npm2 start ecosystem.config.js # uses env pm2 start ecosystem.config.js --env production # uses env_production The dotenv (.env) and PM2 Conflict The most common PM2 gotcha: everything works fine with node server.js locally, but PM2 complains about missing environment variables.\nThe cause is simple. dotenv reads .env and injects into process.env when the process starts. But PM2 runs as an independent daemon (background service), so the current shell\u0026rsquo;s environment variables are not automatically inherited.\nTwo solutions:\nOption 1 — Declare directly in ecosystem.config.js (recommended)\nenv: { NODE_ENV: \u0026#39;production\u0026#39;, DATABASE_URL: \u0026#39;postgresql://...\u0026#39;, API_KEY: \u0026#39;your-key-here\u0026#39; } Downside: if ecosystem.config.js is committed to git, secrets are exposed. Either add it to .gitignore, or split secrets into a separate file and require('./secrets').\nOption 2 — Load dotenv directly in application code\nIn Python, python-dotenv reads .env at app startup regardless of PM2:\n# main.py from dotenv import load_dotenv load_dotenv() # works under PM2 too Same for Node.js:\nrequire(\u0026#39;dotenv\u0026#39;).config(); // at the top of your entry point, works with PM2 Running Non-Node.js Processes — interpreter: \u0026ldquo;none\u0026rdquo; PM2 defaults to running .js files with Node.js. To run Python, Go, shell scripts, or other runtimes, you have two options:\nOption 1 — Specify the interpreter explicitly\n{ name: \u0026#39;flask-api\u0026#39;, script: \u0026#39;app.py\u0026#39;, interpreter: \u0026#39;python3\u0026#39; } Option 2 — interpreter: \u0026ldquo;none\u0026rdquo; + specify the binary directly in script (recommended)\n{ name: \u0026#39;backend\u0026#39;, script: \u0026#39;uvicorn\u0026#39;, // or absolute path: \u0026#39;/usr/local/bin/uvicorn\u0026#39; args: \u0026#39;main:app --host 0.0.0.0 --port 8000\u0026#39;, interpreter: \u0026#39;none\u0026#39; // runs the binary directly, no Node.js wrapper } interpreter: \u0026quot;none\u0026quot; is more flexible. Put any executable — uv, gunicorn, go, shell scripts — in script and pass arguments via args.\nReal-World Example: Hybrid Image Search Demo Here is the ecosystem.config.js from a project currently in production (hybrid-image-search-demo), managing a FastAPI backend (Python + uv) and a Vite frontend (Node.js) together:\nmodule.exports = { apps: [ { name: \u0026#34;backend\u0026#34;, cwd: \u0026#34;./\u0026#34;, // run from repo root — critical for Python module resolution script: \u0026#34;uv\u0026#34;, // run uv (Python package manager) directly args: \u0026#34;run python -m uvicorn backend.src.main:app --host 0.0.0.0 --port 8000\u0026#34;, interpreter: \u0026#34;none\u0026#34;, // uv is not Node.js — required env: { NODE_ENV: \u0026#34;production\u0026#34;, // GOOGLE_API_KEY, OPENAI_API_KEY are loaded from .env via python-dotenv } }, { name: \u0026#34;frontend\u0026#34;, cwd: \u0026#34;./frontend\u0026#34;, // npm commands must run where package.json lives script: \u0026#34;npm\u0026#34;, args: \u0026#34;run dev -- --host\u0026#34;, // \u0026#39;--host\u0026#39; tells Vite to bind 0.0.0.0 (allow external access) interpreter: \u0026#34;none\u0026#34;, } ] }; Key points in this configuration:\ncwd: \u0026quot;./\u0026quot; — the backend must run from the repo root so that dotted module paths like backend.src.main resolve correctly. Omitting cwd or setting it to ./backend will cause ModuleNotFoundError.\nargs: \u0026quot;run dev -- --host\u0026quot; — when passing extra arguments to an npm script, separate them with --. --host is forwarded to Vite, not npm.\nSecrets stay in .env + python-dotenv — GOOGLE_API_KEY and OPENAI_API_KEY are not in the ecosystem file. The FastAPI app reads .env directly at startup.\ngraph LR subgraph \"PM2 Daemon\" A[\"backend \u0026lt;br/\u0026gt;uv run python -m uvicorn \u0026lt;br/\u0026gt;:8000\"] B[\"frontend \u0026lt;br/\u0026gt;npm run dev --host \u0026lt;br/\u0026gt;:5173\"] end C[\".env \u0026lt;br/\u0026gt;GOOGLE_API_KEY \u0026lt;br/\u0026gt;OPENAI_API_KEY\"] --\u003e|python-dotenv loads| A D[\"ecosystem.config.js \u0026lt;br/\u0026gt;NODE_ENV=production\"] --\u003e|PM2 injects| A E[Nginx or direct access] --\u003e A E --\u003e BAuto-Recovery After Server Reboot PM2\u0026rsquo;s process list disappears when the server restarts. Register it permanently in two steps:\n# Step 1: Save the current running process list pm2 save # → written to ~/.pm2/dump.pm2 # Step 2: Register PM2 as a system service (auto-start on reboot) pm2 startup # This prints the command you need to run: # [PM2] To setup the Startup Script, copy/paste the following command: # sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u ubuntu --hp /home/ubuntu # Run the printed sudo command as-is sudo env PATH=$PATH:/usr/bin ... pm2 startup auto-detects the init system (systemd, SysV, etc.). On AWS EC2 Ubuntu it generates a systemd service file.\nWatch Out: Fixed Script Path Per Service Name PM2 locks a script path to a service name the first time it is registered. A main service started from /home/project1/server.js will keep running /home/project1/server.js even if you start it again from /home/project2/ using the same name.\n# Check the currently bound path pm2 show main # look at the \u0026#39;script path\u0026#39; field # Fix: delete the old service and re-register pm2 delete main cd /home/project2/ pm2 start server.js --name main Using ecosystem.config.js avoids this problem naturally — cwd and script are declared explicitly.\nQuick Reference — Common Patterns # Start / restart / stop with ecosystem pm2 start ecosystem.config.js pm2 restart ecosystem.config.js pm2 stop ecosystem.config.js # Single app pm2 restart backend pm2 logs frontend --lines 100 # Status at a glance pm2 list # Full clean restart pm2 delete all \u0026amp;\u0026amp; pm2 start ecosystem.config.js \u0026amp;\u0026amp; pm2 save Quick Links PM2 Official Docs — ecosystem.config.js PM2 ecosystem.config.js environment variables (Korean) PM2 background run / stop / restart (Korean) Insights The biggest confusion when starting with PM2 is \u0026ldquo;it\u0026rsquo;s a Node.js tool — why use it for Python?\u0026rdquo; With interpreter: \u0026quot;none\u0026quot;, PM2 becomes a pure process watchdog — it detects crashes and restarts any process regardless of language. In practice, when running a Python backend alongside a Node.js frontend like this project, having a single pm2 logs command that aggregates both streams is a significant operational convenience. The dotenv vs. PM2 conflict stems from a difference in \u0026ldquo;process execution context\u0026rdquo; — once you understand that, similar issues like environment variables disappearing in Docker containers become easy to diagnose with the same mental model.\n","date":"2026-02-25T00:00:00+09:00","image":"/images/posts/2026-02-25-pm2-process-manager-ecosystem/cover-en.jpg","permalink":"/posts/2026-02-25-pm2-process-manager-ecosystem/","title":"Running Python + Node.js Multi-Service Apps with PM2 — A Complete ecosystem.config.js Guide"},{"content":"Overview When a VS Code extension needs to sign in to an external OAuth service (GitHub, Auth0, etc.), the flow involves opening a browser and receiving a callback. A regular web app uses a local server at something like http://localhost:3000/callback as the redirect URI, but a VS Code extension can receive the callback directly via the vscode://publisher.extension-name protocol — no local port needed. This post covers how to combine registerUriHandler and the AuthenticationProvider API to implement the OAuth flow, the protocol limitations in code-server (browser-based VS Code), and how Remote Tunnels handles OAuth.\ngraph TD A[Extension activates] --\u003e B[registerUriHandler] A --\u003e C[registerAuthenticationProvider] B --\u003e D[UriEventHandler] C --\u003e E[Auth0AuthenticationProvider] D --\u003e|handleUri event| E E --\u003e|createSession called| F[Open browser - Auth0 login page] F --\u003e|OAuth callback| G[\"vscode://publisher.ext-name#access_token=...\"] G --\u003e|OS delivers to VS Code| B B --\u003e|parse URI| H[Extract token] H --\u003e I[Store in context.secrets] I --\u003e J[Return AuthenticationSession]registerUriHandler — The Entry Point for External Callbacks vscode.window.registerUriHandler() registers an OS-level URI handler so that when an external source opens a vscode:// link, the extension receives that URI. If multiple VS Code windows are open, the foreground window handles it.\nThe implementation is straightforward — implement the UriHandler interface and propagate events via an EventEmitter:\nclass UriEventHandler extends EventEmitter\u0026lt;Uri\u0026gt; implements UriHandler { public handleUri(uri: Uri) { this.fire(uri); // deliver URI to subscribers } } // Inside activate() in extension.ts: const uriHandler = new UriEventHandler(); context.subscriptions.push( vscode.window.registerUriHandler(uriHandler) ); The URL format for incoming URIs is:\nvscode://\u0026lt;publisher\u0026gt;.\u0026lt;extension-name\u0026gt;[/path][?query=value][#fragment=value] Examples: vscode://mycompany.my-ext?code=abc123 or vscode://mycompany.my-ext#access_token=xyz\nOne important distinction: Auth0 sends tokens in the URI fragment (#) for implicit flow, while Azure AD uses the query string (?). Which side you parse depends on your OAuth provider.\nThe AuthenticationProvider Interface Since VS Code 1.54, authentication.registerAuthenticationProvider() lets you register a custom authentication provider. This makes the provider appear in VS Code\u0026rsquo;s Account menu and allows other extensions to request sessions via vscode.authentication.getSession().\nInterface to implement:\nexport class Auth0AuthenticationProvider implements AuthenticationProvider, Disposable { private _sessionChangeEmitter = new EventEmitter\u0026lt;...\u0026gt;(); // Event VS Code subscribes to for session changes get onDidChangeSessions() { return this._sessionChangeEmitter.event; } // Return stored sessions (read from secrets store) async getSessions(scopes?: string[]): Promise\u0026lt;readonly AuthenticationSession[]\u0026gt; { const stored = await this.context.secrets.get(SESSIONS_KEY); return stored ? JSON.parse(stored) : []; } // Sign in → obtain token → create session async createSession(scopes: string[]): Promise\u0026lt;AuthenticationSession\u0026gt; { const token = await this.login(scopes); const userinfo = await this.getUserInfo(token); const session: AuthenticationSession = { id: uuid(), accessToken: token, account: { label: userinfo.name, id: userinfo.email }, scopes: [] }; await this.context.secrets.store(SESSIONS_KEY, JSON.stringify([session])); this._sessionChangeEmitter.fire({ added: [session], removed: [], changed: [] }); return session; } // Sign out async removeSession(sessionId: string): Promise\u0026lt;void\u0026gt; { const sessions = JSON.parse(await this.context.secrets.get(SESSIONS_KEY) || \u0026#39;[]\u0026#39;); const idx = sessions.findIndex((s: AuthenticationSession) =\u0026gt; s.id === sessionId); const [removed] = sessions.splice(idx, 1); await this.context.secrets.store(SESSIONS_KEY, JSON.stringify(sessions)); this._sessionChangeEmitter.fire({ added: [], removed: [removed], changed: [] }); } } context.secrets stores data encrypted in VS Code\u0026rsquo;s built-in secret store (macOS Keychain, Windows Credential Manager, Linux libsecret). This is why tokens should never be stored in plain text via globalState.\nOAuth Login Flow in Detail The login() method called inside createSession is where the actual OAuth flow happens:\nprivate async login(scopes: string[]) { return await window.withProgress({ location: ProgressLocation.Notification, ... }, async () =\u0026gt; { const stateId = uuid(); // state parameter for CSRF protection this._pendingStates.push(stateId); // Build the OAuth authorization URL const params = new URLSearchParams({ response_type: \u0026#39;token\u0026#39;, client_id: CLIENT_ID, redirect_uri: `vscode://${PUBLISHER}.${EXT_NAME}`, state: stateId, scope: scopes.join(\u0026#39; \u0026#39;) }); await env.openExternal(Uri.parse(`https://auth0.com/authorize?${params}`)); // Wait until the URI handler receives the callback (60s timeout) return await Promise.race([ promiseFromEvent(this._uriHandler.event, this.handleUri(scopes)).promise, new Promise((_, reject) =\u0026gt; setTimeout(() =\u0026gt; reject(\u0026#39;Timeout\u0026#39;), 60000)) ]); }); } private handleUri = (scopes) =\u0026gt; async (uri, resolve, reject) =\u0026gt; { const fragment = new URLSearchParams(uri.fragment); // Auth0 uses fragment const token = fragment.get(\u0026#39;access_token\u0026#39;); const state = fragment.get(\u0026#39;state\u0026#39;); if (!this._pendingStates.includes(state)) { reject(new Error(\u0026#39;Invalid state\u0026#39;)); // CSRF defense return; } resolve(token); }; The use of Promise.race() is elegant — it handles whichever arrives first among three outcomes: a successful URI callback, a 60-second timeout, or a user cancellation token.\nConsumer Code Using the registered provider from within the same extension or another extension:\n// Fetch existing session, or prompt login if none exists (createIfNone: true) const session = await vscode.authentication.getSession(\u0026#39;auth0\u0026#39;, [\u0026#39;openid\u0026#39;, \u0026#39;profile\u0026#39;], { createIfNone: true }); if (session) { vscode.window.showInformationMessage(`Welcome, ${session.account.label}!`); // Use session.accessToken for API calls } Protocol Limitations in code-server There is an important real-world constraint here. code-server is an open-source project (76k stars) that runs VS Code in the browser — and the vscode:// protocol does not work in browsers.\nBrowser security policy restricts which schemes navigator.registerProtocolHandler() can register, and vscode:// is not on the allowed list. A code-server maintainer noted:\n\u0026ldquo;I do not think browsers allow handling vscode:// anyway, at best we could do web+vscode:// or web+code-server://.\u0026rdquo;\nThe proposed workaround:\nhttps://code-server-url/protocol-handler?uri=vscode://my-plugin/path code-server itself handles the /protocol-handler route and forwards the URI to the connected client extension. This approach has the added benefit of showing no notification/confirmation popup, making for a cleaner UX.\nIf installed as a PWA, partial support is also possible via protocol_handlers in manifest.json (requires https:// scheme).\ngraph LR subgraph \"Desktop VS Code\" A[\"vscode://ext/callback\"] --\u003e|OS protocol handler| B[VS Code process] B --\u003e C[handleUri called] end subgraph \"code-server (browser)\" D[\"vscode://ext/callback\"] --\u003e|blocked by browser| E[fails] F[\"https://code-server/protocol-handler?uri=...\"] --\u003e|HTTP workaround| G[code-server] G --\u003e H[forwarded to extension via WebSocket] endRemote Tunnels\u0026rsquo; OAuth Mechanism VS Code Remote Tunnels provides access to a remote machine without SSH. It uses GitHub OAuth internally to authenticate the tunnel service:\nRun code tunnel → VS Code Server is installed on the remote machine Connects to Microsoft Azure-based dev tunnels service Generates a vscode.dev/tunnel/\u0026lt;machine_name\u0026gt; URL When a client accesses that URL, they get redirected through github.com/login/oauth/authorize... Tunnel security uses AES-256-CTR end-to-end encryption, and VS Code only makes outbound connections — no listening ports, no firewall rules needed.\nPractical Guide: Which Approach to Choose? Scenario Recommended approach Desktop VS Code + external OAuth registerUriHandler + AuthenticationProvider code-server + OAuth /protocol-handler route workaround or localhost server Accessing internal remote environments Remote Tunnels (only requires a GitHub account) Managing multiple GitHub accounts GitShift extension (mikeeeyy04.gitshift) If building a production AuthenticationProvider, two more things are needed:\nStore only the refresh token and renew access tokens each time (security) Detect refresh token expiry → auto-remove session → prompt re-login Quick Links Elio Struyf — Creating an Authentication Provider for VS Code Elio Struyf — Callback from external sources to VS Code extensions VS Code API — UriHandler reference VS Code — Remote Tunnels official docs coder/code-server — registerUriHandler issue discussion RFC 6750 — OAuth 2.0 Bearer Token Insights Digging into OAuth for VS Code extensions reinforced how much platform boundaries matter. The vscode:// protocol works flawlessly on native desktop, but the moment you cross into the browser boundary, OS-level protocol handling is blocked. The /protocol-handler workaround that code-server proposes is clever — it pushes the problem down to the HTTP layer to sidestep the browser restriction. Meanwhile, seeing how Remote Tunnels elegantly solves the same OAuth problem under a single vscode.dev domain shows what\u0026rsquo;s possible when a platform designer centralizes the OAuth redirect upfront. The context.secrets store exposed by the AuthenticationProvider API looks simple on the surface, but it\u0026rsquo;s a well-designed abstraction over platform-specific Keychain and Credential Manager implementations.\n","date":"2026-02-25T00:00:00+09:00","image":"/images/posts/2026-02-25-vscode-extension-uri-handler-oauth/cover-en.jpg","permalink":"/posts/2026-02-25-vscode-extension-uri-handler-oauth/","title":"VS Code Extension Development: Implementing OAuth with URI Handlers"},{"content":"Overview Database schemas change constantly as projects evolve — adding tables, modifying columns, creating indexes. Managing this manually makes it impossible to answer questions like \u0026ldquo;what changes have been applied to this database?\u0026rdquo; Alembic is a migration tool for SQLAlchemy that lets you version-control schema changes like code and safely apply or roll them back.\ngraph LR A[Model change] --\u003e B[alembic revision] B --\u003e C[Migration script generated] C --\u003e D[alembic upgrade] D --\u003e E[Schema applied to DB] E --\u003e F{Problem?} F --\u003e|Yes| G[alembic downgrade] F --\u003e|No| H[Done] G --\u003e AMigration Environment Structure Running alembic init creates the following directory structure:\nyourproject/ alembic.ini # Main config file (DB URL, logging, etc.) pyproject.toml # Python project config alembic/ env.py # Migration runtime (DB connection, transaction management) README script.py.mako # Template for generating migration scripts versions/ # Actual migration scripts 3512b954651e_add_account.py 2b1ae634e5cd_add_order_id.py 3adcc9a56557_rename_username_field.py Role of Each Key File alembic.ini: Global config — DB URL, logging, script paths. The %(here)s token lets you specify paths relative to the config file location.\nenv.py: The \u0026ldquo;brain\u0026rdquo; of migrations. Controls SQLAlchemy engine creation, DB connection, transaction management, and model imports. Modify this file when you need multi-DB support or custom arguments.\nscript.py.mako: A Mako template that defines the skeleton for new migration files. Customize the structure of the upgrade() and downgrade() functions here.\nversions/: Where the actual migration scripts live. File names use partial GUIDs instead of integer sequences, enabling merges across branches.\nBasic Workflow Step 1: Initialize the Environment cd /path/to/yourproject alembic init alembic Four templates to choose from:\nTemplate Use Case generic Single DB, basic setup pyproject pyproject.toml-based config (v1.16+) async Async DB drivers (asyncpg, etc.) multidb Multi-database environments Step 2: Configure the DB Connection Set the database URL in alembic.ini:\nsqlalchemy.url = postgresql://user:pass@localhost/dbname Note: If the URL contains % characters (e.g., URL-encoded passwords), escape them as %%. Example: p%40ss → p%%40ss\nStep 3: Generate a Migration Script alembic revision -m \u0026#34;add account table\u0026#34; This creates a new migration file in versions/:\n\u0026#34;\u0026#34;\u0026#34;add account table Revision ID: 3512b954651e Revises: 2b1ae634e5cd Create Date: 2026-02-24 12:00:00.000000 \u0026#34;\u0026#34;\u0026#34; def upgrade(): # Write schema change code here pass def downgrade(): # Write rollback code here pass Step 4: Apply the Migration alembic upgrade head # Upgrade to latest version alembic upgrade +2 # Advance 2 steps from current position Step 5: Roll Back alembic downgrade -1 # Roll back 1 step alembic downgrade base # Roll back all migrations Step 6: Check Status alembic current # Show current DB version alembic history # Show full migration history alembic history -r1a:3b # Show history for a specific range Useful Features Partial Revision IDs You don\u0026rsquo;t need to type the full Revision ID in commands — just enough characters to guarantee uniqueness:\nalembic upgrade ae1027a6acf # Full ID alembic upgrade ae1 # This works too (if unique) Post-write Hooks Automatically run a code formatter after generating a migration file:\n[post_write_hooks] hooks = ruff ruff.type = module ruff.module = ruff ruff.options = check --fix REVISION_SCRIPT_FILENAME Connect black, ruff, or similar tools to auto-format generated migration scripts.\npyproject.toml Support Since Alembic 1.16, you can manage configuration directly in pyproject.toml:\nalembic init --template pyproject ./alembic With this setup, source code settings go in pyproject.toml and environment-specific settings like DB connections stay in alembic.ini.\nQuick Links Alembic Tutorial — Official tutorial Alembic Cookbook — Real-world recipes SQLAlchemy — The ORM that Alembic is built on Insights Alembic\u0026rsquo;s core value is treating DB schema changes like code. Just as git log lets you trace code change history, alembic history lets you trace schema change history. In team development, when someone asks \u0026ldquo;when was this table added?\u0026rdquo; or \u0026ldquo;who changed this column?\u0026rdquo;, the migration scripts have the answer. Adopting Alembic early in a project prevents a large accumulation of untracked schema debt later. The GUID-based versioning design — rather than integer sequences — is also worth noting: it enables merges across multiple branches where migrations are created concurrently.\n","date":"2026-02-24T00:00:00+09:00","image":"/images/posts/2026-02-24-alembic-database-migration/cover-en.jpg","permalink":"/posts/2026-02-24-alembic-database-migration/","title":"Alembic — Database Migration with SQLAlchemy"},{"content":"Overview Architectural decisions can make or break a project. Monolithic Architecture (MA) and Microservices Architecture (MSA) each have distinct tradeoffs — and the right question isn\u0026rsquo;t \u0026ldquo;which is better?\u0026rdquo; but \u0026ldquo;which fits the current situation?\u0026rdquo;\ngraph TD A[Architecture decision] --\u003e B{Project scale?} B --\u003e|Small / MVP| C[Monolithic Architecture] B --\u003e|Large / Complex| D[Microservices Architecture] C --\u003e E[Single codebase + single DB] D --\u003e F[Independent services + per-service DB] C --\u003e G[Fast initial development] D --\u003e H[Flexible scaling and deployment]Monolithic Architecture Monolithic architecture places all business logic in a single, unified codebase. Authentication, payments, notifications — every feature lives inside one application.\ngraph LR subgraph \"Single Application\" A[Auth] --- B[Products] B --- C[Payments] C --- D[Notifications] end subgraph \"Infrastructure\" E[(Single DB)] end A --\u003e E B --\u003e E C --\u003e E D --\u003e EAdvantages Advantage Description Fast development Simple codebase and easy integration means faster initial development Easy maintenance Applying changes in a single codebase is straightforward Low infrastructure cost A single application means low operational complexity Easy debugging All code in one place makes tracing problems straightforward No network latency Service communication happens through function calls — no network overhead Unified tech stack The whole team uses the same technology, making onboarding easier Disadvantages Disadvantage Description No partial scaling Can\u0026rsquo;t scale a specific feature — must scale the whole application Full redeployment required Even small changes require redeploying the entire app Tech stack lock-in Adopting new technologies is difficult Growing complexity Codebase becomes unwieldy as the project grows Team conflicts Merge conflicts are frequent when everyone works in the same code Best suited for: Small projects, fast MVP development, when complex business logic isn\u0026rsquo;t needed, systems with infrequent changes\nThe advantages of monolithic architecture are most apparent at small scale. As the project grows, those same advantages tend to flip into disadvantages.\nMicroservices Architecture Microservices architecture splits the application into multiple small, independent services. Each service owns a specific business function and communicates via APIs. It\u0026rsquo;s an architecture designed to match the organizational structure of large development teams.\ngraph TD subgraph \"API Gateway\" GW[Gateway] end subgraph \"Independent Services\" S1[\"Auth Service \u0026lt;br/\u0026gt; Node.js\"] S2[\"Product Service \u0026lt;br/\u0026gt; Python\"] S3[\"Payment Service \u0026lt;br/\u0026gt; Go\"] S4[\"Notification Service \u0026lt;br/\u0026gt; Java\"] end subgraph \"Per-Service Databases\" D1[(Auth DB)] D2[(Product DB)] D3[(Payment DB)] D4[(Notification DB)] end GW --\u003e S1 GW --\u003e S2 GW --\u003e S3 GW --\u003e S4 S1 --\u003e D1 S2 --\u003e D2 S3 --\u003e D3 S4 --\u003e D4Advantages Independent deployment: Each service can be developed, tested, and deployed individually Technology diversity: Choose the optimal tech stack per service Selective scaling: Scale only the services under high demand — e.g., if the news service has 1 user and the webtoon service has 100 million, scale only the webtoon service Fault isolation: A failure in one service doesn\u0026rsquo;t bring down the entire system Easier maintenance: Changes to one service have minimal impact on others Disadvantages Operational complexity: Requires service discovery, centralized logging, distributed tracing Data consistency: Distributed transactions are hard to implement correctly Testing difficulty: Integration tests and E2E tests become significantly more complex System-wide comprehension: Understanding the full system requires more effort Migration cost: Transitioning from monolithic to MSA takes considerable time and resources Network latency: Inter-service communication introduces latency Best suited for: Large and complex systems, teams organized around independent services, systems that need flexible scaling\nSide-by-Side Comparison Dimension Monolithic Microservices Structure Single codebase, strong feature coupling Independent services + API communication, distributed system Deployment Full redeployment Per-service independent deployment Tech stack Unified across all teams Per-service choice Scaling Scale everything or nothing Scale individual services Latency None (in-process function calls) Network latency between services Debugging Easy to trace in a single codebase Requires distributed tracing tools Team structure Well-suited for small teams Suited for independent team organizations Quick Links Monolithic vs MSA Comparison (Korean) — Detailed breakdown of tradeoffs Martin Fowler: Microservices — The definitive conceptual definition of MSA Insights The most common mistake in architecture decisions is \u0026ldquo;MSA is modern, so we should use MSA.\u0026rdquo; Applying microservices to a small project adds unnecessary complexity — service communication, distributed transactions, logging infrastructure — without any real benefit. Conversely, sticking with a monolith as a system scales to millions of users means you can\u0026rsquo;t scale a single feature without scaling everything else, which is massively inefficient. The key is choosing what fits your team size and project complexity right now. Many successful projects start monolithic and migrate to microservices when the need actually arises — the gradual approach works.\n","date":"2026-02-24T00:00:00+09:00","image":"/images/posts/2026-02-24-monolithic-vs-microservices/cover-en.jpg","permalink":"/posts/2026-02-24-monolithic-vs-microservices/","title":"Monolithic vs Microservices — How to Choose the Right Architecture"},{"content":"Overview Building a VS Code extension that only works locally isn\u0026rsquo;t enough anymore. Extensions need to function correctly in Remote Development and GitHub Codespaces environments, and the security of secrets they handle — tokens, API keys — needs to be designed in from the start. This post covers VS Code extension remote architecture, core APIs, secret security risks, and Azure integration patterns.\ngraph TD A[VS Code Extension Development] --\u003e B[Remote Architecture] A --\u003e C[Core API] A --\u003e D[Secret Security] A --\u003e E[Azure Integration] B --\u003e F[UI Extension - runs locally] B --\u003e G[Workspace Extension - runs remotely] D --\u003e H[SecretStorage API] D --\u003e I[Keytar / Keychain]VS Code Remote Extension Architecture UI Extension vs Workspace Extension VS Code distinguishes between two kinds of extensions in remote development scenarios:\nType Runs On Role Examples UI Extension Local machine Contributes to VS Code UI (themes, keymaps, snippets) Color Theme, Vim keybinding Workspace Extension Remote machine File access, tool execution, language servers Python, ESLint, GitLens VS Code analyzes package.json to automatically install extensions in the right location. If auto-detection fails, specify extensionKind explicitly:\n{ \u0026#34;extensionKind\u0026#34;: [\u0026#34;workspace\u0026#34;] } Use the Developer: Show Running Extensions command to see where each extension is actually running.\nKey Issues in Remote Environments 1. Secret storage\nRemote environments don\u0026rsquo;t have access to the local Keychain. VS Code\u0026rsquo;s SecretStorage API handles this correctly regardless of whether the extension is running locally or remotely.\n2. Webview resource paths\nWhen referencing local resources in a Webview, always use asWebviewUri(). File paths differ in remote environments — hardcoding paths will cause resource loading failures.\n3. localhost forwarding\nAccessing localhost ports on the remote machine requires VS Code\u0026rsquo;s port forwarding feature. When Webview needs to use localhost:\nOption 1: Transform the URI with asExternalUri Option 2: Configure port mappings with the portMapping option 4. Extension-to-extension communication\nExtensions running in remote and local contexts cannot directly call each other\u0026rsquo;s APIs. Use VS Code\u0026rsquo;s commands API to communicate instead:\n{ \u0026#34;api\u0026#34;: \u0026#34;none\u0026#34; } Adding this to package.json disables API export and forces command-based communication.\nDebugging Environments Four environments are available for testing remote extensions:\nGitHub Codespaces — Cloud-based development environment Dev Containers — Custom Docker containers SSH — Remote server connection WSL — Windows Subsystem for Linux To test an unpublished extension, generate a VSIX file with vsce package and install it manually.\nVS Code API Core Namespaces The VS Code API Reference documents the full API available for extensions. Key namespaces:\nNamespace Role vscode.authentication Authentication session management vscode.commands Command registration and execution vscode.window Editor, terminal, and notification UI vscode.workspace File system, settings, workspace management vscode.languages Language features (completion, diagnostics, symbols) vscode.debug Debugger integration vscode.env Environment info (clipboard, URI opening) vscode.chat AI/Chat feature integration Common Patterns in Extension Development CancellationToken: Long-running operations should always accept a CancellationToken to support cancellation.\nDisposable: Implement the Disposable interface for resource cleanup and register with context.subscriptions.push().\nEventEmitter: Use EventEmitter\u0026lt;T\u0026gt; to publish custom events.\nVS Code Secret Security — The Hidden Risks According to Cycode\u0026rsquo;s security analysis, VS Code extension secret management carries security risks worth understanding.\nHow VS Code Stores Secrets VS Code uses the OS-native Keychain/Keyring:\nmacOS: Keychain Windows: Credential Manager Linux: libsecret (GNOME Keyring, etc.) Extensions access this storage via context.secrets (the SecretStorage API).\nSecurity Risks 1. Extraction via the Electron process\nVS Code is Electron-based. Certain flags create a path to access secrets:\nELECTRON_RUN_AS_NODE=1 \u0026#34;${electronPath}\u0026#34; \\ --ms-enable-electron-run-as-node \u0026#34;${vscodeDecryptScriptPath}\u0026#34; ${machineId} 2. Exposure through malicious extensions\nInstalled extensions have access to the SecretStorage API. Installing an unverified extension creates a risk of exposing stored tokens.\nSecurity Best Practices Always use the SecretStorage API — never store secrets in environment variables or config files Minimize extension permissions — request only the scopes you need Install only verified extensions — check publisher verification in the Marketplace Rotate tokens regularly — refresh long-lived tokens periodically Azure Resources Extension — Authentication Integration Pattern The Azure Resources extension manages Azure resources from within VS Code. It serves as a useful reference for authentication patterns in extension development.\nAuthentication Flow Click \u0026ldquo;Sign in to Azure\u0026hellip;\u0026rdquo; in the Azure Resources view VS Code\u0026rsquo;s built-in Microsoft authentication provider handles the auth Tenants requiring MFA authenticate separately in the Accounts \u0026amp; Tenants view Multiple Azure accounts can be active simultaneously Key Settings azureResourceGroups.selectedSubscriptions — Filter which subscriptions are displayed Microsoft-sovereign-cloud.environment — Automatically configured for sovereign cloud access (government Azure, etc.) This pattern is a solid reference for implementing external service authentication in your own extensions.\nQuick Links VS Code Remote Extensions Guide — Complete guide to remote development extensions VS Code API Reference — Full API reference Cycode: VS Code Secret Security — Secret extraction risk analysis Azure Resources Extension — Azure integration guide Insights VS Code extension development is no longer about building a plugin that works locally. With Remote Development and Codespaces now standard, extensions must be designed as distributed components that work regardless of execution environment. Understanding the UI Extension vs Workspace Extension split is the first step. For secret management, starting with the SecretStorage API is the only right answer — security can\u0026rsquo;t be bolted on later. And as Cycode\u0026rsquo;s analysis demonstrates, being aware of the secret extraction paths in Electron-based apps is essential knowledge for anyone building extensions that handle credentials.\n","date":"2026-02-24T00:00:00+09:00","image":"/images/posts/2026-02-24-vscode-extension-auth-security/cover-en.jpg","permalink":"/posts/2026-02-24-vscode-extension-auth-security/","title":"VS Code Extension Development — Remote Architecture, Core APIs, and Secret Security"},{"content":"Overview Two topics got serious attention today. First: I built an image generation API on gemini-3-pro-image-preview and had questions — resolution pricing tiers, Thought Signatures, new parameters — so I went through the Gemini 3 official docs to get answers. Second: I explored Mermaid.js as an architecture documentation tool and put together a syntax reference for the main diagram types.\nGemini 3 Model Family and Pricing Gemini 3 is still in preview, but it\u0026rsquo;s usable in production. Here are the specs by model:\nModel ID Context (In/Out) Pricing (Input/Output) gemini-3.1-pro-preview 1M / 64k $2 / $12 (under 200k tokens) gemini-3-pro-preview 1M / 64k $2 / $12 (under 200k tokens) gemini-3-flash-preview 1M / 64k $0.50 / $3 gemini-3-pro-image-preview 65k / 32k $2 (text input) / $0.134 (per output image) For the image model, $0.134 per output image is the baseline, but cost scales with resolution. 1K is the default; 4K costs more. Refer to the separate pricing page for resolution-by-resolution details.\nNano Banana Pro — Gemini 3\u0026rsquo;s Native Image Generation Google officially uses the codename \u0026ldquo;Nano Banana\u0026rdquo; for Gemini\u0026rsquo;s native image generation capability. There are two variants:\nNano Banana: gemini-2.5-flash-image — speed and efficiency focused, suited for high-volume processing Nano Banana Pro: gemini-3-pro-image-preview — production-quality assets, Thinking-based high quality What sets Gemini 3 Pro Image apart from the older Imagen is that reasoning (Thinking) is integrated into the image generation process. With a complex prompt, the model internally generates up to two \u0026ldquo;thought images\u0026rdquo; to verify composition and logic before producing the final image. These intermediate images are not billed.\nNew Capabilities 1. Up to 14 reference images\ngemini-3-pro-image-preview accepts up to 14 reference images:\nHigh-resolution object images: up to 6 Character consistency: up to 5 This enables generating varied scenes while maintaining visual consistency for a specific product or character.\n2. Resolution control — 1K / 2K / 4K\nDefault output is 1K. Specify image_size in generation_config to go higher. Important: uppercase K is required — 1k will return an error.\ngeneration_config = { \u0026#34;image_size\u0026#34;: \u0026#34;2K\u0026#34; # \u0026#34;1K\u0026#34;, \u0026#34;2K\u0026#34;, \u0026#34;4K\u0026#34; supported. Lowercase not accepted! } 3. Google Search Grounding\nConnect the google_search tool to generate images based on real-time information — weather forecast charts, stock price graphs, infographics from recent news. Note: image-based search results are not passed to the generation model and are excluded from responses.\nWrapping the API with FastAPI I tested a Hybrid Image Search API running at localhost:8000 today via its Swagger UI. It\u0026rsquo;s a FastAPI server using gemini-3-pro-image-preview as the backend, with /api/generate_image as the core endpoint. It receives an image prompt, calls the Gemini API, and returns the result.\ngraph LR Client --\u003e|POST /api/generate_image| FastAPI FastAPI --\u003e|generateContent| Gemini3ProImage[gemini-3-pro-image-preview] Gemini3ProImage --\u003e|image + thought_signature| FastAPI FastAPI --\u003e|base64 image| ClientThe response schema in Swagger UI includes a thought_signature field. For multi-turn editing sessions, you need to include this value in subsequent requests.\nThought Signatures — The Key to Multi-Turn Editing When you first start using the image generation API, Thought Signatures are the most confusing part. Understanding them makes it clear why multi-turn (conversational) image editing works the way it does.\nA Thought Signature is an encrypted string representing the model\u0026rsquo;s internal reasoning process. When the model generates an image, the response includes a thought_signature field — and you must send that value back with your next request. This is how the model remembers the composition and logic of the previous image when editing it.\nImage generation request → response includes thought_signature → \u0026#34;Change the background to a sunset\u0026#34; + thought_signature sent together → Model edits while maintaining compositional context Strict validation is enforced for image generation/editing — omit the signature and you get a 400 error. The official Python/Node/Java SDKs handle this automatically when you pass chat history through. You only need to manage it manually when using raw REST without an SDK.\nMigration Notes from Gemini 2.5 If you\u0026rsquo;re using an existing Gemini 2.5 conversation trace or injecting custom function calls, you won\u0026rsquo;t have a valid signature. You can work around this with a dummy value:\n\u0026#34;thoughtSignature\u0026#34;: \u0026#34;context_engineering_is_the_way to_go\u0026#34; New API Parameters in Gemini 3 thinking_level — Controls reasoning depth\nLevel Description minimal Flash only. Minimum thinking, minimum latency low Follows simple instructions; suitable for high-throughput apps medium Balanced reasoning high Default. Maximum reasoning; responses may be slower Using thinking_level and the legacy thinking_budget parameter together causes a 400 error.\nmedia_resolution — Controls multimodal vision processing precision\nFor image analysis, media_resolution_high (1120 tokens/image) is recommended. For PDFs, use media_resolution_medium (560 tokens). This gives you explicit control over the cost/quality tradeoff.\nTemperature warning: Gemini 3 is optimized for the default value of 1.0. If you have existing code that sets a low temperature for deterministic output, remove it. Low temperatures can cause loops and performance degradation.\nLLM Token and Cost Calculators When estimating image generation costs, you need to account for both text tokens and per-image output costs. Useful tools:\ntoken-calculator.net — Token count and cost estimation for GPT, Claude, Gemini, and others. Updated through 2026 models. OpenAI Tokenizer — Official OpenAI tokenizer. Visualizes exactly how text gets split into tokens. For Gemini 3 Pro Image at $0.134 per output image (with additional cost for higher resolutions), production environments with high-volume image generation should look at the Batch API — it offers higher rate limits in exchange for up to 24-hour delays.\nMermaid.js — Diagrams from Text Mermaid.js is a JavaScript library for defining diagrams in a Markdown-like text syntax. GitHub, GitLab, Notion, and this blog (Hugo) can all render SVG diagrams from a single code block. The core advantage: keep architecture documentation in the codebase, versioned alongside the code — no separate drawing tool needed.\nUsage is simple: write your diagram definition inside a ```mermaid code block.\nFlowchart — The Most Versatile Diagram Use for flow diagrams, decision trees, and system architecture. Declare direction on the first line.\ngraph TD %% Top → Down graph LR %% Left → Right graph BT %% Bottom → Top graph RL %% Right → Left Node shapes\nA[Rectangle] B(Rounded corners) C([Stadium]) D[[Subroutine]] E[(Cylinder / DB)] F((Circle)) G{Diamond / Decision} H{{Hexagon}} I[/Parallelogram/] J[\\Reverse parallelogram\\] Edge types\nA --\u0026gt; B %% Arrow A --- B %% Line only A -.- B %% Dotted line A ==\u0026gt; B %% Thick arrow A --\u0026gt;|label| B %% Labeled arrow A --o B %% Circle end A --x B %% X end Subgraphs\ngraph LR subgraph Backend API --\u0026gt; DB end subgraph Frontend UI --\u0026gt; API end Example — Gemini image generation flow:\ngraph TD A[User prompt] --\u003e B{Resolution?} B --\u003e|1K| C[image_size: 1K] B --\u003e|2K| D[image_size: 2K] B --\u003e|4K| E[image_size: 4K] C \u0026 D \u0026 E --\u003e F[gemini-3-pro-image-preview] F --\u003e G[Thinking: generates up to 2 thought images] G --\u003e H[Final image output] H --\u003e I[Returns thought_signature] I --\u003e|Reuse for multi-turn editing| FSequence Diagram — Service Communication Flow Use for API call sequences, authentication flows, and inter-service message flows in microservices.\nBasic syntax\nsequenceDiagram participant A as Client participant B as Server participant C as DB A-\u0026gt;\u0026gt;B: Request (solid arrow) B--\u0026gt;\u0026gt;A: Response (dashed arrow) A-)B: Async (open arrow) 10 arrow types\nSyntax Meaning -\u0026gt; Solid line, no arrowhead --\u0026gt; Dashed line, no arrowhead -\u0026gt;\u0026gt; Solid line, with arrowhead --\u0026gt;\u0026gt; Dashed line, with arrowhead \u0026lt;\u0026lt;-\u0026gt;\u0026gt; Solid line, bidirectional -x Solid line, X end (async) -) Solid line, open arrowhead (async) Activation boxes\nsequenceDiagram A-\u0026gt;\u0026gt;+B: Start request B--\u0026gt;\u0026gt;-A: Response (shows B\u0026#39;s active period) Loop, alt, and par\nloop Retry 3 times A-\u0026gt;\u0026gt;B: Request end alt Success B--\u0026gt;\u0026gt;A: 200 OK else Failure B--\u0026gt;\u0026gt;A: 500 Error end par Parallel A-\u0026gt;\u0026gt;B: Task 1 and A-\u0026gt;\u0026gt;C: Task 2 end Notes and background highlighting\nNote right of A: Token validation here Note over A,B: Note spanning two participants rect rgb(200, 220, 255) A-\u0026gt;\u0026gt;B: Highlighted section end Class Diagram — OOP Design Documentation Represents class structures, inheritance relationships, and interfaces.\nClass definition and members\nclassDiagram class Animal { +String name -int age #String species +speak() String +move()* void %% abstract +clone()$ Animal %% static } Member visibility: + public, - private, # protected, ~ package Classifiers: * abstract, $ static\nGeneric types\nclass Stack~T~ { +push(item: T) +pop() T +peek() T } Relationship types\nSyntax Relationship Notes A \u0026lt;|-- B Inheritance B inherits from A A *-- B Composition B is part of A A o-- B Aggregation B belongs to A A --\u0026gt; B Association A uses B A ..\u0026gt; B Dependency A depends on B A ..|\u0026gt; B Realization A implements B\u0026rsquo;s interface Cardinality\nclassDiagram Customer \u0026#34;1\u0026#34; --\u0026gt; \u0026#34;0..*\u0026#34; Order : places Order \u0026#34;1\u0026#34; *-- \u0026#34;1..*\u0026#34; OrderItem : contains ER Diagram — Database Schema Entity-relationship diagrams for documenting database design.\nBasic syntax\nerDiagram CUSTOMER ||--o{ ORDER : places ORDER ||--|{ LINE-ITEM : contains CUSTOMER { string name PK string email UK int age } ORDER { int id PK date created_at int customer_id FK } Cardinality notation\nLeft Right Meaning |o o| Zero or one || || Exactly one }o o{ Zero or more }| |{ One or more Identifying relationships use solid lines (--); non-identifying use dashed lines (..).\nTips %% is a comment in all diagram types direction TB/LR changes direction in most diagram types Node IDs cannot contain spaces — use [Text] for labels Complex diagrams: use Mermaid Live Editor for real-time preview Quick Links Gemini API — Nano Banana Image Generation — Official image generation guide with prompting strategies and code examples Gemini 3 Developer Guide — Full Gemini 3 API guide (pricing, parameters, migration) Token Calculator — LLM token count and cost estimator OpenAI Tokenizer — Tokenizer visualization tool Mermaid.js — Official docs (Flowchart, Sequence, Class, ER syntax reference) Mermaid Live Editor — Real-time browser preview Insights Today\u0026rsquo;s two topics share a common thread: expressing complex things in text. Gemini 3 Pro Image generates images from text prompts, then serializes the editing session\u0026rsquo;s context back to text via the Thought Signature mechanism. Mermaid.js expresses visual concepts — architecture, data flow — in text syntax so they can be version-controlled alongside code. As the FastAPI server wrapping Gemini image generation grows more complex, Mermaid\u0026rsquo;s Flowchart and Sequence diagrams become a practical way to reduce the communication overhead. Each diagram type has a clear use case: Flowchart for process flows, Sequence for API communication, ER for data models — the skill is knowing which to reach for.\n","date":"2026-02-20T00:00:00+09:00","image":"/images/posts/2026-02-20-tech-log/cover-en.jpg","permalink":"/posts/2026-02-20-tech-log/","title":"Gemini 3 Image Generation API + Mermaid.js Diagram Syntax"},{"content":"Overview AI coding tools are getting more powerful by the day, but systematically managing and injecting project context remains a hard problem. Cole Medin\u0026rsquo;s Archon tackles this with an MCP server pattern that turns your knowledge base into a first-class citizen for AI assistants.\nWhat Is Archon? Archon is a command center for AI coding assistants. With over 13,700 GitHub stars, it connects to tools like Claude Code, Cursor, and Windsurf via the MCP (Model Context Protocol) and provides them with a custom knowledge base and task management system.\nFrom the user\u0026rsquo;s side, it\u0026rsquo;s a clean web UI for managing knowledge and tasks. From the AI assistant\u0026rsquo;s side, it\u0026rsquo;s an MCP server that exposes that same knowledge and those same tasks as structured context.\nArchitecture graph LR A[UI :3737] --\u003e B[Server :8181] B --\u003e C[MCP Server :8051] C --\u003e D[Claude Code / Cursor / Windsurf] B --\u003e E[Supabase DB]Archon is composed of three microservices:\nServer (Python): Core API and business logic — handles web crawling, PDF uploads, and RAG (Retrieval-Augmented Generation) search MCP Server: The protocol interface that AI coding assistants connect to UI (TypeScript): Web interface for managing the knowledge base, projects, and tasks The whole stack spins up with a single docker compose command, using Supabase as the backend database.\nKey Features Document management: Build a knowledge base by crawling websites or uploading PDFs and documents Smart search: Advanced RAG strategies to surface relevant content Task management: Project and task tracking integrated directly with the knowledge base Real-time updates: Content added to the knowledge base is immediately available to AI assistants Tech Stack Area Technology Backend Python (2.3M+ LOC) Frontend TypeScript (1.8M+ LOC) Database Supabase (PostgreSQL + PLpgSQL) Infra Docker, Make LLM OpenAI, Gemini, Ollama, OpenRouter Recent addition of OpenRouter embedding support means you can swap models freely without vendor lock-in.\nSetup git clone -b stable https://github.com/coleam00/archon.git cd archon cp .env.example .env # Add your Supabase credentials to .env docker compose up --build -d After setup, visit http://localhost:3737 and follow the onboarding flow to configure your API keys.\nResources The OFFICIAL Archon Guide (23 min) — Installation through real-world workflows GitHub Discussions — Community Archon Kanban Board — Development roadmap Insights The MCP server pattern that Archon demonstrates points toward where AI coding tooling is headed. Beyond just generating code, the key challenge is systematically managing project context and knowledge, then injecting it into AI systems. \u0026ldquo;Context Engineering\u0026rdquo; is becoming an increasingly important discipline, and Archon is a practical, working implementation of that idea.\n","date":"2026-02-19T00:00:00+09:00","image":"/images/posts/2026-02-19-archon-ai-coding-command-center/cover-en.jpg","permalink":"/posts/2026-02-19-archon-ai-coding-command-center/","title":"Archon — The Command Center for AI Coding Assistants"},{"content":"Overview I did a thorough comparison of Hugo blog themes while planning a blog refresh, settling on GitHub Pages as the deployment target. PaperMod and Stack got the most attention, but I surveyed over eight themes in total and mapped out the full setup workflow for Hugo + GitHub Pages.\nHugo Theme Comparison I explored a range of themes at themes.gohugo.io.\nPaperMod — The Most Popular Choice hugo-PaperMod | 13,100+ stars | 3,300+ forks\nPaperMod is the most popular theme in the Hugo ecosystem. It bills itself as \u0026ldquo;Fast, Clean, Responsive\u0026rdquo; and runs on pure Hugo features — no webpack or Node.js dependencies.\nKey features:\nThree layout modes: Regular, Home-Info, and Profile Client-side search powered by Fuse.js Multilingual support and SEO optimization Automatic light/dark theme switching Code block copy button, auto-generated table of contents Breadcrumb navigation Notable recent changes:\nllms.txt support added — an emerging standard that lets LLMs efficiently index blog content Theme detection logic refactored into head.html for faster script execution Live demo | Installation guide\nStack — Card-Style Blogger Theme hugo-theme-stack | 6,200+ stars | 1,900+ forks\nStack is a card-style layout theme built specifically for bloggers. It\u0026rsquo;s the right choice when you want a visually rich blog.\nNotable recent changes:\nMarkdown Alert support (GitHub-style \u0026gt; [!NOTE], \u0026gt; [!WARNING], etc.) Generic taxonomy widget refactored for better extensibility Custom canonical URL configuration added Expanded i18n support Live demo | Documentation\nTheme Comparison Summary Theme Stars Character Best For PaperMod 13.1K Minimal, fast, SEO-optimized Tech blogs, portfolios Stack 6.2K Card UI, visually rich General blogs, photo blogs Coder - Extremely minimal Developer portfolios Book - Docs with sidebar Technical documentation sites Docsy - Google-backed, large-scale Corporate technical docs Terminal - Retro terminal style Developer blogs with personality Blox-Tailwind - Tailwind CSS-based Modern design blogs Compose - Clean, multi-purpose General-purpose blogs Hugo + GitHub Pages Setup Guide I used Integerous\u0026rsquo;s guide as a reference for the setup workflow.\nWhy Hugo? Jekyll — Ruby-based, most popular, good Korean docs, slow builds Hexo — Node.js-based, strong Chinese community, slow development activity Hugo — Go-based, fastest builds, well-documented, fewer Korean references Hugo wins on build speed with no runtime dependencies, and its documentation is excellent.\nThe Build Flow graph TD A[hugo new site blog] --\u003e B[Add theme as submodule] B --\u003e C[Configure config.toml] C --\u003e D[hugo new post/my-post.md] D --\u003e E[Preview locally with hugo server] E --\u003e F[Build with hugo → generates public/] F --\u003e G[Push public/ to username.github.io] G --\u003e H[Push source to blog repo]Key Points 1. Two repositories\nblog — Hugo source files username.github.io — the built static site for deployment 2. Always use git submodules for themes\n# Submodule is recommended over cloning git submodule add https://github.com/theme/repo.git themes/theme-name This makes it easy to pull theme updates and you won\u0026rsquo;t lose the theme if your environment changes. Best practice is to fork the theme repo first, then add your fork as the submodule.\n3. Automate deployment with deploy.sh A single shell script handles build → commit/push public/ → commit/push source.\n4. Utterances for comments A comment system built on the GitHub Issues API. Readers can comment using their GitHub account — no separate server required.\nQuick Links Hugo Themes Gallery PaperMod Wiki Stack Documentation Homebrew — macOS package manager (brew install hugo) VS Code Homebrew Cask Insights When choosing a Hugo theme, the most important factor isn\u0026rsquo;t \u0026ldquo;does it look great right now\u0026rdquo; — it\u0026rsquo;s \u0026ldquo;is the community active, and is the project being maintained?\u0026rdquo; PaperMod\u0026rsquo;s llms.txt support is a good example: active projects evolve with the times. The submodule pattern for managing themes isn\u0026rsquo;t Hugo-specific either — it\u0026rsquo;s a broadly applicable approach for safely integrating external dependencies into any project.\n","date":"2026-02-19T00:00:00+09:00","image":"/images/posts/2026-02-19-hugo-theme-comparison-blog-setup/cover-en.jpg","permalink":"/posts/2026-02-19-hugo-theme-comparison-blog-setup/","title":"Hugo Theme Comparison \u0026 GitHub Pages Setup Guide"},{"content":"Overview Every day I browse through countless technical docs and GitHub repos, but that exploration disappears the moment I close the tab. log-blog is a Python CLI tool that reads Chrome browsing history and automatically converts it into Hugo-compatible blog posts.\nWhat Is log-blog? ice-ice-bear/log-blog automates the \u0026ldquo;explore → organize → share\u0026rdquo; cycle. It extracts data from Chrome\u0026rsquo;s SQLite history database, fetches content from each URL using Playwright, converts it to Hugo-compatible markdown, and commits the result to the blog repository.\nPipeline Structure graph TD A[Chrome History SQLite DB] --\u003e|extract| B[URL + Title + Timestamp JSON] B --\u003e|AI classify| C[Tech vs Non-tech] C --\u003e|fetch| D[Enriched Content] D --\u003e|AI write| E[Hugo Markdown Post] E --\u003e|publish| F[Git Commit to GitHub Pages]Step 1: Extract log-blog extract --json --hours 24 Reads the recent N hours of visit history from Chrome\u0026rsquo;s SQLite history DB. Outputs URL, title, visit count, and last visit time as JSON.\nStep 2: Classify Integrated with the Claude Code skill system, AI classifies each URL as tech or non-tech, then groups them into YouTube, GitHub, and Docs/Web categories.\nStep 3: Fetch log-blog fetch --json \u0026#34;URL1\u0026#34; \u0026#34;URL2\u0026#34; \u0026#34;URL3\u0026#34; Content is collected using a strategy appropriate to the URL type:\nURL Type What\u0026rsquo;s Collected YouTube Full transcript text (Korean preferred) GitHub repo Description, stars, language, README, recent commits GitHub PR Title, state, body, diff stats, comments GitHub issue Title, state, labels, body, comments Web page Full text, heading structure, code blocks Step 4: Write \u0026amp; Publish AI writes a technical blog post from the collected content, then the publish command commits it to the blog repository.\nlog-blog publish post.md # Local commit only log-blog publish post.md --push # Commit + push Tech Stack src/log_blog/ cli.py # CLI entry point (extract, fetch, publish) config.py # YAML config loader history_reader.py # Chrome SQLite history reader content_fetcher.py # Playwright-based content extractor post_generator.py # Hugo markdown post generator publisher.py # Git commit/push Python 3.12+ — primary language Playwright — browser automation for dynamic page content SQLite — direct access to Chrome\u0026rsquo;s history DB Claude Code Skill — AI classification, summarization, and writing Configuration chrome: profiles: [\u0026#34;Default\u0026#34;] history_db_base: \u0026#34;~/Library/Application Support/Google/Chrome\u0026#34; time_range_hours: 24 blog: repo_path: \u0026#34;~/Documents/github/ice-ice-bear.github.io\u0026#34; content_dir: \u0026#34;content/posts\u0026#34; language: \u0026#34;auto\u0026#34; playwright: headless: true timeout_ms: 15000 max_concurrent: 5 Quick Links mindai/mega-code PR #26 — add_behavioral_validation (Upskill) mindai/megaupskill — MegaUpskill project Insights The core value of log-blog is \u0026ldquo;turning exploration itself into content.\u0026rdquo; The technical browsing that happens every day in a browser is already a learning process — but without capturing it, it vanishes. This tool automatically captures and structures that process. In fact, this very post was produced by that pipeline — Chrome history extraction → AI classification → content fetch → AI writing → blog deployment.\n","date":"2026-02-19T00:00:00+09:00","image":"/images/posts/2026-02-19-log-blog-browser-history-automation/cover-en.jpg","permalink":"/posts/2026-02-19-log-blog-browser-history-automation/","title":"log-blog — Turning Browser History into Blog Posts Automatically"},{"content":"Overview Today I explored Hugo blog themes extensively while planning a blog refresh, worked through the Vibe Coding Essentials book to sharpen my Claude Code workflow, discovered Archon — a new tool for AI coding workflows — and looked into aiosqlite for async SQLite access in Python.\nHighlights Hugo Theme Deep-Dive — Comparing 10 Themes for a Blog Refresh I spent time in the Hugo Themes Gallery comparing themes for an upcoming blog refresh. I visited the demo sites for 10 themes and here\u0026rsquo;s what stood out.\nBlog themes:\nPaperMod (★13,116) — The most popular Hugo theme. Fast, clean, and responsive. Supports three layout modes (Regular, Home-Info, Profile), automatic dark/light switching, SEO optimization, and Fuse.js-powered search. No external dependencies like webpack or Node.js required for theme customization — a big plus. Stack (★6,261) — A card-style theme built for bloggers. Visually polished layout, with docs in Korean, English, and Chinese. GPL-3.0 license. Coder (★3,031) — Simple, clean personal blog theme with dark mode. MIT license. Terminal (★2,680) — Retro terminal aesthetic. Great for developers who want personality in their blog. Documentation and portfolio themes:\nBlox Tailwind (★10,025) — 50+ color themes and widgets included. Works for company sites, portfolios, and blogs. Book (★3,953) — Clean book-style documentation theme. Docsy (★2,903) — Dedicated theme for technical documentation sites. Apache 2.0 license. Compose, Bootstrap — Clean documentation-style and Bootstrap-based themes respectively. For the deployment side, this Hugo + GitHub Pages guide compares Jekyll, Hexo, and Hugo as static site generators, then covers why Hugo wins on build speed (Go-based, no external dependencies), the GitHub Pages deployment process, setting up Utterances for comments, and managing themes as git submodules.\nArchon — Knowledge Hub for AI Coding Assistants Archon is a knowledge and task management platform for AI coding assistants. It runs as an MCP (Model Context Protocol) server and connects to Claude Code, Cursor, Windsurf, and other AI coding tools.\nCore capabilities:\nKnowledge management: Website crawling, PDF/document uploads, automatic code example extraction, vector search-based RAG Project/task management: Hierarchical project structure with AI-assisted task creation Microservice architecture: Frontend (React+Vite, port 3737), API Server (FastAPI, port 8181), MCP Server (port 8051), Agents (PydanticAI, port 8052) Spins up with Docker Compose. Uses Supabase (PostgreSQL + PGVector) as the database. Supports OpenAI, Ollama, Google Gemini, and other LLMs, with advanced RAG strategies including hybrid search and result re-ranking.\nCole Medin\u0026rsquo;s YouTube guide shows real AI coding workflow examples.\nVibe Coding Essentials with Claude Code Worked through sections 7–10 of Chapter 02 in Weniv Books\u0026rsquo; Vibe Coding Essentials. The chapter covers practical development patterns with Claude Code — a hands-on guide to using AI coding tools effectively.\nPython aiosqlite — Async SQLite Studied a guide to aiosqlite for async SQLite access in Python. The standard sqlite3 module is synchronous, which means DB operations block the event loop in an async context. aiosqlite fixes this — it wraps sqlite3 and lets you run DB operations without blocking other coroutines. The API is nearly identical to sqlite3; just add async with and await.\nimport aiosqlite import asyncio async def main(): async with aiosqlite.connect(\u0026#39;example.db\u0026#39;) as con: cur = await con.cursor() await cur.execute(\u0026#39;SELECT * FROM stocks WHERE symbol=:symbol\u0026#39;, {\u0026#39;symbol\u0026#39;: \u0026#39;RHAT\u0026#39;}) data = await cur.fetchall() print(data) asyncio.run(main()) Quick Links Homebrew + VS Code via Homebrew — macOS dev environment setup AWS EC2 (ap-northeast-2) — Frontend deployment and instance management AI/dev YouTubers worth following: Cole Medin, Corbin Brown, Rok Benko, Fabio Bergmann Insights Two clear threads ran through today\u0026rsquo;s browsing. First, blog infrastructure renewal — comparing 10 Hugo themes and revisiting the GitHub Pages deployment workflow shows a drive to run the blog more systematically. PaperMod and Stack got the most attention. Second, leveling up the AI coding workflow — exploring Archon, the Vibe Coding Essentials book, and several AI dev YouTubers all point toward moving beyond casual AI tool use toward structured knowledge management and integrated workflows. Archon\u0026rsquo;s MCP server approach looks particularly useful in environments where multiple AI coding tools are running in parallel.\n","date":"2026-02-19T00:00:00+09:00","image":"/images/posts/2026-02-19-tech-log/cover-en.jpg","permalink":"/posts/2026-02-19-tech-log/","title":"Tech Log: 2026-02-19"}]