<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Llm Reliability on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/llm-reliability/</link><description>Recent content in Llm Reliability on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 28 May 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/llm-reliability/index.xml" rel="self" type="application/rss+xml"/><item><title>The Orchestration Layer Is Racing Ahead of Its Primitives — bmad-method and Two Bugs Filed the Same Week</title><link>https://ice-ice-bear.github.io/posts/2026-05-28-orchestration-ahead-of-primitives/</link><pubDate>Thu, 28 May 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-05-28-orchestration-ahead-of-primitives/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post The Orchestration Layer Is Racing Ahead of Its Primitives — bmad-method and Two Bugs Filed the Same Week" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;The agentic coding ecosystem is moving in two directions at once. Frameworks like &lt;a class="link" href="https://github.com/bmad-code-org/bmad-method" target="_blank" rel="noopener"
 &gt;bmad-method&lt;/a&gt; pile more personas and workflows on top of a base coding agent, growing the orchestration layer. Yet two bugs filed the same week — an OAuth crash in &lt;a class="link" href="https://github.com/openai/codex" target="_blank" rel="noopener"
 &gt;OpenAI Codex&lt;/a&gt; and permanent session corruption in &lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; — show that the primitives underneath are still brittle.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 A["Orchestration layer &amp;lt;br/&amp;gt; more agents more workflows"] --&gt; B["bmad-method v6 &amp;lt;br/&amp;gt; 12+ personas Party Mode"]
 A --&gt; C["Base coding agents &amp;lt;br/&amp;gt; Claude Code / Codex / Cursor"]
 C --&gt; D["Codex OAuth &amp;lt;br/&amp;gt; NoneType not iterable"]
 C --&gt; E["Claude Code session &amp;lt;br/&amp;gt; thinking empty text signature kept"]
 D --&gt; F["Whole team blocked &amp;lt;br/&amp;gt; HTTP status None"]
 E --&gt; G["400 on resume &amp;lt;br/&amp;gt; session poisoned forever"]
 B -.filed the same week.-&gt; D
 B -.filed the same week.-&gt; E&lt;/pre&gt;&lt;h2 id="building-up-bmad-methods-orchestration"&gt;Building Up: bmad-method&amp;rsquo;s Orchestration
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/bmad-code-org/bmad-method" target="_blank" rel="noopener"
 &gt;bmad-method&lt;/a&gt; — short for &amp;ldquo;Breakthrough Method for Agile AI-Driven Development&amp;rdquo; — is an MIT-licensed &lt;a class="link" href="https://nodejs.org" target="_blank" rel="noopener"
 &gt;JavaScript&lt;/a&gt; project with more than 48,000 GitHub stars. Instead of calling a single coding agent, the core idea is to make multiple role-separated AI personas collaborate. It defines 12-plus specialized personas — PM, Architect, Developer, UX, and others — and even offers a &lt;strong&gt;Party Mode&lt;/strong&gt; where several personas talk in one session.&lt;/p&gt;
&lt;p&gt;V6 leads with &lt;strong&gt;scale-adaptive planning&lt;/strong&gt; that tunes planning depth to the size of the task. The module ecosystem is broad too: a BMM core packing 34-plus workflows ships alongside BMad Builder, Test Architect, Game Dev, and a Creative Intelligence Suite. Installation is a single &lt;code&gt;npx bmad-method install&lt;/code&gt;, which drops the framework into a base agent such as &lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; or &lt;a class="link" href="https://cursor.com" target="_blank" rel="noopener"
 &gt;Cursor&lt;/a&gt;. The package lives on &lt;a class="link" href="https://www.npmjs.com/package/bmad-method" target="_blank" rel="noopener"
 &gt;npm&lt;/a&gt;, and the docs sit at &lt;a class="link" href="https://docs.bmad-method.org" target="_blank" rel="noopener"
 &gt;docs.bmad-method.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The key point is that bmad-method does &lt;em&gt;not replace&lt;/em&gt; the base agent — it is a coordination layer stacked on top. For 12 personas and 34 workflows to run smoothly, the model calls and session-state handling beneath them have to be solid first. And it is precisely down there that two failures surfaced the same week.&lt;/p&gt;
&lt;h2 id="breaking-below-part-1-codex-oauth-spits-a-nonetype"&gt;Breaking Below, Part 1: Codex OAuth Spits a NoneType
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/openai/codex/issues/24665" target="_blank" rel="noopener"
 &gt;OpenAI Codex issue #24665&lt;/a&gt; (now CLOSED) is a case where an entire team lost access to &lt;a class="link" href="https://github.com/openai/codex" target="_blank" rel="noopener"
 &gt;Codex&lt;/a&gt; at once. The auth path is central: the setup used ChatGPT/Codex &lt;strong&gt;OAuth&lt;/strong&gt;, not an API key. The logs showed provider &lt;code&gt;openai-codex&lt;/code&gt;, model &lt;code&gt;gpt-5.5&lt;/code&gt;, endpoint &lt;code&gt;[&amp;quot;chatgpt.com/backend-api/codex&amp;quot;]&lt;/code&gt;, and this error:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;TypeError: &amp;#39;NoneType&amp;#39; object is not iterable
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;HTTP status: None
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Non-retryable client error (HTTP None). Aborting.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The shape of the symptom is telling. The HTTP status is not a number but &lt;code&gt;None&lt;/code&gt;, and the error is classified non-retryable, so it aborts immediately. This is the classic pattern of &lt;strong&gt;the backend returning null/malformed data, or the client failing to handle a missing field&lt;/strong&gt;, then blowing up at parse time when it tries to iterate over &lt;code&gt;None&lt;/code&gt;. That the HTTP layer never even populated a status code suggests the failure happened before body validation — at the stage of constructing the response object itself.&lt;/p&gt;
&lt;p&gt;The most painful part is the blast radius. Not one user, but the whole team sharing the OAuth was blocked simultaneously. No matter how cleanly an orchestration framework partitions personas, if all of those personas ultimately call the model through the same OAuth backend, a single mishandled &lt;code&gt;None&lt;/code&gt; from that backend stalls the entire stack above it.&lt;/p&gt;
&lt;h2 id="breaking-below-part-2-how-claude-code-poisons-a-session-forever"&gt;Breaking Below, Part 2: How Claude Code Poisons a Session Forever
&lt;/h2&gt;&lt;p&gt;The technically more interesting one is &lt;a class="link" href="https://github.com/anthropics/claude-code/issues/63147" target="_blank" rel="noopener"
 &gt;Claude Code issue #63147&lt;/a&gt; (OPEN, Claude Code 2.1.153). A session that combined extended thinking with tool calls becomes &lt;strong&gt;permanently broken when you resume or continue it.&lt;/strong&gt; Once it starts, a new prompt — even a no-op — returns the identical 400.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-mysql" data-lang="mysql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="n"&gt;redacted_thinking&lt;/span&gt;&lt;span class="o"&gt;`&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;assistant&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cannot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;be&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;modified&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The root cause is in how the transcript is persisted. Claude Code stores thinking blocks in the session transcript jsonl (&lt;code&gt;[&amp;quot;projects/&amp;lt;slug&amp;gt;/&amp;lt;id&amp;gt;.jsonl&amp;quot;]&lt;/code&gt;), but it &lt;strong&gt;empties the &lt;code&gt;thinking&lt;/code&gt; text to &lt;code&gt;&amp;quot;&amp;quot;&lt;/code&gt; while retaining the original &lt;code&gt;signature&lt;/code&gt; field.&lt;/strong&gt; A block on disk looks like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;type&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;thinking&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;thinking&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nt"&gt;&amp;#34;signature&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;lt;base64, ~600-4000 chars&amp;gt;&amp;#34;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;On resume, Claude Code replays this block to the API verbatim — &lt;code&gt;{type:&amp;quot;thinking&amp;quot;, thinking:&amp;quot;&amp;quot;, signature:&amp;lt;original&amp;gt;}&lt;/code&gt;. But the signature was &lt;strong&gt;computed over the original, non-empty thinking text.&lt;/strong&gt; The API validates the signature against the (now empty) text, the two no longer match, and it returns 400. Because the original text is already gone from disk, there is no way to reconstruct the request into a valid form — &lt;strong&gt;the session is permanently poisoned.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The reporter dumps a broken transcript with &lt;code&gt;jq&lt;/code&gt; and shows every thinking block has text length 0 but a signature of hundreds to thousands of characters:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;$ jq ... &amp;#39;select(.type==&amp;#34;thinking&amp;#34;)|[(.thinking|length),(.signature|length)]&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;0 3932
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;0 1196
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;0 620
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The scarier detail: across many seemingly healthy sessions, the &lt;em&gt;trailing&lt;/em&gt; thinking block is also frequently stored in the same &amp;ldquo;empty text plus retained signature&amp;rdquo; state. In other words, a large number of perfectly fine-looking sessions are &lt;strong&gt;latent landmines that detonate the moment they are resumed.&lt;/strong&gt; The proposed fixes split three ways — (1) persist the full signed thinking text so the signed block round-trips intact, (2) drop thinking blocks entirely from reconstructed prior turns (the API permits omitting earlier-turn thinking), or (3) add a defensive guard at request-build time that detects empty-text-with-signature blocks and strips them before sending.&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;Set the two bugs side by side and one tension comes into focus: the orchestration layer races ahead while the reliability of the primitives it rides on cannot keep pace. &lt;a class="link" href="https://github.com/bmad-code-org/bmad-method" target="_blank" rel="noopener"
 &gt;bmad-method&lt;/a&gt; coordinates 12 personas and 34 workflows, but every one of those calls ultimately runs on a thin foundation — one OAuth token and one transcript file. The &lt;a class="link" href="https://github.com/openai/codex" target="_blank" rel="noopener"
 &gt;Codex&lt;/a&gt; &lt;code&gt;None&lt;/code&gt; crash is about &lt;em&gt;a client failing to die gracefully when the backend breaks its promised shape&lt;/em&gt;; the &lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; thinking bug is about &lt;em&gt;a client irreversibly mis-serializing its own state&lt;/em&gt;. The former is input validation, the latter is state persistence — neither is a flashy agent feature, just the 30-year-old fundamentals of software engineering.&lt;/p&gt;
&lt;p&gt;The Claude Code bug in particular compresses a classic distributed-systems trap. A signature is an integrity promise about some data; erase the data while keeping the signature and the promise becomes a lie. And because the failure surfaces &lt;em&gt;at resume time, not at save time&lt;/em&gt;, the user loses an entire long working session with no warning and no recovery path. The more orchestration encourages long, complex sessions, the larger the blast radius of these latent mines. The honest conclusion: before stacking more agents, the code that refreshes a token and the code that serializes a conversation transcript have to be boringly robust first. More than a flashy Party Mode, what is urgent right now is a parser that does not die when it meets a &lt;code&gt;None&lt;/code&gt;, and a persistence layer that round-trips signatures together with their text.&lt;/p&gt;
&lt;h2 id="references"&gt;References
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Framework / orchestration&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/bmad-code-org/bmad-method" target="_blank" rel="noopener"
 &gt;bmad-method (GitHub)&lt;/a&gt; — 12+ personas, Party Mode, scale-adaptive planning, MIT, 48k+ stars&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.bmad-method.org" target="_blank" rel="noopener"
 &gt;bmad-method docs&lt;/a&gt; — module and workflow reference&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.npmjs.com/package/bmad-method" target="_blank" rel="noopener"
 &gt;bmad-method (npm)&lt;/a&gt; — package shipped via &lt;code&gt;npx bmad-method install&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/bmad-code-org" target="_blank" rel="noopener"
 &gt;bmad-code-org&lt;/a&gt; — the org maintaining the project&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Bug reports (same week)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/openai/codex/issues/24665" target="_blank" rel="noopener"
 &gt;OpenAI Codex issue #24665&lt;/a&gt; — OAuth NoneType crash, whole team blocked (CLOSED)&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/anthropics/claude-code/issues/63147" target="_blank" rel="noopener"
 &gt;Claude Code issue #63147&lt;/a&gt; — session permanently poisoned by empty-text-but-signed thinking blocks (OPEN, 2.1.153)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Base agents / runtimes&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; · &lt;a class="link" href="https://docs.claude.com/en/docs/claude-code/overview" target="_blank" rel="noopener"
 &gt;Claude Code docs&lt;/a&gt; — agent CLI supporting extended thinking and tool calls&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/openai/codex" target="_blank" rel="noopener"
 &gt;OpenAI Codex&lt;/a&gt; — gpt-5.5-based coding agent&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://cursor.com" target="_blank" rel="noopener"
 &gt;Cursor&lt;/a&gt; — editor-based agent that bmad-method installs into&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://nodejs.org" target="_blank" rel="noopener"
 &gt;Node.js&lt;/a&gt; · &lt;a class="link" href="https://docs.astral.sh/uv/" target="_blank" rel="noopener"
 &gt;uv&lt;/a&gt; — JS/Python agent-toolchain runtimes&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com" target="_blank" rel="noopener"
 &gt;Anthropic&lt;/a&gt; · &lt;a class="link" href="https://openai.com" target="_blank" rel="noopener"
 &gt;OpenAI&lt;/a&gt; — base model providers&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>