<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Claude Opus 4 8 on ICE-ICE-BEAR-BLOG</title><link>https://ice-ice-bear.github.io/tags/claude-opus-4-8/</link><description>Recent content in Claude Opus 4 8 on ICE-ICE-BEAR-BLOG</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 28 May 2026 00:00:00 +0900</lastBuildDate><atom:link href="https://ice-ice-bear.github.io/tags/claude-opus-4-8/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Opus 4.8 — One Step on Capability, One on Policy: A More Honest Model at the Same Price, Plus a 2028 Scenario</title><link>https://ice-ice-bear.github.io/posts/2026-05-28-claude-opus-4-8-launch/</link><pubDate>Thu, 28 May 2026 00:00:00 +0900</pubDate><guid>https://ice-ice-bear.github.io/posts/2026-05-28-claude-opus-4-8-launch/</guid><description>&lt;img src="https://ice-ice-bear.github.io/" alt="Featured image of post Claude Opus 4.8 — One Step on Capability, One on Policy: A More Honest Model at the Same Price, Plus a 2028 Scenario" /&gt;&lt;h2 id="overview"&gt;Overview
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://www.anthropic.com" target="_blank" rel="noopener"
 &gt;Anthropic&lt;/a&gt; shipped &lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-8" target="_blank" rel="noopener"
 &gt;Claude Opus 4.8&lt;/a&gt;: an incremental upgrade that lifts coding, reasoning, and agentic performance while holding pricing flat ($5/$25 per million input/output tokens). What makes the week interesting is that, around the same time, Anthropic also published something that is not a model at all — the policy paper &lt;a class="link" href="https://www.anthropic.com/research/2028-ai-leadership" target="_blank" rel="noopener"
 &gt;2028: Two Scenarios for Global AI Leadership&lt;/a&gt;. One release on capability, one on policy. Read side by side, they make it much clearer what game a frontier lab is actually playing right now.&lt;/p&gt;
&lt;pre class="mermaid" style="visibility:hidden"&gt;graph TD
 A["Claude Opus 4.8 &amp;lt;br/&amp;gt; same price higher judgment"] --&gt; B["Coding agent &amp;lt;br/&amp;gt; Online-Mind2Web 84%"]
 A --&gt; C["Honesty &amp;lt;br/&amp;gt; 4x fewer missed code flaws"]
 A --&gt; D["Product features &amp;lt;br/&amp;gt; dynamic workflows / effort / fast mode"]
 E["2028 AI leadership &amp;lt;br/&amp;gt; policy paper"] --&gt; F["Export controls &amp;lt;br/&amp;gt; chips and equipment"]
 E --&gt; G["Distillation defense"]
 E --&gt; H["Global adoption of US AI"]
 A -.around the same time.-&gt; E&lt;/pre&gt;&lt;h2 id="same-price-higher-judgment"&gt;Same Price, Higher Judgment
&lt;/h2&gt;&lt;p&gt;The headline message of &lt;a class="link" href="https://www.anthropic.com/claude/opus" target="_blank" rel="noopener"
 &gt;Opus 4.8&lt;/a&gt; is &amp;ldquo;we raised judgment, not the price.&amp;rdquo; The &lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-8" target="_blank" rel="noopener"
 &gt;official announcement&lt;/a&gt; touts gains across coding, reasoning, and agentic tasks, but it leads with &lt;strong&gt;honesty&lt;/strong&gt; rather than benchmark numbers. Early testers praised the model for flagging its own uncertainties and avoiding unsupported claims, and Anthropic reports it is roughly four times less likely than the previous generation, Opus 4.7, to overlook a code flaw. In agentic coding, a &amp;ldquo;plausible but wrong&amp;rdquo; answer costs more than an incorrect refusal — this improvement aims squarely at that asymmetry.&lt;/p&gt;
&lt;p&gt;Holding the price is itself a signal. On the &lt;a class="link" href="https://www.anthropic.com/pricing" target="_blank" rel="noopener"
 &gt;Anthropic pricing page&lt;/a&gt;, the Opus tier sits at $5 in / $25 out per million tokens — identical to 4.7. Against a backdrop of frontier labs raising prices generation over generation, layering capability onto a flat price reads as positioning that is acutely aware of per-token cost comparisons with &lt;a class="link" href="https://openai.com" target="_blank" rel="noopener"
 &gt;competing models&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="opus-as-an-agent-benchmarks-and-new-features"&gt;Opus as an Agent: Benchmarks and New Features
&lt;/h2&gt;&lt;p&gt;Among the benchmarks called out, the web-agent evaluation stands out. Anthropic reports 84% on &lt;a class="link" href="https://huggingface.co/datasets/osunlp/Online-Mind2Web" target="_blank" rel="noopener"
 &gt;Online-Mind2Web&lt;/a&gt;, the live-web benchmark in the &lt;a class="link" href="https://osu-nlp-group.github.io/Mind2Web/" target="_blank" rel="noopener"
 &gt;Mind2Web&lt;/a&gt; family. Because it runs multi-step tasks against real websites, it speaks more directly to &amp;ldquo;how usable is this as an agent&amp;rdquo; than static QA does.&lt;/p&gt;
&lt;p&gt;The product layer changed too. &lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; gained &lt;strong&gt;dynamic workflows&lt;/strong&gt; that split large tasks into parallel subagents (&lt;a class="link" href="https://docs.claude.com/en/docs/claude-code/overview" target="_blank" rel="noopener"
 &gt;Claude Code docs&lt;/a&gt;). &lt;a class="link" href="https://claude.ai" target="_blank" rel="noopener"
 &gt;claude.ai&lt;/a&gt; added an &lt;strong&gt;effort control&lt;/strong&gt; that lets users trade quality against speed directly, alongside a &lt;strong&gt;fast mode&lt;/strong&gt; priced three times cheaper than before. In effect, the dial for making one model &amp;ldquo;think harder&amp;rdquo; or &amp;ldquo;answer faster&amp;rdquo; now sits in the user&amp;rsquo;s hands.&lt;/p&gt;
&lt;p&gt;The announcement frames Opus 4.8 as a &amp;ldquo;modest improvement&amp;rdquo; and positions it as a preview of Mythos-class models slated for broader release within weeks. Explicitly staging an incremental release lines up with the recent cadence visible in the &lt;a class="link" href="https://www.anthropic.com/news" target="_blank" rel="noopener"
 &gt;Anthropic newsroom&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="the-policy-paper-placed-right-beside-it"&gt;The Policy Paper Placed Right Beside It
&lt;/h2&gt;&lt;p&gt;The real story of the week is that the &lt;a class="link" href="https://www.anthropic.com/research/2028-ai-leadership" target="_blank" rel="noopener"
 &gt;2028 AI leadership scenarios&lt;/a&gt; paper sat right next to the capability release. Its core claim: &amp;ldquo;the political systems in which the most advanced AI is created will shape the rules and norms for how the technology is developed and deployed.&amp;rdquo; The paper sketches two futures — one where the US holds a 12-to-24-month intelligence lead and democracies set global AI norms, and one where the gap closes and authoritarian surveillance becomes possible at scale.&lt;/p&gt;
&lt;p&gt;The recommendations compress to three. First, tighten &lt;a class="link" href="https://www.bis.doc.gov/" target="_blank" rel="noopener"
 &gt;export controls&lt;/a&gt; on advanced chips and manufacturing equipment. Second, defend against &lt;strong&gt;distillation&lt;/strong&gt; attacks that illegally harvest US models to replicate their capabilities. Third, promote global adoption of American AI systems. The paper goes further, naming compute as the decisive variable and grounding the argument in more than a decade of &amp;ldquo;model capability scaling with compute.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="insights"&gt;Insights
&lt;/h2&gt;&lt;p&gt;Read together, the two releases make it clear that a frontier lab&amp;rsquo;s strategy runs on more than one axis. On one side is &lt;strong&gt;product competition&lt;/strong&gt; — pricing held flat while honesty and agentic performance climb, as in &lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-8" target="_blank" rel="noopener"
 &gt;Opus 4.8&lt;/a&gt;. On the other is &lt;strong&gt;policy competition&lt;/strong&gt; running in parallel — &lt;a class="link" href="https://www.anthropic.com/research/2028-ai-leadership" target="_blank" rel="noopener"
 &gt;export controls and distillation defense&lt;/a&gt;. The honesty gains on the model card and the &amp;ldquo;democracies should lead&amp;rdquo; thesis in the policy paper grow from the same root: who builds trustworthy AI, and under whose norms.&lt;/p&gt;
&lt;p&gt;For practitioners, the more consequential shift is the product-layer dial. The &lt;a class="link" href="https://www.anthropic.com/claude/opus" target="_blank" rel="noopener"
 &gt;effort control&lt;/a&gt;, &lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-8" target="_blank" rel="noopener"
 &gt;fast mode&lt;/a&gt;, and &lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; dynamic workflows break the &amp;ldquo;one model, one speed&amp;rdquo; assumption. Cost, latency, and quality trade-offs are likely to migrate from &lt;em&gt;which model you pick&lt;/em&gt; to &lt;em&gt;how you set the dial within the same model&lt;/em&gt;. If the claim of catching four times more code flaws holds, the distribution of human review time across an agentic coding pipeline changes shape. That said, the honesty and benchmark figures are all vendor-reported, so a number like 84% on &lt;a class="link" href="https://huggingface.co/datasets/osunlp/Online-Mind2Web" target="_blank" rel="noopener"
 &gt;Online-Mind2Web&lt;/a&gt; is safest treated as directional until independently reproduced. And as the &lt;a class="link" href="https://www.anthropic.com/research/2028-ai-leadership" target="_blank" rel="noopener"
 &gt;2028 scenarios&lt;/a&gt; suggest, which &lt;a class="link" href="https://deepmind.google/models/gemini/" target="_blank" rel="noopener"
 &gt;compute&lt;/a&gt; and which norms that dial spins on will increasingly turn on variables that are not technical at all.&lt;/p&gt;
&lt;h2 id="references"&gt;References
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Official announcement / product&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/news/claude-opus-4-8" target="_blank" rel="noopener"
 &gt;Claude Opus 4.8 announcement&lt;/a&gt; — flat price, improved coding/reasoning/honesty, new features (dynamic workflows / effort / fast mode)&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/claude/opus" target="_blank" rel="noopener"
 &gt;Claude Opus model page&lt;/a&gt; — Opus tier overview&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/pricing" target="_blank" rel="noopener"
 &gt;Anthropic pricing&lt;/a&gt; — Opus $5/$25 per million tokens&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/anthropics/claude-code" target="_blank" rel="noopener"
 &gt;Claude Code&lt;/a&gt; — agentic coding CLI that gained dynamic workflows / parallel subagents&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://docs.claude.com/en/docs/claude-code/overview" target="_blank" rel="noopener"
 &gt;Claude Code docs&lt;/a&gt; — feature and workflow reference&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://claude.ai" target="_blank" rel="noopener"
 &gt;claude.ai&lt;/a&gt; — consumer interface exposing the effort control&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Policy / research&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/research/2028-ai-leadership" target="_blank" rel="noopener"
 &gt;2028: Two Scenarios for Global AI Leadership&lt;/a&gt; — export controls, distillation defense, global adoption of US AI&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/responsible-scaling-policy" target="_blank" rel="noopener"
 &gt;Anthropic Responsible Scaling Policy&lt;/a&gt; — background on capability/risk scaling policy&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.bis.doc.gov/" target="_blank" rel="noopener"
 &gt;US BIS export controls&lt;/a&gt; — agency overseeing semiconductor and equipment controls&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Benchmarks / background&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://osu-nlp-group.github.io/Mind2Web/" target="_blank" rel="noopener"
 &gt;Mind2Web&lt;/a&gt; — web-agent evaluation project&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://huggingface.co/datasets/osunlp/Online-Mind2Web" target="_blank" rel="noopener"
 &gt;Online-Mind2Web dataset&lt;/a&gt; — live-web multi-step agent benchmark&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.anthropic.com/news" target="_blank" rel="noopener"
 &gt;Anthropic newsroom&lt;/a&gt; — recent release cadence&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://openai.com" target="_blank" rel="noopener"
 &gt;OpenAI&lt;/a&gt; · &lt;a class="link" href="https://deepmind.google/models/gemini/" target="_blank" rel="noopener"
 &gt;Google DeepMind Gemini&lt;/a&gt; — frontier labs for per-token cost and compute comparison&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>