Back to blog
Model Release·Technical Deep Dive

Claude Opus 4.8 Release, Benchmarks And More

Claude Opus 4.8 scores 88.6% on SWE-bench Verified, 74.6% on Terminal-Bench 2.1, 1890 Elo on GDPval-AA, with parallel-subagent workflows and a 2.5x fast mode. Same $5/$25 pricing.

Jonathan Chavez
Jonathan Chavez
Co-Founder @ LLM Stats
·10 min read
Claude Opus 4.8 Release, Benchmarks And More

Key Numbers

Opus 4.8 · May 28, 2026

0.0%
SWE-bench Verified
0.0%
GPQA Diamond
0.0%
Terminal-Bench 2.1
0.0%
OSWorld-Verified
0
GDPval-AA (Elo)
0M
Context Window

Speed & Price

same model · two modes

Standard
$5 / $25 · per 1M tokens
2.5×
Fast mode
$10 / $50 · per 1M tokens

Standard pricing is unchanged from Opus 4.7. The optional fast mode runs at 2.5× the speed for double the per-token rate, and it is three times cheaper than fast mode on previous Claude models.

Anthropic released Claude Opus 4.8 on May 28, 2026. It is a direct upgrade to Opus 4.7 at the same price ($5 / $25 per million input / output tokens), and Anthropic positions it as its most capable general-access model at release. The benchmark gains are real but modest. The more interesting changes are in how you run it.

The headline number: 88.6% on SWE-bench Verified, up from 87.6% on Opus 4.7. Around it sit 74.6% on Terminal-Bench 2.1, 93.6% on GPQA Diamond, and a leading 1890 Elo on GDPval-AA. But the release is defined by four operational shifts: parallel-subagent dynamic workflows in Claude Code, mid-task system messages on the Messages API, an optional 2.5x fast mode, and measurable honesty improvements in the alignment assessment.


At a Glance

  • Release date: May 28, 2026. Generally available.
  • Model ID: claude-opus-4-8 on the Claude API.
  • Pricing: $5 per 1M input tokens, $25 per 1M output tokens. Same as Opus 4.7.
  • Fast mode: ~2.5x speed at $10 / $50 per 1M tokens (optional).
  • Context window: 1M input tokens / 128K output tokens.
  • Modalities: Text + vision input, text output.
  • Effort: Defaults to high; xhigh and max available for harder problems.
  • Claude Code: Dynamic workflows with parallel subagents for codebase-scale migrations.
  • Deployment: Claude.ai, Claude API, Amazon Bedrock, Google Vertex AI, Microsoft Foundry.

What's New in 4.8

None of these are architectural overhauls. Together they push Opus toward longer-horizon, lower-supervision agentic work.

Dynamic workflows in Claude Code

The marquee feature. Opus 4.8 can spin up parallel subagents that each plan, execute, and verify part of a task, coordinated by an orchestrator that merges their results. Where a single agent loop processes a large refactor sequentially, dynamic workflows split it across agents working at once.

Dynamic Workflows

one task · many agents

Orchestrator

plans, fans out,
merges results

01
Plan·
Execute·
Verify
02
Plan·
Execute·
Verify
03
Plan·
Execute·
Verify
04
Plan·
Execute·
Verify
In Claude Code, Opus 4.8 spawns parallel subagents that each plan, execute, and verify a slice of the work, then reports back to an orchestrator. The pattern is built for codebase-scale migrations that a single linear agent loop would take hours to grind through.

Anthropic frames the target use case as codebase-scale migrations: the kind of multi-file, multi-hour change where the bottleneck is throughput, not raw reasoning. The same effort control now extends to claude.ai and Cowork.

Mid-task system messages

A quieter but consequential developer change: the Messages API now accepts system entries inside the messages array, not just the top-level system parameter. Harnesses can update instructions partway through a task without breaking the prompt cache. For long agentic runs this means you can steer the model mid-flight, then keep paying cached-input rates on everything that came before.

Fast mode

Opus 4.8 introduces an optional fast mode that runs at roughly 2.5x the standard speed for double the per-token rate ($10 / $50 per million tokens). Notably it is three times cheaper than fast mode on previous Claude models, which makes interactive, latency-sensitive use of a frontier Opus model far more practical.

Higher default effort

Opus 4.8 defaults to high effort, with xhigh and max available for the hardest problems. The practical implication is the same as every recent Opus release: budget for more output tokens at the top effort levels, and measure token-per-task on your own traffic rather than assuming the aggregate holds.


Benchmarks

All scores are self-reported by Anthropic in the launch announcement and system card. Two benchmarks changed versions at 4.8 (Terminal-Bench 2.0 to 2.1, Finance Agent v1 to v2), so those are shown as standalone scores rather than deltas.

4.8 vs 4.7

Opus 4.8Opus 4.7
OSWorld-Verifiedharness updated
83.478.0+5.4
BrowseComp (single-agent)
84.379.3+5.0
SWE-bench Pro
69.264.3+4.9
MCP-Atlas
82.277.3+4.9
HLE (with tools)
57.954.7+3.2
SWE-bench Verified
88.687.6+1.0
GPQA Diamond
93.694.2-0.6
CharXiv-R (with tools)
89.991.0-1.1
Self-reported by Anthropic. Only same-version benchmarks shown. OSWorld 4.8 uses an updated harness, so part of its gain is methodology. Scores on a 0–100 scale.

Agentic coding

BenchmarkOpus 4.8Opus 4.7Delta
SWE-bench Verified88.6%87.6%+1.0
SWE-bench Pro69.2%64.3%+4.9
SWE-bench Multilingual84.4%
Terminal-Bench 2.174.6%69.4% (2.0)n/a

The +4.9 point jump on SWE-bench Pro is the real coding signal: SWE-bench Verified is approaching saturation, so the harder, less-saturated set is where headroom remains. Terminal-Bench moved to version 2.1, so its 74.6% is not directly comparable to 4.7's 2.0 score.

Reasoning & knowledge

BenchmarkOpus 4.8Opus 4.7Delta
GPQA Diamond93.6%94.2%-0.6
HLE (with tools)57.9%54.7%+3.2
HLE (without tools)49.8%46.9%+2.9
USAMO 202696.7%
GDPval-AA Elo1890

GPQA Diamond is flat within noise (both models sit above 93%), which is what saturation looks like. The clearer gains are on Humanity's Last Exam (+3.2 with tools) and the knowledge-work GDPval-AA evaluation from Artificial Analysis, where Opus 4.8 leads at 1890 Elo.

Agents: browse, tools, computer use

BenchmarkOpus 4.8Opus 4.7Delta
BrowseComp (single-agent)84.3%79.3%+5.0
BrowseComp (multi-agent)88.5%
MCP-Atlas82.2%77.3%+4.9
OSWorld-Verified83.4%78.0%+5.4 *
ScreenSpot-Pro87.9%
DeepSearchQA93.1%
Toolathlon59.9%

Agentic browsing and tool use are where Opus 4.8 moves most: +5.0 on BrowseComp single-agent (rising to 88.5% with a multi-agent orchestrator) and +4.9 on MCP-Atlas. The OSWorld-Verified gain (*) comes partly from an updated harness (zoom-tool fix, 128K max tokens per turn), so read it as a methodology + model improvement rather than a clean apples-to-apples delta.

Long context

Benchmark1M subset256K subset
GraphWalks BFS68.1%85.9%
GraphWalks Parents83.3%99.3%

The 256K results are strong; the 1M-token subset shows the usual degradation at the edge of the window. As always, treat the advertised 1M context as a ceiling, not a working budget.


Alignment & Honesty

The most distinctive part of this release is not a capability number. Anthropic's alignment assessment reports that Opus 4.8 is measurably more honest about its own work, which matters precisely because the rest of the release pushes the model to run longer with less supervision.

Alignment · Honesty

fewer unflagged flaws in self-written code

vs Opus 4.7

17×

fewer dishonest agentic code summaries

vs Sonnet 4.6

From Anthropic's alignment assessment. “Dishonest” here means the model claimed a task was done correctly when it was not. Both are reductions in the rate of dishonest behavior, not guarantees.

Two results stand out. Opus 4.8 lets flaws in its own code pass unremarked roughly four times less often than Opus 4.7, and produces dishonest summaries of agentic coding work about seventeen times less often than Claude Sonnet 4.6. Anthropic also reports broadly improved adherence to Claude's constitution. These are reductions in the rate of dishonest behavior, not eliminations, but for unattended multi-agent runs the direction is the one that counts.


Pricing & Availability

DetailValue
Input price$5.00 / 1M tokens
Output price$25.00 / 1M tokens
Fast mode input$10.00 / 1M tokens (~2.5x speed)
Fast mode output$50.00 / 1M tokens (~2.5x speed)
Max input context1M tokens
Max output128K tokens
PlatformsClaude API, Amazon Bedrock, Vertex AI, Microsoft Foundry
Model IDclaude-opus-4-8

Standard pricing matches Opus 4.7. The pitch is the same as every recent Opus: same cost per token, more capability per token. Fast mode is the new lever, trading double the rate for 2.5x throughput, and it lands three times cheaper than the previous generation's fast tier. See Anthropic's pricing page for rate limits and batch/caching discounts.


Migrating from 4.7

Opus 4.8 is a drop-in swap on the API surface. Three things are worth checking before you flip the model flag at scale.

Vertex availability

At launch the model id resolves cleanly on the Claude API and Bedrock, but Google Cloud Vertex AI may lag by a short window before the publisher model is exposed. If you route through Vertex, confirm claude-opus-4-8 resolves there before cutting traffic over, and keep the direct Anthropic route as the primary until it does.

Mid-task system messages

If your harness re-sends a full system prompt on every turn to inject new instructions, you can now move those updates into in-array system messages and stop invalidating the prompt cache. This is opt-in: existing code keeps working unchanged.

Effort and token budgets

With high effort as the default and parallel subagents in the mix, output token consumption can rise on agentic workloads. Re-measure token-per-task on real traffic, and lean on fast mode where latency matters more than the marginal cost.


Outlook

Opus 4.8 is an incremental model release wrapped around a more interesting platform release. The benchmark deltas over 4.7 are small, and a couple (GPQA, CharXiv-R) are flat or slightly down, which is what you expect when the headline suites are saturating. The story is the operational surface: parallel-subagent workflows for codebase-scale work, prompt-cache-safe mid-task steering, a genuinely cheaper fast mode, and an alignment assessment that puts numbers on honesty.

That combination, more autonomy plus more honesty about its own output, is the through-line. The benchmarks tell you Opus 4.8 is a little smarter. The workflow and alignment changes tell you Anthropic wants you to hand it bigger jobs and check its work less. For the full announcement and per-benchmark methodology, see Anthropic's launch post and system card.

Questions

Frequently Asked Questions

  • Anthropic released Claude Opus 4.8 on May 28, 2026. It is available across Claude products, the Claude API (claude-opus-4-8), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
  • Claude Opus 4.8 pricing is $5 per million input tokens and $25 per million output tokens, unchanged from Opus 4.7. An optional fast mode runs at 2.5x speed for $10 / $50 per million tokens, three times cheaper than fast mode on previous Claude models.
  • Opus 4.8 supports a 1 million token input context window with up to 128K output tokens, matching Opus 4.7.
  • Opus 4.8 improves on most of the comparable suite: 88.6% vs 87.6% on SWE-bench Verified, 69.2% vs 64.3% on SWE-bench Pro, 82.2% vs 77.3% on MCP-Atlas, and 84.3% vs 79.3% on BrowseComp (single-agent). GPQA Diamond is effectively flat (93.6% vs 94.2%), and CharXiv-R dips slightly. The headline additions are operational, not benchmark deltas.
  • Dynamic workflows let Claude Code spawn parallel subagents that each plan, execute, and verify a slice of a task and report back to an orchestrator. They are built for codebase-scale migrations that a single linear agent loop would grind through slowly.
  • Fast mode serves Opus 4.8 at roughly 2.5x the standard speed for double the per-token rate ($10 / $50 per million tokens). It is three times cheaper than fast mode on previous Claude models, aimed at latency-sensitive interactive use.

Continue Reading