Is Claude Opus 4.8 more expensive than Opus 4.7?

No. Opus 4.8 ships at the same list price as Opus 4.7: $5 per million input tokens and $25 per million output tokens, with the same 1M-token context window and 128k max output.

What is the effort parameter in Claude Opus 4.8?

Effort controls how many tokens the model is willing to spend on reasoning before answering. Opus 4.8 defaults to high effort; you can raise it to extra (xhigh in Claude Code) or max for harder tasks, or lower it to save tokens on simple ones.

What are Dynamic Workflows in Claude Code?

Dynamic Workflows let Claude write a JavaScript orchestration script that decomposes a large task and delegates the pieces across many parallel subagents in a background runtime, keeping intermediate results in script variables instead of the main context window. It targets codebase-scale migrations and repository-wide sweeps.

Should I upgrade from Opus 4.7 to Opus 4.8?

For agentic coding and math-heavy work, yes — the SWE-bench Pro and USAMO gains are real and the price is unchanged. For general chat the difference is smaller. Re-test any prompt scaffolding that hard-codes effort or relies on 4.7's verbosity before switching production traffic.

Claude Opus 4.8 Review — Benchmarks, Effort Controls, and Dynamic Workflows

Parvez Ahmed

May 31, 2026

Anthropic shipped Claude Opus 4.8 on May 28, 2026 — API model ID claude-opus-4-8 — and called it “a modest but tangible improvement” over Opus 4.7. That framing is honest, and it is the right lens for this review. Opus 4.8 is not a generational leap. It is a point release that moves the needle on agentic coding, math, and honesty, ships two genuinely useful product features (an effort control and Dynamic Workflows), and does it all without raising the price. This is a review of what actually changed and whether it is worth switching.

TL;DR verdict

	Claude Opus 4.8
Released	May 28, 2026
Price	$5 / 1M input · $25 / 1M output (unchanged from 4.7)
Context window	1,000,000 tokens · 128k max output
Headline gain	Agentic coding — SWE-bench Pro 69.2% vs 4.7’s 64.3%
New features	`effort` parameter (high/extra/max) · Dynamic Workflows in Claude Code
Availability	Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry
Best for	Long-horizon agentic coding, math-heavy work, code review
Skip if	You only use it for chat — the delta there is small

If you do not read past this: upgrade for agentic coding and math, re-test your scaffolding first, and the unchanged price means there is little downside to making claude-opus-4-8 your default.

What actually changed

Three things matter in this release: the benchmark deltas, the new effort control, and Dynamic Workflows. Everything else is a rounding error.

Benchmarks — the gains are concentrated in coding and math

Anthropic positioned 4.8 as a coding and agentic upgrade, and the published numbers back that up:

Benchmark	Opus 4.8	Opus 4.7	Notes
SWE-bench Verified (500 problems)	88.6%	87.6%	+1.0 pt; leads Gemini 3.1 Pro (80.6%)
SWE-bench Pro (contamination-resistant)	69.2%	64.3%	+4.9 pt; ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%)
USAMO 2026 (math olympiad)	96.7%	69.3%	+27.4 pt — the largest single-release jump Anthropic published
Humanity’s Last Exam (no tools)	49.8%	46.9%	Leads GPT-5.5 (41.4%) and Gemini 3.1 Pro (44.4%)

The SWE-bench Pro number is the one to weight most. The original SWE-bench Verified set is increasingly contaminated — OpenAI stopped reporting Verified scores in early 2026 and now points to Pro — so the ~5-point Pro gain is a better signal of real-world coding improvement than the 1-point Verified bump. The USAMO jump is eye-catching but narrow: it tells you 4.8 is much stronger at competition math, not that every task improved by 27 points.

For the full cross-vendor picture — GPQA, MMLU-Pro, MMMU, Aider Polyglot, tau-bench, and pricing side by side — see the AI Models Leaderboard, where Opus 4.8 currently sits at the top of the composite ranking.

The new `effort` parameter

Opus 4.8 introduces an effort control that governs how many tokens the model will spend reasoning before it answers. It defaults to high, which Anthropic describes as its best balance of token spend and output quality. You can raise it to extra (surfaced as xhigh in the Claude Code effort menu) or max for harder problems, or drop it lower to save tokens on routine work.

This is the practical lever most teams will reach for. The interesting second-order effect, flagged by Cursor’s Michael Truell, is that on CursorBench, Opus 4.8 reaches the same result in fewer steps than 4.7 — so even at the same effort setting, the token-per-task cost tends to drop. In other words, the headline price is unchanged but the effective cost per finished task can come down on agentic workloads.

Dynamic Workflows in Claude Code

The biggest product change ships in Claude Code, not the model card. Dynamic Workflows lets Claude write a JavaScript orchestration script that decomposes a large task and delegates the pieces across up to 1,000 parallel subagents in a background runtime. Intermediate results live in script variables rather than the main context window, which is what makes it viable at scale.

The intended jobs are codebase-scale migrations, repository-wide bug sweeps, and multi-service refactors — the kind of work that previously blew past a single context window or required you to hand-roll a fan-out harness. If you have built multi-agent pipelines by hand, this is Anthropic moving that pattern into the product. It is powerful and also the feature most likely to surprise you on a bill, so cap it on a real task before turning it loose on a monorepo.

Honesty and code review

Anthropic put real weight on reliability this cycle. The claim worth repeating: Opus 4.8 is four times less likely than Opus 4.7 to let a code flaw pass without flagging it. In practice that means fewer “looks good to me” reviews on code that quietly contains a bug — the failure mode that makes an agentic reviewer untrustworthy. It lines up with the SWE-bench Pro gain: a model that catches more of its own mistakes is a model that closes more issues correctly.

For code-review-heavy workflows this is arguably more valuable than the raw benchmark delta. A reviewer you can trust to flag the subtle stuff changes how much you can delegate.

Pricing — what it costs

Pricing is unchanged from Opus 4.7, which is the quiet headline:

Input: $5.00 per 1M tokens
Output: $25.00 per 1M tokens
Prompt caching: $6.25 write / $0.50 read per 1M tokens
Context window: 1,000,000 tokens · max output: 128,000 tokens

At the same price as the model it replaces, with measurable gains and a steps-per-task reduction on agentic work, the cost story is straightforwardly positive. If you are price-sensitive and your work is not agentic, a cheaper model on the leaderboard may still be the rational pick — Opus-class pricing only pays for itself when you are using the autonomy.

Who should upgrade

Agentic coding teams: Yes. The SWE-bench Pro gain and the honesty improvement both target the exact failure modes that matter for autonomous, multi-file work. Switch your default to claude-opus-4-8.
Math / research: Yes, if competition-grade reasoning matters to you — the USAMO jump is real.
Code reviewers: Yes. The “4× less likely to miss a flaw” improvement is the most reviewer-relevant change in the release.
Chat-only users: Optional. The general-conversation delta is small; there is no penalty to upgrading, but do not expect a night-and-day difference.
Anyone with hard-coded scaffolding: Re-test first. If your prompts assume 4.7’s verbosity or pin a specific effort behavior, validate before moving production traffic.

How to switch

Point your client at the new model ID and, if you use the SDK, decide whether to pin an effort level:

from anthropic import Anthropic

client = Anthropic()

resp = client.messages.create(
    model="claude-opus-4-8",          # was claude-opus-4-7
    max_tokens=4096,
    # effort defaults to "high"; raise for harder agentic tasks
    extra_body={"effort": "high"},     # "high" | "extra" | "max"
    messages=[{"role": "user", "content": "Refactor this module and run the tests."}],
)
print(resp.content[0].text)

In Claude Code, the effort menu exposes the same levels (with xhigh as the label for extra), and Dynamic Workflows is available on long-running tasks without any config change. See Agent Instructions for how to scope a CLAUDE.md so the upgraded model stays inside your conventions.

FAQ

Is Opus 4.8 more expensive than 4.7? No — same $5 / $25 per million tokens, same 1M context, same 128k output ceiling.

What does the effort parameter do? It sets how many tokens the model spends reasoning before answering. Default is high; extra (xhigh) and max trade more tokens for depth on hard tasks.

What are Dynamic Workflows? A Claude Code feature where Claude writes a JavaScript orchestration script to fan a large task out across many parallel subagents in the background, keeping intermediate state out of the main context window.

Should I upgrade from 4.7? For coding, math, and review: yes. For chat: optional. Re-test any scaffolding that hard-codes effort or depends on 4.7’s output style first.

How does it compare to GPT-5.5 and Gemini 3.1 Pro? On the contamination-resistant SWE-bench Pro, Opus 4.8 (69.2%) leads both GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). The full table is on the leaderboard.

Continue reading

AI Models Leaderboard — Opus 4.8 versus 50+ models on benchmarks, pricing, and context window.
Claude Code vs Cursor vs Codex — where Opus 4.8 actually runs for most coding work.
LLM Benchmark Comparison 2026 — how to read SWE-bench, GPQA, and the rest without getting fooled.
Claude Sonnet 5 Review — the cheaper mid-tier sibling that sits just below Opus 4.8, for high-volume agent traffic.
Multi-Agent Pipelines — the hand-rolled version of what Dynamic Workflows automates.
Microsoft MAI-Thinking-1 & MAI-Code-1-Flash Review — the Build 2026 in-house models whose reasoning claims are measured against Opus.
All Reviews — index of every head-to-head review on the site.