Claude Opus 4.8, released on May 28, 2026, is Anthropic's flagship model in the Claude 4 family. Built on Opus 4.7, it delivers meaningful gains in agentic coding, multidisciplinary reasoning, and self-awareness — approximately 4x less likely than its predecessor to let code flaws pass unremarked.
- Stronger Agentic Coding: Achieves 69.2% on SWE-bench Pro (+4.9 points over Opus 4.7) and 88.6% on SWE-bench Verified, with improved end-to-end task completion on the Super-Agent Benchmark.
- Improved Honesty and Self-Awareness: Around 4x less likely to let code flaws pass unremarked; proactively flags uncertainties, catches its own mistakes, and pushes back on unsound plans before executing.
- Dynamic Workflows (Research Preview): Enables Claude Code to plan and execute large-scale tasks using hundreds of parallel subagents, supporting codebase-scale migrations across hundreds of thousands of lines of code.
- Mid-Conversation System Messages: The Messages API now accepts system messages mid-conversation while preserving prompt cache, reducing costs in agentic loops.
- Better Long-Context Performance: Improved compaction handling for sustained conversations, with fewer derailments after context compaction in long-running agentic tasks.
- Professional Software Engineering: With 69.2% on SWE-bench Pro — ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%) — well-suited for production-grade code generation, complex refactors, and autonomous bug resolution.
- Large-Scale Agentic Workflows: Dynamic Workflows allow orchestrating hundreds of parallel subagents for tasks like codebase-wide migrations, making it effective for enterprise-scale automation.
- High-Stakes Knowledge Work: Scores 1890 Elo on GDPval-AA and 57.9% on Humanity's Last Exam (with tools), suitable for financial analysis, legal document processing, and complex research synthesis.
- Long-Running Autonomous Tasks: Improved compaction handling and self-monitoring make it reliable for extended agentic sessions that require sustained focus and consistency.
| Capability | Description |
|---|
| Reasoning | GPQA Diamond: 93.6%. Humanity's Last Exam: 57.9% (with tools), 49.8% (without). USAMO 2026: 96.7%. |
| Coding | SWE-bench Verified: 88.6%, SWE-bench Pro: 69.2%, SWE-bench Multilingual: 84.4%, Terminal-Bench 2.1: 74.6%. |
| Agentic | Completes every case end-to-end on the Super-Agent Benchmark. Finance Agent v2: 53.9%. Legal Agent Benchmark: record score. |
| Computer Use | Online-Mind2Web: 84%. Strong browser and desktop interaction capabilities. |
| Multimodal | Text and image input. Up to 16 megapixels on the long edge, up to 600 images or PDF pages per request. |
| Context Window | 1,000,000 tokens. |
| Max Output | 128,000 tokens. |
| Tool Use | Full function calling, code execution, MCP support, adaptive thinking, effort control, dynamic workflows. |
| Multilingual | Strong multilingual performance across major world languages. |
- Terminal-Bench 2.1 trails GPT-5.5 (74.6% vs 78.2%).
- GPQA Diamond marginally lower than Opus 4.7 (93.6% vs 94.2%).
- New tokenizer (inherited from Opus 4.7) produces up to 35% more tokens for the same input text, meaning actual per-request costs may increase despite unchanged per-token pricing.
- Image input only (no native audio or video input).
- Dynamic Workflows remain in research preview and are limited to Claude Code.
| Model | Input (Credits/Token) | Cache Write (Credits/Token) | Cache Read (Credits/Token) | Output (Credits/Token) |
|---|
| Claude Opus 4.8 | 5.00 | 6.25 | 0.50 | 25.00 |
- Prompt caching: Cache writes at 1.25x (5-minute TTL) or 2x (1-hour TTL) base input price; cache reads at 0.1x base input price. Minimum 1,024 tokens for caching.