Anthropic released Claude Opus 4.8 on Thursday, just six weeks after the previous version. The new model maintains the original pricing but continues to improve in programming, inference, and computer operation tests, while adding finer inference strength control and the new Claude Code feature.
Programming test scores improved
In the SWE-bench Pro test, which measures the performance of Opus 4.8 in real-world software engineering tasks, Opus 4.8 scored 69.2%, higher than Opus 4.7's 64.3%. The report also cited data showing that OpenAI's GPT-5.5 scored 58.6%, and Google's Gemini 3.1 Pro scored 54.2%.
In the Humanity's Last Exam test, Opus 4.8 scored 49.8% without tools, rising to 57.9% with tools. In the OSWorld-Verified test, which tests real-world computer operation tasks, Opus 4.8 scored 83.4%, slightly higher than its predecessor's 82.8%.
Five new levels of reasoning difficulty have been added.
Anthropic has also added inference intensity control. Users can choose from five levels—Low, Medium, High, Extra, and Max—in Claude.ai and Cowork. The company claims that the default High setting consumes roughly the same number of tokens as the Opus 4.7 default setting, but delivers better results.
However, the report mentions that the new tokenizer used in the Opus series will consume more tokens per task. This means that in actual use, the total cost to the user may not depend solely on the listed price. Anthropic has also simultaneously increased the rate limit for Claude Code to accommodate the higher token consumption of the Extra and Max tiers.
Safety performance is close to Mythos
Anthropic stated that Opus 4.8's performance in alignment tests is close to that of Claude Mythos Preview. The latter is a restricted frontier model available to a limited number of approved organizations, primarily used in cybersecurity scenarios.
According to the company, Opus 4.8 is about three-quarters less likely to miss its own vulnerabilities when generating code compared to Opus 4.7. The report also mentioned that the UK's AI Security Institute tested Mythos and found it could independently complete a 32-step enterprise cyberattack simulation; therefore, this level of model is not yet fully available.
The price gap remains significant.
Alongside the new model, a preview of Claude Code's dynamic workflow research was also released. This feature allows the model to orchestrate scripts, invoke parallel sub-agents, verify results, and summarize outputs within a single session, and is available to Enterprise, Team, and Max plan users.
Although Opus 4.8 did not increase in price, its price remains significantly higher than some Chinese models. The report mentions that DeepSeek V4 Pro is priced at $0.435 per million input tokens and $0.87 per million output tokens; while Opus 4.8's standard mode remains at $5 per million input tokens and $25 per million output tokens, while the fast mode is $10 and $50 per million tokens.
The report suggests that Anthropic continues to prioritize model quality and security in its competitive efforts, particularly in regulated industries, legal services, and high-risk production environments. In a small game generation test, the authors stated that Opus 4.8's final output was superior to GPT-5.5 and DeepSeek V4 Pro, but its generation speed was slower, and its cost advantage was not significant.












