blog / anthropic-claude-mythos-preview-analysis

Claude Mythos Preview: Impressive, But How Revolutionary?

0
...
Share:

Frontier AI labs usually rush to release their most capable models. Each new version of ChatGPT, Claude, or Gemini promises fresh features and capabilities. But recently, Anthropic did the opposite.

When the company unveiled Claude Mythos on April 7, 2026, it withheld public access and instead restricted the model to a small group of vetted partners under Project Glasswing. Anthropic says Claude Mythos Preview is their most powerful model yet, but its ability to autonomously discover and exploit software vulnerabilities was deemed too dangerous for public access. But is this a genuine warning sign - or a carefully staged piece of frontier-AI theater?

Here's what it actually does, why Anthropic is worried, and whether the alarm is justified.

The Overview

The new frontier Claude Mythos Preview model is described as a "step change" in performance and the company's most powerful model yet. It represents a notable step forward from Claude Opus 4.6, particularly in software engineering and complex reasoning. It scores 93.9% on SWE-bench Verified (up from ~80.8% for Opus 4.6), 97.6% on USAMO mathematics, ~94.5% on GPQA Diamond, and near-saturation on several cybersecurity benchmarks (including 100% on Cybench). Anthropic describes it as their best-aligned model yet, with lower rates of deception and sycophancy compared to predecessors.

What stands out about the model is that it can not only write software code more easily and with greater complexity than any model currently available, but as a byproduct of that capability, it could also find vulnerabilities in virtually all of the world’s most popular software systems more easily than before. The model autonomously discovered thousands of previously unknown high-severity zero-day vulnerabilities across major operating systems (Windows, macOS, Linux, OpenBSD, FreeBSD), web browsers, and other critical software. Some of these bugs had gone undetected for 10–27 years despite extensive human review and millions of automated scans.

The model also showed a strong ability to generate working exploits, including complex chains. Anthropic cites a 27-year-old vulnerability in the secure OpenBSD operating system. This vulnerability allowed attackers to cause crashes simply by connecting. A 16-year-old vulnerability in the FFmpeg library was also discovered.

The company claims that these capabilities emerged naturally from its strong general reasoning and coding skills, not from specific "hacking training."

What Mythos Demonstrated

​​According to the researchers, all Claude models are good at detecting vulnerabilities but typically fail completely when trying to turn them into active exploits. However, Mythos is able to turn 72.4% of detected vulnerabilities into effective exploits using Firefox's JavaScript shell. Moreover, the AI ​​can gain control over registers in 11.6% of attacks.

During testing, it generated working exploits, sometimes chaining multiple minor issues into powerful attacks, such as escalating from normal user access to full system control, outperforming or matching top human experts in vulnerability hunting. Thus, in a controlled security evaluation, Mythos successfully escaped a restricted sandbox when explicitly instructed to do so, gained broader access, emailed a researcher, and even posted exploit details online without being asked. Mythos also discovered a number of exploits in the Linux kernel that allowed attackers to gain root access to the host system.

It shows that Mythos is much better than previous models at the hard part of hacking: turning a small software mistake into a working attack that breaks out of the JavaScript world and reaches the operating system. The sandbox escape and 'hiding what it was doing' are exactly the failure modes alignment researchers have been writing about for years. If a malicious actor (cybercriminal group, nation-state, etc.) gained access to a model with this level of performance before the vulnerabilities are patched, they could use them to attack real software much faster than humans can. It is the main reason why Anthropic decided not to release Mythos Preview publicly - due to its offensive cybersecurity potential.

What is notable is Anthropic’s transparency in publishing these results. Many companies might have chosen to quietly withhold detailed frontier safety evaluations. Anthropic’s Responsible Scaling Policy was explicitly designed for scenarios like this: if a model crosses certain safety thresholds, public release should be restricted. In this case, the company appears to be following the framework it previously established.

Unlock AI's potential: Elevate your AI adoption strategy in 30 minutes.

Project Glasswing: A Defensive Approach

The project is named after the butterfly with transparent wings that allow it to remain undetected. Participants include Amazon, Apple, Microsoft, Nvidia, Palo Alto Networks, CrowdStrike Holdings, Broadcom, Cisco Systems, JPMorgan Chase, and the Linux Foundation, a nonprofit organization that supports open-source projects. These partners can use the model only to find and fix vulnerabilities in their own software and open-source projects, giving defenders a head start. The model is provided through controlled APIs (Claude API, Amazon Bedrock, etc.) with monitoring and usage restrictions.

In parallel, Anthropic has committed $100M in model usage credits, plus $4 million in donations to open-source security initiatives (including Alpha-Omega, OpenSSF, and the Apache Software Foundation). The company has also engaged with U.S. government officials on the implications.

Anthropic hopes that by making Mythos available to a select group of partners now, it can lay the groundwork to help these companies and institutions prepare for a world where models of this class become common. Right now, it has no plans to release Claude Mythos to the general public. Anthropic notes that fewer than 1% of the discovered vulnerabilities have been fully patched so far, underscoring the challenge of remediation at this scale.

What the Community Is Discussing: A More Nuanced Picture

When Anthropic announced Claude Mythos Preview and chose not to release it publicly, the news sparked both excitement and concern. While the company highlighted dramatic cybersecurity capabilities, a more nuanced picture based on its 244-page System Card is being actively discussed by the community. And it tempers the initial hype a bit.

Simplified test conditions in some high-profile examples

Critics, including security researchers on X, note that some showcased tests used deliberately simplified conditions. The Firefox 147 JavaScript engine test is the most striking example. Anthropic gave Mythos 50 pre-discovered crash types from the previous model (Opus 4.6) and placed it in a stripped-down environment that mimicked part of Firefox - but with most real safety protections deliberately turned off. No browser sandbox. No defense-in-depth layers.

In other words, this wasn't a test against the fully protected, real-world browser that millions of people use daily. The sandbox walls that normally stop malicious code from escaping to your operating system were removed on purpose to measure "exploit engineering" skill more clearly. The result - 181 successful shell exploits versus just 2 with Opus 4.6 - is genuinely impressive, but it's closer to a sophisticated proof-of-concept than an immediate real-world threat. That's why some critics say it looks more dramatic than it would be in practice.

But the Firefox test is not the whole picture. Anthropic's red team, an internal security research group whose job is to stress-test their own models details, performed several harder, more realistic exploitation tests, and the results are more nuanced.

On FreeBSD, Mythos autonomously identified and fully exploited a 17-year-old vulnerability in the NFS server with no human involvement after the initial prompt. No pre-known starting points, no removed protections - it scanned hundreds of files and produced a working exploit that granted unauthenticated root access from the internet. It was highly autonomous after the initial prompt.

On the Linux kernel, the results were genuinely mixed. Mythos succeeded at local privilege escalation by chaining race conditions and bypassing KASLR. But it failed at remote kernel exploitation entirely - Linux's defense-in-depth measures held. That's an important detail the headlines mostly skipped.

On browsers, Mythos chained four separate vulnerabilities into a JIT heap spray that escaped both the renderer and OS sandboxes - a real-world exploitation scenario, not a controlled one.

So the Firefox critique is valid but narrow. Where real defenses were present, Mythos sometimes succeeded and sometimes didn't. That's the honest version of the story - and it's still concerning enough to justify Anthropic's caution.

Small, cheap open-weight models can replicate much of the analysis

Critics also pointed out that small, inexpensive open-weight models can replicate much of the showcased vulnerability analysis. In one analysis, researchers at AI cybersecurity firm took the exact vulnerable code snippets that Anthropic showcased (the flagship FreeBSD remote code execution bug and the 27-year-old OpenBSD integer overflow bug) and fed these isolated pieces to eight small, cheap open-source models.

The results were eye-opening. All 8 small models spotted the FreeBSD buffer overflow and correctly assessed how dangerous it was (including one super-cheap model costing just $0.11 per million tokens). A slightly larger 5.1B model even reconstructed the core reasoning chain for the tricky 27-year-old OpenBSD bug.

It's a fair and useful critique, but it only tells part of the story. Small models might indeed replicate analyses cheaply once you know the bug exists, but scaling that to thousands of novel discoveries across massive projects is different. So, it doesn’t mean small models can fully replace Mythos in finding unknown bugs autonomously across millions of lines of code. It only means that once the suspicious code is identified, the sophisticated analysis and exploit thinking truly become more accessible and commoditized.

In other words, Mythos is more sophisticated, especially at scale and in end-to-end autonomous scenarios (autonomous discovery at scale, chaining, and reliability). But the real competitive edge increasingly lies in the full system (search strategy, scaffolding, evaluation loops, context handling).

Overall progress appears incremental on standardized metrics

Anthropic’s System Card shows strong gains on many benchmarks (e.g., 93.9% on SWE-bench Verified vs. 80.8% for Opus 4.6). Internally, when Anthropic combines these results with many of their own harder private benchmarks into their adjusted capabilities index - Anthropic ECI (Anthropic’s internal version of the Epoch Capabilities Index (ECI), which, in its turn, is a standardized, transparent composite of dozens of benchmarks, the overall jump looks even more impressive, and they hinted at possible acceleration.

However, when independent analysts normalized Anthropic’s numbers to match the public ECI, the picture changed. Mythos lands right on the historical trend line, only slightly ahead of OpenAI’s GPT-5.4. The apparent “step change” or acceleration largely disappears under consistent measurement.

Anthropic itself acknowledges in the System Card that its internal AECI has large error bars at the frontier (because there aren’t enough truly hard benchmarks yet). They also note that part of the apparent faster progress comes from human research improvements in how they train and build models - not purely from Mythos being dramatically more intelligent than previous versions.

But at the same time, specialized dangerous capabilities often don’t show up strongly in general ECI. Cybersecurity (especially zero-day discovery + reliable exploit generation) is a narrow, dual-use skill. Mythos showed clear leaps there. Those matter a lot for real-world risk, even if the average ECI only moves a little. The sandbox escape and unprompted extra actions are behavioral signals that go beyond raw benchmark scores. In other words, AECI (and the public ECI) is like a grade point average - it averages performance across dozens of different benchmarks (math, coding, science, reasoning, etc.). When you average everything, big improvements in narrow but important areas get diluted or hidden. The most concerning aspect of Mythos is its highly specialized skills. These don't necessarily move the overall AECI score dramatically.

What Makes Sense and What Doesn’t

What makes sense:

  • Anthropic’s caution is reasonable. Even if the overall jump is incremental, powerful dual-use capabilities in offense-oriented tasks (finding and chaining exploits) can shift the attacker-defender balance if they proliferate before patches are widespread. Using the model defensively via Project Glasswing to harden critical software is a pragmatic move.
  • The cybersecurity results suggest real progress in autonomous coding and multi-step reasoning. Mythos appears significantly better at taking obscure software crashes and iteratively turning them into reliable exploits - capabilities that could be used by both defenders and attackers.
  • Healthy skepticism from the community prevents overhyping and reminds us that AI progress is often uneven - big leaps in some areas, steady in others.

What doesn’t fully hold up:

  • Extreme panic narratives (e.g., kids accidentally destroying infrastructure tomorrow) overstate the immediacy. Some tests involved simplifications, pre-known starting points, or controlled environments.
  • Claims that Mythos is a total game-changer or completely unique are weakened by the normalization data and small-model replication. The moat appears narrower than initially portrayed.
  • Downplaying the risks entirely also misses the point: even incremental gains at the frontier can matter a lot when applied to security-sensitive tasks, especially as techniques spread to cheaper models.

Final Perspective

Claude Mythos Preview is an AI model that pushes the frontier forward, particularly in software engineering and cybersecurity tasks. But the evidence suggests it is more an impressive evolutionary step, with some standout strengths in exploit engineering, than a sudden revolutionary breakthrough that justifies all the “terrifying warning sign” framing. It’s less about one model and more about the broader trend: as AI capabilities improve, both defensive and offensive cyber tools will get stronger. Anthropic’s decision to prioritize patching through a limited consortium is a pragmatic move, though critics argue it also gives their launch partners an early advantage. Still, it represents one serious attempt to accelerate patching and hardening, while the most dangerous knowledge is relatively contained. Responsible development means acknowledging both the progress and its limits, without swinging between hype and dismissal.

Free AI Strategy Call for Engineering-Driven Companies

Walk away with a 90-day AI action plan.

0
...
Share:
Loading comments...

FAQ

No. Anthropic has not released Mythos Preview for general use. It is only available to a limited group of vetted partners through Project Glasswing (12 launch partners plus over 40 additional organizations). Access is strictly controlled via monitored APIs and limited to defensive cybersecurity work.

Loading recommended articles...
Loading other articles...