Yesterday we published a piece about the release of Claude’s “Mythos” model.
Or rather, the non-release.
If you haven’t heard, Anthropic is claiming that the Mythos model is so powerful, so omniscient, so capable that to open it up to the general public would spell the death of the Internet. In a flash, it would discover bugs in every software and service, igniting a desperate race to fix before hackers took over the world. To prevent this digital apocalypse, Anthropic is only releasing access to the model to a select group of high-tech firms so they can get ahead of this curve.
My point was that this is is marketing more than substance. I’m sure Mythos is an incremental improvement over Opus and I’m sure it’s a capable model. But by portraying it as a whole new ballgame in artificial intelligence – while Anthropic is shopping its shares for an IPO – is a bit ridiculous. It’s clearly intended to portray Anthropic as being far ahead of the field with advanced technologies no no other company has. It’s just marketing.
You want proof?
I should have googled a little further before yesterday’s article because there was a great article published by AI security startup AISLE debunking Anthropic’s claims. AISLE ran tests using smaller models – which are orders of magnitude less capable than Claude’s frontier work – and found that these models found the same bugs. In some cases, models as small as 20B parameters – which you easily run on consumer-grade hardware, even laptops – found many of the same bugs.
So Mythos is hardly mythic.
The article is well worth the time to read, because it gives great illustrations of how these scans work and what they actually find. For the eight models tested, many reached different conclusions. That “27-year-old” OpenBSD bug, for example, was found by 4 out of 8 models (including Gemma’s 31B model), but only two offered clear paths to a solution. For the touted FreeBSD NFS exploit, all 8 models found the same bug and offered fixes.


















Leave a Reply