I Should Have Read Further: Small Models Find the Same Bugs as the Vaunted Claude Mythos

Apr 12, 2026 @ 2:22 pm

chatgpt, claude, gemini, grok, mythos, openai, opus

Human Master Yesterday we published a piece about the release of Claude’s “Mythos” model.

Or rather, the non-release.

If you haven’t heard, Anthropic is claiming that the Mythos model is so powerful, so omniscient, so capable that to open it up to the general public would spell the death of the Internet. In a flash, it would discover bugs in every software and service, igniting a desperate race to fix before hackers took over the world. To prevent this digital apocalypse, Anthropic is only releasing access to the model to a select group of high-tech firms so they can get ahead of this curve.

My point was that this is is marketing more than substance. I’m sure Mythos is an incremental improvement over Opus and I’m sure it’s a capable model. But by portraying it as a whole new ballgame in artificial intelligence – while Anthropic is shopping its shares for an IPO – is a bit ridiculous. It’s clearly intended to portray Anthropic as being far ahead of the field with advanced technologies no no other company has. It’s just marketing.

You want proof?

I should have googled a little further before yesterday’s article because there was a great article published by AI security startup AISLE debunking Anthropic’s claims. AISLE ran tests using smaller models – which are orders of magnitude less capable than Claude’s frontier work – and found that these models found the same bugs. In some cases, models as small as 20B parameters – which you easily run on consumer-grade hardware, even laptops – found many of the same bugs.

So Mythos is hardly mythic.

The article is well worth the time to read, because it gives great illustrations of how these scans work and what they actually find. For the eight models tested, many reached different conclusions. That “27-year-old” OpenBSD bug, for example, was found by 4 out of 8 models (including Gemma’s 31B model), but only two offered clear paths to a solution. For the touted FreeBSD NFS exploit, all 8 models found the same bug and offered fixes.

I'm Sorry, Sam, What Did You Just Say? Altman Says No "AI Jobs Apocalypse" Coming

LowEndBoxTV: The Cult of ForgeMind AI: True Belief, Sarah’s Purple Octopus, and the Quest for Immort...

Too Dangerous to Release? Why Claude Mythos's Alleged Capabilities are Nonsense

LowEndBoxTV: AI Companions, Part 5: SillyTavern Tutorial! Setup, Config, How to Write a Character Ca...

LowEndLOLs: Things That Make You Go Hmmm...

LowEndBoxTV: AI Companions, Part 4: Pitfalls and Problems

raindog308

raindog308 is a longtime community LETizen, technical writer, and self-described techno polymath. With deep roots in the *nix world, he has a passion for systems both modern and vintage, ranging from Unix, Perl, Python, and Golang to shell scripting and mainframe-era operating systems like MVS. He’s equally comfortable with relational database systems, having spent years working with Oracle, PostgreSQL, and MySQL.

As an avid user of LowEndBox providers, raindog308 runs an empire of LEBs, from tiny boxes for VPNs, to mid-sized instances for application hosting, and heavyweight servers for data storage and complex databases. He brings both technical rigor and real-world experience to every piece he writes.

Beyond the command line, raindog308 has a life-long love of German Shepherd Dogs, high-quality knives, target shooting, theology, tabletop RPGs, playing guitar, and hiking in deep, quiet forests.

His goal with every article is to help users, from beginners to seasoned sysadmins, get more value, performance, and enjoyment out of their infrastructure.

You can find him daily in the forums at LowEndTalk under the handle @raindog308.