Apple Intelligence is Free: You Are Not the Product, You're the Currency

Jun 13, 2024 @ 9:37 am

At WWDC this year, Apple announced a team-up with OpenAI. ChatGPT will be embedded in many places in the next iOS version, serving up writing assistance, creating custom memojis, and making Siri smarter.

And it costs you nothing!

However, does it cost Apple nothing?

According to reports, Apple is not paying OpenAI one thin dime for this integration. So why would OpenAI do this? After all, it requires enormous processing power, management time, development work, etc. It’s easy to understand Apple’s motivations: it gets leading-edge AI integrated into its platforms.

But what does OpenAI get?

The Data Wall

In the landmark series of essays, Situational Awareness, the “data wall” is explained:

There is a potentially important source of variance for all of this: we’re running out of internet data. That could mean that, very soon, the naive approach to pretraining larger language models on more scraped data could start hitting serious bottlenecks.
Frontier models are already trained on much of the internet. Llama 3, for example, was trained on over 15T tokens. Common Crawl, a dump of much of the internet used for LLM training, is >100T tokens raw, though much of that is spam and duplication (e.g., a relatively simple deduplication leads to 30T tokens, implying Llama 3 would already be using basically all the data). Moreover, for more specific domains like code, there are many fewer tokens still, e.g. public github repos are estimated to be in low trillions of tokens.

Training LLMs requires vast amounts of data. In fact, we’re out of data. All the large publicly available datasets are already trained to death, and doing clever things like sucking in every Reddit post or every Tweet has already been done. There’s ebooks and YouTube and other things, but the core problem is that there is only so much data.

Now Apple is presenting OpenAI with a vast new storehouse of data: everything its hundreds of millions of customers do on their iPhones, iPads, Watches, and Macs.

Remember, Sam Altman is a Liar

Now, of course, they won’t be training on Apple user data. Of course not.

Also remember that at every turn, Sam Altman has ignored safety, rules, and limits. That’s why he was canned.

If you honestly believe that your Apple-originated data is not going to be used to improve ChatGPT’s models, I have a bridge in New York I’d like to sell you.

The official story is that Apple is acting as a brand ambassador for OpenAI…wink, wink.

I'm Sorry, Sam, What Did You Just Say? Altman Says No "AI Jobs Apocalypse" Coming

I Should Have Read Further: Small Models Find the Same Bugs as the Vaunted Claude Mythos

Too Dangerous to Release? Why Claude Mythos's Alleged Capabilities are Nonsense

The Invisible Cost of a Clean Internet: Burning Out Poor People's Souls to Moderate Online Content

LowEndLOLs: Things That Make You Go Hmmm...

Not April Fools: Microsoft Says Copilot is for "Entertainment Purposes Only"

raindog308

raindog308 is a longtime community LETizen, technical writer, and self-described techno polymath. With deep roots in the *nix world, he has a passion for systems both modern and vintage, ranging from Unix, Perl, Python, and Golang to shell scripting and mainframe-era operating systems like MVS. He’s equally comfortable with relational database systems, having spent years working with Oracle, PostgreSQL, and MySQL.

As an avid user of LowEndBox providers, raindog308 runs an empire of LEBs, from tiny boxes for VPNs, to mid-sized instances for application hosting, and heavyweight servers for data storage and complex databases. He brings both technical rigor and real-world experience to every piece he writes.

Beyond the command line, raindog308 has a life-long love of German Shepherd Dogs, high-quality knives, target shooting, theology, tabletop RPGs, playing guitar, and hiking in deep, quiet forests.

His goal with every article is to help users, from beginners to seasoned sysadmins, get more value, performance, and enjoyment out of their infrastructure.

You can find him daily in the forums at LowEndTalk under the handle @raindog308.