LowEndBox - Cheap VPS, Hosting and Dedicated Server Deals

ACT NOW: How to Prevent Your Repo From Being Used to Train Copilot

Microsoft GithubGitHub has announced that going forward, your data on GitHub will be used to train Microsoft’s Copilot LLM.

From April 24 onward, interaction data—specifically inputs, outputs, code snippets, and associated context—from Copilot Free, Pro, and Pro+ users will be used to train and improve our AI models unless they opt out. Copilot Business and Copilot Enterprise users are not affected by this update.

I wonder if they used Copilot to write this announcement?  I mean, those em-dashes…

Anyway, according to GitHub, everybody else is doing it, so they will, too.

This approach aligns with established industry practices and will improve model performance for all users.

Yeah.

By participating, you’ll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production.

Uh huh.

What a Messy Announcement

So at first read, it sounds like if you don’t use Copilot (and really, who does?) your data won’t be absorbed.

But then it goes on to talk about repos’ “file names, repository structure, and navigation patterns”.  It also notes that “the data used in this program may be shared with GitHub affiliates”.  Ugh!

According to the announcement, not everything is included.  Specifically not included is

Content from your issues, discussions, or private repositories at rest. We use the phrase “at rest” deliberately because Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out.

Obviously, public repos’ data is used because, well, it’s public.

How to Opt Out

Head over to your GitHub settings and look for this, under “Privacy”:

GitHub Opt Out

Should You Opt Out?

I guess this is a personal choice.  Personally, the announcement itself made me want to opt out.  It went from “Copilot users” to “metadata in your repo” to “we may share this with third parties” in only a few paragraphs.

I expect that if I used Copilot – and to be honest, I always forget that product still exists when people talk about AI – my inputs would be fed back into the model.  But I expect my private repo to be, well, private, not scanned to learn about “file names, repository structure, and navigation patterns”.

So I opted out.  Whether that is truly honored…

In the mean time, maybe GitHub could take action on the nulled WHMCS repo I reported more than two years ago.

 

No Comments

    Leave a Reply

    Some notes on commenting on LowEndBox:

    • Do not use LowEndBox for support issues. Go to your hosting provider and issue a ticket there. Coming here saying "my VPS is down, what do I do?!" will only have your comments removed.
    • Akismet is used for spam detection. Some comments may be held temporarily for manual approval.
    • Use <pre>...</pre> to quote the output from your terminal/console, or consider using a pastebin service.

    Your email address will not be published. Required fields are marked *