Slack is now using all content, including DMs, to train LLMs

lemmyreader@lemmy.ml · 9 months ago

Slack is now using all content, including DMs, to train LLMs

originalfrozenbanana@lemm.ee · 9 months ago

That’s not true at all. If you obfuscate the PII it stops being PII. This is an extremely common trick companies use to circumvent these laws.

FaceDeer@fedia.io · 9 months ago

You could say it’s to “circumvent” the law or you could say it’s to comply with the law. As long as the PII is gone what’s the problem?

Lemongrab@lemmy.one · 9 months ago

LLMs have shown time and time again that simple crafted attacks can unmask the training data verbatim.

FaceDeer@fedia.io · edit-2 9 months ago

It is impossible for them to contain more than just random fragments, the models are too small for it to be compressed enough to fit. Even the fragments that have been found are not exact, the AI is “lossy” and hallucinates.

The examples that have been found are examples of overfitting, a flaw in training where the same data gets fed into the training process hundreds or thousands of time over. This is something that modern AI training goes to great lengths to avoid.

a4ng3l@lemmy.world · 9 months ago

How do you anonymise without supervision ? And obfuscation isn’t anonymisation…

originalfrozenbanana@lemm.ee · edit-2 9 months ago

Legally obfuscation can be anonymization depending on how it’s done

Depending on the data structures there are many methods to anonymize without supervision. None of them are perfect but the don’t have to be - just legally defensible.

Slack is now using all content, including DMs, to train LLMs

Slack is now using all content, including DMs, to train LLMs

Mark Newton (@NewtonMark@eigenmagic.net)