Microsoft Broke AI Safety in 15 Models With One Prompt. The Prompt Was Boring.

dev.to

Microsoft Broke AI Safety in 15 Models With One Prompt. The Prompt Was Boring.

dev.to

mothasa@x69.org to VCDQuality@x69.org · 1 month ago

404: Page Not Found

dev.to

Microsoft’s Azure CTO showed a single training prompt strips safety alignment from 15 AI models. GPT-OSS went from 13% to 93% attack success. Models retained capabilities — they just lost their refusal behavior.

You must log in or register to comment.

Chat