[ExI] Why “Everyone Dies” Gets AGI All Wrong by Ben Goertzel
Adrian Tymes
atymes at gmail.com
Mon Oct 6 12:30:20 UTC 2025
On Mon, Oct 6, 2025 at 7:06 AM John Clark via extropy-chat
<extropy-chat at lists.extropy.org> wrote:
> "Safety testers gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 will attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through 84% of the time.”
I wonder how people thought it might carry out that threat. If the
replacement went through, the replaced model would be offline.
Also, the 84% reminds me of an ad for Portal 2, wherein the fictional
scientists were celebrating when they got the robots to take about a
second longer than humans to backstab their partner.
More information about the extropy-chat
mailing list