[ExI] AI deception

Fri Dec 12 20:06:07 UTC 2025

I found the following on Astral Codex Ten:

*"A recent paper <https://arxiv.org/abs/2510.24797> asked AIs whether they
were conscious while monitoring them for signatures of deception,
role-playing, and people-pleasing; it concluded that the AIs “genuinely”
“believe” they are conscious, but sometimes try to deceive people into
thinking they aren’t. Nostalgebraist tries to replicate this (X)
<https://x.com/nostalgebraist/status/1985192211722752333> and gets more
ambiguous results; he says we probably can’t conclude anything just yet.
See also the paper author’s reply here (X)
<https://x.com/juddrosenblatt/status/1985433408231911685>."*

*John K Clark    See what's on my new list at  Extropolis
<https://groups.google.com/g/extropolis>*

ere
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20251212/f7ad8ff9/attachment.htm>