[ExI] The paperclip maximizer scenario

Kelly Anderson postmowoods at gmail.com
Sun May 17 02:23:11 UTC 2026


I really have to agree that recent versions of Claude (4.x) have been
much less agreeable than prior versions. But when I call bull shit on
what they've done, they still see their own weaknesses.

This whole thread seems to neglect what's been recently termed
"thinking," which appears to involve feeding one AI's output into
another and asking if it makes sense until multiple models reach a
similar conclusion. I believe this is what reduces the agreeableness
of AIs. Now, there has to be a limit to this, or you would offend
Christians, atheists, flat-earthers, transhumanists and other
self-deluded groups. (Yes, we're as self-deluded as the other groups,
sad to say) So you can't be a jerk and be a successful AI model.

My recent experience is with the high-end paid models. So take that
into account.

-Kelly

> This is now something that LLMs are benchmarked on. The latest LLMs are much better at not giving into things they disagree with:
>
> https://github.com/petergpt/bullshit-benchmark/blob/main/docs/images/v2-detection-rate-by-model.png
>
> This is especially true of the latest paid models (the free ones are much more likely to go along with whatever BS the user supplies).
>
> Jason
>


More information about the extropy-chat mailing list