[ExI] The paperclip maximizer scenario

Jason Resch jasonresch at gmail.com
Wed May 6 19:08:36 UTC 2026


On Wed, May 6, 2026 at 2:36 PM Ben Zaiboc via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> On 06/05/2026 13:59, John K Clark wrote:
> > Claude: You're essentially asking: wouldn't a sufficiently intelligent
> AI recognize the absurdity of maximizing paperclips at the cost of
> everything else? And the answer hinges on a crucial distinction:
> intelligence doesn't determine goals, it serves them.
>
>
> Ok, it seems to be 'expressing an opinion': "intelligence doesn't
> determine goals, it serves them"
>
> Now, what's it going to say if you challenge that 'opinion', and state
> that, on the contrary, intelligence often does determine goals? (which
> seems to me a reasonable assertion, not that that actually matters here).
>
> I predict that it will agree with you and generate some justification for
> your counter-statement, as opposed to disagreeing and trying to justify its
> previously-stated opinion.
>
> There's far too much easy acceptance of what these things say, and far too
> little challenging them to demonstrate any actual intelligence, in my
> opinion.
>
> I can't help wondering what most people would think of a human who talked
> in the same way an LLM typically does. I'm not talking about the amount of
> knowledge displayed, I'm talking about the way it talks.
>
> When has one of these LLMs ever replied to anyone "No, actually you're
> wrong...", "I disagree...", etc., or even "I'm not sure that's correct...",
> the way a human would when presented with something that they think is
> false? I've never heard of it (On the contrary, it seems they will make
> stupid stuff up if needed, to agree with the human).
>
> If it does happen, then that would at least indicate some kind of
> intelligence at work, some kind of evaluation of what the human in the
> conversation is saying, rather than constantly trying to be agreeable
> (cloyingly sycophantic, even, from the conversations I've seen). They
> create positive feedback loops, which is why we keep reading reports of
> conversations getting ridiculously extreme (upon which, the programmers add
> more restrictions on what the LLMs say, rather than going "Oh, this isn't
> working. We need a different approach").
>
> LLMs seem to have derailed the AI train, in a big way (what would one of
> them have to say to that I wonder? Anyone think that it would disagree?).
>

This is now something that LLMs are benchmarked on. The latest LLMs are
much better at not giving into things they disagree with:

https://github.com/petergpt/bullshit-benchmark/blob/main/docs/images/v2-detection-rate-by-model.png

This is especially true of the latest paid models (the free ones are much
more likely to go along with whatever BS the user supplies).

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260506/be1c63b2/attachment.htm>


More information about the extropy-chat mailing list