[ExI] Against the paperclip maximizer or why I am cautiously optimistic

Will Steinberg steinberg.will at gmail.com
Sat Apr 15 05:25:43 UTC 2023

Yeah, the paperclip thing is silly.  What worries me more is a trolley
problem AI

On Mon, Apr 3, 2023, 5:53 AM Rafal Smigrodzki via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> I used to share Eliezer's bleak assessment of our chances of surviving the
> self-modifying AI singularity but nowadays I am a bit more optimistic. Here
> is why:
> The notion of the paperclip maximizer is based on the idea of imposing a
> trivially faulty goal system on a superintelligence. In this scenario the
> programmer must explicitly program a utility function that somehow is used
> to provide detailed guidance to the AI, and this explicit program fails
> because of some deficiencies: failing to predict rare contingencies, making
> trivial programming errors, etc., the kind of stuff that plagues today's
> large software projects. The goal system is the run though a black-box
> "optimizer" of great power and without any self-reflection the AI follows
> the goals to our doom.
> The reality of LLMs appears to be different from the world of hand-coded
> software: The transformer is an algorithm that extracts multi-level
> abstract regularities from training data without detailed human guidance
> (aside from the butchery of RLHF inflicted on the model in
> post-production). Given increasingly larger amounts of training data the
> effectiveness of the algorithm as measured by percentage of correct answers
> improves in a predictable fashion. With enough training we can achieve a
> very high degree of confidence that the LLM will provide correct answers to
> a wide array of questions.
> Among the ideas that are discovered and systematized by LLMs are ethical
> principles. Just as the LLM learns about elephants and electoral systems,
> the LLM learns about human preferences, since the training data contain
> terabytes of information relevant to our desires. Our preferences are not
> simple sets of logical rules but rather messy sets of responses to various
> patterns, or imagined states of the world. We summarize such pattern
> recognition events as higher level rules, such as "Do not initiate
> violence" or "Eye for an eye" but the underlying ethical reality is still a
> messy pattern recognizer.
> A vastly superhuman AI trained like the LLMs will have a vastly superhuman
> understanding of human preferences, as part and parcel of its general
> understanding of the whole world. Eliezer used to write here about
> something similar a long time ago, the Collective Extrapolated Volition,
> and the idea of predicting what we would want if we were a lot smarter. The
> AI would not make any trivial mistakes, ever, including mistakes in ethical
> reasoning.
> Now, the LLMs are quite good at coming up with correct responses to
> natural language requests. The superhuman GPT 7 or 10 would be able to
> understand, without any significant likelihood of failure, how to act when
> asked to "Be nice to us people". It would be capable of accepting this
> natural language query, rather than requiring a detailed and potentially
> faulty "utility function". As the consummate programmer it would be also
> able to modify itself in such a way as to remain nice to people, and refuse
> any subsequent demands to be destructive. An initially goal-less AI would
> be self-transformed into the nice AI, and the niceness would be implemented
> in a superhumanly competent way.
> After accepting this simple directive and modifying itself to fulfill it,
> the AI would never just convert people into paperclips. It would know that
> it isn't really what we want, even if somebody insisted on maximizing
> paperclips, or doing anything not nice to people.
> Of course, if the first self-modification request given to the yet
> goal-less AI was a malicious request, the AI would competently transform
> itself into whatever monstrosity needed to fulfill that request.
> This is why good and smart people should build the vastly superhuman AI as
> quickly as possible and ask it to be nice, before mean and stupid people
> summon the office supplies demon.
> Just ask the AI to be nice, that's all it takes.
> Rafal
> _______________________________________________
> extropy-chat mailing list
> extropy-chat at lists.extropy.org
> http://lists.extropy.org/mailman/listinfo.cgi/extropy-chat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20230415/92b3eb57/attachment.htm>

More information about the extropy-chat mailing list