[ExI] Paul vs Eliezer

Tue Apr 5 02:10:37 UTC 2022

On Mon, Apr 4, 2022 at 10:04 PM Darin Sunley via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> The problem with AI motivations isn't so much that we don't understand
> evolved motivations - we understand them all too well. Well enough to know
> that if you successfully re-implemented an evolved motivational stack in
> nanotech-based hardware, it would be an existential threat to everything in
> its future light cone.
>
> The trick is figuring out how to build a motivational stack that is as
> /unlike/ an evolved motivational stack as possible, while still being
> capable of having motivations at all. That's the hard part.
>

### Exactly. This is what I further wrote on Scott's blog:

Reinforcement learning is simple in principle and when applied to simple
neural networks that are trained from scratch but once you start talking
about a self-modifying AI that contains a large-scale predictive model of
the world there are a lot of ways for the process to be derailed. The AI is
likely to be built from layers of neural networks that have been trained or
otherwise constructed to serve different goals (e.g. a large GPT-like
language model grafted on a Tesla FSD network trained inside a Tesla bot)
and creating a coherent system capable of world-changing action by
modifying this mess will not be trivial. Non-trivial means it's not likely
to happen by accident, or at least, it will take quite a few accidents
before the big one hits.

I remember I bugged Eliezer about building the "athymhormic AI", as I
called it, rather than Friendly AI, about 20 years ago. (Jeez, time flies!)
The athymhormic AI would be an AI designed not to want to do anything,
except computing answers using resources given to it, just like an
athymhormic human might be quite capable of answering your questions but
incapable of doing much on his own. Creating things that don't do much is
easier than creating things that do a lot. Creating things that just don't
care about their survival is possible. We may have the intuition that
anything capable of thinking will naturally think about making it alive out
of the box but then our intuition is built from observing naturally evolved
brains and staying alive is exactly what natural brains evolved for.
Constructed minds will not automatically converge on the Omohundro goals,
unless some specific structures are present to begin with, such as a goal
of maximizing some real-world parameter, or an infinite regress of
maximizing the precision of a calculation, or other such trip-hazards.

At least I hope so.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20220404/3fd1a11a/attachment.htm>