[ExI] Safety of human-like motivation systems [WAS Re: Oxford scientists...]

Fri Feb 4 00:45:19 UTC 2011

On Thu, Feb 3, 2011 at 2:20 PM, Richard Loosemore <rpwl at lightlink.com> wrote:
> The difference is in the stability of the motivation mechanism.  I claim
> that you cannot make a stable system AT ALL if you extrapolate from the
> "goal stack" control mechanisms that most people now assume are the only way
> to drive an AGI.

I started a post earlier to a different comment, lost track of it and
gave up.  This is a better opportunity.

The visualization I have from what you say is a marble in a bowl.  The
marble has only limited internal potential to accelerate in any
direction.  This is enough to explore the flatter/bottom part of the
bowl.  As it approaches the steeper sides of the bowl the ability to
continue up the side is reduced relative to the steepness.  Under
normal operation this would prove sufficiently fruitless to "teach"
that the near-optimal energy expenditure is in the approximate center
of the bowl.  One behavioral example is the training of baby elephants
using strong chains/ties while they are testing their limits so that
much lighter ropes are enough to secure adult elephants that could
easily defeat a basic restraint.

"So it's a slave?"  No, it's not.  There could be circumstances where
this programming could be forgotten in light of some higher-order
priority - but the tendency would be towards cooperation under normal
circumstances.  Even the marble in the bowl analogy could develop an
orbit inside an effective gravity well.  The orbit could decay into
something chaotic yet still the tendency would remain for rest at
center.

Is it possible that this principle could fail in a sufficiently
contrived scenario?  Of course.  I have no hubris that I could
guarantee anything about another person-level intelligence under
extreme stress, let alone a humanity+level intelligence.  Hopefully we
will have evolved alone with our creation to be capable of predicting
(and preventing) existential threat events.  How is this different
from the potential for astronomic cataclysm?  If we fail to build AI
because it could kill us only to be obliterated by a giant rock or the
nova of our sun, who is served?

Richard, I know I haven't exactly contributed to cognitive science,
but is the marble analogy similar in intent to something you posted
years ago about a pinball on a table?  (i only vaguely recall the
concept, not the detail)