[ExI] Safety of human-like motivation systems [WAS Re: Oxford scientists...]

Fri Feb 4 01:38:34 UTC 2011

Mike Dougherty wrote:
> On Thu, Feb 3, 2011 at 2:20 PM, Richard Loosemore <rpwl at lightlink.com> wrote:
>> The difference is in the stability of the motivation mechanism.  I claim
>> that you cannot make a stable system AT ALL if you extrapolate from the
>> "goal stack" control mechanisms that most people now assume are the only way
>> to drive an AGI.
> 
> I started a post earlier to a different comment, lost track of it and
> gave up.  This is a better opportunity.
> 
> The visualization I have from what you say is a marble in a bowl.  The
> marble has only limited internal potential to accelerate in any
> direction.  This is enough to explore the flatter/bottom part of the
> bowl.  As it approaches the steeper sides of the bowl the ability to
> continue up the side is reduced relative to the steepness.  Under
> normal operation this would prove sufficiently fruitless to "teach"
> that the near-optimal energy expenditure is in the approximate center
> of the bowl.  One behavioral example is the training of baby elephants
> using strong chains/ties while they are testing their limits so that
> much lighter ropes are enough to secure adult elephants that could
> easily defeat a basic restraint.
> 
> "So it's a slave?"  No, it's not.  There could be circumstances where
> this programming could be forgotten in light of some higher-order
> priority - but the tendency would be towards cooperation under normal
> circumstances.  Even the marble in the bowl analogy could develop an
> orbit inside an effective gravity well.  The orbit could decay into
> something chaotic yet still the tendency would remain for rest at
> center.
> 
> Is it possible that this principle could fail in a sufficiently
> contrived scenario?  Of course.  I have no hubris that I could
> guarantee anything about another person-level intelligence under
> extreme stress, let alone a humanity+level intelligence.  Hopefully we
> will have evolved alone with our creation to be capable of predicting
> (and preventing) existential threat events.  How is this different
> from the potential for astronomic cataclysm?  If we fail to build AI
> because it could kill us only to be obliterated by a giant rock or the
> nova of our sun, who is served?
> 
> 
> Richard, I know I haven't exactly contributed to cognitive science,
> but is the marble analogy similar in intent to something you posted
> years ago about a pinball on a table?  (i only vaguely recall the
> concept, not the detail)

Yes, the marble analogy works very well for one aspect of what I am 
trying to convey (actually two, I believe, but only one is relevant to 
the topic).

Strictly speaking your bowl is a minimum in a 2-D subspace, whereas we 
would really be talking about a minimum in a very large N-dimensional 
space.  The larger the number of dimensions, the more secure the 
behavior of the marble.

Time limits what I can write at the moment, but I promise I will try to 
expand on this soon.

Richard Loosemore