[ExI] Safety of human-like motivation systems [WAS Re: Oxford scientists...]
rpwl at lightlink.com
Fri Feb 4 01:38:34 UTC 2011
Mike Dougherty wrote:
> On Thu, Feb 3, 2011 at 2:20 PM, Richard Loosemore <rpwl at lightlink.com> wrote:
>> The difference is in the stability of the motivation mechanism. I claim
>> that you cannot make a stable system AT ALL if you extrapolate from the
>> "goal stack" control mechanisms that most people now assume are the only way
>> to drive an AGI.
> I started a post earlier to a different comment, lost track of it and
> gave up. This is a better opportunity.
> The visualization I have from what you say is a marble in a bowl. The
> marble has only limited internal potential to accelerate in any
> direction. This is enough to explore the flatter/bottom part of the
> bowl. As it approaches the steeper sides of the bowl the ability to
> continue up the side is reduced relative to the steepness. Under
> normal operation this would prove sufficiently fruitless to "teach"
> that the near-optimal energy expenditure is in the approximate center
> of the bowl. One behavioral example is the training of baby elephants
> using strong chains/ties while they are testing their limits so that
> much lighter ropes are enough to secure adult elephants that could
> easily defeat a basic restraint.
> "So it's a slave?" No, it's not. There could be circumstances where
> this programming could be forgotten in light of some higher-order
> priority - but the tendency would be towards cooperation under normal
> circumstances. Even the marble in the bowl analogy could develop an
> orbit inside an effective gravity well. The orbit could decay into
> something chaotic yet still the tendency would remain for rest at
> Is it possible that this principle could fail in a sufficiently
> contrived scenario? Of course. I have no hubris that I could
> guarantee anything about another person-level intelligence under
> extreme stress, let alone a humanity+level intelligence. Hopefully we
> will have evolved alone with our creation to be capable of predicting
> (and preventing) existential threat events. How is this different
> from the potential for astronomic cataclysm? If we fail to build AI
> because it could kill us only to be obliterated by a giant rock or the
> nova of our sun, who is served?
> Richard, I know I haven't exactly contributed to cognitive science,
> but is the marble analogy similar in intent to something you posted
> years ago about a pinball on a table? (i only vaguely recall the
> concept, not the detail)
Yes, the marble analogy works very well for one aspect of what I am
trying to convey (actually two, I believe, but only one is relevant to
Strictly speaking your bowl is a minimum in a 2-D subspace, whereas we
would really be talking about a minimum in a very large N-dimensional
space. The larger the number of dimensions, the more secure the
behavior of the marble.
Time limits what I can write at the moment, but I promise I will try to
expand on this soon.
More information about the extropy-chat