[ExI] Safety of human-like motivation systems [WAS Re: Oxford scientists...]

Wed Feb 2 16:40:39 UTC 2011

Kelly Anderson wrote:
> On Mon, Jan 31, 2011 at 12:05 PM, Richard Loosemore <rpwl at lightlink.com> wrote:
>> Kelly Anderson wrote:

>>> Richard, do you think computers will achieve Strong AI eventually?
>> Kelly, by my reckoning I am one of only a handful of people on this planet
>> with the ability to build a strong AI, and I am actively working on the
>> problem (in between teaching, fundraising, and writing to the listosphere).
> 
> That's fantastic, I truly hope you succeed. If you are working to
> build a strong AI, then you must believe it is possible.

I certainly believe that strong AI is possible.

> I have spent about the last two hours reading your papers, web site,
> etc. You have an interesting set of ideas, and I'm still digesting it.
> 
> One question comes up from your web site, I quote:
> 
> "One reason that we emphasize human-mind-like systems is safety. The
> motivation mechanisms that underlie human behavior are quite unlike
> those that have traditionally been used to control the behavior of AI
> systems. Our research indicates that the AI control mechanisms are
> inherently unstable, whereas the human-like equivalent can be
> engineered to be extremely stable."
> 
> Are you implying that humans are safe? If so, what do you mean by safety?

No, humans by themselves are (mild understatement) not safe.

The human motivation mechanism works in conjunction with the "thinking" 
part of the human mind.  The latter is like a swarm of simple agents, 
all trying to engage in a process of "weak constraint relaxation" with 
their neighbors, so the whole thing is like a molecular soup in which 
atoms and molecules are independently trying to aggregate to form larger 
molecules.

One factor that is important in this relaxation process is the anchoring 
of the relaxation:  there are always some agents whose state is being 
fixed by outside factors (e.g. the agents linked to sensors in your eye 
go into states that depend, not on nearby agents, but on the signals 
hitting the retina), so these peripheral agents act as seeds, causing 
many others to attach to them and grow to form large "molecules".  Those 
molecules are the extended structures that constitute the knowledge 
representations that we hold in working memory.  Obviously they change 
all the time, so there is never complete stability, but nevertheless the 
agents are always trying to find ways to go "downhill" toward more 
stable states.

Now, going back to your original question about motivation.  There are 
other sources that act as seed areas, governing the formation of 
molecules in this working memory area.  One such source is the 
motivation system:  a diffuse collection of agents that push the 
thinking system to want certain things, and to try to get those things 
in ways that are consistent with the constraints of the motivation system.

This can all get very complicated (too much for a post here), but the 
bottom line is that when the system is controlled in this way, the 
stability of the motivation system is determined by a very large number 
of mutually-reinforcing contraints, so if the system starts with 
intentions that are (shall we say) broadly empathic with the human 
species, it cannot start to conceive new, bizarre motivations that break 
a significant number of those constraints.  It is always settling back 
toward a large global attractor.

The problem with humans is that they have several modules in the 
motivation system, some of them altruistic and empathic and some of them 
selfish or aggressive.   The nastier ones were built by evolution 
because she needed to develop a species that would fight its way to the 
top of the heap.  But an AGI would not need those nastier motivation 
mechanisms.  If you subtract out those unwanted modules what you have 
left is an altruistic saint of an AGI, with a motivation system has 
three very important properties:

1)  If the AGI starts out wanting to help the human species because it 
feels like it belongs with us, then it can only develop new ideas about 
how to behave that are consistent with that motivation.

2)  For that same reason, if the AGI were given the chance to redesign 
itself, it would always want to improve its motivation mechanism to keep 
it consistent with those original motivations.  As a result, over time 
the motivation of the AGI would not drift, it would stay consistent with 
the feeling of empathy for humans.

3)  If some problem occurred in the computational substrate of the AGI 
(a random cosmic ray strike on the motivation module) the disruption 
would be very unlikely to leave the system with different, violent 
motivations.  That would be rather like a random cosmic ray collision 
causing you to have such specific damage to your body that a second 
after the collision you had a new, fully functional third arm attached 
to your body -- a ridiculously unlikely event, obviously.

This is what I mean by safety.  An AGI whose motivations had the same 
stability of design, as a human being, but without the specific modules 
(selfishness and aggression, primarily) that are present in the human 
system.

Richard Loosemore