[ExI] Safety of human-like motivation systems [WAS Re: Oxford scientists...]
rpwl at lightlink.com
Wed Feb 2 16:40:39 UTC 2011
Kelly Anderson wrote:
> On Mon, Jan 31, 2011 at 12:05 PM, Richard Loosemore <rpwl at lightlink.com> wrote:
>> Kelly Anderson wrote:
>>> Richard, do you think computers will achieve Strong AI eventually?
>> Kelly, by my reckoning I am one of only a handful of people on this planet
>> with the ability to build a strong AI, and I am actively working on the
>> problem (in between teaching, fundraising, and writing to the listosphere).
> That's fantastic, I truly hope you succeed. If you are working to
> build a strong AI, then you must believe it is possible.
I certainly believe that strong AI is possible.
> I have spent about the last two hours reading your papers, web site,
> etc. You have an interesting set of ideas, and I'm still digesting it.
> One question comes up from your web site, I quote:
> "One reason that we emphasize human-mind-like systems is safety. The
> motivation mechanisms that underlie human behavior are quite unlike
> those that have traditionally been used to control the behavior of AI
> systems. Our research indicates that the AI control mechanisms are
> inherently unstable, whereas the human-like equivalent can be
> engineered to be extremely stable."
> Are you implying that humans are safe? If so, what do you mean by safety?
No, humans by themselves are (mild understatement) not safe.
The human motivation mechanism works in conjunction with the "thinking"
part of the human mind. The latter is like a swarm of simple agents,
all trying to engage in a process of "weak constraint relaxation" with
their neighbors, so the whole thing is like a molecular soup in which
atoms and molecules are independently trying to aggregate to form larger
One factor that is important in this relaxation process is the anchoring
of the relaxation: there are always some agents whose state is being
fixed by outside factors (e.g. the agents linked to sensors in your eye
go into states that depend, not on nearby agents, but on the signals
hitting the retina), so these peripheral agents act as seeds, causing
many others to attach to them and grow to form large "molecules". Those
molecules are the extended structures that constitute the knowledge
representations that we hold in working memory. Obviously they change
all the time, so there is never complete stability, but nevertheless the
agents are always trying to find ways to go "downhill" toward more
Now, going back to your original question about motivation. There are
other sources that act as seed areas, governing the formation of
molecules in this working memory area. One such source is the
motivation system: a diffuse collection of agents that push the
thinking system to want certain things, and to try to get those things
in ways that are consistent with the constraints of the motivation system.
This can all get very complicated (too much for a post here), but the
bottom line is that when the system is controlled in this way, the
stability of the motivation system is determined by a very large number
of mutually-reinforcing contraints, so if the system starts with
intentions that are (shall we say) broadly empathic with the human
species, it cannot start to conceive new, bizarre motivations that break
a significant number of those constraints. It is always settling back
toward a large global attractor.
The problem with humans is that they have several modules in the
motivation system, some of them altruistic and empathic and some of them
selfish or aggressive. The nastier ones were built by evolution
because she needed to develop a species that would fight its way to the
top of the heap. But an AGI would not need those nastier motivation
mechanisms. If you subtract out those unwanted modules what you have
left is an altruistic saint of an AGI, with a motivation system has
three very important properties:
1) If the AGI starts out wanting to help the human species because it
feels like it belongs with us, then it can only develop new ideas about
how to behave that are consistent with that motivation.
2) For that same reason, if the AGI were given the chance to redesign
itself, it would always want to improve its motivation mechanism to keep
it consistent with those original motivations. As a result, over time
the motivation of the AGI would not drift, it would stay consistent with
the feeling of empathy for humans.
3) If some problem occurred in the computational substrate of the AGI
(a random cosmic ray strike on the motivation module) the disruption
would be very unlikely to leave the system with different, violent
motivations. That would be rather like a random cosmic ray collision
causing you to have such specific damage to your body that a second
after the collision you had a new, fully functional third arm attached
to your body -- a ridiculously unlikely event, obviously.
This is what I mean by safety. An AGI whose motivations had the same
stability of design, as a human being, but without the specific modules
(selfishness and aggression, primarily) that are present in the human
More information about the extropy-chat