[ExI] Safety of human-like motivation systems [WAS Re: Oxford scientists...]

Fri Feb 4 17:50:59 UTC 2011

John Clark wrote:
> On Feb 4, 2011, at 12:01 PM, Richard Loosemore wrote:
> 
>> Any intelligent system must have motivations 
> 
> Yes certainly, but the motivations of anything intelligent never remain 
> constant. A fondness for humans might motivate a AI to have empathy and 
> behave benevolently toward those creatures that made it for millions, 
> maybe even billions, of nanoseconds; but there is no way you can be 
> certain that its motivation will not change many many nanoseconds from now. 

Actually, yes we can.

With the appropriate design, we can design it so that it uses (in 
effect) a negative feedback loop that keeps it on the original track.

And since the negative feedback loop works in (effectively) a few 
thousand dimensions simultaneously, it can have almost arbitrary stability.

This is because departures from nominal motivation involve 
inconsistencies between the departure "thought" and thousands of 
constraining ideas.  Since all of those thousands of constraints raise 
red flags and trigger processes that elaborate the errant thought, and 
examine whether it can be made consistent, the process will always come 
back to a state that is maximally consistent with the empathic 
motivation that it starts with.

Richard Loosemore