[ExI] AGI Motivation revisited [WAS Re: Isn't Bostrom seriously ...]

Fri Jun 17 15:28:46 UTC 2011

Anders Sandberg wrote:
> Richard Loosemore wrote:
>> Anders Sandberg wrote:
>>>  For example, at present most of our conclusions suggest that a 
>>> uploading-driven singularity is more survivable than an AI-driven 
>>> singularity.
>>
>> Most of whose conclusions?  Based on what reasoning?
>>
>> Personally, I have come to exactly the opposite conclusion, based on 
>> the potential controllability of AGI motivation and the complete 
>> uncontrollability of unmodified uploaded human mind motivation.
> 
> Safe AGI requires a motivation in a particular subset of possible 
> motivations. This is unlikely to happen by chance, and if it happens by 
> design it requires 1) designers can predict that a motivation is safe, 
> and 2) implement it properly.
> 
> The first requirement has turned out to be significantly harder than 
> most people thought, and it is not too hard to show various limits to 
> our prediction ability (AGIs are after all complex systems, the usual 
> theo comp sci/Gödel theorems apply, plenty of apparently reasonable 
> approaches can be shown to be wrong etc.) In many ways it is the 
> "inverse ethics problem": from a given set of acceptable behaviors 
> (which is not complete) deduce a motivation that will produce all of 
> them *and* not misbehave in domains and situations we might not even 
> have a clue about. Maybe this is simpler than solving the traditional 
> "forward ethics problem", but I doubt it.

But!  Everything you have just said presupposes a certain (unspoken) 
definition of what "motivation" actually is, and while all the things 
you say may be true within the confines of that particular definition, 
my entire point is that the definition *itself* is broken.

So, you have not made your sense of "motivation" explicit, but, reading 
between the lines I have a fairly shrewd idea what it is and it probably 
amounts to an expected-utility maximization mechanism, operating on a 
system with one or more stacks of goals represented in logical sentences 
within some variety of reasoning system.  With that kind of definition, 
sure, all bets are off!  No motivation could be guaranteed to be safe, 
nor could you solve the inverse ethics problem within that context.

But, as I say, I do not accept that that is an appropriate type of 
motivation mechanism, and (by the by) I don't even believe that such a 
system could *work*, as the basis for an coherent intelligent system.

> The second requirement is a combination of A) having 
> organisations/designers *wanting* to do the right implementation and B) 
> actually implementing the answer to the first requirement correctly. 
> Principal-agent problems, competitive pressures, typical organisational 
> and human biases, and so on make A a fairly tough challenge even if we 
> had a nice answer to 1. As for B, consider extant data on human 
> reliability as programmers or how often software projects produce bad 
> code. Requirement 1 is deep, requirement 2 is messy.

Not necessarily.

I don't want to go into the arguments against this one, partly for lack 
of time.  Much of what you say in the above section is still tainted by 
the assumption mentioned above, but even the part of your argument that 
is not, can be given a kind of reply.  Basically, the programmers will 
find that if they screw up, they don't end up with a fantastically smart 
AGI with dangerous motivations (very bad), instead they end up with a 
dumb AGI with dangerous motivations, because the two elements of the 
system are entangled in such a way that you can't easily get [smart] 
plus [dangerous].  If I had time I would extend this argument:  the 
basic conclusion is that in order to get a really smart AGI you will 
need the alternate type of motivation system I alluded to above, and in 
that case the easiest thing to do is to create a system that is empathic 
to the human race .... you would have to go to immense trouble, over an 
extended period of time, with many people working on the project, to 
build something that was psychotic and smart, and I find that scenario 
quite implausible.

So requirement 2 is not messy, unless you stick to your original 
(implicit) definition of what motivation is.

> The reason to worry about the above requirements is the possibility of 
> hard takeoff (intelligence explosion of self-improving software 
> happening at short timescales compared to societal adaptation and human 
> social interaction). Hard takeoffs are likely to result in single 
> superintelligences (or groups of closely collaborating copies of it), 
> since it is unlikely that several projects would be very close to each 
> other in development. There are also the arguments by Carl Shulman about 
> how AIs with different goals could merge their utility functions to a 
> joint utility function, in essence becoming the same being. We also have 
> fairly good reasons to think that superintelligences will be good at 
> achieving their goals (more or less the definition of intelligence), so 
> if a hard takeoff happens a single goal or motivation system will direct 
> most of the future. So far I have not seen any plausible argument for 
> why such a goal would be likely to be human compatible (no, I don't buy 
> Mark R. Wasers argument) and some mildly compelling arguments for why 
> they are likely to be accidentally inimical to us (the mindspace 
> argument, the risk from AGIs with single top goals, Omohundro drives - 
> although that paper needs shoring up a lot!) That implies that hard 
> takeoffs pose a possible xrisk unless the safe AGI requirements have 
> been properly implemented.

If I were to respond to this paragraph within the parameters of the 
utility-function view of motivation, I would agree with many of these 
points.  (Especially about Omohundro's paper, which I think cannot be 
shored up).

Okay, I am going to have to stop here.  Partly lack of time, partly 
because you will demand (justifiably) to know what I mean by an 
alternative to the utility-function approach to motivation (and I have 
even less time to write that out in full right now).  You may or may not 
be aware that I have written many thousands of words on that topic on a 
few different mailing lists (SL4, AGI, Google-AGI, and here) .... 
however I have not produced a formal publication yet.

So the discussion has to go on ice until I get the time to do that.

If I am lucky I will have time to do that full paper before the AGI 
conference (are you going to that?).

Richard Loosemore