[ExI] AGI Motivation revisited [WAS Re: Isn't Bostrom seriously ...]
Richard Loosemore
rpwl at lightlink.com
Fri Jun 17 15:28:46 UTC 2011
Anders Sandberg wrote:
> Richard Loosemore wrote:
>> Anders Sandberg wrote:
>>> For example, at present most of our conclusions suggest that a
>>> uploading-driven singularity is more survivable than an AI-driven
>>> singularity.
>>
>> Most of whose conclusions? Based on what reasoning?
>>
>> Personally, I have come to exactly the opposite conclusion, based on
>> the potential controllability of AGI motivation and the complete
>> uncontrollability of unmodified uploaded human mind motivation.
>
> Safe AGI requires a motivation in a particular subset of possible
> motivations. This is unlikely to happen by chance, and if it happens by
> design it requires 1) designers can predict that a motivation is safe,
> and 2) implement it properly.
>
> The first requirement has turned out to be significantly harder than
> most people thought, and it is not too hard to show various limits to
> our prediction ability (AGIs are after all complex systems, the usual
> theo comp sci/Gödel theorems apply, plenty of apparently reasonable
> approaches can be shown to be wrong etc.) In many ways it is the
> "inverse ethics problem": from a given set of acceptable behaviors
> (which is not complete) deduce a motivation that will produce all of
> them *and* not misbehave in domains and situations we might not even
> have a clue about. Maybe this is simpler than solving the traditional
> "forward ethics problem", but I doubt it.
But! Everything you have just said presupposes a certain (unspoken)
definition of what "motivation" actually is, and while all the things
you say may be true within the confines of that particular definition,
my entire point is that the definition *itself* is broken.
So, you have not made your sense of "motivation" explicit, but, reading
between the lines I have a fairly shrewd idea what it is and it probably
amounts to an expected-utility maximization mechanism, operating on a
system with one or more stacks of goals represented in logical sentences
within some variety of reasoning system. With that kind of definition,
sure, all bets are off! No motivation could be guaranteed to be safe,
nor could you solve the inverse ethics problem within that context.
But, as I say, I do not accept that that is an appropriate type of
motivation mechanism, and (by the by) I don't even believe that such a
system could *work*, as the basis for an coherent intelligent system.
> The second requirement is a combination of A) having
> organisations/designers *wanting* to do the right implementation and B)
> actually implementing the answer to the first requirement correctly.
> Principal-agent problems, competitive pressures, typical organisational
> and human biases, and so on make A a fairly tough challenge even if we
> had a nice answer to 1. As for B, consider extant data on human
> reliability as programmers or how often software projects produce bad
> code. Requirement 1 is deep, requirement 2 is messy.
Not necessarily.
I don't want to go into the arguments against this one, partly for lack
of time. Much of what you say in the above section is still tainted by
the assumption mentioned above, but even the part of your argument that
is not, can be given a kind of reply. Basically, the programmers will
find that if they screw up, they don't end up with a fantastically smart
AGI with dangerous motivations (very bad), instead they end up with a
dumb AGI with dangerous motivations, because the two elements of the
system are entangled in such a way that you can't easily get [smart]
plus [dangerous]. If I had time I would extend this argument: the
basic conclusion is that in order to get a really smart AGI you will
need the alternate type of motivation system I alluded to above, and in
that case the easiest thing to do is to create a system that is empathic
to the human race .... you would have to go to immense trouble, over an
extended period of time, with many people working on the project, to
build something that was psychotic and smart, and I find that scenario
quite implausible.
So requirement 2 is not messy, unless you stick to your original
(implicit) definition of what motivation is.
> The reason to worry about the above requirements is the possibility of
> hard takeoff (intelligence explosion of self-improving software
> happening at short timescales compared to societal adaptation and human
> social interaction). Hard takeoffs are likely to result in single
> superintelligences (or groups of closely collaborating copies of it),
> since it is unlikely that several projects would be very close to each
> other in development. There are also the arguments by Carl Shulman about
> how AIs with different goals could merge their utility functions to a
> joint utility function, in essence becoming the same being. We also have
> fairly good reasons to think that superintelligences will be good at
> achieving their goals (more or less the definition of intelligence), so
> if a hard takeoff happens a single goal or motivation system will direct
> most of the future. So far I have not seen any plausible argument for
> why such a goal would be likely to be human compatible (no, I don't buy
> Mark R. Wasers argument) and some mildly compelling arguments for why
> they are likely to be accidentally inimical to us (the mindspace
> argument, the risk from AGIs with single top goals, Omohundro drives -
> although that paper needs shoring up a lot!) That implies that hard
> takeoffs pose a possible xrisk unless the safe AGI requirements have
> been properly implemented.
If I were to respond to this paragraph within the parameters of the
utility-function view of motivation, I would agree with many of these
points. (Especially about Omohundro's paper, which I think cannot be
shored up).
Okay, I am going to have to stop here. Partly lack of time, partly
because you will demand (justifiably) to know what I mean by an
alternative to the utility-function approach to motivation (and I have
even less time to write that out in full right now). You may or may not
be aware that I have written many thousands of words on that topic on a
few different mailing lists (SL4, AGI, Google-AGI, and here) ....
however I have not produced a formal publication yet.
So the discussion has to go on ice until I get the time to do that.
If I am lucky I will have time to do that full paper before the AGI
conference (are you going to that?).
Richard Loosemore
More information about the extropy-chat
mailing list