[ExI] Self improvement

Sat Apr 23 17:39:09 UTC 2011

Anders Sandberg wrote:
> I prefer to approach the whole AI safety thing from a broader 
> standpoint, checking the different possibilities. AI might emerge 
> rapidly, or more slowly. In the first case it is more likely to be 
> unitary since there will be just one system recursively selfimproving, 
> and there are some arguments that for some forms of softer takeoffs 
> there would be a 'merging' of the goals if they are compatible (Shulman, 
> Armstrong). In the case of softer takeoffs power differentials are going 
> to be less extreme, but there will also be many more motivations and 
> architectures around. Systems can be designed in different ways for 
> safety - by reducing their capabilities (might or might not preclude 
> recursive self improvement), various motivational measures, or by 
> constraints from other systems of roughly equal power.
> 
> So we need to analyse a number of cases:
> 
> Hard takeoff (one agent): capability constraints ("AI in a box" for 
> example), motivational constraints ("friendly" designs of various kinds, 
> CEV, values copied from person, etc), ability to prevent accidental or 
> deliberate takeoffs
> 
> Soft takeoff (multiple agents): as above (but with the complication that 
> defection of some agents is likely due to accident, design or systemic 
> properties), various mutual balancing schemes, issues of safeguarding 
> against or using 'evolutionary attractors' or coordination issues (e.g. 
> the large scale economic/evolutionary pressures discussed in some of 
> Robin's and Nick's papers, singletons)
> 
> This is not a small field. Even the subfields have pretty deep 
> ramifications (friendliness being a good case in point). It is stupid to 
> spend all one's effort on one possible subsection and claiming this is 
> the only one that matters. Some doubtlessly matter more, but even if you 
> think there is 99% chance of a soft takeoff safeguarding against that 1% 
> chance of a bad hard takeoff can be very rational. We will need to 
> specialize ourselves in researching them, but we shouldn't forget we 
> need to cover the possibility space well enough that we can start 
> formulating policies (including even the null policy of not doing 
> anything and just hoping for the best).

I have to say that I think the main problem is not so much the fact that 
there is a cluster of different (and sometimes difficult to analyze) 
posibilities, which force us to (as you put it) "approach the whole AI 
safety thing from a broader standpoint".

The critical issue is to understand the concept of "AGI motivation" in a 
more theoretically sound way.  I find much of the discussion to be so 
confused on this matter, that the discussion itself is of little value.

For example, you refer to some of the "large scale economic/evolutionary 
pressures discussed in some of Robin's and Nick's papers" (to which I 
would add Steve Omohundro's paper).  The problem with these analyses is 
that they presuppose, without critical examination, that the motivation 
of the AGI is going to be controlled by somthing like a simple Utility 
optization function, such as the ones used to model human economic 
agents.  That is just a supposition.  And yet, if the supposition turns 
out to be wrong, everything that follows it is called into question.

And we have good grounds for believing that that kind of supposition is 
indeed wrong:  this is a classic example of the model/mechanism mistake. 
  If you build a post-hoc model to *describe* a system, in might work in 
an approximate way (perhap well enough to allow crude modeling of the 
system).... but going from that model to the idea that a real, 
functioning version of that system is actually *driven* by a mechanism 
that is a direct instantiation of that model, is a ghastly (and very 
elementary) mistake.

I can model the spawning rate of rabbits, for example, using a nice 
little Fibbonnaci algorithm.  But going from that model to the 
suggestion that a colony of rabbits is controlled by a shared mechanism 
(presumably taking advantage of rabit telepathy!) that computes 
fibonnaci numbers and, at the appropriate moment, issues a command to 
spit out a new rabbit -- that would be a ridiculous confusion between 
model and mechanism.

But instead of understanding that this model/mechanism mistake is 
seriously undermining progress in the field of AGI motivation (and by 
extension, the whole question of AGI safety and friendliness), we fall 
back on the assertion that we need a wide variety of perspectives, and 
argue that it is all very complex and multifaceted.

Pity there aren't more people with a broad enough perspective working on 
these issues full time, huh...?   :-) :-)

Richard Loosemore