[ExI] Self improvement

Sat Apr 23 13:25:20 UTC 2011

I prefer to approach the whole AI safety thing from a broader 
standpoint, checking the different possibilities. AI might emerge 
rapidly, or more slowly. In the first case it is more likely to be 
unitary since there will be just one system recursively selfimproving, 
and there are some arguments that for some forms of softer takeoffs 
there would be a 'merging' of the goals if they are compatible (Shulman, 
Armstrong). In the case of softer takeoffs power differentials are going 
to be less extreme, but there will also be many more motivations and 
architectures around. Systems can be designed in different ways for 
safety - by reducing their capabilities (might or might not preclude 
recursive self improvement), various motivational measures, or by 
constraints from other systems of roughly equal power.

So we need to analyse a number of cases:

Hard takeoff (one agent): capability constraints ("AI in a box" for 
example), motivational constraints ("friendly" designs of various kinds, 
CEV, values copied from person, etc), ability to prevent accidental or 
deliberate takeoffs

Soft takeoff (multiple agents): as above (but with the complication that 
defection of some agents is likely due to accident, design or systemic 
properties), various mutual balancing schemes, issues of safeguarding 
against or using 'evolutionary attractors' or coordination issues (e.g. 
the large scale economic/evolutionary pressures discussed in some of 
Robin's and Nick's papers, singletons)

This is not a small field. Even the subfields have pretty deep 
ramifications (friendliness being a good case in point). It is stupid to 
spend all one's effort on one possible subsection and claiming this is 
the only one that matters. Some doubtlessly matter more, but even if you 
think there is 99% chance of a soft takeoff safeguarding against that 1% 
chance of a bad hard takeoff can be very rational. We will need to 
specialize ourselves in researching them, but we shouldn't forget we 
need to cover the possibility space well enough that we can start 
formulating policies (including even the null policy of not doing 
anything and just hoping for the best).

-- 
Anders Sandberg,
Future of Humanity Institute 
James Martin 21st Century School 
Philosophy Faculty 
Oxford University