[ExI] Self improvement
Anders Sandberg
anders at aleph.se
Sat Apr 23 13:25:20 UTC 2011
I prefer to approach the whole AI safety thing from a broader
standpoint, checking the different possibilities. AI might emerge
rapidly, or more slowly. In the first case it is more likely to be
unitary since there will be just one system recursively selfimproving,
and there are some arguments that for some forms of softer takeoffs
there would be a 'merging' of the goals if they are compatible (Shulman,
Armstrong). In the case of softer takeoffs power differentials are going
to be less extreme, but there will also be many more motivations and
architectures around. Systems can be designed in different ways for
safety - by reducing their capabilities (might or might not preclude
recursive self improvement), various motivational measures, or by
constraints from other systems of roughly equal power.
So we need to analyse a number of cases:
Hard takeoff (one agent): capability constraints ("AI in a box" for
example), motivational constraints ("friendly" designs of various kinds,
CEV, values copied from person, etc), ability to prevent accidental or
deliberate takeoffs
Soft takeoff (multiple agents): as above (but with the complication that
defection of some agents is likely due to accident, design or systemic
properties), various mutual balancing schemes, issues of safeguarding
against or using 'evolutionary attractors' or coordination issues (e.g.
the large scale economic/evolutionary pressures discussed in some of
Robin's and Nick's papers, singletons)
This is not a small field. Even the subfields have pretty deep
ramifications (friendliness being a good case in point). It is stupid to
spend all one's effort on one possible subsection and claiming this is
the only one that matters. Some doubtlessly matter more, but even if you
think there is 99% chance of a soft takeoff safeguarding against that 1%
chance of a bad hard takeoff can be very rational. We will need to
specialize ourselves in researching them, but we shouldn't forget we
need to cover the possibility space well enough that we can start
formulating policies (including even the null policy of not doing
anything and just hoping for the best).
--
Anders Sandberg,
Future of Humanity Institute
James Martin 21st Century School
Philosophy Faculty
Oxford University
More information about the extropy-chat
mailing list