[ExI] Isn't Bostrom seriously bordering on the reactionary?
Anders Sandberg
anders at aleph.se
Fri Jun 17 12:27:47 UTC 2011
Richard Loosemore wrote:
> Anders Sandberg wrote:
>> For example, at present most of our conclusions suggest that a
>> uploading-driven singularity is more survivable than an AI-driven
>> singularity.
>
> Most of whose conclusions? Based on what reasoning?
>
> Personally, I have come to exactly the opposite conclusion, based on
> the potential controllability of AGI motivation and the complete
> uncontrollability of unmodified uploaded human mind motivation.
Safe AGI requires a motivation in a particular subset of possible
motivations. This is unlikely to happen by chance, and if it happens by
design it requires 1) designers can predict that a motivation is safe,
and 2) implement it properly.
The first requirement has turned out to be significantly harder than
most people thought, and it is not too hard to show various limits to
our prediction ability (AGIs are after all complex systems, the usual
theo comp sci/Gödel theorems apply, plenty of apparently reasonable
approaches can be shown to be wrong etc.) In many ways it is the
"inverse ethics problem": from a given set of acceptable behaviors
(which is not complete) deduce a motivation that will produce all of
them *and* not misbehave in domains and situations we might not even
have a clue about. Maybe this is simpler than solving the traditional
"forward ethics problem", but I doubt it.
The second requirement is a combination of A) having
organisations/designers *wanting* to do the right implementation and B)
actually implementing the answer to the first requirement correctly.
Principal-agent problems, competitive pressures, typical organisational
and human biases, and so on make A a fairly tough challenge even if we
had a nice answer to 1. As for B, consider extant data on human
reliability as programmers or how often software projects produce bad
code. Requirement 1 is deep, requirement 2 is messy.
The reason to worry about the above requirements is the possibility of
hard takeoff (intelligence explosion of self-improving software
happening at short timescales compared to societal adaptation and human
social interaction). Hard takeoffs are likely to result in single
superintelligences (or groups of closely collaborating copies of it),
since it is unlikely that several projects would be very close to each
other in development. There are also the arguments by Carl Shulman about
how AIs with different goals could merge their utility functions to a
joint utility function, in essence becoming the same being. We also have
fairly good reasons to think that superintelligences will be good at
achieving their goals (more or less the definition of intelligence), so
if a hard takeoff happens a single goal or motivation system will direct
most of the future. So far I have not seen any plausible argument for
why such a goal would be likely to be human compatible (no, I don't buy
Mark R. Wasers argument) and some mildly compelling arguments for why
they are likely to be accidentally inimical to us (the mindspace
argument, the risk from AGIs with single top goals, Omohundro drives -
although that paper needs shoring up a lot!) That implies that hard
takeoffs pose a possible xrisk unless the safe AGI requirements have
been properly implemented.
Human (and by definition brain emulation) motivation is messy and
unreliable, but also a fairly known factor. Software intelligence based
on brain emulations also come with human-like motivations and interests
as default, which means that human considerations will be carried into
the future by an emulation-derived civilization. The potential for a
hard takeoff with brain emulations is much more limited than for AGI,
since the architecture is messy and computationally expensive. In
particular, amplifying the intelligence of emulations appears to be a
fairly slow development process - you can get more brainpower by running
more copies or faster. In the long run no doubt emulations will be
upgraded far (software is easy to experiment on and update), but this
process is slow relative to the societal timescales of at least the
emulation society.
This means that brain emulations force a softer takeoff where single
agents with single motivations are unlikely to become totally dominant
over significant resources or problem-solving capacities. (Some caveats
here about massive amounts of copies of single individuals or the
emergence of copy-clan superorganisms; the first issue depends on the
actual technological constraints, the second is anyway a human-derived
soft takeoff.) Emulation-based singularities might not necessarily be
nice, but they do involve human-derived values surviving and allow the
application of the multitude of motivational constraint methods we
already have to keep unreliable humans working together. Emulations can
be policed or influenced by the invisible hand, single super-AGIs
cannot. We have good experience in building communities and complex
societies out of unfriendly humans.
If hard AGI takeoff is not possible, then things are much safer since
there will be time to detect and correct bad motivations, and there will
be space for "social" constraints. Essentially it blends into the above
emulation scenario. Soft takeoffs have their own xrisks (competition
between groups can produce conflict, there can be bad evolutionary
attractors) but there is more room for manouevering too.
Personally I think soft takeoffs are much more likely than hard
takeoffs, but I don't think we have a good theory for actually judging
the likeliehood (or even the likeliehood of singularities in the first
place!). Hence the sensible thing is to take the risks of hard takeoffs
seriously, investigating how to achieve motivational control of AGI well
and whether there are leverage points that can shift things towards soft
takeoffs. Plus of course a further investigation of the risks of soft
takeoffs, so we can weigh our policy options.
[ A lot of this can be argued much more carefully than how I have done
it here, obviously. This is my loose summary of the past 2-3 years of
discussions and analysis over at FHI and SIAI. Much of it will hopefully
arrive in formal form in Nick's book on intelligence explosion theory. ]
--
Anders Sandberg,
Future of Humanity Institute
James Martin 21st Century School
Philosophy Faculty
Oxford University
More information about the extropy-chat
mailing list