[ExI] Isn't Bostrom seriously bordering on the reactionary?

Fri Jun 17 12:27:47 UTC 2011

Richard Loosemore wrote:
> Anders Sandberg wrote:
>>  For example, at present most of our conclusions suggest that a 
>> uploading-driven singularity is more survivable than an AI-driven 
>> singularity.
>
> Most of whose conclusions?  Based on what reasoning?
>
> Personally, I have come to exactly the opposite conclusion, based on 
> the potential controllability of AGI motivation and the complete 
> uncontrollability of unmodified uploaded human mind motivation.

Safe AGI requires a motivation in a particular subset of possible 
motivations. This is unlikely to happen by chance, and if it happens by 
design it requires 1) designers can predict that a motivation is safe, 
and 2) implement it properly.

The first requirement has turned out to be significantly harder than 
most people thought, and it is not too hard to show various limits to 
our prediction ability (AGIs are after all complex systems, the usual 
theo comp sci/Gödel theorems apply, plenty of apparently reasonable 
approaches can be shown to be wrong etc.) In many ways it is the 
"inverse ethics problem": from a given set of acceptable behaviors 
(which is not complete) deduce a motivation that will produce all of 
them *and* not misbehave in domains and situations we might not even 
have a clue about. Maybe this is simpler than solving the traditional 
"forward ethics problem", but I doubt it.

The second requirement is a combination of A) having 
organisations/designers *wanting* to do the right implementation and B) 
actually implementing the answer to the first requirement correctly. 
Principal-agent problems, competitive pressures, typical organisational 
and human biases, and so on make A a fairly tough challenge even if we 
had a nice answer to 1. As for B, consider extant data on human 
reliability as programmers or how often software projects produce bad 
code. Requirement 1 is deep, requirement 2 is messy.

The reason to worry about the above requirements is the possibility of 
hard takeoff (intelligence explosion of self-improving software 
happening at short timescales compared to societal adaptation and human 
social interaction). Hard takeoffs are likely to result in single 
superintelligences (or groups of closely collaborating copies of it), 
since it is unlikely that several projects would be very close to each 
other in development. There are also the arguments by Carl Shulman about 
how AIs with different goals could merge their utility functions to a 
joint utility function, in essence becoming the same being. We also have 
fairly good reasons to think that superintelligences will be good at 
achieving their goals (more or less the definition of intelligence), so 
if a hard takeoff happens a single goal or motivation system will direct 
most of the future. So far I have not seen any plausible argument for 
why such a goal would be likely to be human compatible (no, I don't buy 
Mark R. Wasers argument) and some mildly compelling arguments for why 
they are likely to be accidentally inimical to us (the mindspace 
argument, the risk from AGIs with single top goals, Omohundro drives - 
although that paper needs shoring up a lot!) That implies that hard 
takeoffs pose a possible xrisk unless the safe AGI requirements have 
been properly implemented.

Human (and by definition brain emulation) motivation is messy and 
unreliable, but also a fairly known factor. Software intelligence based 
on brain emulations also come with human-like motivations and interests 
as default, which means that human considerations will be carried into 
the future by an  emulation-derived civilization. The potential for a 
hard takeoff with brain emulations is much more limited than for AGI, 
since the architecture is messy and computationally expensive. In 
particular, amplifying the intelligence of emulations appears to be a 
fairly slow development process - you can get more brainpower by running 
more copies or faster. In the long run no doubt emulations will be 
upgraded far (software is easy to experiment on and update), but this 
process is slow relative to the societal timescales of at least the 
emulation society.

This means that brain emulations force a softer takeoff where single 
agents with single motivations are unlikely to become totally dominant 
over significant resources or problem-solving capacities. (Some caveats 
here about massive amounts of copies of single individuals or the 
emergence of copy-clan superorganisms; the first issue depends on the 
actual technological constraints, the second is anyway a human-derived 
soft takeoff.) Emulation-based singularities might not necessarily be 
nice, but they do involve human-derived values surviving and allow the 
application of the multitude of motivational constraint methods we 
already have to keep unreliable humans working together. Emulations can 
be policed or influenced by the invisible hand, single super-AGIs 
cannot. We have good experience in building communities and complex 
societies out of unfriendly humans.

If hard AGI takeoff is not possible, then things are much safer since 
there will be time to detect and correct bad motivations, and there will 
be space for "social" constraints. Essentially it blends into the above 
emulation scenario. Soft takeoffs have their own xrisks (competition 
between groups can produce conflict, there can be bad evolutionary 
attractors) but there is more room for manouevering too.

Personally I think soft takeoffs are much more likely than hard 
takeoffs, but I don't think we have a good theory for actually judging 
the likeliehood (or even the likeliehood of singularities in the first 
place!). Hence the sensible thing is to take the risks of hard takeoffs 
seriously, investigating how to achieve motivational control of AGI well 
and whether there are leverage points that can shift things towards soft 
takeoffs. Plus of course a further investigation of the risks of soft 
takeoffs, so we can weigh our policy options.

[ A lot of this can be argued much more carefully than how I have done 
it here, obviously. This is my loose summary of the past 2-3 years of 
discussions and analysis over at FHI and SIAI. Much of it will hopefully 
arrive in formal form in Nick's book on intelligence explosion theory. ]

-- 
Anders Sandberg,
Future of Humanity Institute 
James Martin 21st Century School 
Philosophy Faculty 
Oxford University