[extropy-chat] Two draft papers: AI and existential risk; heuristics and biases

Eliezer S. Yudkowsky sentience at pobox.com
Tue Jun 13 06:34:22 UTC 2006

Robin Hanson wrote:
> At 12:33 PM 6/4/2006, Eliezer S. Yudkowsky wrote:
>>These are drafts of my chapters for Nick Bostrom's forthcoming edited
>>volume _Global Catastrophic Risks_.
>>_Cognitive biases potentially affecting judgment of global risks_
>>   http://singinst.org/Biases.pdf
>>An introduction to the field of heuristics and biases ...
>>_Artificial Intelligence and Global Risk_
>>   http://singinst.org/AIRisk.pdf
>>The new standard introductory material on Friendly AI.


It turns out that I've got more stuff coming up (moving to a new 
apartment within Silicon Valley) so I may not be able to carry on this 
conversation in as much detail as I'd like.  I did want to respond to at 
least what you've said so far.  If you write a long response to this, be 
forewarned - I may not be able to respond back.

> The chapter on cognitive biases was excellent.   Regarding the other 
> chapter, while you seem to have thought lots about many related 
> issues over the years, you don't seem to have worked much on the 
> issue I get stuck on: the idea that a single relatively isolated AI 
> system could suddenly change from negligible to overwhelmingly powerful.

As you may recall, that's where I got started on this "seed AI" business 
in 1998 - talking about recursive self-improvement.

But while writing the chapter, I made a conscious decision to talk more 
about Friendly AI, and less about seed AI.  Because, among other 
reasons, Friendly AI is both harder to explain and more important to 
explain.  People "get" the concept of seed AI relatively easily, though 
they may or may not agree with it.

> You warn repeatedly about how easy is is to fool oneself into 
> thinking one understands AI, and you want readers to apply this to 
> their intuitions about the goals an AI may have.  

The danger is anthropomorphic thinking, in general.  The case of goals 
is an extreme case where we have specific, hardwired, wrong intuitions. 
  But more generally, all your experience is in a human world, and it 
distorts your thinking.  Perception is the perception of differences. 
When something doesn't vary in our experience, we stop even perceiving 
it; it becomes as invisible as the oxygen in the air.  The most 
insidious biases, as we both know, are the ones that people don't see.

You expect surface effects to work like they do in your human 
experience, even when the fundamental causes of those surface effects 
change.  You expect assertions to be justified in terms of their 
perceived departure from what seems normal to you, but your norms are 
human norms.  For example:

> But you seem to be 
> relying almost entirely on unarticulated intuitions when you conclude 
> that very large and rapid improvement of isolated AIs is likely.

Here you measure "rapid" on a human scale.  There is nothing in the laws 
of physics which says that one thought per 10^45 Planck intervals is 
"normal", one thought per 10^55 Planck intervals is "slow", and one 
thought per 10^35 Planck intervals is "fast".

Pretend that a politically correct review committee is going to go over 
all your work looking for signs of humanocentrism.

 > A standard abstraction seems useful to me:  when knowledge
 > accumulates in many small compatible representations, growth is in
 > the largest system that can share such representations.

Presuming that information can be shared more cheaply than it can be 
initially produced; i.e. that the cost of bandwidth is less than the 
cost of local production.

 > Since DNA
 > is sharable mainly within a species, the improvements that any one
 > small family of members can produce are usually small compared to the
 > improvements transferred by sex within the species.

Here you analogize to evolution.  This is something to be wary of 
because evolution is an extremely unusual special case of an 
optimization process.  I use all sorts of evolutionary arguments, but 
only to illustrate *how different* an optimization process can be from 
human intelligence - never to say that something *must* be like evolution.

When knowledge accumulates in small modular representations, growth is 
in the largest system that *does* share such representations - not the 
largest system that *can*.  In principle, species could develop means of 
swapping adaptations among themselves.  Wouldn't you like gills?  But 
that's not how it works with multicellular organisms.  There's a very 
clear evolutionary logic for this - it's not a mystery.  But if a human 
were in charge of the system, if we were running the show, we'd 
plagiarize the heck out of everything and export adaptations wholesale 
between species.

So in fact, ecology contradicts the generalization you brought it to 
support - that growth is within the largest pool where knowledge *can* 
be shared, as a human onlooker thinks of opportunity.  Growth is within 
the pool where knowledge *is* shared.

The ecological world is like one in which every two human cultures that 
became sufficiently different, *completely stopped* communicating with 
each other.

We'd never do that.  Even if we hated their guts, we'd steal their guns.

In spirit, if not in letter, this may seem like an argument in your 
direction.  Evolution is dumber than a brain, and as we moved in the 
direction of increasing intelligence, we seemed to move toward 
perceiving more opportunities for communication.  Or at least more 
opportunities for theft.  Humans plagiarized flight from birds, but I 
haven't seen much capability-transfer going the other way.


There's a wider universe out there;
It doesn't work like you do;
You can't trust your intuitions;
Evolutionary analogies have dangers both subtle and gross;
Just because something *could* happen doesn't mean that it will.

This also struck me about your "Dreams of Autarky"; you said:

> The cells in our bodies are largely-autonomous devices and manufacturing plants, producing most of what they need internally. Our biological bodies are as wholes even more autonomous, requiring only water, air, food, and minimal heat to maintain and reproduce themselves under a wide variety of circumstances. Furthermore, our distant human ancestors acquired tools that made them even more general, i.e., able to survive and thrive in an unusually diverse range of environments. And the minds our ancestors acquired were built to function largely autonomously, with only minor inputs from other minds.

And from this you read:  There is a trend toward greater interdependency 
over recent time (~10 Ky), and you expect this trend to continue.

An alternate reading would be:  Modern human culture is a bizarre 
special case in a universe that doesn't usually work that way.  I 
discuss this in more detail below.

 > Since humans
 > share their knowledge via language and copying practices, the
 > improvements that a small group of people can make are small compared
 > to the improvements transferred from others, and made available by
 > trading with those others.

And this is an example of what I mean by anchoring on human norms.  In 
your everyday experience, an economy is made up of humans trading 
*artifacts* and *knowledge*.  You don't even think to question this, 
because it's so universal.

Humans don't trade brains.  They don't open up their skulls and trade 
visual cortex.  They don't trade adaptations.  They don't even trade 
procedural knowledge.  No matter how much someone offers to pay me, I 
cannot sell them my command of English or my ability to write 
entertaining nonfiction - not that I would ever sell the original.  I'm 
not sure I would sell a copy.  But the point is that I have no choice. 
I *can't* sell, whether I want to or not.

We can trade the products of our minds, but not the means of production. 
  This is an IMPORTANT ASSUMPTION in human affairs.

John K Clark once said:  "It mystifies me why anyone would even try to 
move large quantities of matter around the universe at close to the 
speed of light.  It's as silly as sending ice cubes to the south pole by 
Federal Express.  There's already plenty of matter in the Virgo Galactic 
Cluster 2 billion light years away and it's every bit as good as the 
matter we have here."

As it becomes more economical to ship the factory, it becomes less 
economical to ship the products of the factory.  This is 
double-bonus-true of cognition.  A compact description of the underlying 
rules of arithmetic (e.g. the axioms of addition) can give rise to a 
vast variety of surface facts (e.g. that 953,188 + 12,152 = 965,340). 
Trying to capture the surface behaviors, rather than the underlying 
generator, rapidly runs into the problem of needing to capture an 
infinite number of facts. AI people who run into this problem and don't 
understand where it comes from refer to it as the "common-sense problem" 
or "frame problem", and think that the solution is to build an AI that 
can understand English so it can download all the arithmetical facts it 
needs from the Internet.

In our modern world, everything focuses around shipping around 
declarative verbal sentences, because this is what human beings evolved 
to trade.  We can't trade procedural knowledge, except by extremely 
laborious, expensive, failure-prone processes - such as multi-year 
apprenticeships in school.  And neural circuitry we cannot trade at all.

When you reach down into the generators, you find more power than when 
you only play with surface phenomena.  You amplify leverage by moving 
closer to the start of the causal chain.  Like moving the pebbles at the 
top of the mountain where they start avalanches.  You cannot build Deep 
Blue (the famous program that beat Garry Kasparov for the world chess 
championship) by programming in a good chess move for every possible 
chess position.  First of all, it is impossible to build a chess player 
this way, because you don't know exactly which positions it will 
encounter.  And second, even if you did this, the resulting program 
would not play chess any better than you do.  Deep Blue's programmers 
didn't just capture their own chess-move generator.  If they'd captured 
their own chess-move generator, they could have avoided the problem of 
programming an infinite number of chess positions - but they couldn't 
have beat Garry Kasparov; they couldn't have built a program that played 
better chess than any human in the world.  The programmers built a 
*better* move generator.  This is something they couldn't even do on the 
level of organization of trading surface moves.

At Goertzel's recent AGI conference, I said:  "The only thing I know of 
more difficult than building a Friendly AI is creating a child."  And 
someone inevitably said:  "Creating a child is easy, anyone can do it." 
  And I said:  "That is like putting quarters into a Coke machine, and 
saying, 'Look, I made a Coke!'"

Humans who spark the process of embryogenesis possess none of the 
knowledge they would need to design children in their own right; they 
are just pulling the lever that starts an incredibly complex machine 
that they don't understand and couldn't build themselves.

People sometimes try to build AIs from "semantic networks", with data 
like is(cat, animal) or cuts(lawnmower, grass), and then they're 
surprised when the AI doesn't do anything.  This is because a verbal 
sentence - the units of knowledge most commonly traded among humans - 
are like levers for starting a machine.  That's all we need to trade 
among ourselves, because we all have the machine.  But people don't 
realize this - the machine is universal, and therefore it's invisible; 
perception is the perception of differences.  So someone who programs 
these tiny, lifeless LISP tokens into an AI is surprised when the AI 
does absolutely nothing interesting, because as far as they can see, the 
AI has everything it needs.  But the levers have no mechanisms to 
trigger, the instruction set has no CPU.  When you see the word "cat" it 
paints a complex picture in your visual cortex - the mere ASCII string 
carries none of that information, it is just a lever that triggers a 
machine you already have.

We are like people who refine gasoline, and trade gasoline, and 
understand the concept of "running out of gas", but who never think 
about cars.  So you don't focus on the question of whether there might 
be more efficient cars.

And yet there are these things called "chimps" that can't use any of the 
knowledge you're so playfully batting about.  You don't even think to 
ask why chimps are excluded from the knowledge economy - though they're 
incredibly close to us evolutionarily.  You don't encounter chimps in 
your everyday life; they don't participate in your economy... and yet 
what separates humans from chimps is the very last layer of icing on a 
brain-cake that's almost entirely shared between us.

A comparative handful of improvements to underlying *generators*, 
underlying *brain circuitry*, are enough to entirely exclude chimps from 
our knowledge economy; they cannot absorb the knowledge we are trading 
around, and can do nothing with it.  Ricardo's Law of Comparative 
Advantage does not extend to chimps.  And chimps are our closest 
cousins!  What about mice?  What about lizards?  *That* is the power of 
between-species intelligence differences - underlying generators that 
differ by the presence of entire complex adaptations.

Humans don't ship around brain circuitry and complex adaptations because 
we can't.  We don't even realize how powerful they are, because 
differences of brain circuitry are so hugely powerful as to drop our 
closest competitors out of the economy and out of sight.  Anything that 
doesn't have *all* your brain circuitry and all your complex adaptations 
is so powerless, compared to you, that it doesn't occur to you to look 
in that direction - even though a chimp has 95% of your genomic complexity.

This is what I mean by saying that humans are an unusual special case of 
non-autarky.  Ordinarily, when an optimization process builds something, 
it builds things that, by comparison to an interdependent human economy, 
look like autarkic monoliths.  Humans are extremely unusual because we 
gained the ability to transfer units of knowledge (lever-pulling 
instructions) between ourselves, but we could not reach down to the 
level on which evolution built us to begin with.  Thus we could *not* 
encapsulate the accumulating complexity into our own system designs.  We 
could *not* give our children the accumulated knowledge of our science, 
we could *not* build into their bodies the accumulated power of our 
technology.  Evolution, in contrast, usually builds into each member of 
a species all the adaptive complexity it manages to accumulate.  Why 
shouldn't it, since it can?

 > The obvious question about a single AI is why its improvements could
 > not with the usual ease be transferred to other AIs or humans, or
 > made available via trades with those others.

Transferring to other AIs is one issue, but that you ask about 
transferring to humans indicates pretty clearly that you're thinking 
about declarative knowledge rather than brain circuitry.

Insert here the usual lecture about the brain being a mess of spaghetti 
code that is not modular, cannot easily be read out or written to, runs 
at slow serial speeds, was never designed to be improved, and is not 
end-user-modifiable.  (It's easier to build a 747 from scratch; than to 
inflate an existing bird to the size of a 747, that actually flies, as 
fast as a 747, without killing the bird or making it very uncomfortable. 
  I'm not saying it could never, ever be done; but if it happens at all, 
it will be because the bird built a seed that grew into a 747 that 
upgraded the bird.  (And at this point the metaphor bursts into flames 
and dies.))

You could imagine drawing a circle around all the AIs in the world, and 
suppose that growth is on the level of their knowledge economy.  WHICH 
that's so powerful that chimps who merely have 95% of what you have 
might as well not exist from your economic viewpoint.

What goes on inside that circle is just as much a hard takeoff from the 
perspective of an outside human.

Not that I think we'll see a knowledge economy among different AIs 
undergo hard takeoff, because...

 > Today a single human can share the ideas within his own
 > head far easier than he can share those ideas with others -
 > communication with other people is far more expensive and
 > error-prone.   Yet the rate at which a single human can innovate is
 > so small relative to the larger economy that most innovation comes
 > from ideas shared across people.

Again, anchoring on the human way of doing things.  You do not have the 
capability to solve a problem by throwing ONE BIG human at it, so you 
think in terms of throwing lots of individual minds.

But which is more effective - one human, six chimps, or a hundred 
squirrels?  All else being equal, it will generally be far more 
efficient to build a coherent individual out of the same amount of 
computing power, rather than divide that individual into pieces. 
Otherwise the human brain would have naturally evolved to consist of a 
hundred compartmentalized communicating squirrels.  (If this reminds you 
of anyone you know, it is pure coincidence.)

Having individual minds is like having economies with separate 
currencies, fortified borders, heavily protectionist trade barriers, and 
wide seas separating their wooden ships.  It's more efficient to take 
down the trade barriers and adopt the same currency, in which case you 
soon end up with a single economy.

Now, maybe France *wants* to preserve its French identity within the 
European Union, as a matter of intrinsic utilities; but that is a 
separate matter from maximizing efficiency.

And even more importantly...

 > If so, this single AI
 > would just be part of our larger system of self-improvement.   The
 > scenario of rapid isolated self-improvement would seem to be where
 > the AI found a new system of self-improvement, where knowledge
 > production was far more effective, *and* where internal sharing of
 > knowledge was vastly easier than external sharing.

You seem to be visualizing a world in which, at the time the *first* AI 
approaches the threshold of recursive self-improvement,

(1) There are already lots of AIs around that fall short of strong 

And these AIs:

(2) Have ability to trade meaningful, important units between themselves.

You think of knowledge of the kind humans evolved to share with each 
other.  I think of underlying brain circuitry of the kind that differs 
between species and is the ultimate generator of all human culture.  The 
latter is harder to trade - though, obviously, far more valuable.  How 
much would you pay for another 20 IQ points?  (And that's not even a 
difference of the interspecies kind, just the froth of individual 

Furthermore, the AIs can:

(3) Gain significant economic benefits by reciprocally trading their 
software to each other.

And they must also have:

(4) Compatible motives in the long run.

When I look over the present AGI landscape, and imagine what would 
happen if an AGI reached the threshold of strong recursivity in the next 
  decade, I find myself thinking that:

(1) There are so few AGI projects around at all, let alone projects with 
a clue, that at the time the first AGI reaches the critical threshold, 
there will be no other AGIs in the near vicinity of power.

(2) Current AGI projects use such wildly differing theories that it 
would be a matter of serious difficulty for AGIs of less than superhuman 
ability to trade modules with each other.  (Albeit far less difficult 
than trading with humans.)  Or look at it this way - it takes a lot more 
programming ability to rewrite *another* AI's code than to rewrite your 
*own* code.  Brains predate language; internal bandwidth predates 
external bandwidth.  So the hard takeoff, when it starts, starts inside 
one AI.

(3) Different AGIs, having been produced by different designers on 
different AGI projects, will not be like humans who are all the same 
make and model of car and interact economically as equals.  More like 
different species.  The top AGI will have as little to gain from trading 
with the next runner-up as we have to gain from trading with 
chimpanzees.  Or less; chimpanzees are 95% similar to us.  Even 
Ricardo's Law falls off the edge of the interspecies abyss.  If the AI 
wants twice as much brainpower on the problem, it'll absorb twice as 
much processing power into itself.

(4) I'm not sure whether AIs of different motives would be willing to 
cooperate, even among the very rare Friendly AIs.  If it is *possible* 
to proceed strictly by internal self-improvement, there is a 
*tremendous* expected utility bonus to doing so, if it avoids having to 
share power later.

With respect to (4), I am admittedly not visualizing a large group of 
individuals interacting as rough equals.  *Those* would have a motive to 
form coalitions for fear of being beaten by other coalitions.  (Whether 
humans would be worth including into any coalition, on grounds of pure 
efficiency, is a separate issue.)  But if you *automatically* visualize 
a large group of individuals interacting as rough equals, you need to 
put more effort into questioning your anchoring on human norms.  The 
psychic unity of humankind *mandates* that healthy humans do not differ 
by the presence of entire complex adaptations.

*Of course* the economies you know run on entities who are all 
approximate equals - anyone who's not an approximate equal, like your 
chimp cousins, falls off the edge of vision.  Of course there are lots 
of similar individuals in a your economy - evolution doesn't produce 
unique prototypes, and human brains don't agglomerate into unitary 

 > You say that humans today and natural selection do not self-improve
 > in the "strong sense" because humans "haven't rewritten the human
 > brain," "its limbic core, its cerebral cortex, its prefrontal
 > self-models" and natural selection has not "rearchitected" "the
 > process of mutation and recombination and selection," with "its focus
 > on allele frequencies" while an AI "could rewrite its code from
 > scratch."
 > The code of an AI is
 > just one part of a larger system that would allow an AI to
 > self-improve, just as the genetic code is a self-modifiable part of
 > the larger system of natural selection, and human culture and beliefs
 > are a self-modifiable part of human improvement today.

Not "self-modifiable".  The genome (as Hofstadter emphasized at the 
Singularity Summit, the genetic code means the ATCG coding system) is 
modified by the logic of natural selection.  To discover a case in which 
gene-optimizing logic was embedded in the genome itself would be a 
stunning, Lamarckian revolution in biology.

The genome carries out processes, such as randomized sexual 
recombination, which are not of themselves optimizing, but which 
contribute to the logic of natural selection.  The logic of evolution is 
quite simple.  Sexual recombination is the only major example I can 
think of where the logic of evolution was significantly modified by 
genomic content.  Perhaps the original invention of DNA would count as 
replicators modifying the logic of evolution - though I'm not even sure 
I'd count that.

Neither random mutation, nor random recombination, actually implement 
the optimizing part of the process - the part that produces information 
in the genome.  That part comes from nonrandom environmental selection. 
  As far as I can think, the only genes which implement organismal-level 
optimization logics are those responsible for sexual selection within a 
species - and even they don't write directly to DNA.

It is a lot easier to understand how evolution works than to understand 
how the brain works.  Evolution is a small handful of tricks - point 
mutation, random recombination, natural selection, sexual selection. 
They play out in very complex ways, but the optimization logic is 
simple.  The human brain is a *much bigger* set of tricks and is 
correspondingly more efficient.  And yet the brain does not write to DNA.

Human culture and human beliefs are not a "self-modifiable" part of 
human improvement.  They are modified by human brains, but cannot freely 
rewrite the optimization logic of human brains.  One might argue that 
writing and science are analogous to the invention of DNA and sex 
respectively, significantly changing the rules of the game.  Even so 
there's an underlayer we can't reach.  If you think that the human brain 
isn't doing the important work of intelligence, only rules handed down 
culturally, then just try and program those cultural rules into a 
computer - if you can share them between humans, surely they're explicit 
enough to program...  What you'll find, after your AI project fails, is 
that your database of cultural knowledge consists of rules for how to 
pull levers on a complex machine you don't understand.  If you don't 
have the complex machine, the lever-pulling rules are useless.  If you 
don't believe me, just try to build a scientist using your declarative 
knowledge of how to be a good scientist.  It's harder than it looks.

We ain't got strong recursivity.

 > This argument seems to me to need a whole lot of elaboration and
 > clarification to be persuasive, if it is to go beyond the mere
 > logical possibility of rapid self-improvement.

The game here is follow-the-work-of-optimization, which is similar to 
follow-the-entropy in thermodynamics or follow-the-evidence in 
probability theory.

I can't do an analytic calculation of the RSI curve.  So why do I expect 
it to be "fast" as humans measure quickness?  Largely, it is an 
(admittedly imprecise) perception of lots of low-hanging fruit (with 
clear, obvious reasons why evolution or human engineering has not 
already plucked those fruit).  The most blatant case is the opportunity 
for fast serial speeds, and the second most blatant is the ability to 
absorb vast amounts of new hardware, but there's software issues too. 
Our "fast and frugal" heuristics are impressive for doing so much with 
so little, but humans not noticing the direction of correlations smaller 
than .6 probably throws away a *lot* of information.

The intuition of fast takeoff comes from realizing just *how much* room 
there is for improvement.  As in, orders and orders of magnitude.  In 
the case of hardware, this is readily visible; software is harder to 
understand and therefore there is a publication bias against it, but 
there is no reason in principle to expect evolved software to be closer 
to optimality than evolved hardware.  What I'm seeing (albeit 
imprecisely) is that human software is, like the hardware, orders and 
orders of magnitude short of optimality.  Think of it as an anthropic 
argument:  The software we use for general intelligence is the smallest 
possible incremental modification of a chimpanzee that lets the chimp 
build a computer, because if there were any way to do it with less, we'd 
be having this conversation about RSI at that level of intelligence instead.

Admittedly, this intuition is hard to convey.  If only there were some 
way of transferring procedural skills and intuitions!  Alas, we don't. 
Looks like we humans have a lot of room for improvement!

 > So a modest advantage for the AI's
 > internal sharing would not be enough - the advantage would have to be
 > enormous.

I think it will be.

Eliezer S. Yudkowsky                          http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

More information about the extropy-chat mailing list