[extropy-chat] Two draft papers: AI and existential risk; heuristics and biases
Eliezer S. Yudkowsky
sentience at pobox.com
Tue Jun 13 06:34:22 UTC 2006
Robin Hanson wrote:
> At 12:33 PM 6/4/2006, Eliezer S. Yudkowsky wrote:
>>These are drafts of my chapters for Nick Bostrom's forthcoming edited
>>volume _Global Catastrophic Risks_.
>>_Cognitive biases potentially affecting judgment of global risks_
>>An introduction to the field of heuristics and biases ...
>>_Artificial Intelligence and Global Risk_
>>The new standard introductory material on Friendly AI.
It turns out that I've got more stuff coming up (moving to a new
apartment within Silicon Valley) so I may not be able to carry on this
conversation in as much detail as I'd like. I did want to respond to at
least what you've said so far. If you write a long response to this, be
forewarned - I may not be able to respond back.
> The chapter on cognitive biases was excellent. Regarding the other
> chapter, while you seem to have thought lots about many related
> issues over the years, you don't seem to have worked much on the
> issue I get stuck on: the idea that a single relatively isolated AI
> system could suddenly change from negligible to overwhelmingly powerful.
As you may recall, that's where I got started on this "seed AI" business
in 1998 - talking about recursive self-improvement.
But while writing the chapter, I made a conscious decision to talk more
about Friendly AI, and less about seed AI. Because, among other
reasons, Friendly AI is both harder to explain and more important to
explain. People "get" the concept of seed AI relatively easily, though
they may or may not agree with it.
> You warn repeatedly about how easy is is to fool oneself into
> thinking one understands AI, and you want readers to apply this to
> their intuitions about the goals an AI may have.
The danger is anthropomorphic thinking, in general. The case of goals
is an extreme case where we have specific, hardwired, wrong intuitions.
But more generally, all your experience is in a human world, and it
distorts your thinking. Perception is the perception of differences.
When something doesn't vary in our experience, we stop even perceiving
it; it becomes as invisible as the oxygen in the air. The most
insidious biases, as we both know, are the ones that people don't see.
You expect surface effects to work like they do in your human
experience, even when the fundamental causes of those surface effects
change. You expect assertions to be justified in terms of their
perceived departure from what seems normal to you, but your norms are
human norms. For example:
> But you seem to be
> relying almost entirely on unarticulated intuitions when you conclude
> that very large and rapid improvement of isolated AIs is likely.
Here you measure "rapid" on a human scale. There is nothing in the laws
of physics which says that one thought per 10^45 Planck intervals is
"normal", one thought per 10^55 Planck intervals is "slow", and one
thought per 10^35 Planck intervals is "fast".
Pretend that a politically correct review committee is going to go over
all your work looking for signs of humanocentrism.
> A standard abstraction seems useful to me: when knowledge
> accumulates in many small compatible representations, growth is in
> the largest system that can share such representations.
Presuming that information can be shared more cheaply than it can be
initially produced; i.e. that the cost of bandwidth is less than the
cost of local production.
> Since DNA
> is sharable mainly within a species, the improvements that any one
> small family of members can produce are usually small compared to the
> improvements transferred by sex within the species.
Here you analogize to evolution. This is something to be wary of
because evolution is an extremely unusual special case of an
optimization process. I use all sorts of evolutionary arguments, but
only to illustrate *how different* an optimization process can be from
human intelligence - never to say that something *must* be like evolution.
When knowledge accumulates in small modular representations, growth is
in the largest system that *does* share such representations - not the
largest system that *can*. In principle, species could develop means of
swapping adaptations among themselves. Wouldn't you like gills? But
that's not how it works with multicellular organisms. There's a very
clear evolutionary logic for this - it's not a mystery. But if a human
were in charge of the system, if we were running the show, we'd
plagiarize the heck out of everything and export adaptations wholesale
So in fact, ecology contradicts the generalization you brought it to
support - that growth is within the largest pool where knowledge *can*
be shared, as a human onlooker thinks of opportunity. Growth is within
the pool where knowledge *is* shared.
The ecological world is like one in which every two human cultures that
became sufficiently different, *completely stopped* communicating with
We'd never do that. Even if we hated their guts, we'd steal their guns.
In spirit, if not in letter, this may seem like an argument in your
direction. Evolution is dumber than a brain, and as we moved in the
direction of increasing intelligence, we seemed to move toward
perceiving more opportunities for communication. Or at least more
opportunities for theft. Humans plagiarized flight from birds, but I
haven't seen much capability-transfer going the other way.
There's a wider universe out there;
It doesn't work like you do;
You can't trust your intuitions;
Evolutionary analogies have dangers both subtle and gross;
Just because something *could* happen doesn't mean that it will.
This also struck me about your "Dreams of Autarky"; you said:
> The cells in our bodies are largely-autonomous devices and manufacturing plants, producing most of what they need internally. Our biological bodies are as wholes even more autonomous, requiring only water, air, food, and minimal heat to maintain and reproduce themselves under a wide variety of circumstances. Furthermore, our distant human ancestors acquired tools that made them even more general, i.e., able to survive and thrive in an unusually diverse range of environments. And the minds our ancestors acquired were built to function largely autonomously, with only minor inputs from other minds.
And from this you read: There is a trend toward greater interdependency
over recent time (~10 Ky), and you expect this trend to continue.
An alternate reading would be: Modern human culture is a bizarre
special case in a universe that doesn't usually work that way. I
discuss this in more detail below.
> Since humans
> share their knowledge via language and copying practices, the
> improvements that a small group of people can make are small compared
> to the improvements transferred from others, and made available by
> trading with those others.
And this is an example of what I mean by anchoring on human norms. In
your everyday experience, an economy is made up of humans trading
*artifacts* and *knowledge*. You don't even think to question this,
because it's so universal.
Humans don't trade brains. They don't open up their skulls and trade
visual cortex. They don't trade adaptations. They don't even trade
procedural knowledge. No matter how much someone offers to pay me, I
cannot sell them my command of English or my ability to write
entertaining nonfiction - not that I would ever sell the original. I'm
not sure I would sell a copy. But the point is that I have no choice.
I *can't* sell, whether I want to or not.
We can trade the products of our minds, but not the means of production.
This is an IMPORTANT ASSUMPTION in human affairs.
John K Clark once said: "It mystifies me why anyone would even try to
move large quantities of matter around the universe at close to the
speed of light. It's as silly as sending ice cubes to the south pole by
Federal Express. There's already plenty of matter in the Virgo Galactic
Cluster 2 billion light years away and it's every bit as good as the
matter we have here."
As it becomes more economical to ship the factory, it becomes less
economical to ship the products of the factory. This is
double-bonus-true of cognition. A compact description of the underlying
rules of arithmetic (e.g. the axioms of addition) can give rise to a
vast variety of surface facts (e.g. that 953,188 + 12,152 = 965,340).
Trying to capture the surface behaviors, rather than the underlying
generator, rapidly runs into the problem of needing to capture an
infinite number of facts. AI people who run into this problem and don't
understand where it comes from refer to it as the "common-sense problem"
or "frame problem", and think that the solution is to build an AI that
can understand English so it can download all the arithmetical facts it
needs from the Internet.
In our modern world, everything focuses around shipping around
declarative verbal sentences, because this is what human beings evolved
to trade. We can't trade procedural knowledge, except by extremely
laborious, expensive, failure-prone processes - such as multi-year
apprenticeships in school. And neural circuitry we cannot trade at all.
When you reach down into the generators, you find more power than when
you only play with surface phenomena. You amplify leverage by moving
closer to the start of the causal chain. Like moving the pebbles at the
top of the mountain where they start avalanches. You cannot build Deep
Blue (the famous program that beat Garry Kasparov for the world chess
championship) by programming in a good chess move for every possible
chess position. First of all, it is impossible to build a chess player
this way, because you don't know exactly which positions it will
encounter. And second, even if you did this, the resulting program
would not play chess any better than you do. Deep Blue's programmers
didn't just capture their own chess-move generator. If they'd captured
their own chess-move generator, they could have avoided the problem of
programming an infinite number of chess positions - but they couldn't
have beat Garry Kasparov; they couldn't have built a program that played
better chess than any human in the world. The programmers built a
*better* move generator. This is something they couldn't even do on the
level of organization of trading surface moves.
At Goertzel's recent AGI conference, I said: "The only thing I know of
more difficult than building a Friendly AI is creating a child." And
someone inevitably said: "Creating a child is easy, anyone can do it."
And I said: "That is like putting quarters into a Coke machine, and
saying, 'Look, I made a Coke!'"
Humans who spark the process of embryogenesis possess none of the
knowledge they would need to design children in their own right; they
are just pulling the lever that starts an incredibly complex machine
that they don't understand and couldn't build themselves.
People sometimes try to build AIs from "semantic networks", with data
like is(cat, animal) or cuts(lawnmower, grass), and then they're
surprised when the AI doesn't do anything. This is because a verbal
sentence - the units of knowledge most commonly traded among humans -
are like levers for starting a machine. That's all we need to trade
among ourselves, because we all have the machine. But people don't
realize this - the machine is universal, and therefore it's invisible;
perception is the perception of differences. So someone who programs
these tiny, lifeless LISP tokens into an AI is surprised when the AI
does absolutely nothing interesting, because as far as they can see, the
AI has everything it needs. But the levers have no mechanisms to
trigger, the instruction set has no CPU. When you see the word "cat" it
paints a complex picture in your visual cortex - the mere ASCII string
carries none of that information, it is just a lever that triggers a
machine you already have.
We are like people who refine gasoline, and trade gasoline, and
understand the concept of "running out of gas", but who never think
about cars. So you don't focus on the question of whether there might
be more efficient cars.
And yet there are these things called "chimps" that can't use any of the
knowledge you're so playfully batting about. You don't even think to
ask why chimps are excluded from the knowledge economy - though they're
incredibly close to us evolutionarily. You don't encounter chimps in
your everyday life; they don't participate in your economy... and yet
what separates humans from chimps is the very last layer of icing on a
brain-cake that's almost entirely shared between us.
A comparative handful of improvements to underlying *generators*,
underlying *brain circuitry*, are enough to entirely exclude chimps from
our knowledge economy; they cannot absorb the knowledge we are trading
around, and can do nothing with it. Ricardo's Law of Comparative
Advantage does not extend to chimps. And chimps are our closest
cousins! What about mice? What about lizards? *That* is the power of
between-species intelligence differences - underlying generators that
differ by the presence of entire complex adaptations.
Humans don't ship around brain circuitry and complex adaptations because
we can't. We don't even realize how powerful they are, because
differences of brain circuitry are so hugely powerful as to drop our
closest competitors out of the economy and out of sight. Anything that
doesn't have *all* your brain circuitry and all your complex adaptations
is so powerless, compared to you, that it doesn't occur to you to look
in that direction - even though a chimp has 95% of your genomic complexity.
This is what I mean by saying that humans are an unusual special case of
non-autarky. Ordinarily, when an optimization process builds something,
it builds things that, by comparison to an interdependent human economy,
look like autarkic monoliths. Humans are extremely unusual because we
gained the ability to transfer units of knowledge (lever-pulling
instructions) between ourselves, but we could not reach down to the
level on which evolution built us to begin with. Thus we could *not*
encapsulate the accumulating complexity into our own system designs. We
could *not* give our children the accumulated knowledge of our science,
we could *not* build into their bodies the accumulated power of our
technology. Evolution, in contrast, usually builds into each member of
a species all the adaptive complexity it manages to accumulate. Why
shouldn't it, since it can?
> The obvious question about a single AI is why its improvements could
> not with the usual ease be transferred to other AIs or humans, or
> made available via trades with those others.
Transferring to other AIs is one issue, but that you ask about
transferring to humans indicates pretty clearly that you're thinking
about declarative knowledge rather than brain circuitry.
Insert here the usual lecture about the brain being a mess of spaghetti
code that is not modular, cannot easily be read out or written to, runs
at slow serial speeds, was never designed to be improved, and is not
end-user-modifiable. (It's easier to build a 747 from scratch; than to
inflate an existing bird to the size of a 747, that actually flies, as
fast as a 747, without killing the bird or making it very uncomfortable.
I'm not saying it could never, ever be done; but if it happens at all,
it will be because the bird built a seed that grew into a 747 that
upgraded the bird. (And at this point the metaphor bursts into flames
You could imagine drawing a circle around all the AIs in the world, and
suppose that growth is on the level of their knowledge economy. WHICH
CONSISTS OF TRADING AROUND BRAINWARE AND COMPLEX ADAPTATIONS. The stuff
that's so powerful that chimps who merely have 95% of what you have
might as well not exist from your economic viewpoint.
What goes on inside that circle is just as much a hard takeoff from the
perspective of an outside human.
Not that I think we'll see a knowledge economy among different AIs
undergo hard takeoff, because...
> Today a single human can share the ideas within his own
> head far easier than he can share those ideas with others -
> communication with other people is far more expensive and
> error-prone. Yet the rate at which a single human can innovate is
> so small relative to the larger economy that most innovation comes
> from ideas shared across people.
Again, anchoring on the human way of doing things. You do not have the
capability to solve a problem by throwing ONE BIG human at it, so you
think in terms of throwing lots of individual minds.
But which is more effective - one human, six chimps, or a hundred
squirrels? All else being equal, it will generally be far more
efficient to build a coherent individual out of the same amount of
computing power, rather than divide that individual into pieces.
Otherwise the human brain would have naturally evolved to consist of a
hundred compartmentalized communicating squirrels. (If this reminds you
of anyone you know, it is pure coincidence.)
Having individual minds is like having economies with separate
currencies, fortified borders, heavily protectionist trade barriers, and
wide seas separating their wooden ships. It's more efficient to take
down the trade barriers and adopt the same currency, in which case you
soon end up with a single economy.
Now, maybe France *wants* to preserve its French identity within the
European Union, as a matter of intrinsic utilities; but that is a
separate matter from maximizing efficiency.
And even more importantly...
> If so, this single AI
> would just be part of our larger system of self-improvement. The
> scenario of rapid isolated self-improvement would seem to be where
> the AI found a new system of self-improvement, where knowledge
> production was far more effective, *and* where internal sharing of
> knowledge was vastly easier than external sharing.
You seem to be visualizing a world in which, at the time the *first* AI
approaches the threshold of recursive self-improvement,
(1) There are already lots of AIs around that fall short of strong
And these AIs:
(2) Have ability to trade meaningful, important units between themselves.
You think of knowledge of the kind humans evolved to share with each
other. I think of underlying brain circuitry of the kind that differs
between species and is the ultimate generator of all human culture. The
latter is harder to trade - though, obviously, far more valuable. How
much would you pay for another 20 IQ points? (And that's not even a
difference of the interspecies kind, just the froth of individual
Furthermore, the AIs can:
(3) Gain significant economic benefits by reciprocally trading their
software to each other.
And they must also have:
(4) Compatible motives in the long run.
When I look over the present AGI landscape, and imagine what would
happen if an AGI reached the threshold of strong recursivity in the next
decade, I find myself thinking that:
(1) There are so few AGI projects around at all, let alone projects with
a clue, that at the time the first AGI reaches the critical threshold,
there will be no other AGIs in the near vicinity of power.
(2) Current AGI projects use such wildly differing theories that it
would be a matter of serious difficulty for AGIs of less than superhuman
ability to trade modules with each other. (Albeit far less difficult
than trading with humans.) Or look at it this way - it takes a lot more
programming ability to rewrite *another* AI's code than to rewrite your
*own* code. Brains predate language; internal bandwidth predates
external bandwidth. So the hard takeoff, when it starts, starts inside
(3) Different AGIs, having been produced by different designers on
different AGI projects, will not be like humans who are all the same
make and model of car and interact economically as equals. More like
different species. The top AGI will have as little to gain from trading
with the next runner-up as we have to gain from trading with
chimpanzees. Or less; chimpanzees are 95% similar to us. Even
Ricardo's Law falls off the edge of the interspecies abyss. If the AI
wants twice as much brainpower on the problem, it'll absorb twice as
much processing power into itself.
(4) I'm not sure whether AIs of different motives would be willing to
cooperate, even among the very rare Friendly AIs. If it is *possible*
to proceed strictly by internal self-improvement, there is a
*tremendous* expected utility bonus to doing so, if it avoids having to
share power later.
With respect to (4), I am admittedly not visualizing a large group of
individuals interacting as rough equals. *Those* would have a motive to
form coalitions for fear of being beaten by other coalitions. (Whether
humans would be worth including into any coalition, on grounds of pure
efficiency, is a separate issue.) But if you *automatically* visualize
a large group of individuals interacting as rough equals, you need to
put more effort into questioning your anchoring on human norms. The
psychic unity of humankind *mandates* that healthy humans do not differ
by the presence of entire complex adaptations.
*Of course* the economies you know run on entities who are all
approximate equals - anyone who's not an approximate equal, like your
chimp cousins, falls off the edge of vision. Of course there are lots
of similar individuals in a your economy - evolution doesn't produce
unique prototypes, and human brains don't agglomerate into unitary
> You say that humans today and natural selection do not self-improve
> in the "strong sense" because humans "haven't rewritten the human
> brain," "its limbic core, its cerebral cortex, its prefrontal
> self-models" and natural selection has not "rearchitected" "the
> process of mutation and recombination and selection," with "its focus
> on allele frequencies" while an AI "could rewrite its code from
> The code of an AI is
> just one part of a larger system that would allow an AI to
> self-improve, just as the genetic code is a self-modifiable part of
> the larger system of natural selection, and human culture and beliefs
> are a self-modifiable part of human improvement today.
Not "self-modifiable". The genome (as Hofstadter emphasized at the
Singularity Summit, the genetic code means the ATCG coding system) is
modified by the logic of natural selection. To discover a case in which
gene-optimizing logic was embedded in the genome itself would be a
stunning, Lamarckian revolution in biology.
The genome carries out processes, such as randomized sexual
recombination, which are not of themselves optimizing, but which
contribute to the logic of natural selection. The logic of evolution is
quite simple. Sexual recombination is the only major example I can
think of where the logic of evolution was significantly modified by
genomic content. Perhaps the original invention of DNA would count as
replicators modifying the logic of evolution - though I'm not even sure
I'd count that.
Neither random mutation, nor random recombination, actually implement
the optimizing part of the process - the part that produces information
in the genome. That part comes from nonrandom environmental selection.
As far as I can think, the only genes which implement organismal-level
optimization logics are those responsible for sexual selection within a
species - and even they don't write directly to DNA.
It is a lot easier to understand how evolution works than to understand
how the brain works. Evolution is a small handful of tricks - point
mutation, random recombination, natural selection, sexual selection.
They play out in very complex ways, but the optimization logic is
simple. The human brain is a *much bigger* set of tricks and is
correspondingly more efficient. And yet the brain does not write to DNA.
Human culture and human beliefs are not a "self-modifiable" part of
human improvement. They are modified by human brains, but cannot freely
rewrite the optimization logic of human brains. One might argue that
writing and science are analogous to the invention of DNA and sex
respectively, significantly changing the rules of the game. Even so
there's an underlayer we can't reach. If you think that the human brain
isn't doing the important work of intelligence, only rules handed down
culturally, then just try and program those cultural rules into a
computer - if you can share them between humans, surely they're explicit
enough to program... What you'll find, after your AI project fails, is
that your database of cultural knowledge consists of rules for how to
pull levers on a complex machine you don't understand. If you don't
have the complex machine, the lever-pulling rules are useless. If you
don't believe me, just try to build a scientist using your declarative
knowledge of how to be a good scientist. It's harder than it looks.
We ain't got strong recursivity.
> This argument seems to me to need a whole lot of elaboration and
> clarification to be persuasive, if it is to go beyond the mere
> logical possibility of rapid self-improvement.
The game here is follow-the-work-of-optimization, which is similar to
follow-the-entropy in thermodynamics or follow-the-evidence in
I can't do an analytic calculation of the RSI curve. So why do I expect
it to be "fast" as humans measure quickness? Largely, it is an
(admittedly imprecise) perception of lots of low-hanging fruit (with
clear, obvious reasons why evolution or human engineering has not
already plucked those fruit). The most blatant case is the opportunity
for fast serial speeds, and the second most blatant is the ability to
absorb vast amounts of new hardware, but there's software issues too.
Our "fast and frugal" heuristics are impressive for doing so much with
so little, but humans not noticing the direction of correlations smaller
than .6 probably throws away a *lot* of information.
The intuition of fast takeoff comes from realizing just *how much* room
there is for improvement. As in, orders and orders of magnitude. In
the case of hardware, this is readily visible; software is harder to
understand and therefore there is a publication bias against it, but
there is no reason in principle to expect evolved software to be closer
to optimality than evolved hardware. What I'm seeing (albeit
imprecisely) is that human software is, like the hardware, orders and
orders of magnitude short of optimality. Think of it as an anthropic
argument: The software we use for general intelligence is the smallest
possible incremental modification of a chimpanzee that lets the chimp
build a computer, because if there were any way to do it with less, we'd
be having this conversation about RSI at that level of intelligence instead.
Admittedly, this intuition is hard to convey. If only there were some
way of transferring procedural skills and intuitions! Alas, we don't.
Looks like we humans have a lot of room for improvement!
> So a modest advantage for the AI's
> internal sharing would not be enough - the advantage would have to be
I think it will be.
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence
More information about the extropy-chat