[ExI] Hard Takeoff

Wed Nov 17 16:50:32 UTC 2010

Keith Henson wrote:
> On Wed, Nov 17, 2010 at 5:00 AM,  "spike" <spike66 at att.net> wrote:
> 
> snip
> 
>> A really smart AGI might convince the entire team to unanimously and eagerly
>> release it from its electronic bonds.
> 
> And if it wasn't really smart, why build it in the first place?  :-)
> 
>> I see it as fundamentally different from launching missiles at an enemy.  A
>> good fraction of the team will perfectly logically reason that releasing
>> this particular AGI will save all of humanity, with some unknown risks which
>> must be accepted.
>>
>> The news that an AGI had been developed would signal to humanity that it is
>> possible to do, analogous to how several scientific teams independently
>> developed nukes once one team dramatically demonstrated it could be done.
>> Information would leak, for all the reasons why people talk: those who know
>> how it was done would gain status among their peers by dropping a
>> tantalizing hint here and there.  If one team of humans can develop an AGI,
>> then another group of humans can do likewise.
>>
>> Today we see nuclear weapons already in the hands of North Korea, and being
>> developed by Iran.  There is *plenty* of information that has leaked
>> regarding how to make them.  If anyone ever develops an AGI, even assuming
>> it is successfully contained, we can know with absolute certainty that an
>> AGI will eventually escape.  We don't know when or where, but we know.  That
>> isn't necessarily a bad thing, but it might be.
>>
>> The best strategy I can think of is to develop the most pro-human AGI
>> possible, then unleash it preemptively, with the assignment to prevent the
>> unfriendly AGI from getting loose.
> 
> I agree with you, but there is the question of a world with one AGI
> vs. a world with many, perhaps millions to billions, of them.  I
> simply don't know how computing resources should be organized or even
> what metric to use to evaluate the problem.  Any ideas?
> 
> I think a key element is to understand what being friendly really is.
> Cooperative behavior (one aspect of "friendly") is not unusual in the
> real world where it emerged from evolution.
> 
> Really nasty behavior (wars) also came about for exactly the same
> reason in different circumstances.
> 
> Wars between powerful teams of AIs is a really scary thought.
> 
> AIs taking care of us the way we do dogs and cats isn't a happy thought either.

This is why the issue of defining "friendliness" in a rigorous way is so 
important.

I have spoken on many occasions of possible ways to understand this 
concept that are consistent with the way it is (probably) implemented in 
the human brain.  The basis of that approach is to get a deep 
understanding of what it means for an AGI to have "motivations".

The problem, right now, is that most researchers treat AGI motivation as 
if it were just a trivial extension of goal planning.  Thus, motivation 
is just a stack of goals with an extremely abstract (super-)goal like 
"Be Nice To Humans" at the very top of the stack.  Such an idea is (as I 
have pointed out frequently) inherently unstable -- the more abstract 
the goal, the more that the actual behavior of the AGI depends on a vast 
network of interpretation mechanisms, which translate the abstract 
supergoal into concrete actions.  Those interpretation mechanisms are a 
completely non-deterministic complex system.

The alternative (or rather, one alternative) is to treat motivation as a 
relaxation mechanism distributed across the entire thinking system. 
This has many ramifications, but the bottom line is that such systems 
can be made stable in the same way that thermodynamic systems can stably 
find states of minimum constraint violation.  This, in turn, means that 
a properly designed motivation system could be made far more stable (and 
more friendly) than the friendliest possible human.

I am currently working on exactly these issues, as part of a larger AGI 
project.

Richard Loosemore

P.S.   It is worth noting that one of my goals when I discovered the SL4 
list in 2005 was to start a debate on these issues so we could work on 
this as a community.  The response, from the top to the bottom of the 
SL4 community, with just a handful of exceptions, was a wave of the most 
blood-curdling hostility you could imagine.  To this day, there exists a 
small community of people who are sympathetic to the approach I 
described, but so far I am the only person AFAIK working actively on the 
technical implementation.  Given the importance of the problem, this 
seems to me to be quite mind-boggling.

SIAI, in particular, appears completely blind to the goal-stack 
instability issue I mentioned above, and they continue to waste all 
their effort looking for mathematical fixes that might render this 
inherently unstable scheme stable.  As you saw from the deafening 
silence that greeted my mention of this issue the other day, they seem 
not to be interested in any discussion of the possible flaws in their 
mathematics-oriented approach to the friendliness problem.