[ExI] Yes, the Singularity is the greatest threat to humanity

Mon Jan 17 14:30:47 UTC 2011

Eugen Leitl wrote:
> To be able to build friendly you must first be able to define friendly.
> 

This statement is at least partially true.  An absolute definition of 
friendliness does not have to be possible, because it would be possible 
to build an AGI that empathizes with human aspirations without writing 
down a closed-form definition of "friendliness".  However, you would 
have to have a good working understanding of the mechanism that 
implemented the empathy.

Still, what we can agree about, thus far, is that there is not yet a 
good definition of what friendliness is or how empathy mechanisms work.

Which means that no firm statements about the dynamics of friendliness 
or empathy can be made at this point.....

> What is friendly today is not friendly tomorrow. What is friendly to me,
> a god, is not friendly to you, a mere human.

.... which means that this statement utterly contradicts the above.

In the absence of a theory of friendliness etc, we cannot say things 
like "What is friendly today is not friendly tomorrow" or "What is 
friendly to me, a god, is not friendly to you, a mere human".

> When you bootstrap de novo by co-evolution in a virtual environment and
> aim for very high fitness target is extremely unlikely to be a good team 
> player to us meat puppets.

... nor can this conclusion be reached.

And, in Anders' post on the same topic earlier, we had:

Anders Sandberg wrote:
> There are far more elegant ways of ensuring friendliness than
> assuming Kantianism to be right or fixed axioms. Basically, you try
> to get the motivation system to not only treat you well from the
> start but also be motivated to evolve towards better forms of
> well-treating (for a more stringent treatment, see Nick's upcoming
> book on intelligence explosions). Unfortunately, as Nick, Randall
> *and* Eliezer all argued today (talks will be put online on the FHI
> web ASAP) getting this friendliness to work is *amazingly* hard.

Getting *which* mechanism of "friendliness" to work?  Which theory of 
the friendliness and empathy mechanisms is being assumed?  My 
understanding is that no such theory exists.

You cannot prove that "getting X to work is *amazingly* hard" when X has 
not been characterized in anything more than a superficial manner.

Richard