[ExI] Self improvement

Fri Apr 22 18:40:28 UTC 2011

Eugen Leitl wrote:
> On Fri, Apr 22, 2011 at 12:17:13PM -0400, Richard Loosemore wrote:
> 
>> The question of whether the AIs would be friendly toward each other is a  
>> matter of design, very much the same as the question of whether or not  
>> they would be good at arithmetic.
> 
> The difference is that "friendly" is meaningless until defined.
> 
> Please define "friendly" in formal terms, including context.

That is not the difference at all.  It is not possible, and will never 
be possible, to define "friendly" in formal terms.  For the same reason 
that it is not possible to define "intelligent" in formal terms.

There are other ways to effect a definition without it being "formal". 
I we understand the roots of human motivation, and if we understand the 
components of human motivation in such a way that we can differentiate 
"empathic" behavior from other kinds, like selfishly aggressive 
behavior, then we could discover that (as seems likely) these are 
impemented by separable modules in the human brain.

Under those circumstances we would be in a position to investigate the 
dynamics of those modules and in the end we could come to be sure that 
with that type of design, and with the aggressive modules missing from 
the design, a thinking creature would experience all of the empathic 
motivations that we so prize in human beings, and not even be aware of 
the violent or aggressive motivations.

With that information in hand, we would then be able to predict that the 
appropriately designed AI would exhibit a cluster of behaviors that, in 
normal human terms, would be summed up by the label "friendly".  By that 
stage, the term "friendliness" would have achieved the status of having 
a functional definition in terms of the behavior, and the stability, of 
certain brain modules (and their AGI counterparts).  At that point we 
would have:

a) A common-speech, DESCRIPTIVE definition of "friendliness",

b) A technical, FUNCTIONAL definition of "friendliness", and

c) A method for designing systems in which the effectiveness and 
stability of the friendliness modules could be stated in concrete, 
measureable terms.  This would give us everything we could ask for in 
the way of a guarantee of friendliness, short of a guarantee based on 
logically necessary premises and strictly correct logical deduction (a 
type of guarantee that is virtually impossible in any real world science).

--

So, if you are going to argue that "friendliness" cannot be given a 
formal definition in the mathematical sense, I agree with you.

If you are going to claim that this has any relevance for the question 
at hand, you are mistaken, and you need to address the above line of 
argument if you want to maintain otherwise.

Richard Loosemore