[ExI] Self improvement

Sun Apr 24 10:21:50 UTC 2011

On Fri, Apr 22, 2011 at 02:40:28PM -0400, Richard Loosemore wrote:
> Eugen Leitl wrote:
>> On Fri, Apr 22, 2011 at 12:17:13PM -0400, Richard Loosemore wrote:
>>
>>> The question of whether the AIs would be friendly toward each other 
>>> is a  matter of design, very much the same as the question of whether 
>>> or not  they would be good at arithmetic.
>>
>> The difference is that "friendly" is meaningless until defined.
>>
>> Please define "friendly" in formal terms, including context.
>
> That is not the difference at all.  It is not possible, and will never  
> be possible, to define "friendly" in formal terms.  For the same reason  

We agree.

> that it is not possible to define "intelligent" in formal terms.

Yes; but it's irrelevant to the key assumption of people believing
in guardian agents. Intelligence is as intelligence does, but in
case of overcritical friendly you have zero error margin, at each 
interaction step of open-ended evolution on the side of both the 
guardian and guarded agent populations. There is no "oh well, we'll 
scrap it and redo it again". Because of this you need a brittle
definition, which already clashes with overcriticality, which
by necessity must be robust in order to succeed at all (in order
to overcome the sterility barreer in absence of enough knowledge
about where fertile regions start).

> There are other ways to effect a definition without it being "formal". I 
> we understand the roots of human motivation, and if we understand the  

Humans are insufficiently friendly as is. The best you can do is
to destill the ethics into a small hand-picked "the right stuff"
tiny subpopulation. This is pretty thin, but I think it's our best
bet.

> components of human motivation in such a way that we can differentiate  
> "empathic" behavior from other kinds, like selfishly aggressive  
> behavior, then we could discover that (as seems likely) these are  
> impemented by separable modules in the human brain.

I'm not sure the brain is sufficiently modular to allow for such
easy extraction. But this definitely warrants further research.

> Under those circumstances we would be in a position to investigate the  
> dynamics of those modules and in the end we could come to be sure that  
> with that type of design, and with the aggressive modules missing from  
> the design, a thinking creature would experience all of the empathic  
> motivations that we so prize in human beings, and not even be aware of  
> the violent or aggressive motivations.

You're thinking about de novo, but it might work in people as well. However,
the idea of engineering (emulated) people to make them superior ethical 
agents by itself appears on thin ice, even if mutually consensual. There
definitely lots of dangers involved.

> With that information in hand, we would then be able to predict that the  
> appropriately designed AI would exhibit a cluster of behaviors that, in  
> normal human terms, would be summed up by the label "friendly".  By that  

But we're about the leave the domain of "normal human terms". At which
point is there self-termination for the guardian task? And who exactly
are they guarding against whom?

> stage, the term "friendliness" would have achieved the status of having  
> a functional definition in terms of the behavior, and the stability, of  
> certain brain modules (and their AGI counterparts).  At that point we  
> would have:

What if the human condition has changed? Will your agents track
the change, or be a constant? Which of the reference point in the
new species are you using? The meat puppet? It seems a rather awful
idea to imprint a simian paw upon the universe, world without end.

> a) A common-speech, DESCRIPTIVE definition of "friendliness",
>
> b) A technical, FUNCTIONAL definition of "friendliness", and
>
> c) A method for designing systems in which the effectiveness and  
> stability of the friendliness modules could be stated in concrete,  
> measureable terms.  This would give us everything we could ask for in  
> the way of a guarantee of friendliness, short of a guarantee based on  
> logically necessary premises and strictly correct logical deduction (a  
> type of guarantee that is virtually impossible in any real world 
> science).
>
> --
>
> So, if you are going to argue that "friendliness" cannot be given a  
> formal definition in the mathematical sense, I agree with you.
>
> If you are going to claim that this has any relevance for the question  
> at hand, you are mistaken, and you need to address the above line of  
> argument if you want to maintain otherwise.

I think what you're trying to build is not terribly meaningful.
But it appears to be benign enough so that it doesn't appear to 
add to the problem space.

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE