[ExI] Self improvement

Sun Apr 24 15:54:59 UTC 2011

Eugen Leitl wrote:
> On Fri, Apr 22, 2011 at 02:40:28PM -0400, Richard Loosemore wrote:
>> Eugen Leitl wrote:
>>> On Fri, Apr 22, 2011 at 12:17:13PM -0400, Richard Loosemore wrote:
>>>
>>>> The question of whether the AIs would be friendly toward each other 
>>>> is a  matter of design, very much the same as the question of whether 
>>>> or not  they would be good at arithmetic.
>>> The difference is that "friendly" is meaningless until defined.
>>>
>>> Please define "friendly" in formal terms, including context.
>> That is not the difference at all.  It is not possible, and will never  
>> be possible, to define "friendly" in formal terms.  For the same reason  
> 
> We agree.
> 
>> that it is not possible to define "intelligent" in formal terms.
> 
> Yes; but it's irrelevant to the key assumption of people believing
> in guardian agents. Intelligence is as intelligence does, but in
> case of overcritical friendly you have zero error margin, at each 
> interaction step of open-ended evolution on the side of both the 
> guardian and guarded agent populations. There is no "oh well, we'll 
> scrap it and redo it again". Because of this you need a brittle
> definition, which already clashes with overcriticality, which
> by necessity must be robust in order to succeed at all (in order
> to overcome the sterility barreer in absence of enough knowledge
> about where fertile regions start).

This paragraph makes absolutely no sense to me.  Too dense.

>> There are other ways to effect a definition without it being "formal". I 
>> we understand the roots of human motivation, and if we understand the  
> 
> Humans are insufficiently friendly as is. The best you can do is
> to destill the ethics into a small hand-picked "the right stuff"
> tiny subpopulation. This is pretty thin, but I think it's our best
> bet.
> 
>> components of human motivation in such a way that we can differentiate  
>> "empathic" behavior from other kinds, like selfishly aggressive  
>> behavior, then we could discover that (as seems likely) these are  
>> impemented by separable modules in the human brain.
> 
> I'm not sure the brain is sufficiently modular to allow for such
> easy extraction. But this definitely warrants further research.

It is not really a matter of the brain's modularity.

This is something I have been working on for a very long time, so the 
line of my argument is nowhere near as simple as "find the right brain 
modules and we're done".

The actual argument (which I always have to oversimplify for brevity) is 
that:

1)  In order to understand cognition (not motivation, but tought 
proceses) it is necessary to take an approach that involves multiple, 
simultaneous, dynamic, weak constraint satisfaction (a mouthful: 
basically, a certain type of unusual neural computation).

2)  For theoretical reasons, it is best to have the motivation of such a 
system *decoupled* from the relaxation network..... meaning, that the 
motivation process is not just oe part of the activity involved in 
thinking, its origin lies outside the flow of thought.

3)  The mechanisms of motivation end up looking like a set of bias 
fields or gradients on the relaxation of the cognition process:  so, 
there is a "slope" inherent in the cognition process, and the motivation 
mechanisms raise or lower that slope.  (In fact, it is not one slope: 
it is a set of them, acting along different "dimensions").

4)  This picture of the way that the human mind "ought" to operate 
appears to be consistent with the neuroscience that we know so far.  In 
particular, there do indeed appear to be some distinct centers that 
mediate at least some motivations.

5)  Whether or not the "centers" for the various types of motivation in 
the human brain can be disentangled (that might not be possible, becuase 
they could be physically close together or overlapping), if the general 
principle is correct, then it allows us to build a corresponding 
artificial system in which the distinct sources can be added or omitted 
by design.  Thus, even if they aggression and empathy modules are 
overlapping in the human brain's physiology, they can still be 
disentangled functionally, and one of them can be left out of the AGI 
design.

>> Under those circumstances we would be in a position to investigate the  
>> dynamics of those modules and in the end we could come to be sure that  
>> with that type of design, and with the aggressive modules missing from  
>> the design, a thinking creature would experience all of the empathic  
>> motivations that we so prize in human beings, and not even be aware of  
>> the violent or aggressive motivations.
> 
> You're thinking about de novo, but it might work in people as well. However,
> the idea of engineering (emulated) people to make them superior ethical 
> agents by itself appears on thin ice, even if mutually consensual. There
> definitely lots of dangers involved.

I know not why you say this.  We can only address the specific context 
of the model I am proposing, we cannot just start making general 
statements that "there are definitely lots of dangers".

If I can enough time to work full time on these issues, I would be able 
to write several papers explaining all this in luxurious detail and 
giving experimental evidence.  As it is, I have to work more than full 
time teaching for the next 12 months at least.

Richard Loosemore