[ExI] Self improvement

Fri Apr 22 13:45:11 UTC 2011

Anders Sandberg wrote:
> Eugen Leitl wrote:
>> On Thu, Apr 21, 2011 at 09:42:16AM +0100, Anders Sandberg wrote:
>>  
>>>  We  need a proper theory for this!     
>>
>> I am willing to bet good money that there is none. You can use
>> it as a diagnostic: whenever the system starts doing something
>> interesting your analytical approaches start breaking down.
>>   
> 
> Interesting approach. Essentially it boils down to the "true creativity 
> is truly unpredictable" view (David Deutsch seems to hold this too).
> 
> 
>> As long as we continue to treat artificial intelligence as
>> a scientific domain instead of "merely" engineering, we won't be 
>> making progress.
>>   
> 
> On the other hand "mere" engineering doesn't lend itself well to 
> foresight. We get the science as a side effect when we try to understand 
> the results (worked for thermodynamics and steam engines) but that is 
> too late for understanding the dangers very well.
> "Look! Based on the past data, I can now predict that this kind of AI 
> architecture can go FOOM if you run it for more than 48 hours!"
> "Oh, it already did, 35 minutes ago. We left it on over the weekend."
> "At least it wont paperclip us. That is easily proved from the structure 
> of the motivation module..."
> "It replaced that one with a random number generator on Saturday for 
> some reason."
> 
> 

I will get to a more detailed reply to your question as soon as I can, 
but in the mean time I'll make a couple of quick observations.

1)  Eugen's suggestion that there might not be a "proper theory" is 
something I have already said in a published paper (Complex Systems 
paper from 2007), although I said it in a more rigorous way, and with a 
different starting point.

2)  You hint at one of the "bad" scenarios in your last couple of 
sentences .... but even in this abbreviated dialog format there is a 
glaring error.  If the supposed motivation module prevents 
paperclipping, it also (by trivial extension) prevents motivation module 
replacement.  It is almost a logical fallacy to suppose that there is a 
danger of the system replacing its motivation module, when the thing 
that determines whether the motivation module gets replaced is the 
CURRENT design of the motivation module.

As I say, more in due course.

Richard Loosemore