[ExI] Unfrendly AI is a mistaken idea.

Wed Jun 6 15:12:09 UTC 2007

> Stathis Papaioannou wrote:
>
> Suppose your goal is to win a chess game *adhering to the 
> rules of chess*. 

Do chess opponents at tournaments conduct themselves in ways that they hope might psyche out their opponent?  In my observations, hell yes.  And these ways are not explicitly excluded in the rules of chess.  They may or may not be constrained partially by the rules of the tournament.  For example, physical violence explicitly will get you ejected, in most cases, but a mean look won't. I don't think we'll have a good chance of explicitly excluding all possible classes of failure on every problem we ask the AI to solve. 

The meta-problem here could be summarized as this: what do you mean, exactly, by adhering to the rules of chess?  

As the problems you're asking the AI to solve become increasingly complex, the chances of making a critical error in your domain specification increases dramatically.  What we want is an AI that does *what we mean* rather than what it's told.  That's really one of the core goals of Friendly AI.  It's about solving the meta-problem, rather that requiring it be solved perfectly in each case where some problem is specified for solution.

> Managing its internal resources,  again, does not logically
> lead to managing the outside world.  

Nor does it logically exclude it. 

What I'm suggesting is that in the process of exploring and testing solutions and generalizing principles, we can't count on the AI *not* to stumble across (or converge rapidly upon) unexpected solution classes to the problems we stated.  And if we knew what all those possibilities were, we could explicitly exclude them ahead of time, as you suggested above, but the problem is too big for that.  

But also, would we really be willing to pay the price of throwing away "good" novel solutions that might get sniped by our well-intended exclusions?  In this respect, we're kind of like small children asking an AI to engineer a Jupiter Brain by excluding stuff that we know is dangerous.  So do whatever you need to, Mr. AI, but whatever you do, *absolutely DO NOT cross this street*; it's unacceptably dangerous.

> Such a thing needs to be explicitly or implicitly allowed 
> by the program. 

What we need to accommodate is that we're tasking a powerful intelligence with tasks that may involve steps and inferences beyond our ability to actively work with in anything resembling real time.  Sooner or later (often, I think), there will be things that are implicitly allowed by our definitions that we will simply will not comprehend.  We should solve that meta-problem before jumping, and make sure the AI can generate self-guidance based on our intentions, perhaps asking before plowing ahead.  

> It might suggest that certain experiments be performed, but
> trying to commandeer resources to ensure that these experiments
> are carried out would be like a chess program creating new pieces 
> for itself when it felt it was losing. You could design a chess 
> program that way but why would you? 

But what the AI is basically doing *is* designing a chess program, by applying its general intelligence in a specific way.  If I *could* design it that way, then so could the AI.  

Why would the AI design it that way?  Because the incomplete constraint parameters we gave it left that particular avenue open in the design space.  We probably forgot to assert one or more assumptions that humans take for granted; assumptions that come from our experience, general observer-biases, and from specific biases inherent in the complex functional adaptations of the human brain.

I wouldn't trust myself to catch them all.  Would you trust yourself, or anybody else?  

On the meta-problem, at least we have a shot...  I hope.

-Chris