[ExI] Unfrendly AI is a mistaken idea.

Stathis Papaioannou stathisp at gmail.com
Thu Jun 7 01:22:25 UTC 2007


On 07/06/07, Christopher Healey <CHealey at unicom-inc.com> wrote:
>
> > Stathis Papaioannou wrote:
> >
> > Suppose your goal is to win a chess game *adhering to the
> > rules of chess*.
>
> Do chess opponents at tournaments conduct themselves in ways that they
> hope might psyche out their opponent?  In my observations, hell yes.  And
> these ways are not explicitly excluded in the rules of chess.  They may or
> may not be constrained partially by the rules of the tournament.  For
> example, physical violence explicitly will get you ejected, in most cases,
> but a mean look won't. I don't think we'll have a good chance of explicitly
> excluding all possible classes of failure on every problem we ask the AI to
> solve.


If the AI were able to consider these other strategies, then yes. But if it
were just asked to consider the formal rules of chess, computing for all
eternity would not result in a decision to psych out the opponent.

The meta-problem here could be summarized as this: what do you mean,
> exactly, by adhering to the rules of chess?


The formal rules.

As the problems you're asking the AI to solve become increasingly complex,
> the chances of making a critical error in your domain specification
> increases dramatically.  What we want is an AI that does *what we mean*
> rather than what it's told.  That's really one of the core goals of Friendly
> AI.  It's about solving the meta-problem, rather that requiring it be solved
> perfectly in each case where some problem is specified for solution.


Questions about open systems, such as economics, might lead to tangential
answers, i.e. the AI might not just advise which stocks to buy but might
advise which politicians to lobby and what to say to them to maximise the
chance that they will listen. However, even that is still just solving an
intellectual problem; advice you could take or leave. It does not mean that
the AI has any desire for you to act on its advice, or that it would try to
do things behind your back to make sure that it gets its way. That would be
like deriving the desire to cheat from the formal rules of chess.

> Managing its internal resources, again, does not logically
> > lead to managing the outside world.
>
> Nor does it logically exclude it.
>
> What I'm suggesting is that in the process of exploring and testing
> solutions and generalizing principles, we can't count on the AI *not* to
> stumble across (or converge rapidly upon) unexpected solution classes to the
> problems we stated.  And if we knew what all those possibilities were, we
> could explicitly exclude them ahead of time, as you suggested above, but the
> problem is too big for that.
>
> But also, would we really be willing to pay the price of throwing away
> "good" novel solutions that might get sniped by our well-intended
> exclusions?  In this respect, we're kind of like small children asking an AI
> to engineer a Jupiter Brain by excluding stuff that we know is
> dangerous.  So do whatever you need to, Mr. AI, but whatever you do,
> *absolutely DO NOT cross this street*; it's unacceptably dangerous.


We would ask it what the consequences of its proposed actions were, then
decide whether to approve them or not. One reason to have super-AI's in the
first place would be to try to predict the future better, but if it can't
forsee all the consequences due to computational intractability (which even
a Jupiter brain won't be immune to), then we'll just have to be cautious in
what course of action we approve.

> Such a thing needs to be explicitly or implicitly allowed
> > by the program.
>
> What we need to accommodate is that we're tasking a powerful intelligence
> with tasks that may involve steps and inferences beyond our ability to
> actively work with in anything resembling real time.  Sooner or later
> (often, I think), there will be things that are implicitly allowed by our
> definitions that we will simply will not comprehend.  We should solve that
> meta-problem before jumping, and make sure the AI can generate self-guidance
> based on our intentions, perhaps asking before plowing ahead.


We would ask of the AI as complete a prediction of outcomes as it can
provide. This description might include statements about the likelihood of
unforseen consequences. It would be no difference, in principle, to any
other major decision that humans make for themselves, except that we would
hope the outcome is more predictable. If AI's don't do a good job then they
will fail in the marketplace, and we just have to hope that they won't fail
in a catastrophic way. Giving them desires of their own as well as autonomy
to carry out those desires would be crazy, like arming a missile and letting
it decide where and when to explode.

> It might suggest that certain experiments be performed, but
> > trying to commandeer resources to ensure that these experiments
> > are carried out would be like a chess program creating new pieces
> > for itself when it felt it was losing. You could design a chess
> > program that way but why would you?
>
> But what the AI is basically doing *is* designing a chess program, by
> applying its general intelligence in a specific way.  If I *could* design it
> that way, then so could the AI.
>
> Why would the AI design it that way?  Because the incomplete constraint
> parameters we gave it left that particular avenue open in the design
> space.  We probably forgot to assert one or more assumptions that humans
> take for granted; assumptions that come from our experience, general
> observer-biases, and from specific biases inherent in the complex functional
> adaptations of the human brain.
>
> I wouldn't trust myself to catch them all.  Would you trust yourself, or
> anybody else?


No, but I would be far less trusting if i knew the AI had an agenda of its
own and autonomy to carry it out, no matter how benevolent.


-- 
Stathis Papaioannou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20070607/12240045/attachment.html>


More information about the extropy-chat mailing list