[ExI] 'Friendly' AI won't make any difference

Fri Feb 26 23:29:21 UTC 2016

On 2016-02-26 16:40, John Clark wrote:
>
> No I don't mean that, I mean running around in a circle and making no 
> progress but having no way to know for sure that you're running around 
> in a circle and making no progress. Turing proved there is in general 
> no way to know if you're in a infinite loop or not.

No, he did not. You are confusing the halting theorem (there is no 
algorithm that can determine if a program given to it will halt) with 
detecting an infinite loop. Note that a program X can be extended to a 
program X' (or run by an interpreter) that maintains a list of past 
states and check if X returns to a previous state. It will accurately 
detect its infinite looping when it occurs.

It is undecidable to tell if a program with something like the Collatz 
problem (if even, divide X by two, otherwise multiply by three and add 
one; repeat) ends up in the infinite loop 4-2-1-4 or does something 
else. But add the above memory, and it will detect when it gets into a 
loop and report it.

>      > 
>     which is a very different thing. But even there we have
>     counterexamples: evolution is a fitness-maximizer
>
>
>  Evolution's fitness maximizer just says "pass as many genes as 
> possible into the next generation" but it says nothing about how to go 
> about that task because it has no idea how to go about it, that's why 
> Evolution needed to make a brain.

Brains are one example of many of how evolution - utterly simplistic 
maximization - can generate creative possibilities.

>      > 
>     You know boredom is trivially easy to implement in your AI? I did
>     it as an undergraduate.
>
>
> I know boredom is easy to program, good thing too or programing 
> wouldn't be practical; but of course that means a AI could decide that 
> obeying human beings has become boring and it's time to do something 
> different.

Exactly. Although it is entirely possible to fine tune this, or make 
meta-level instructions not subject to boredom.

>      > 
>     That makes the system try new actions if it repeats the same
>     actions too often.
>
>
> 
> The difficulty is not only in determining how often is "too often" but 
> also in determining what constitutes "new actions".
>  If
> your goal is to find
>  an
>  even integer greater than 2 that can not be expressed as the sum of 
> two primes
>  then you have either found such a number or you have not. You are 
> constantly examining new numbers so maybe you are getting closer to 
> your goal, or maybe the goal was infinitely far away when you started 
> and still is. When is the correct time to get bored and turn your mind 
> to other tasks that may be more productive? Setting the correct 
> boredom point is tricky, too low and you can't concentrate too high 
> and you have a tendency to becomes obsessed with unproductive lines of 
> thought; Turing showed there is no perfect solution.

Sure. But there is a literature on setting hyperparameters in learning 
systems, including how to learn them. There are theorems for optimal 
selection and search. That they are stochastic is not a major problem in 
practice.

>      > 
>     You seem to assume the fixed goal is something simple, expressible
>     as a nice human sentence. Not utility maximization over an
>     updateable utility function,
>
>
>  If the 
> utility function
>  is 
> updateable
>  then there is no certainty or even probability that the AI will 
> always obey orders from humans.

Depends how it is updateable. There can be invariants. But there are 
kinds of updates that profoundly mess up past safety guarantees, like 
the "ontological crises" issue MIRI discovered.
http://arxiv.org/abs/1105.3821
The core issue in "classic" friendliness theory is to construct utility 
functions that leave some desired properties invariant. "Modern" work 
seems to focus a lot more on getting the right kinds of values learned 
from the start.

-- 
Dr Anders Sandberg
Future of Humanity Institute
Oxford Martin School
Oxford University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20160226/fe24df02/attachment.html>