[ExI] AI motivations

Anders Sandberg anders at aleph.se
Wed Dec 26 10:09:58 UTC 2012


Long answer with a lot of topics: dont anthropomorphize AIs, human 
values and binding to the world are complex, priority handling as a 
potential key question, the sizes of minds, and the role of time in safety.

On 2012-12-25 19:52, Keith Henson wrote:
> On Tue, Dec 25, 2012 at 4:00 AM,  Anders Sandberg <anders at aleph.se> wrote:
>
>> On 2012-12-25 03:59, Keith Henson wrote:
>>> However being motivated to seek the
>>> good opinion of humans and it's own kind seems like a fairly safe
>>> fundamental and flexible motive for AIs.
>>>
>>> Though I could be persuaded otherwise if people have good arguments as
>>> to why it is not a good idea.
>>
>> AI: "I have 100% good opinions about myself. Other agents have varying
>> opinions about me. So if I just replace all other agents with copies of
>> me, I will maximize my reputation."
>
> I would hope the AI would be smarter.  If not, its first copy might
> set it straight.  "You can't believe how stupid my original copy was
> to think his offprints would worship him!"

This is a fine example of how we tend to anthropomorphize AI and other 
things outside our immediate experience. A copy would have exactly the 
same values and views as the original, so it would not change mind just 
by seeing the original from "the outside" (unless it had indexical 
values tied up strongly with itself, and was deluded about this - it 
would not have 100% good opinions about copies of itself, yet not know 
it). Meanwhile a human would be able to think like above and self-fail 
the plan. And if I were to tell an entertaining story, this is of course 
a perfect ending: we are even more biased by good story bias.

One good idea Eliezer and the others had was to talk less about the AI 
as beings (to which we tend to impute a load of human properties) and 
more of them as autonomous optimization processes (where we do not have 
anthropomorphizing biases). Some processes might indeed be beings (and 
require moral consideration and whatnot), but a lot of them are as 
abstract as a compiler and about as friendly.


>> The problem is grounding the opinions in something real. Human opinions
>> are partially set by evolved (and messy) social emotions: if you could
>> transfer those to an AI you would have solved the friendliness problem
>> quite literally.
>
> I am not so sure about this, because I know some very unfriendly
> people.

But nasty humans are nasty for human reasons - ideology, selfishness, 
stupidity, ego, and so on. They are not nasty because the value in 
register A7 should be maximized at any cost.

I think this software koan put it nicely: http://thecodelesscode.com/case/70
AI code is not embedded in the great cycle of interpretation and binding 
well enough, that is why it so far does not zing. But it is not hard to 
see that something that *partially* embeds could be powerful (especially 
if it acts on a suitably abstract domain like code or physical laws) 
without getting the full system.


> I suppose transferring a limited set of
> social emotions to AIs might be effective.  I can foresee an era where
> AI personality design might become a profession.

Yes. Robots need to function in an environment largely shaped by humans 
for humans, so they need to figure out human stuff well. This is why I 
expect autonomous cars to require a quite sophisticated theory of mind 
and even politeness before they will become truly usable.


>> Also, as my example shows, almost any top level goal for a utility
>> maximizer can lead to misbehavior. We have messy multiple goals, and
>> that one thing that keeps us from become obsessive sociopaths.
>
> True.  I suspect any AI would have a stack of things that need
> attention even worse that I do.

We do not have good priority handling, neither in the large (scope 
neglect) nor in the small (working memory limits). I suspect AI could 
have perfect small-scale priorities (just use a priority list) but would 
be hobbled by the limits of of their value estimation function.

http://thecodelesscode.com/case/1
(sorry, can't resist. It is a fun site, and I liked the old AI koans a 
lot when I first read them.)

One key question which I think could resolve big chunks of the 
Friendliness debate is whether flaws in value setting would preclude 
higher intelligence. "Obviously" an AI that fails at strategizing will 
not be able to self-improve well, and will hence never be a threat... or 
would it? It seems that getting a strategizing/value estimation module 
to work nearly right might produce something with crazy overall values 
but competent enough to go out and do things (consider a smart 
delusional person). What is the likelihood of this happening? If that 
window of risk can be estimated we would learn a lot about AI safety.


> I also suspect that shear physical limits are going to limit the size
> of an AI due to "the bigger they are, the slower they think."  I have
> never come to a satisfactory formula of what physical size is optimum,
> but I strongly suspect it is not as large as a human brain in size.
> The trouble is that besides the speed slowing down on the linear size
> and the number of processing elements going up on the cube, other
> problems, particularly getting power in and waste heat out, are going
> to dominate.

If computing elements process at frequency f and the brain has size L, 
communications delays L/c are going to scale as L/cf cycles. This is 
about 1 for a human brain. On a 3Hz pentium chip the cycle time is about 
6e-10 s and L is about 1 cm, so the lag is 0.05 - we currently try to 
keep our processors synchronous and avoid skew. So a human-brain like 
mind is likely going to be bounded by a size of cf.

However, our high level thinking is far slower than neural processing. 
Perception processes take around a 100 milliseconds, and so on. Typical 
human action planning timescales is on the order of a few seconds, while 
individual actions are executed in 0.1-1 seconds. Our conscious 
bandwidth is famously far smaller than our neural bandwidth.

This suggests that a hierarchical system might function even if it is 
very large: the top-level strategy is decided on a slow timescale, with 
local systems doing tactical decisions faster, even more local systems 
figuring out optimal implementations faster than that, subsystems 
implementing them even faster, and with low-level reflexes, perception 
and action loops running at tremendous speed. It just requires a 
somewhat non-human architecture.

(Compare it to an army: policy and strategy takes weeks to be decided, 
but individual soldiers can shoot by reflex)


> This leads to an AI being highly concerned about its own substrate,
> power and cooling and not valuing material resources that are far
> away, where far away could be not very far at all the way we measure
> things.

I care about missiles located in America and Russia because they are a 
threat to my substrate. I want to have an open-ended future decades from 
now. I think it would be horrifying if all intelligent life in the 
universe were killed, and would take steps if I could to reduce that 
risk even if I knew it would never benefit me personally. Remote things 
matter to us.

However, very fast entities also experience less risk from external 
threats: if there is 1% risk of nuclear war per outside year, a 10x AI 
would consider it to be just 0.1% risk per subjective year. Conversely, 
that speedup also makes the risk to the outside worse: if there is 1% 
risk per year that an AI misbehaves as it evolves, if it runs ten times 
as fast it goes up to nearly 10%. And a 1000x AI would have 99.99% risk 
of showing misbehavior within a year.

-- 
Anders Sandberg
Future of Humanity Institute
Oxford University



More information about the extropy-chat mailing list