[extropy-chat] The athymhormic AI

Thu Jun 16 05:42:07 UTC 2005

Eliezer S. Yudkowsky sentience at pobox.com wrote:
Tue Mar 29 13:39:25 MST 2005

Rafal Smigrodzki wrote:
> Last week I commented here on the low likelihood of an AI designed as a pure
> epistemic engine (like a cortex without much else) turning against its owners,
> which I derived from the presence of complex circuitry in humans devoted to
> producing motivation and a goal system.
> 
> Now I found more about actual neurological conditions where this circuitry is
> damaged, resulting in reduced volition with preserved mentation. Athymhormia,
> as one of the forms of this disorder is called, is caused by interruption of
> the connections between frontopolar cortex and the caudate, the subcortical
> circuit implicated in sifting through motor behaviors to find the ones likely
> to achieve goals. An athymhormic person loses motivation even to eat, despite
> still being able to feel hunger in an intellectual, detached manner. At the
> same time he has essentially normal intelligence if prodded verbally, thanks to
> preservation of the cortex itself, and connections from other cortical areas
> circumventing the basal ganglia.
> 
> I would expect that the first useful general AI will be athymhormic, at least
> mildly so, rather than Friendly. What do you think, Eliezer?

Utilities play, oh, a fairly major role in cognition.  You have to 
decide what to think.  You have to decide where to invest your computing 
power.  You have to decide the value of information.

Athymhormic patients seem to have essentially normal intelligence if 
prodded verbally?  This would seem to imply that for most people 
including these patients, conscious-type desires play little or no role 
in deciding how to think - they do it all on instinct, without 
deliberate goals.  If I contracted athymhormia would I lose my desire to 
become more Bayesian?  Would I lose every art that I deliberately employ 
to perfect my thinking in the service of that aspiration?  Would I 
appear to have only slightly diminished intelligence, perhaps the 
intelligence of Eliezer-2004, on the grounds that everything I've 
learned to do more than a year ago has already become automatic reflex?

### Yes, you are right that in humans the apportionment of cognitive
resources is dictated by utility functions inherent in our structure -
e.g. the bandwidth of some forms of inputs, the makeup of the
hypothalamus, the pre-specified intracortical connections contributing
to enhanced saliency of faces, or language.

The only cases of athymhormia I heard about were adults who already
had a well-formed cortex, and therefore their cognition was fully
developed. Damage to the frontal cortex in the young tends to reduce
intelligence greatly, and I agree with you  it is quite reasonable to
believe that total loss of goals would prevent the formation of
intelligence.

--------------------------------------

If it's unwise to generalize from normal humans to AIs, is it really 
that much wiser to generalize from brain-damaged humans to AIs?  I don't 
know how to build an efficient real-world probability estimator without 
mixing in an expected utility system to allocate computing resources and 
determine the information value of questions.

### Here is where I would differ from you - I am not generalizing from
damaged humans, on the contrary, I am pointing to an exception to the
general observation that intelligent systems exhibit goal-oriented
behavior (i.e. acting on the environment, not on self).

The thread started (a long time ago, I know, sorry for not answering
sooner) discussing the likelihood of unexpected goal-oriented behavior
(specifically, moralistic behavior that John Wright worried about)
emerging from an AI not specifically designed for such moralistic
behavior.

I think that athymhormic humans point to the possibility of building
an inference engine with interest in a predictive understanding of the
world, to be achieved using computational resources given to it,
without a desire to achieve anything else. Note how simple this goal
architecture would be - "Predict future inputs based on current and
past inputs using hardware you are installed on". There would be no
need for defining friendliness to humans, which, as you very well
know, is not easy. A simpler concepts, such as "current hardware base"
would be initially sufficient to define the limitations necessary to
protect the environment from being converted into computing substrate.

At the same time it would be quite flexible - given enough hardware it
could build a hierarchical, multi-modular system similar to the
cortex. It could rewire itself to achieve greater efficiency, without
having to re-form its goal system. By providing input of interest to
us (a form of goal system housed externally) we could direct its
attention to processes that are important to us, and obtain predictive
outputs of some usefulness. In effect, the questions of the users
would be a major part of its goal system.

Now, of course, it the AI was of some immense, mind-boggling size, it
might be able to devise a method of manipulating the humans
responsible for formulation of inputs so as to e.g. get simpler
questions, but I imagine this would happen much later than the initial
period of general AI use, and would not be a concern for initial
practical applications of such AI (e.g. in biological and physical
sciences, and even in social and economic analyses).

By the time the athymhormic AI was powerful enough to form goals of
its own, the knowledge we gained from it would be already enough to
bootstrap ourselves into being smart, for a change.

Rafal