[ExI] The right AI idea was Re: Unfrendly AI is a mistaken idea.

Tue Jun 12 18:03:48 UTC 2007

I think I am not mistaken assuming that an unfriendly AI is a grave
threat, for many reasons I won't belabor here, and I would like to
look at current ideas about how an AI can be made safer.

Stathis is on the right track asking for the AI to be devoid of
desires to act (but he is too sure about our ability to make this a
permanent feature in a real-life useful device). This is the notion of
the athymhormic AI that I advanced some time ago on sl4. Of course,
how do you make an intelligence that is not an agent, is not a trivial
question. I think that a massive hierarchical temporal memory is a
possible solution. A HTM is like a cortex without the basal ganglia
and without the motor cortices, a pure thinking machine, similar to a
patient made athymhormic by a frontal lobe lesion damaging the
connections to the basal ganglia. This AI is a predictive process, not
an optimizing one. Goals are not implemented, only a way of analyzing
and organizing sense-data is present. Of course, we can't be sure
about the stability of immense HTM-like devices, but at least not
implementing generators of possible behaviors (like the basal ganglia)
goes towards limiting actions, if not eliminating them.

Then there is the issue of sandboxing. Obviously, you can't provably
sandbox a deity-level intelligence but you should make it more
difficult for a lesser demon to escape if its only output is video,
and it's only input comes on dvd's.

Avoidance of recursive self-modification may be another technique to
contain the AI. I do not believe that it is possible to implement a
goal system perfectly stable during recursive modification, unless you
can apply external selection during each round of modification - as
happens in evolution. The problem with evolution in this context is
that the selection criterion - friendliness to humans - is much more
complicated than the selection criteria in natural evolution
(survival), or the selection criteria used by genetic algorithms. Once
you do not understand the internal structure of an AI, it is not
possible to use this criterion to reliably weed out unfriendly AI
versions, since it's too easy for unfriendly ones to hide parts of
their goal system from scrutiny.

So, as far as I know, we might be somewhat less unsafe with an
athymhormic, sandboxed AI that does not rewrite its own basic
algorithm. It would be much nicer to stumble across a provably
Friendly AI design but most likely we will all die in the singularity
in the next 20 to 50 years. Still, there is a chance that such an AI
could give us the time to develop uploading and human
autopsychoengineering to the level where we could face grown up AIs on
their own turf.

Are there any other ideas?

Rafal