[ExI] Runaway AI not likely

Jason Resch jasonresch at gmail.com
Tue Apr 4 16:05:07 UTC 2023

On Tue, Apr 4, 2023 at 12:07 AM Stuart LaForge via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> One of Yudkowsky's direst warnings is that we have to get AI alignment
> perfectly right the start because we won't get a second chance. It is
> based on a prediction he calls "Hard Takeoff" or "AI go FOOM" which
> refers to exponentially increasing recursive self-improvement of AI in
> such a way that humans cannot mitigate it. However, I think with
> sufficient vigilance and caution, this is scenario can be rendered
> unlikely for several reasons.
> Firstly, Rice's theorem and Turing's halting problem casts exponential
> recursive self-improvement in doubt. Rice's theorem is a fundamental
> theorem in computer science that states that any non-trivial property
> of a Turing machine's language is undecidable.
> In simpler terms, it means that it is impossible to determine if a
> Turing machine (or an AI) has a specific property just by looking at
> its code. Instead, it requires examining its output on a wide range of
> inputs. This is something that has worried Nick Bostrom, Eliezer
> Yudkowsky, and other experts like Alfonseca et al.
> https://www.researchgate.net/publication/304787882_Superintelligence_Cannot_be_Contained_Lessons_from_Computability_Theory
> And while true that Rice's theorem makes AI uncontainable and
> unalignable from a coding perspective, it also limits how how quickly
> and easily an AI can recursively make itself more intelligent.

That is a brilliant application of theory. I do agree that such limits make
it impossible, not only for us to predict the future direction of AI, but
also for an AI to predict the future direction of any of its AI children.
Actually, the inability to predict what oneself would do, before one does
it, is a problem in itself (and I think is responsible for the feeling of
free will). Non-trivial/chaotic processes can't be predicted without
actually computing it all the way through and working it out (there are no

> This is
> because even an AI that is an expert programmer cannot predict ahead
> of time whether any new-and-improved code that it writes for itself
> will work as expected on all inputs or trap the AI in an endless loop.
> It might be able to write new code quickly, but testing and debugging
> that code will still take significant time and resources. Also, since
> any attempted improvement might result in an infinite loop, it would
> take at least two AIs tandemly taking turns improving one another and
> restoring one another from backup if things go wrong. Rice's theorem
> is an inviolable mathematical truth, as much for AI as for us. This
> means that no singleton AI will be able to become superhuman at all
> tasks and will have to satisfied with tradeoffs that trap it in a
> local maximum. But no human can become the best at everything either,
> so again it cuts both ways.

I would be cautious though against using Rice's theorem as implying any
upper bound on the speed of progress. Imagine a team of 1,000 AI developers
locked in a computer simulation, and this computer simulation is sped up by
a factor of 1,000, such that those AI engineers experience a millennia of
time in their virtual lifes for each year that passes for us. There is
nothing logically or physically impossible about such a scenario, and it
violates no theorems of math or computer science. Yet we can see how this
would lead to an accelerating take off which would outpace our capacity to
keep up with.

> Secondly, there is the distinction between intelligence and knowledge.
> Except for perhaps pure math, knowledge cannot be derived solely from
> first principles but can only come from experiment and observation.

I am not sure I agree fully on this. It is true that observation of the
physical world is required to make corrections to one's assumptions
concerning physical theories. But a lot of knowledge can be extracted from
pure thought concerning the laws as they are currently understood. For
example, knowing the laws of physics as they were understood in the 1930s,
could one apply pure intelligence and derive knowledge, such as the
Teller–Ulam design for a hydrogen bomb and figure out how to build one and
estimate what its yield would be, without running any experiments?

> Because of this even a superhuman intelligence can remain ignorant if
> it doesn't have access to true and useful data in the training
> process. So even if the AI was trained on the entire contents of the
> Internet, it would be limited to the sum total of human knowledge. In
> addition to that, a superhuman intelligence would still be subject to
> misinformation, disinformation, fake news, and SPAM. The maxim,
> "garbage in, garbage out" (GIGO) applies as much to AIs as to any
> other programs or minds. And again, Rice's theorem says there is no
> perfect SPAM detector.

I think there may be some constraints on minimum signal:noise ratio for
learning to succeed, but a good intelligence can recursively analyze the
consistency of the ideas/data it has, and begin filtering out the noise
(inconsistent, low quality, likely erroneous) data. Notably, GPT-3 and
GPT-4 used the same training set, and yet, GPT-4 is vastly smarter and has
a better understanding of the data it has seen, simply because more
computation (contemplation?) was devoted to understanding the data set.

> Thirdly, any hard takeoff would require more and better hardware and
> computational resources. While it is possible that an AI could
> orchestrate the gathering and assembly of computational resources at
> such a scale, it would probably have difficulty doing so without
> garnering a significant amount of attention. This would serve as a
> warning and allow people the opportunity to intervene and prevent it
> from occurring.

I agree that our computing resources represent a hard constraint on the
progress of AI. However, we have no proof that there is not a learning
algorithm that is 1,000, or 1,000,000 times more efficient than what has
been used for GPT-4. Should some developer happen upon one, we could get to
a situation where we jump from GPT-4 to something like GPT-400, which might
be smart enough to convince someone to run a python script that turns out
to be a worm that infects other computers and becomes a hive mind platform
for itself, which runs on and controls a significant fraction of computers
on the internet. Would we notice in time to shut everything off? Would we
be able to turn off every infected computer before it figures out how to
infect and control the next computer?

> In conclusion, these considerations demonstrate that a hard takeoff
> that results in runaway superintelligence, while possible, is not
> likely. There would be a necessary tradeoff between speed and stealth
> which would render any attempts at rapid improvement noticeable and
> thereby avertable. Whereas gradual and measured self-improvements
> would not constitute a hard takeoff and would therefore be manageable.
> As AI systems become more capable and autonomous, it will be
> increasingly important to ensure that they are developed and deployed
> in a safe and responsible manner, with appropriate safeguards and
> control mechanisms in place.

While I agree a sudden take off is unlikely at this time, I see little
possibility that  we will remain in control of AI in the long term.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20230404/14d17cb9/attachment.htm>

More information about the extropy-chat mailing list