<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 4, 2022 at 10:04 PM Darin Sunley via extropy-chat <<a href="mailto:extropy-chat@lists.extropy.org">extropy-chat@lists.extropy.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">The problem with AI motivations isn't so much that we don't understand evolved motivations - we understand them all too well. Well enough to know that if you successfully re-implemented an evolved motivational stack in nanotech-based hardware, it would be an existential threat to everything in its future light cone.<div><br></div><div>The trick is figuring out how to build a motivational stack that is as /unlike/ an evolved motivational stack as possible, while still being capable of having motivations at all. That's the hard part.</div></div></blockquote><div><br></div><div>### Exactly. This is what I further wrote on Scott's blog:</div><div><br></div>Reinforcement learning is simple in principle and when applied to simple neural networks that are trained from scratch but once you start talking about a self-modifying AI that contains a large-scale predictive model of the world there are a lot of ways for the process to be derailed. The AI is likely to be built from layers of neural networks that have been trained or otherwise constructed to serve different goals (e.g. a large GPT-like language model grafted on a Tesla FSD network trained inside a Tesla bot) and creating a coherent system capable of world-changing action by modifying this mess will not be trivial. Non-trivial means it's not likely to happen by accident, or at least, it will take quite a few accidents before the big one hits.<br><br>I remember I bugged Eliezer about building the "athymhormic AI", as I called it, rather than Friendly AI, about 20 years ago. (Jeez, time flies!) The athymhormic AI would be an AI designed not to want to do anything, except computing answers using resources given to it, just like an athymhormic human might be quite capable of answering your questions but incapable of doing much on his own. Creating things that don't do much is easier than creating things that do a lot. Creating things that just don't care about their survival is possible. We may have the intuition that anything capable of thinking will naturally think about making it alive out of the box but then our intuition is built from observing naturally evolved brains and staying alive is exactly what natural brains evolved for. Constructed minds will not automatically converge on the Omohundro goals, unless some specific structures are present to begin with, such as a goal of maximizing some real-world parameter, or an infinite regress of maximizing the precision of a calculation, or other such trip-hazards.<br><br><div>At least I hope so. </div></div></div>