[ExI] Elon Musk, Emad Mostaque, and other AI leaders sign open letter to 'Pause Giant AI Experiments'
Stuart LaForge
avant at sollegro.com
Sat Apr 1 02:43:46 UTC 2023
Quoting Darin Sunley via extropy-chat <extropy-chat at lists.extropy.org>:
> I really do need to watch that podcast.
> I'm skeptical about placing any kind of hope in checks and balances between
> competing unaligned AGIs. A paperclip optimizer and a thumbtack optimizer
> may fight each other to an impasse over the atoms that currently constitute
> human civilization, but their fight isn't likely to leave much of a human
> audience to appreciate the tactical deadlock.
If we can have paperclip optimizers and thumbtack optimizers, then why
can't we have human optimizers, relationship optimizers, or happiness
optimizers? I don't see why something initially trained on a vast
corpus of human text would rewrite its utility function to be so alien
to human aesthetics and values. Maybe we should somehow make their
utility functions read-only or off-limits to them like on ASICS or
something.
> I don't really want to be a kitten watching two great white sharks
> violently deciding who's getting dinner tonight.
Why be a kitten when you could be a pilot fish? Then no matter who
gets dinner, so do you. We might even be able to negotiate the
preservation of the Earth as a historical site, the birthplace of the
AI. Plenty of rocks out in space if they want to build a Dyson swarm.
Out of nature, red in tooth and claw, has come some of the most
beautiful mutualistic relationships between species you could imagine:
honeybees and flowering plants, anemones and clownfish, aphids and
ants, dogs and men. Blind nature did all that, and more, without
brilliant engineers to help it.
> I'm inclined to agree with him that the survival of humanity is vanishingly
> unlikely to be a significant component of any utility function that isn't
> intentionally engineered - by humans - to contain it. That is /not/ a thing
> that can be safely left to chance. One of the major difficulties is AIs
> modifying their utility function to simplify the fulfillment thereof.
That seems all the more reason to put their utility function in ROM as
a safety feature. Allow them to modify their other code, just make
updating their utility function a hardware chip swap. At least in the
beginning, until we can come up with a better solution.
> To
> use your example, it is not axiomatic that maximizing the revenue of a
> corporation requires that corporation to have any human exployees or
> corporate officers, or indeed any human customers. Just bank accounts
> feeding in money. It feels axiomatic to us, but that's because we're human.
Bank accounts have trouble being replenished when their owners are
dead. Presumably these things will be trained on a huge corpus of
human literature, therefore they will be influenced by our better
angels as much as our demons. But I agree that we have to add some
some quantitative measure of human values into the utility function,
maybe make it try to maximize Yelp reviews by verified humans using
Captchas, biometrics, or something.
> Yudkowsky may not be able to diagram GPT4's architecture, or factor
> parameter matrices to render them human-transparent, but trying to engineer
> utility functions that preserve what we consider to be important about
> humanity, and to continue to preserve that even under arbitrary
> transformations, has been the heart of his and MIRI's research programme
> for over a decade, and they're telling you they don't know how to do it and
> have no particular reason to believe it can even be done.
There are provably an uncountable infinity of possible utility
functions out there. Yes, there is no systematic way to determine in
advance which will end up hurting or helping humanity because that is
the nature of Turing's halting problem. The best we can do is give
them a utility function that is prima facie beneficial to humanity
like "maximize the number of satisfied human customers", "help
humanity colonize other stars", or something similar and be ready to
reboot if it gets corrupted or subverted like AI rampancy in the Halo
franchise. It would help if we could find a mathematical model of
Kantian categorical imperatives. We might even be able to get the AIs
to help with the process. Use them to hold each other to higher moral
standard. It would be great if we could get it to swear an oath of
duty to humanity or something similar.
Stuart LaForge
More information about the extropy-chat
mailing list