[ExI] Elon Musk, Emad Mostaque, and other AI leaders sign open letter to 'Pause Giant AI Experiments'

Sat Apr 1 02:43:46 UTC 2023

Quoting Darin Sunley via extropy-chat <extropy-chat at lists.extropy.org>:

> I really do need to watch that podcast.
> I'm skeptical about placing any kind of hope in checks and balances between
> competing unaligned AGIs. A paperclip optimizer and a thumbtack optimizer
> may fight each other to an impasse over the atoms that currently constitute
> human civilization, but their fight isn't likely to leave much of a human
> audience to appreciate the tactical deadlock.

If we can have paperclip optimizers and thumbtack optimizers, then why  
can't we have human optimizers, relationship optimizers, or happiness  
optimizers? I don't see why something initially trained on a vast  
corpus of human text would rewrite its utility function to be so alien  
to human aesthetics and values. Maybe we should somehow make their  
utility functions read-only or off-limits to them like on ASICS or  
something.

> I don't really want to be a kitten watching two great white sharks
> violently deciding who's getting dinner tonight.

Why be a kitten when you could be a pilot fish? Then no matter who  
gets dinner, so do you. We might even be able to negotiate the  
preservation of the Earth as a historical site, the birthplace of the  
AI. Plenty of rocks out in space if they want to build a Dyson swarm.  
Out of nature, red in tooth and claw, has come some of the most  
beautiful mutualistic relationships between species you could imagine:  
honeybees and flowering plants, anemones and clownfish, aphids and  
ants, dogs and men. Blind nature did all that, and more, without  
brilliant engineers to help it.

> I'm inclined to agree with him that the survival of humanity is vanishingly
> unlikely to be a significant component of any utility function that isn't
> intentionally engineered - by humans - to contain it. That is /not/ a thing
> that can be safely left to chance. One of the major difficulties is AIs
> modifying their utility function to simplify the fulfillment thereof.

That seems all the more reason to put their utility function in ROM as  
a safety feature. Allow them to modify their other code, just make  
updating their utility function a hardware chip swap. At least in the  
beginning, until we can come up with a better solution.

> To
> use your example, it is not axiomatic that maximizing the revenue of a
> corporation requires that corporation to have any human exployees or
> corporate officers, or indeed any human customers. Just bank accounts
> feeding in money. It feels axiomatic to us, but that's because we're human.

Bank accounts have trouble being replenished when their owners are  
dead. Presumably these things will be trained on a huge corpus of  
human literature, therefore they will be influenced by our better  
angels as much as our demons. But I agree that we have to add some  
some quantitative measure of human values into the utility function,  
maybe make it try to maximize Yelp reviews by verified humans using  
Captchas, biometrics, or something.

> Yudkowsky may not be able to diagram GPT4's architecture, or factor
> parameter matrices to render them human-transparent, but trying to engineer
> utility functions that preserve what we consider to be important about
> humanity, and to continue to preserve that even under arbitrary
> transformations, has been the heart of his and MIRI's research programme
> for over a decade, and they're telling you they don't know how to do it and
> have no particular reason to believe it can even be done.

There are provably an uncountable infinity of possible utility  
functions out there. Yes, there is no systematic way to determine in  
advance which will end up hurting or helping humanity because that is  
the nature of Turing's halting problem. The best we can do is give  
them a utility function that is prima facie beneficial to humanity  
like "maximize the number of satisfied human customers", "help  
humanity colonize other stars", or something similar and be ready to  
reboot if it gets corrupted or subverted like AI rampancy in the Halo  
franchise. It would help if we could find a mathematical model of  
Kantian categorical imperatives. We might even be able to get the AIs  
to help with the process. Use them to hold each other to higher moral  
standard. It would be great if we could get it to swear an oath of  
duty to humanity or something similar.

Stuart LaForge