[ExI] Self improvement
Richard Loosemore
rpwl at lightlink.com
Sat Apr 23 17:39:09 UTC 2011
Anders Sandberg wrote:
> I prefer to approach the whole AI safety thing from a broader
> standpoint, checking the different possibilities. AI might emerge
> rapidly, or more slowly. In the first case it is more likely to be
> unitary since there will be just one system recursively selfimproving,
> and there are some arguments that for some forms of softer takeoffs
> there would be a 'merging' of the goals if they are compatible (Shulman,
> Armstrong). In the case of softer takeoffs power differentials are going
> to be less extreme, but there will also be many more motivations and
> architectures around. Systems can be designed in different ways for
> safety - by reducing their capabilities (might or might not preclude
> recursive self improvement), various motivational measures, or by
> constraints from other systems of roughly equal power.
>
> So we need to analyse a number of cases:
>
> Hard takeoff (one agent): capability constraints ("AI in a box" for
> example), motivational constraints ("friendly" designs of various kinds,
> CEV, values copied from person, etc), ability to prevent accidental or
> deliberate takeoffs
>
> Soft takeoff (multiple agents): as above (but with the complication that
> defection of some agents is likely due to accident, design or systemic
> properties), various mutual balancing schemes, issues of safeguarding
> against or using 'evolutionary attractors' or coordination issues (e.g.
> the large scale economic/evolutionary pressures discussed in some of
> Robin's and Nick's papers, singletons)
>
> This is not a small field. Even the subfields have pretty deep
> ramifications (friendliness being a good case in point). It is stupid to
> spend all one's effort on one possible subsection and claiming this is
> the only one that matters. Some doubtlessly matter more, but even if you
> think there is 99% chance of a soft takeoff safeguarding against that 1%
> chance of a bad hard takeoff can be very rational. We will need to
> specialize ourselves in researching them, but we shouldn't forget we
> need to cover the possibility space well enough that we can start
> formulating policies (including even the null policy of not doing
> anything and just hoping for the best).
I have to say that I think the main problem is not so much the fact that
there is a cluster of different (and sometimes difficult to analyze)
posibilities, which force us to (as you put it) "approach the whole AI
safety thing from a broader standpoint".
The critical issue is to understand the concept of "AGI motivation" in a
more theoretically sound way. I find much of the discussion to be so
confused on this matter, that the discussion itself is of little value.
For example, you refer to some of the "large scale economic/evolutionary
pressures discussed in some of Robin's and Nick's papers" (to which I
would add Steve Omohundro's paper). The problem with these analyses is
that they presuppose, without critical examination, that the motivation
of the AGI is going to be controlled by somthing like a simple Utility
optization function, such as the ones used to model human economic
agents. That is just a supposition. And yet, if the supposition turns
out to be wrong, everything that follows it is called into question.
And we have good grounds for believing that that kind of supposition is
indeed wrong: this is a classic example of the model/mechanism mistake.
If you build a post-hoc model to *describe* a system, in might work in
an approximate way (perhap well enough to allow crude modeling of the
system).... but going from that model to the idea that a real,
functioning version of that system is actually *driven* by a mechanism
that is a direct instantiation of that model, is a ghastly (and very
elementary) mistake.
I can model the spawning rate of rabbits, for example, using a nice
little Fibbonnaci algorithm. But going from that model to the
suggestion that a colony of rabbits is controlled by a shared mechanism
(presumably taking advantage of rabit telepathy!) that computes
fibonnaci numbers and, at the appropriate moment, issues a command to
spit out a new rabbit -- that would be a ridiculous confusion between
model and mechanism.
But instead of understanding that this model/mechanism mistake is
seriously undermining progress in the field of AGI motivation (and by
extension, the whole question of AGI safety and friendliness), we fall
back on the assertion that we need a wide variety of perspectives, and
argue that it is all very complex and multifaceted.
Pity there aren't more people with a broad enough perspective working on
these issues full time, huh...? :-) :-)
Richard Loosemore
More information about the extropy-chat
mailing list