[extropy-chat] Re: Overconfidence and meta-rationality

Eliezer S. Yudkowsky sentience at pobox.com
Wed Mar 9 23:40:30 UTC 2005


Robin Hanson wrote:
> 
> You don't seem very interested in the formal analysis here.  You know, 
> math, theorems and all that.

You did not ask.

Helpful references:

Robin's paper "Are Disagreements Honest?"
http://www.gmu.edu/jbc/Tyler/deceive.pdf

which builds on Aumann's Agreement Theorem:
http://www.princeton.edu/~bayesway/Dick.tex.pdf
(not a good intro for the bewildered, maybe someone can find a better intro)

> The whole point of such analysis is to 
> identify which assumptions matter for what conclusions.  And as far as I 
> can tell your only argument which gets at the heart of the relevant 
> assumptions is your claim that those who make relatively more errors 
> can't see this fact while those who make relatively fewer errors can see 
> this fact.

I don't think this argument (which you do concede for a factual premise? 
or was our agreement only that people who make relatively fewer errors 
do so in part because they are relatively better at estimating their 
probability of error on specific problems?) is what touches on the 
assumptions.

Anyway, let's talk math.

First, a couple of general principles that apply to discussions in which 
someone invokes math:

1)  An argument from pure math, if it turns out to be wrong, must have 
an error in one or more premises or purportedly deductive steps.  If the 
deductive steps are all correct, this is a special kind of rigor which 
Ben Goertzel gave as his definition of the word "technical"; personally 
I would label this class of argument "logical", reserving "technical" 
for hypotheses that sharply concentrate their probability mass.  (A la 
"A Technical Explanation of Technical Explanation".)

Pure math is a fragile thing.  An argument that is pure math except for 
one nonmathematical step is not pure math.  The chain of reasoning in 
"Are Disagreements Honest?" is not pure math.  The modesty argument uses 
Aumann's Agreement Theorem and AAT's extensions as plugins, but the 
modesty argument itself is not formal from start to finish.  I know of 
no *formal* extension of Aumann's Agreement Theorem such that its 
premises are plausibly applicable to humans.  I also expect that I know 
less than a hundredth as much about AAT's extensions as you do.  But if 
I am correct that there is no formal human extension of AAT, you cannot 
tell me: "If you claim the theorem is wrong, then it is your 
responsibility to identify which of the deductive steps or empirical 
premises is wrong."  The modesty argument has not yet been formalized to 
that level.  It's still a modesty *argument* not a modesty *theorem*.

Might the modesty argument readily formalize to a modesty theorem with a 
bit more work?  Later I will argue that this seems unlikely because the 
modesty argument has a different character from Aumann's Agreement Theorem.

2)  Logical argument has no ability to coerce physics.  There's a 
variety of parables I tell to illustrate this point.  Here's one parable:

     Socrates raised the glass of hemlock to his lips.  "Do you 
suppose," asked one of the onlookers, "that even hemlock will not be 
enough to kill so wise and good a man?"
     "No," replied another bystander, a student of philosophy; "all men 
are mortal, and Socrates is a man; and if a mortal drink hemlock, surely 
he dies."
     "Well," said the onlooker, "what if it happens that Socrates 
*isn't* mortal?"
     "Nonsense," replied the student, a little sharply; "all men are 
mortal *by definition*; it is part of what we mean by the word 'man'. 
All men are mortal, Socrates is a man, therefore Socrates is mortal.  It 
is not merely a guess, but a *logical certainty*."
     "I suppose that's right..." said the onlooker.  "Oh, look, Socrates 
already drank the hemlock while we were talking."
     "Yes, he should keel over any minute now," said the student.
     And they waited, and they waited, and they waited...
     "Socrates appears not to be mortal," said the onlooker.
     "Then Socrates must not be a man," replied the student.  "All men 
are mortal, Socrates is not mortal, therefore Socrates is not a man. 
And that is not merely a guess, but a *logical certainty*."

The moral of this parable is that if all "humans" are mortal by 
definition, then I cannot know that Socrates is a "human" until after I 
have observed that Socrates is mortal.  If "humans" are defined as 
mortal language-users with ten fingers, then it does no good at all - 
under Aristotle's logic - to observe merely that Socrates speaks 
excellent Greek and count five of his fingers on each hand.  I cannot 
state that Socrates is a member of the class "human" until I observe all 
three properties of Socrates - language use, ten fingers, and mortality. 
  Whatever information I put into an Aristotelian definition, I get 
exactly the same information back out - nothing more.  If you want 
actual cognitive categories instead of mere Aristotelian classes, 
categories that permit your mind to classify objects into empirical 
clusters and thereby guess observations you have not yet made, you have 
to resort to induction, not deduction.  Whatever is said to be true "by 
definition" usually isn't; writing in dictionaries has no ability to 
coerce physics.  You cannot change the writing in a dictionary and get a 
different outcome.

Another parable:

     Once upon a time there was a court jester who dabbled in logic. 
The jester gave the king two boxes:  The first box inscribed "Either 
this box contains an angry frog, or the box with a false inscription 
contains gold, but not both."  And the second box inscribed "Either this 
box contains gold and the box with a false inscription contains an angry 
frog, or this box contains an angry frog and the box with a true 
inscription contains gold."  And the jester said:  "One box contains an 
angry frog, the other box gold, and one and only one of the inscriptions 
is true."
     The king opened the wrong box, and was savaged by an angry frog.
     "You see," the jester said, "let us hypothesize that the first 
inscription is the true one.  Then suppose the first box contains an 
angry frog.  Then the other box would contain gold and this would 
contradict the first inscription which we hypothesized to be true.  Now 
suppose the first box contains gold.  The other box would contain an 
angry frog, which again contradicts the first inscription -"
     The king ordered the jester thrown in the dungeons.
     A day later, the jester was brought before the king in chains, and 
shown two boxes.  "One box contains a key," said the king, "to unlock 
your chains, and if you find the key you are free.  But the other box 
contains a dagger for your heart if you fail."  And the first box was 
inscribed:  "Either both inscriptions are true or both inscriptions are 
false."  And the second box was inscribed:  "This box contains the key."
     The jester reasoned thusly:  "Suppose the first inscription is 
true.  Then the second inscription must also be true.  Now suppose the 
first inscription is false.  Then again the second inscription must be 
true.  Therefore the second box contains the key, whether the first 
inscription is true or false."
     The jester opened the second box and found a dagger.
     "How?!" cried the jester in horror, as he was dragged away.  "It 
isn't possible!"
     "It is quite possible," replied the king.  "I merely wrote those 
inscriptions on two boxes, and then I put the dagger in the second one."

In "Are Disagreements Honest?" you say that people should not have one 
standard in public and another standard in private; you say:  "If people 
mostly disagree because they systematically violate the rationality 
standards that they profess, and hold up for others, then we will say 
that their disagreements are dishonest."  (I would disagree with your 
terminology; they might be dishonest *or* they might be self-deceived. 
Whether you think self-deception is a better excuse than dishonesty is 
between yourself and your morality.)  In any case, there is a moral and 
social dimension to the words you use in "Are Disagreements Honest?" 
You did in fact invoke moral forces to help justify some steps in your 
chain of reasoning, even if you come back later and say that the steps 
can stand on their own.

Now suppose that I am looking at two boxes, one with gold, and one with 
an angry frog.  I have pondered these two boxes as best I may, and those 
signs and portents that are attached to boxes; and I believe that the 
first box contains the gold, with 67% probability.  And another person 
comes before me and says:  "I believe that the first box contains an 
angry frog, with 99.9% probability."  Now you may say to me that I 
should not presume a priori that I am more rational than others; you may 
say that most people are self-deceived about their relative immunity to 
self-deception; you may say it would be logically inconsistent with my 
publicly professed tenets if we agree to disagree; you may say that it 
wouldn't be fair for me to insist that the other person change his 
opinion if I'm not willing to change mine.  So suppose that the two of 
us agree to compromise on a 99% probability that the first box contains 
an angry frog.  But this is not just a social compromise; it is an 
attempted statement about physical reality, determined by the modesty 
argument.  What if the first box, in defiance of our logic and 
reasonableness, turns out to contain gold instead?  Which premises of 
the modesty argument would turn out to be the flawed ones?  Which 
premises would have failed to reflect underlying, physical, empirical 
reality?

The heart of your argument in "Are Disagreements Honest?" is Aumann's 
Agreement Theorem and the dozens of extensions that have been found for 
it.  But if Aumann's Agreement Theorem is wrong (goes wrong reliably in 
the long run, not just failing 1 time out of 100 when the consensus 
belief is 99% probability) then we can readily compare the premises of 
AAT against the dynamics of the agents, their updating, their prior 
knowledge, etc., and track down the mistaken assumption that caused AAT 
(or the extension of AAT) to fail to match physical reality.  In 
contrast, it seems harder to identify what would have gone wrong, 
probability-theoretically speaking, if I dutifully follow the modesty 
argument, humbly update my beliefs until there is no longer any 
disagreement between myself and the person standing next to me, and the 
other person is also fair and tries to do the same, and lo and behold 
our consensus beliefs turn out to be more poorly calibrated than my 
original guesses.

Is this scenario a physical impossibility?  Not obviously, though I'm 
willing to hear you out if you think it is.  Let's suppose that the 
scenario is physically possible and that it occurs; then which of the 
premises of the modesty argument do you think would have been 
empirically wrong?  Is my sense of fairness factually incorrect?  Is the 
other person's humility factually incorrect?  Does the factually 
mistaken premise lie in our dutiful attempt to avoid agreeing to 
disagree because we know this implies a logical inconsistency?  To me 
this suggests that the modesty argument is not just *presently* 
informal, but that it would be harder to formalize than one might wish.

There's another important difference between the modesty argument and 
Aumann's Agreement Theorem.  AAT has been excessively generalized; it's 
easy to generalize and a new generalization is always worth a published 
paper.  You attribute the great number of extensions of AAT to the 
following underlying reason:  "His [Aumann's] results are robust because 
they are based on the simple idea that when seeking to estimate the 
truth, you should realize you might be wrong; others may well know 
things that you do not."

I disagree; this is *not* what Aumann's results are based on.

Aumann's results are based on the underlying idea that if other entities 
behave in a way understandable to you, then their observable behaviors 
are relevant Bayesian evidence to you.  This includes the behavior of 
assigning probabilities according to understandable Bayesian cognition.

Suppose that A and B have a common prior probability for proposition X 
of 10%.  A sees a piece of evidence E1 and updates X's probability to 
90%; B sees a piece of evidence E2 and updates X's probability to 1%. 
Then A and B compare notes, exchanging no information except their 
probability assignments.  Aumann's Agreement Theorem easily permits us 
to construct scenarios in which A and B's consensus probability goes to 
0, 1, or any real number between.  (Or rather, simple extensions of AAT 
permit this; the version of AAT I saw is static, allowing only a single 
question and answer.)  Why?  Because it may be that A's posterior 
announcement, "90%", is sufficient to uniquely identify E1 as A's 
observation, in that no other observed evidence would produce A's 
statement "90%"; likewise with B and E2.  The joint probability for 
E1&E2 given X (or ~X) does not need to be the product of the 
probabilities E1|X and E2|X (E1|~X, E2|~X).  It might be that E1 and E2 
are only ever seen together when X, or only ever seen together when ~X. 
  So A and B are *not* compromising between their previous positions; 
their consensus probability assignment is *not* a linear weighting of 
their previous assignments.

If you tried to devise an extension of Aumann's Agreement Theorem in 
which A and B, e.g., deduce each other's likelihoods given their stated 
posteriors and then combine likelihoods, you would be assuming that A 
and B always see unrelated evidence - an assumption rather difficult to 
extend to human domains of argument; no two minds could ever take the 
same arguments into account.  Our individual attempts to cut through to 
the correct answer do not have the Markov property relative to one 
another; different rationalists make correlated errors.

Under AAT, as A and B exchange information and become mutually aware of 
knowledge, they concentrate their models into an ever-smaller set of 
possible worlds.  (I dislike possible-worlds semantics for various 
reasons, but let that aside; the formalizations I've found of AAT are 
based on possible-worlds semantics.  Besides, I rather liked the way 
that possible-worlds semantics avoids the infinite recursion problem in 
"common knowledge".)  If A and B's models are concentrating their 
probability densities into ever-smaller volumes, why, they must be 
learning something - they're reducing entropy, one might say, though 
only metaphorically.

Now *contrast* this with the modesty argument, as its terms of human 
intercourse are usually presented.  I believe that the moon is made of 
green cheese with 80% probability.  Fred believes that the moon is made 
of blueberries with 90% probability.  This is all the information that 
we have of each other; we can exchange naked probability assignments but 
no other arguments.  By the math of AAT, *or* the intuitive terms of the 
modesty argument, this ought to force agreement.  In human terms, 
presumably I should take into account that I might be wrong and that 
Fred has also done some thinking about the subject, and compromise my 
beliefs with Fred's, so that we'll say, oh, hm, that the moon is made of 
green cheese with 40% probability and blueberries with 45% probability, 
that sounds about right.  Fred chews this over, decides I'm being fair, 
and nods agreement; Fred updates his verbally stated probability 
assignments accordingly.  Yay!  We agreed!  It is now theoretically 
possible that we are being verbally consistent with our professed 
beliefs about what is rational!

But wait!  What do Fred and I know about the moon that we didn't know 
before?  If this were AAT, rather than a human conversation, then as 
Fred and I exchanged probability assignments our actual knowledge of the 
moon would steadily increase; our models would concentrate into an 
ever-smaller set of possible worlds.  So in this sense the dynamics of 
the modesty argument are most unlike the dynamics of Aumann's Agreement 
Theorem, from which the modesty argument seeks to derive its force.  AAT 
drives down entropy (sorta); the modesty argument doesn't.  This is a 
BIG difference.

Furthermore, Fred and I can achieve the same mutual triumph of possible 
consistency - hence, public defensibility if someone tries to criticize 
us - by agreeing that the moon is equally likely to be made of green 
cheese or blueberries.  (Fred is willing to agree that I shouldn't be 
penalized for having been more modest about my discrimination 
capability.  Modesty is a virtue and shouldn't be penalized.)

As far as any outside observer can tell according to the rules you have 
laid down for 'modesty', two disputants can publicly satisfy the moral 
demand of the modesty argument by any number of possible compromises. 
 From _Are Disagreements Honest_:  "It is perhaps unsurprising that most 
people do not always spend the effort required to completely overcome 
known biases.  What may be more surprising is that people do not simply 
stop disagreeing, as this would seem to take relatively little 
effort..."  I haven't heard of an extension to AAT which (a) proves that 
'rational' agents will agree (b) explicitly permits multiple possible 
compromises to be equally 'rational' as the agent dynamics were defined.

 From _Are Disagreements Honest?_:

> One approach would be to try to never assume that you are more meta-rational than anyone else. But this cannot mean that you should agree with everyone, because you simply cannot do so when other people disagree among themselves. Alternatively, you could adopt a "middle" opinion. There are, however, many ways to define middle, and people can disagree about which middle is best (Barns 1998).  Not only are there disagreements on many topics, but there are also disagreements on how to best correct for one’s limited meta-rationality.

The AATs I know are constructive; they don't just prove that agents will 
agree as they acquire common knowledge, they describe *exactly how* 
agents arrive at agreement.  (Including multiple agents.)  So that's 
another sense in which the modesty argument seems unlike a formalizable 
extension of AAT - the modesty argument doesn't tell us *how* to go 
about being modest.  Again, this is a BIG difference.

 From _Are Disagreements Honest?_:

> For example, people who feel free to criticize consistently complain when they notice someone making a sequence of statements that is inconsistent or incoherent. [...] These patterns of criticism suggest that people uphold rationality standards that prefer logical consistency...

As I wrote in an unpublished work of mine:

"Is the Way to have beliefs that are consistent among themselves?  This 
is not the Way, though it is often mistaken for the Way by logicians and 
philosophers.  The object of the Way is to achieve a map that reflects 
the territory.  If I survey a city block five times and draw five 
accurate maps, the maps, being consistent with the same territory, will 
be consistent with each other.  Yet I must still walk through the city 
block and draw lines on paper that correspond to what I see.  If I sit 
in my living room and draw five maps that are mutually consistent, the 
maps will bear no relation whatsoever to the territory.  Accuracy of 
belief implies consistency of belief, but consistency does not imply 
accuracy.  Consistency of belief is only a sign of truth, and does not 
constitute truth in itself."

 From _ADH?_:

> In this paper we consider only truth-seeking at the individual level, and do not attempt a formal definition, in the hope of avoiding the murky philosophical waters of “justified belief.”

I define the "truth" of a probabilistic belief system as its score 
according to the strictly proper Bayesian scoring criterion I laid down 
in "Technical Explanation" - a definition of truth which I should 
probably be attributing to someone else, but I have no idea who.

(Incidentally, it seems to me that the notion of the Bayesian score cuts 
through a lot of gibberish about freedom of priors; the external 
goodness of a prior is its Bayesian score.  A lot of philosophers seem 
to think that, because there's disagreement where priors come from, they 
can pick any damn prior they please and none of those darned 
rationalists will be able to criticize them.  But there's actually a 
very clearly defined criterion for the external goodness of priors, the 
question is just how to maximize it using internally accessible 
decisions.  That aside...)

According to one who follows the way of Bayesianity - a Bayesianitarian, 
one might say - it is better to have inconsistent beliefs with a high 
Bayesian score than to have consistent beliefs with a low Bayesian 
score.  Accuracy is prized above consistency.  I guess that this 
situation can never arise given logical omniscience or infinite 
computing power; but I guess it can legitimately arise under bounded 
rationality.  Maybe you could even detect an *explicit* inconsistency in 
your beliefs, while simultaneously having no way to reconcile it in a 
way that you expect to raise your Bayesian score.  I'm not sure about 
that, though.  It seems like the scenario would be hard to construct, no 
matter what bounds you put on the rationalist.  I would not be taken 
aback to see a proof of impossibility - though I would hope the 
impossibility proof to take the form of a simple constructive algorithm 
that can be followed by most plausible bounded rationalists in case they 
discover inconsistency.

Even the simplest inconsistency resolution algorithm may take more 
time/computation than the simpler algorithm "discard one belief at 
random".  And the simplest good resolution algorithm for resolving a 
human disagreement may take more time than one of the parties discarding 
their beliefs at random.  Would it be more rational to ignore this 
matter of the Bayesian score, which is to say, ignore the truth, and 
just agree as swiftly as possible with the other person?  No.  Would 
that behavior be more 'consistent' with Aumann's result and extensions? 
  No, because the AATs I know, when applied to any specific 
conversation, constructively specify a precise, score-maximizing change 
of beliefs - which a random compromise is not.  All you'd be maximizing 
through rapid compromise is your immunity to social criticism for 
'irrationality' in the event of a public disagreement.

Aumann's Agreement Theorem and its extensions do not say that 
rationalists *should* agree.  AATs prove that various rational agents 
*will* agree, not because they *want* to agree, but because that's how 
the dynamics work out.  But that mathematical result doesn't mean that 
you can become more rational by pursuing agreement.  It doesn't mean you 
can find your Way by trying to imitate this surface quality of AAT 
agents, that they agree with one another; because that cognitive 
behavior is itself quite unlike what AAT agents do.  You cannot tack an 
imperative toward agreement onto the Way.  The Way is only the Way of 
cutting through to the correct answer, not the Way of cutting through to 
the correct answer + not disagreeing with others.  If agreement arises 
from that, fine; if not, it doesn't mean that you can patch the Way by 
tacking a requirement for agreement onto the Way.

The essence of the modesty argument is that we can become more rational 
by *trying* to agree with one another; but that is not how AAT agents 
work in their internals.  Though my reply doesn't rule out the 
possibility that the modesty rule might prove pragmatically useful when 
real human beings try to use it.

The modesty argument is important in one respect.  I agree that when two 
humans disagree and have common knowledge of each other's opinion (or a 
human approximation of common knowledge which does not require logical 
omniscience), *at least one* human must be doing something wrong.  The 
modesty argument doesn't tell us immediately what is wrong or how to fix 
it.  I have argued that the *behavior* of modesty is not a solution 
theorem, though it might *pragmatically* help.  But the modesty 
*argument* does tell us that something is wrong.  We shouldn't ignore 
things when they are visibly wrong - even if modesty is not a solution.

One possible underlying fact of the matter might be that one person is 
right and the other person is wrong and that is all there ever was to 
it.  This is not an uncommon state of human affairs.  It happens every 
time a scientific illiterate argues with a scientific literate about 
natural selection.  From my perspective, the scientific literate is 
doing just fine and doesn't need to change anything.  The scientific 
illiterate, if he ever becomes capable of facing the truth, will end up 
needing to sacrifice some of his most deeply held beliefs while not 
receiving any compromise or sacrifice-of-belief in return, not even the 
smallest consolation prize.  That's just the Way things are sometimes. 
And in AAT also, sometimes when you learn the other's answer you will 
simply discard your own, while the other changes his probability 
assignment not a jot.  Aumann agents aren't always humble and compromising.

But then we come to the part of the problem that pits meta-rationality 
against self-deception.  How does the scientific literate guess that he 
is in the right, when he, being scientifically literate, is also aware 
of studies of human overconfidence and of consistent biases toward 
self-overestimation of relative competence?

As far as I know, neither meta-rationality nor self-deception have been 
*formalized* in a way plausibly applicable to humans even as an 
approximation.  (Or maybe it would be better to say that I have not yet 
encountered a satisfactory formalism.  For who among us has read the 
entire Literature?)

Trying to estimate your own rationality or meta-rationality involves 
severe theoretical problems because of the invocation of reflectivity, a 
puzzle that I'm still trying to solve in my own FAI work.  My puzzle 
appears, not as a puzzle of estimating *self*-rationality as such, but 
the puzzle of why a Bayesian attaches confidence to a purely abstract 
system that performs Bayesian reasoning, without knowing the specifics 
of the domain.  "Beliefs" and "likelihoods" and "Bayesian justification" 
and even "subjective probability" are not ontological parts of our 
universe, which contains only a mist of probability amplitudes.  The 
probability theory I know can only apply to "beliefs" by translating 
them into ordinary causal signals about the domain, not treating them 
sympathetically *as beliefs*.

Suppose I assign a subjective probability of 40% to some one-time event, 
and someone else says he assigns a subjective probability of 80% to the 
same one-time event.  This is all I know of him; I don't know the other 
person's priors, nor what evidence he has seen, nor the likelihood 
ratio.  There is no fundamental mathematical contradiction between two 
well-calibrated individuals with different evidence assigning different 
subjective probabilities to the same one-time event.  We can still 
suppose both individuals are calibrated in the long run - when one says 
"40%" it happens 40% of the time, and when one says "80%" it happens 80% 
of the time.  In this specific case, either the one-time event will 
happen or it won't.  How are two well-calibrated systems to update when 
they know the other's estimate, assuming they each believe the other to 
be well-calibrated, but know nothing else about one another? 
Specifically, they don't know the other's priors, just that those priors 
are well-calibrated - they can't deduce likelihood of evidence seen by 
examining the posterior probability.  (If they could deduce likelihoods, 
they could translate beliefs to causal signals by translating:  "His 
prior odds in P were 1:4, and his posterior odds in P are 4:1, so he 
must have seen evidence about P of likelihood 16:1" to "The fact of his 
saying aloud '80%' has a likelihood ratio of 16:1 with respect to P/~P, 
even though I don't know the conditional probabilities.")

How are these two minds to integrate the other's subjective probability 
into their calculations, if they can't convert the other's spoken words 
into some kind of witnessable causal signal that bears a known 
evidential relationship to the actual phenomenon?  How can Bayesian 
reasoning take into account other agents' beliefs *as beliefs*, not just 
as causal phenomena?

Maybe if you know the purely abstract fact that the other entity is a 
Bayesian reasoner (implements a causal process with a certain Bayesian 
structure), this causes some type of Bayesian evidence to be inferrable 
from the pure abstract report "70%"?  Well, first of all, how do you 
integrate it?  If there's a mathematical solution it ought to be 
constructive.  Second, attaching this kind of *abstract* confidence to 
the output of a cognitive system runs into formal problems.  Consider 
Lob's Theorem in mathematical logic.  Lob's Theorem says that if you can 
prove that a proof of T implies T, you can prove T; |- ([]T => T) 
implies |- T.  Now the idea of attaching confidence to a Bayesian system 
seems to me to translate into the idea that if a Bayesian system says 
'X', that implies X.  I'm still trying to sort out this confused issue 
to the point where I will run over it in my mind one day and find out 
that Lob is not actually a problem.

Is there an AAT extension that doesn't involve converting the other's 
beliefs into causal signals with known evidentiary relationships to the 
specific data?  Is there a formal AAT extension that works on the 
*abstract* knowledge of the other person's probable rationality, without 
being able to relate specific beliefs to specific states of the world? 
Suppose that I say 30%, and my friend says 70%, and we know of each 
other only the pure abstract fact that we are calibrated in the long 
run; in fact, we don't even know what our argument is about 
specifically.  Should we be able to reach an agreement on our 
probability assignments even though we have no idea what we're arguing 
about?  How?  What's the exact number?

That's the problem I run into when I try to formalize a pure abstract 
belief about another person's 'rationality'.  (If this has already been 
formalized, do please let me know.)  Now obviously human beings do make 
intuitive estimates of each other's rationality.  I'm just saying that I 
don't know how to formalize this in a way free from paradox - humans do 
a lot of thinking that is useful and powerful but also sloppy and 
subject to paradox.  I think that if this human thinking is reliably 
useful, then there must be some structure to it that explains the 
usefulness, a structure that can be extracted and used in an FAI 
architecture while leaving all the sloppiness and paradox behind.  But I 
have not yet figured out how to build a reflective cognitive system that 
attaches equal evidential force to (a) its own estimates as they are 
produced in the system or (b) a mental model of an abstract process that 
is an accurate copy of itself, plus the abstract knowledge (without 
knowing the specific evidence) that this Bayesian process arrived at the 
same specific probability output.  I want this condition so the 
cognitive system is consistent under reflection; it attaches the same 
force to its own thoughts whether they are processed as thoughts or as 
causal signals.  But how do I prevent a system like that from falling 
prey to Lob's Theorem when it tries the same thing in mathematical 
logic?  That's something I'm presently pondering.  I think there's 
probably a straightforward solution, I just don't have it yet.

Then we come to self-deception.  If it were not for self-deception, 
meta-rationality would be much more straightforward.  Grant some kind of 
cognitive framework for estimating self-rationality and 
other-rationality.  There would be some set of signals standing in a 
Bayesian relation to the quantities of "rationality", some signals 
publicly accessible and some privately accessible.  Each party would 
honestly report their self-estimate of rationality (the public signals 
being privately accessible as well), and this estimate would have no 
privileged bias.  Instead, though, we have self-deceptive phenomena such 
as biased retrieval of signals favorable to self-rationality, and biased 
non-retrieval of signals prejudicial to self-rationality.

It seems to me that you have sometimes argued that I should foreshorten 
my chain of reasoning, saying, "But why argue and defend yourself, and 
give yourself a chance to deceive yourself?  Why not just accept the 
modesty argument?  Just stop fighting, dammit!"  I am a human, and a 
human is a system with known biases like selective retrieval of 
favorable evidence.  Each additional step in an inferential chain 
introduces a new opportunity for the biases to enter.  Therefore I 
should grant greater credence to shorter chains of inference.

This again has a certain human plausibility, and it even seems as if it 
might be formalizable.

*But*, trying to foreshorten our chains of inference contradicts the 
character of ordinary probability theory.

E. T. Jaynes (who is dead but not forgotten), in _Probability Theory: 
The Logic of Science_, Chapter 1, page 1.14, verse 1-23, speaking of a 
'robot' programmed to carry out Bayesian reasoning:

1-23b:  "The robot always takes into account all of the evidence it has 
relevant to a question.  It does not arbitrarily ignore some of the 
information, basing its conclusions only on what remains.  In other 
words, the robot is completely non-ideological."

Jaynes quoted this dictum when he railed against ad-hoc devices of 
orthodox statistics that would throw away relevant information.  The 
modesty argument argues that I should foreshorten my chain of reasoning, 
*not* take into account everything I can retrieve as evidence, and stick 
to modesty - without using my biased retrieval mechanisms to try and 
recall evidence regarding my relative competence.  Now this has a 
pragmatic human plausibility, but it's very un-Jaynesian.  According to 
the religion of Bayesianity, what might perhaps be called 
Bayesianitarianism, I should be trying to kiss the truth, pressing my 
map as close to the territory as possible, maximizing my Bayesian score 
by every inch and fraction I can muster, using every bit of evidence I 
can find.

I think that's the point which, from my perspective, cuts closest to the 
heart of the matter.  Biases can be overcome.  You can fight bias, and 
win.  You can't do that if you cut short the chain of reasoning at its 
beginning.  I don't spend as much time as I once did thinking about my 
relative rationality, mostly because I estimate myself as being so way 
the hell ahead that *relative* rationality is no longer interesting. 
The problems that worry me are whether I'm rational enough to deal with 
a given challenge from Nature.  But, yes, I try to estimate my 
rationality in detail, instead of using unchanged my mean estimate for 
the rationality of an average human.  And maybe an average person who 
tries to do that will fail pathetically.  Doesn't mean *I'll* fail, cuz, 
let's face it, I'm a better-than-average rationalist.  There will be 
costs, if I dare to estimate my own rationality.  There will be errors. 
  But I think I can do better by thinking.

While you might think that I'm not as good as I think, you probably do 
think that I'm a more skilled rationalist than an average early 
21st-century human, right?  According to the foreshortening version of 
the modesty argument, would I be forbidden to notice even that?  Where 
do I draw the line?  If you, Robin Hanson, go about saying that you have 
no way of knowing that you know more about rationality than a typical 
undergraduate philosophy student because you *might* be deceiving 
yourself, then you have argued yourself into believing the patently 
ridiculous, making your estimate correct.

The indexical argument about how you could counterfactually have been 
born as someone else gets into deep anthropic issues, but I don't think 
that's really relevant given the arguments I already stated.

And now I'd better terminate this letter before it goes over 40K and 
mailing lists start rejecting it.  I think that was most of what I had 
to say about the math, leaving out the anthropic stuff for lack of space.

-- 
Eliezer S. Yudkowsky                          http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence



More information about the extropy-chat mailing list