[ExI] Statistical significance

Fri Apr 2 06:01:31 UTC 2010

> ...On Behalf Of Max More
> Subject: [ExI] Statistical significance
> 
> spike: Have you read Taleb's Black Swan? He has interesting 
> things to say about statistics and mathematical models...

Thanks, no I had never heard of him, but I will check it out.

> Coincidentally, the following piece just appeared in my 
> in-box from forecasting expert Scott Armstrong: "Does 
> statistical significance help you make better forecasts?"

Hadn't heard of him either, but Scott Armstrong definitely said it better
than I did, and reinforced what I have seen way too often: the shortcut of 2
sigma has done far more harm than good, and has resulted in far too many
scientists and engineers who can't extract a subtle signal from the noise,
because they have not been property taught how to recognize noise.  I am not
pointing fingers: it took me years to learn this, with many mistakes along
the way, and the textbooks were of little help.

> Research findings should help to improve your decision-making 
> and to simplify your life: Abolish tests of statistical 
> significance from your decision making...

Or rather: in every case, 1) calculate the number of sigma from the mean
each data set varies from the null hypothesis and 2) think what it means if
it falls short of 95%, and 3) very important please, think of all the
different possible ways to frame the null hypothesis.  THINK!  There really
is a number of different ways to do it, and some make more sense than
others.  How you ask the question has everything to do with the answer you
get.

> Armstrong and Green 
> have not found any evidence that such tests improve decision 
> making. Indeed, they seem to create confusion and harm 
> decision making...

Oh my, how well I can verify that comment.  I have seen it a hundred times.
The data can't talk to us unless we recognize its voice.  

>... As a result, they have recently changed one 
> of their guidelines for forecasters to read:
> 
> 13.29    Do not use measures of statistical significance to assess a 
> forecasting method or model...

Max you are a university professor, ja?  This is a real problem, and it is
directly traceable to statistics teaching methods, and traditional teaching
constraints in general.  We have worked our way into a situation where all
the classroom material must be broken down into testable skills, so it
encourages memorization of algorithms, while functionally discouraging
actual thought.  The harm is clear in the field of statistics.

I really started to understand this five years ago when I was taking a
spacecraft feedback controls class at the U.  The final was a take-home, all
materials open, all resources open, one week to do it.  My wife and I each
used over 50 hours to do that exam, taking a couple days off of work.  Four
problems: choose any one and solve it.  My test writeup was over 80 pages,
Shelly's was a little longer.  It occurred to me then that this is actually
the right way to do engineering education and test competence, for it
encouraged creative problem solving, rather than memorization of shortcuts
and misleading oversimplifications such as the traditional 95%ile criterion
for statistical significance.

> Description:  Even when correctly applied, significance tests 
> are dangerous...

Armstrong says it better than I did, in fewer words.

> Statistical significance tests calculate the 
> probability, assuming the analyst's null hypothesis is true...

Ja, critical point: there is more than one logical way to frame the null
hypothesis.  We are taught how docs test new medications.  The null there is
that this new stuff does nothing to treat this disease.  But seldom do
engineering problems reduce so neatly to a single logical null.

> ...The proper approach to analyzing and 
> communicating findings from empirical studies is to (1) 
> calculate and report effect sizes; (2) estimate the range 
> within which the actual effect size is likely to lie by 
> taking account of prior knowledge and all potential sources 
> of error in measuring the effect; and (3) conduct 
> replications, extensions, and meta-analyses...

Oooh this gets me turned on.  Thanks for the reference.

> Purpose:  To avoid the selection of invalid models or 
> methods, and the rejection of valid ones...

Excellent!

> 
> ...
> 
> J. Scott Armstrong
> Dept. of Marketing
> The Wharton School
> U. of Pennsylvania
> Phila., PA 19104
> armstrong at wharton.upenn.edu<mailto:armstrong at wharton.upenn.edu>

Department of Marketing?  Indeed?  Isn't that where they get people to buy
stuff?  Shame on the engineering departments and math departments
everywhere, SHAME!  We get a guy from the department where they teach how to
make funny superbowl commercials, and they are the ones who point the way to
do science better?  Where are the rigorous bigshots from the math
department?  Where are the tough as nails engineering profs?  Why aren't
they taking the lead?  Am I misunderstanding what is the department of
marketing?  Suddenly I have new respect for that discipline.

I have half a mind to contact J. Scott Armstrong and congratulate him for
his breatkthru insights, forward this whole discussion and wish him well.
Max is Prof. Armstrong a buddy of yours?

spike