>>>>>>>>>>>>>>>>>>>>>>>>>>DOE FAQ Alert<<<<<<<<<<<<<<<<<<<<<<<<<<
Issue: Volume 1, Number 4
Date: June 2001
From: Mark J. Anderson, Stat-Ease, Inc. (http://www.statease.com)
"Statistics Made Easy" (tm)
TO UNSUBSCRIBE FOLLOW THE INSTRUCTIONS AT THE END OF THIS E-MAIL.
Dear Experimenter,
Here's our fourth issue in an ongoing series of e-mails with
answers to frequently asked questions (FAQs) about doing design
of experiments (DOE), plus alerts to timely information and free
software updates. If you missed the prior DOE FAQ Alert (or
earlier ones), go to http://www.statease.com/doealert.html .
Feel free to forward this newsletter to your colleagues. They can
subscribe by going to: http://www.statease.com/doealertreg.html.
Before I get into the meat of this message, I offer this link as
an appetizer from an educational web-site on the Internet:
http://www.brainpop.com/specials/scientificmethod/index.weml .
(site requires subscription to view)
Upon arrival the site loads a cartoon movie (be patient!). Press
play to gain insights on the scientific method. This fun-
looking site is aimed at 5th to 8th graders, so if you like it,
pass it along to any youngsters you think might benefit. (Can you
suggest any links that experimenters might find fun and
interesting? If so, send me an e-mail with the link embedded.)
Here's what I cover in this DOE FAQ Alert:
1. FAQ: Why choose a probability of 0.05 (p-value) as the
criteria for statistical significance?
2. X-FAQ: Criteria for p-values when doing multiple pairwise
significance testing
3. Software Alert: Upgrade patch available for V6.04 of
Design-Expert(R) version 6 (DX6) software. (If you own DX6,
click the on link for a free upgrade. Otherwise, follow the
link to the free trial version.)
4. Info Alert: New "Stat-Teaser" newsletter features a bread
DOE and other helpful articles (click on link to see it!)
5. Events Alert: A heads-up on DOE talks and demos.
6. Workshop Alert: Coming soon to Philadelphia and Seattle
PS. Statistics quote for the month from Nero Wolfe, fictional
detective.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 - FAQ: Why choose a probability of 0.05 (p-value) as the
criteria for statistical significance?
-----Original Question-----
From: Indiana
"I have a question for you: Do you have a reference to back up
the following rule of thumb for p values*?:
p <= 0.05 ==> significant
0.1> p >0.05 ==> may be significant
p >= 0.1 ==> not significant"
*(If you need background on the term "p-value" to understand this
question, see page 24 in Chapter 2 of "DOE Simplified, Practical
Tools for Effective Experimentation." Details on the book can be
found at http://www.statease.com/doe_simp.html along with an
excerpt from Chapter 2 that contains the referenced material on p-
values. Also, if you own Design-Expert or Design-Ease software,
you will find a definition of "p-value" and many other statistical
terms in the Glossary under Help. The "DOE Simplified" book also
offers a glossary of terms.)
Answer:
>The p-value represents the risk of falsely rejecting the null
hypothesis. The "p >= 0.1 ==> not significant" rule is pretty
universal. Less universal may be the additional rules we use in
our workshops that "p <= 0.05 ==> significant" and "0.1> p
>0.05 ==> may be significant." How much risk you can tolerate
depends on the cost of falsely rejecting the null hypothesis.
You have to quantify this cost and decide on the risk you are
willing to accept.<
(Answered by Pat Whitcomb, Principal, Stat-Ease, Inc.)
I completely agree with Pat's assessment of generally acceptable
p-values. How these specific values evolved is somewhat murky.
In "The History of Statistics" (published by Belknap Press of
Harvard University) Stephen Stigler relates a story about the
statistician LaPlace who used a p-value of 0.01 as a measure of
significance for a study on how the moon affected barometric
pressure. This is the earliest reference I can find to p-values.
Sir Ronald Fisher, who introduced the concept of significance
testing in the early part of the 20th century, said this: "If P
is between 0.1 and 0.9 there is certainly no reason to suspect
the hypothesis tested. If it is below 0.02 it is strongly
indicated that the hypothesis fails to account for the whole
of the facts. We shall not often be astray if we draw a
conventional line at 0.05...." [from: "Statistical method for
research workers." London: Oliver and Boyd, 1950:80.] Fisher
argued that interpretation of the p-value was ultimately up to
the researcher. For example, a p-value of around 0.05 might
provide incentive to perform another experiment rather than
provide immediate resolution as to whether to accept or reject
the null hypothesis.
In "Statistics for Experimenters," the authors (Box, Hunter,
Hunter) go a bit higher in p-value than Pat (and most other
statisticians) at the upper end. They say "one begins to be
slightly suspicious of a discrepancy at the 0.20 level, somewhat
convinced of its reality at the 0.05 level, and fairly confident
of it at the 0.01 level." But they then go on to say:
"Significance testing in general has been a greatly overworked
procedure...[It's] better to provide an interval within which the
value of the parameter would be expected to lie." For this
reason we now print confidence intervals on the model
coefficients in version 6 of our Design-Ease and Design-Expert
software. You should also make use of the Point Prediction
feature in our software, which provides confidence and prediction
intervals on the predicted response(s). In the end you must
decide what to do based on the statistics and your subject
matter knowledge. Box, Hunter and Hunter acknowledge that "In
practice, an experimenter's prior belief in the possibility of
a particular type of discrepancy must affect his attitude."
(Learn more about significance testing and basic DOE by attending
the 3-day computer-intensive workshop "Experiment Design Made
Easy." Go to http://www.statease.com/clas_edme.html for a
description and links to the course outline and schedule.)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 - X-FAQ: Criteria for p-values when doing multiple pairwise
significance testing
-----Original Question-----
From: New Jersey
"I have a question which has come up with regard to the F-test as
in the "bowling" tutorial.* It was indicated that we should
never try to use the t-test to compare averages between any two
treatments UNLESS the F-test showed overall significance.
However, I would like to know exactly WHY that was said?? Even
if the overall F-test does not show significance, isn't it
possible that say one of four treatments might be significantly
better or worse than the other three, even though there is no
statistically significant difference among those other three??
I know that for some sets of data, the F test could indicate a
"prob>F" that exceeds 0.10, yet one or more of the t-test
comparisons will indicate a "prob>t" that's less than 0.05.
So I do not believe it to be a true statement that the F test
will ALWAYS indicate significance if even one of the t-test
comparisons indicates significance. However, it appears to me
that the converse is true: If the F test shows significance,
then at least one of the t-tests will always indicate
significance. I would like to be able to conclude with 95%
confidence that the t-test could still indicate significance
even if the F test is close but not quite into the significance
range. Do you get my dilemma? Can you answer this? Thanks a
lot!"
*(Refer to Design-Expert User's Guide Section 2: "One Factor
Tutorial, or http://www.statease.com/x6ug/DX02-Factor-One.pdf .)
Answer:
>The problem with doing all the individual t-tests is that of
multiple comparisons. If you do multiple comparisons, each
with a 5% risk of a type I error, the overall risk of a type
I error is much greater than 5%. The maximum risk is roughly
k (the number of comparisons) times the error risk. If you
have 5 means there are 10 pairwise t-tests and the maximum
risk of falsely rejecting the null hypothesis is 10 times 5%
or 50%. The actual risk in this case is about 29%, not 50%.
By first performing the F-test we provide protection from
type I error creep by not looking for differences unless we
know the null hypothesis has been rejected with our overall
risk set at 5%. There are also schemes that correct the
t-values to control the overall type I error. If you want to
learn more about this look in statistics text books for
"multiple comparisons".<
(Answered by Pat Whitcomb, Principal, Stat-Ease, Inc.)
PS from MJA: See what the online "Engineering Statistics
Handbook" by NIST/SEMATECH says about this issue by linking to:
http://www-09.nist.gov/div898/handbook/prc/section4/prc47.htm .
(Dear advanced readers: What can you add to clarify this FAQ and
the related one above? Mark)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 - Software Alert: Upgrade patch available for V6.04 of
Design-Expert(R) version 6 (DX6) software
If you own a permanently licensed copy of Design-Expert version 6
go to http://www.statease.com/soft_ftp.html#dx6updt for a patch
that will update your software (individual or networked) with the
latest enhancements. If you do not currently use Stat-Ease
software, download a fully-functional free trial of DX6 at
http://www.statease.com/dx6descr.html , which you can use at no
cost for 30 days.
The latest version of DX6 offers a new design option for response
surface methods called "Historical Data". This feature makes it
easy to create a blank layout to enter happenstance data for up to
10 numeric factors and 10 categorical factors and the associated
responses. Typically these will be copied into DX from a Windows-
based spreadsheet such as Excel. Then you can apply Design-
Expert's powerful tools for regression modeling, 3D graphics and
multiple response optimization.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 - Info Alert: New "Stat-Teaser" newsletter features a
bread DOE and other helpful articles
Get a free download of the June 2001 Stat-Teaser in PDF by
clicking on http://www.statease.com/newsltr.html . Find out how I
applied DOE methods to improve the performance of a bread-making
machine. The analysis revealed an unexpected interaction between
two key ingredients.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 - Events Alert: A heads-up on DOE talks and demos
Click on http://www.statease.com/events.html for a listing of
where Stat-Ease consultants will be giving talks and doing DOE
demos. The next event of international interest will be the
Joint Statistical Meetings in Atlanta this August. We hope to see
some of you there!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6 - Workshop Alert: Coming soon to Philadelphia and Seattle
On July 12th Stat-Ease travels to Philadelphia for a presentation
of "DOE Simplified" (DOES), a one-day overview of DOE based on the
book of the same name. See http://www.statease.com/does.html for
class content.
Although "DOE Simplified" is fun and informative, it's only
intended to get people started on the path to more effective
experimentation. We hope that participants will then be motivated
to take the next step by attending our "Experiment Design Made
Easy" (EDME) workshop, which will be presented next in Seattle on
July 10-12. We will return to Seattle on Sept. 13 with the
one-day "DOE Simplified" presentation.
See http://www.statease.com/clas_pub.html for a schedule and sites
for all Stat-Ease workshops open to the public. To enroll,
call Stat-Ease at 612-378-9449. Don't delay, seats sometimes fill
up fast. If spots remain available, bring along several
colleagues and take advantage of quantity discounts in tuition, or
consider bringing in an expert from Stat-Ease to teach a private
class at your site. Call us to get a quote.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I hope you learned something from this issue. Address your
questions and comments to me at:
Mark@StatEase.com
PLEASE DO NOT SEND ME REQUESTS TO SUBSCRIBE OR UNSUBSCRIBE -
FOLLOW THE INSTRUCTIONS AT THE END OF THIS MESSAGE.
Sincerely,
Mark
Mark J. Anderson, PE, CQE
Principal, Stat-Ease, Inc. (http://www.statease.com)
Minneapolis, Minnesota USA
PS. Statistics quote for the month:
"In a world that operates largely at random, coincidences are to
be expected, but each one of them must always be mistrusted."
- Line spoken by actor playing Detective Nero Wolfe, from A&E
television show, originally aired on 4/28/01.
Trademarks: Design-Ease, Design-Expert and Stat-Ease are
registered trade-marks of Stat-Ease, Inc.
Acknowledgements to contributors:
- Students of Stat-Ease training and users of Stat-Ease software
- Fellow Stat-Ease consultants Pat Whitcomb and Shari Kraber
(see http://www.statease.com/consult.html for resumes)
- Statistical advisor to Stat-Ease: Dr. Gary Oehlert
( http://www.statease.com/garyoehl.html )
- Stat-Ease programmers, especially Tryg Helseth
( http://www.statease.com/pgmstaff.html )
- Heidi Hansel, Stat-Ease communications specialist, and all the
remaining staff
DOE FAQ Alert - Copyright 2001
Stat-Ease, Inc.
All rights reserved.