DOE FAQ Alert

DOE FAQ Alert Electronic Newsletter

Issue: Volume 2, Number 12
Date: December 2002
From: Mark J. Anderson, Stat-Ease, Inc.

Here's another set of frequently asked questions (FAQs) about doing design of experiments (DOE), plus alerts to timely information and free software updates. If you missed previous DOE FAQ Alerts, click on the links below. Feel free to forward this newsletter to your colleagues. They can subscribe by going to http://www.statease.com/doealertreg.html.

Here's an appetizer on how "transmitters could give fans and pundits quick stats fixes" according to "Nature" magazine at http://www.nature.com/nsu/021104/021104-6.html. Who wouldn't want more stats? However, the complexion of soccer (or "football" as it's called outside of the USA) could change completely, perhaps not for the better. The American version of football already makes use of many high-tech gadgets, such as telecommunications equipment allowing coaches to instruct their quarterback (via a helmet receiver) on what play to run. It may be just a matter of time before these sports get taken over completely by electronics. I wonder if this device could be modified so that referees could give badly behaving players a good jolt. Imagine if the fans could take control of this feature!

Here's what I cover in the body text of this DOE FAQ Alert (topics that delve into statistical detail are designated "Expert"):
1. FAQ: Explaining differences between predictive models in coded versus actual units
2. Expert-FAQ: Why predicted R-squared is not computed for two-level factorial design (Challenge to readers - analyze this yourself!)
3. Info Alerts: New Six Sigma article on DOE
4. Workshop alert: Upcoming classes, tuition increase
5. Reader contribution: DOE for Design for Six Sigma (DFSS), more wanted on this subject
PS. Quote for the month - Einstein's supposed experiment (a joke?) to explain relativity in lay person's terms (link provided to details from Scientific American - check it out!)

Best wishes for a happy holiday season! Mark

1 - FAQ: Explaining differences between predictive models in coded versus actual units

-----Original Question-----
From: California
"I have looked at the coded and uncoded equations but I have never understood how to use them to determine the level of importance of the factors. The coded equation seems to say that one factor is most important, but the actual equation seems to say other factors are most important. For example, in the analysis detailed by your two-level factorial tutorial (posted at http://www.statease.info/dx6files/manual/DX03-Factorial-Levels-Two.pdf) the equations look quite different:

Final Equation in Terms of Coded Factors:
Filtration Rate =
+70.06
+10.81 * A
+4.94 * C
+7.31 * D
-9.06 * A * C
+8.31 * A * D

Final Equation in Terms of Actual Factors:
Filtration Rate =
-36.75000
+2.37500 * Temperature
+53.54545 * Concentration
-4.96970 * Stir Rate
-1.64773 * Temperature * Concentration
+0.20152 * Temperature * Stir Rate

Can you give me a simple way to understand this?"

Answer (from Stat-Ease consultant Pat Whitcomb):
"For process understanding use coded values, because:
1. Regression coefficients tell us how the response changes relative to the intercept. The intercept in coded values is in the center of our design. In actual values the intercept can be, and usually is, far from the design space.
2. Units of measure are normalized (removed) by coding. Coefficients measure half the change from -1 to +1 for all factors."

Note to readers: FYI, Pat answered a very similar FAQ last April. You probably will find this informative as a reminder about how the models get coded and uncoded. See FAQ #1 at http://www.statease.com/news/faqalert2-4.html.

(Learn more about analyzing data from two-level factorial designs by attending the 3-day computer-intensive workshop "Experiment Design Made Easy." For a complete description see http://www.statease.com/clasedme.html. Link from this page to the course outline and schedule. Then, if you like, enroll online.)

2 - Expert-FAQ: Why predicted R-squared is not computed for two-level factorial design (Challenge to readers - analyze this yourself!)

-----Original Message-----
From: Georgia
"Why does your software list the predicted R-squared as being not available (N/A)? It also reports significant curvature in my models. What can I do about this? I am sending you my data with disguised names and coded levels for the factors. Thanks for your help, and for your good software."

Note to readers: Here's the data in standard order for the two responses that came from this questioner's 2^3 factorial design with 3 center points.

Y1:
100
12.5
40
10
45.5
24.5
16
17
18
17
21.5

Y2:
100
12
78.5
5.5
67
0
0
0
6.5
3.5
3.5

Try analyzing this for yourself. If you have access to Design-Ease® or Design-Expert® software (obtain a free trial at http://www.statease.com/soft_ftp.html), set up the design (specify 2 responses), sort it in standard order and then copy and paste the data into the empty response columns. You can then use the software to do the statistical analysis.

Answer:
You've got two remarkable responses. However, you could not see this because for both responses you picked every possible effect. That's why you got the message in the analysis of variance (ANOVA): "Case(s) with leverage of 1.0000: Pred R-Squared and PRESS statistic not defined." The PRESS (on which predicted (Pred) R-rquared is based) stands for "predicted residual sum of squares." It tries to refit your model with each point in turn (one at a time) taken out, thus providing a more acid test of how well it predicts your response. Unfortunately, PRESS cannot work when you've used up every bit of information, which is what you did by picking all the estimable effects. You performed eight unique factorial runs (centerpoints don't help estimate the factorial effects) from which you estimated the mean (overall average of response data) plus three main effects (A, B, C), three two-factor interactions (AB, AC, BC) and one three-factor interaction (ABC).

Starting over again with the analyses, I noticed that both your responses exhibited very wide ranges - from 10 to 100 for one, and 0 to 100 for the other. In such cases, it almost always helps to apply response transformations.

For the first response I applied a log transformation, which clarified the picture on the half-normal plot of effects considerably. Some main effects and two-factor interactions now stand out at the high end, while the three-factor interaction and other effects now line up with the pure error estimated from your centerpoints. Also, the ANOVA statistics come out much-improved. The residual diagnostics look good in this transformed scale. Finally, the Box-Cox plot supports the use of log.

For the second response I applied a somewhat less severe transformation, the square root, thus avoiding the problem of logging zeroes (not possible!). This did not appear to help much, but I picked the biggest effect (A) on the half-normal plot and pressed ahead with ANOVA and the diagnostics. At this stage it became immediately obvious that you've got a statistical outlier (more than 6 standard deviations off according to the Outlier t plot). After ignoring this deviant run (find out what happened!), things cleared up amazingly on the half-normal plot - two main effects emerged (A and C). The ANOVA shows no more significant curvature and the residuals look great. The Box-Cox plot
recommends the square root, so this proved to be a good choice for transformation.

This worked out so well, that I'm suspicious you set me up for this by making up data. Is this for real? Do my revised analyses make sense to you?

I got this heart-warming response back:

"Mark - Thanks for your help and for your prompt response. No, this is real data (unless someone is setting ME up) and you have saved me and my company real time, money, and head-scratching. As always, when you do good work, your reward may be more work to do. But I hope this will teach me a lesson and I will know a few more things to try. Actually, you taught me the lesson about transforms when I took your course, I was just prejudiced toward using them only when the physical situation implied them. And believe me, this one did not."

3 - Info alert: Six Sigma article on DOE

Here's a link to a short, but informative article on "How To Compare Data Sets" using analysis of variance: http://www.isixsigma.com/tools-templates/analysis-of-variance-anova/how-compare-data-sets-anova/. It shows how to do the calculations via Microsoft Excel, but it refers to output from our Design-Expert software, which provides a clear picture of the effects. I like the article because it proves how great I am at bowling. My fellow consultants at Stat-Ease don't fare so well, especially Shari, whose name got misspelled. :( The names may not be correct, but the data is reliable, at least according to this biased bowler. :)

4 - Workshop alert: Upcoming classes, tuition increase

See http://www.statease.com/clas_pub.html for schedule and site information on all Stat-Ease workshops open to the public. To enroll, click the "register online" link at our web site or call Stat-Ease at 1.612.378.9449. If spots remain available, bring along several colleagues and take advantage of quantity discounts in tuition,* or consider bringing in an expert from Stat-Ease to teach a private class at your site. Call us to get a quote.

*(Prices will be going up for individuals attending public Stat-Ease workshops in 2003. However, if you prepay by December 31, we will hold the tuition to the current level.)

5 - Reader contribution: Article posted on DOE for Design for Six Sigma (DFSS), more wanted on this subject

Peter Peterka submitted his thoughts on DOE for Six Sigma and Design for Six Sigma (DFSS) which are posted at http://www.statease.com/pubs/sixsigma&DOE.pdf. Peter formerly worked at 3M Company as a product development and improvement specialist, where he routinely used Stat-Ease software for DOE. He is now an independent consultant (see http://www.6sigma.us). I've asked Peter to follow up on his submission by detailing an actual application of DOE for DFSS. Thanks, Peter, for your contributions on this vital subject.

Stat-Ease would appreciate more submissions on DOE for Six Sigma, not only for its DOE FAQ Alert, but also on behalf of the new International Journal of Six Sigma, which we offered to help via their Editorial Advisory Board. The editors of this double-blind refereed journal (still in the works) are asking for articles on Six Sigma, preferably ones that document DOE projects with measurable results. Please send anything you've got on this subject to me at my e-mail address shown below.

I hope you learned something from this issue. Address your questions and comments to me at:

[email protected]

Mark J. Anderson, PE, CQE
Principal, Stat-Ease, Inc. (http://www.statease.com)
Minneapolis, Minnesota USA

PS. Quote for the month - Einstein's supposed experiment (a joke?) to explain relativity in lay person's terms:
"When a man sits with a pretty girl for an hour, it seems like a minute. But let him sit on a hot stove for a minute and it's longer than any hour. That's relativity."

- Albert Einstein (for the rest of the story, see Scientific American at http://makeashorterlink.com/?L24A12DF1. Be patient with this link, which got shortened from the original path via an intermediary site. It takes a moment to process.)

Trademarks: Design-Ease, Design-Expert and Stat-Ease are registered trademarks of Stat-Ease, Inc.

Acknowledgements to contributors:

- Students of Stat-Ease training and users of Stat-Ease software
- Fellow Stat-Ease consultants Pat Whitcomb and Shari Kraber (see http://www.statease.com/consult.html for resumes)
- Statistical advisor to Stat-Ease: Dr. Gary Oehlert (http://www.statease.com/garyoehl.html)
- Stat-Ease programmers, especially Tryg Helseth (http://www.statease.com/pgmstaff.html)
- Heidi Hansel, Stat-Ease marketing director, and all the remaining staff.

Interested in previous FAQ DOE Alert e-mail newsletters? To view a past issue, choose it below.

#1 - Mar 01, #2 - Apr 01, #3 - May 01, #4 - Jun 01, #5 - Jul 01 , #6 - Aug 01, #7 - Sep 01, #8 - Oct 01, #9 - Nov 01, #10 - Dec 01, #2-1 Jan 02, #2-2 Feb 02, #2-3 Mar 02, #2-4 Apr 02, #2-5 May 02, #2-6 Jun 02, #2-7 Jul 02, #2-8 Aug 02, #2-9 Sep 02, #2-10 Oct 02, #2-11 Nov 02, #2-12 Dec 02 (see above)