Stat-Ease | The DOE FAQ Alert Vol. 21 No. 2

Dear Experimenter,

Here’s a fresh set of answers to frequently asked questions (FAQs) about design of experiments (DOE); plus, timely alerts for events, publications, and software updates. Check it out!

Please let me know what you learned from this issue: I’d really appreciate hearing from you! Address your questions and comments to me at [email protected].

Please do not send me requests to subscribe or unsubscribe, follow the instructions at the end of this message.

Sincerely,
Mark J. Anderson, PE, CQE
Engineering Consultant, Stat-Ease, Inc.

PS Quote for the month:
Beware of “quantipulation”.

(Page down to the end of this e-zine to enjoy the actual quote.)

IN THIS ISSUE
Vol. 21, No. 2 - Mar/Apr 2021

FAQ
Lack of Fit Significant:
Now What?

Events Alert
Talks on Ruggedness Testing and Deploying DOE

Webinar Alert
Free Webinars - Sign Up Now to Take Advantage

Info Alert
DX13 Poisson Regression Maximizes Popcorn Output

Workshop Alert
Enroll Before Spring Classes Fill

BLOGS
StatsMadeEasy Blog
My wry look at all things statistical and/or scientific with an engineering perspective.

Also, see the Stat-Ease blog for tips on making DOE easy. For example, a recent posting provides “Cutting-Edge Tools in Design-Expert Version 13”. Take a look!

FAQ
Lack of Fit is Significant: Now What?

Original question from a Research Scientist:
“I generally understand the lack of fit test but I’m struggling to understand how to think through a model that has great statistical properties (very significant model, R^2 predicted >0.9, diagnostics look OK, high adequate precision, etc.) but has a significant lack of fit (p < 0.05). I’ve gone through the different model selection algorithms and most of the models I generate look great except all have a significant lack of fit. I don’t know if this is the correct way to think about it, but this tells me that a significant model can be built that explains most of the data, but the ‘residual’ data unexplained by the model is due to the model not fitting all of the data versus experimental error (pure error used for lack of fit calculation). Am I thinking about that correctly? Or are there other “checks” to do?”

Answer from Stat-Ease Consultant Shari Kraber:
“When I look at client data that exhibits strong lack of fit (LOF) but all other stats are good, the first thing I check is the replicates. If the replicates are very, very similar, then the lack of fit test can easily become significant, even if there are no other modeling problems. The denominator of the test is determined by the difference between the replicates and is simply very small, artificially inflating the LOF test. Maybe the replicates were done all at the same time (like when center points are all run together) causing low variation, or perhaps the center point is a standard setting and therefore the operators are very capable of keeping the process in tight control at that point. In any case, very little variation between the replicates is probably the number one cause of significant lack of fit, in the presence of all other good statistics. In this case I would likely determine that the test is not valid.

“If the replicates are good, then I think of significant lack of fit as telling me that there is more curvature to be explained somewhere in the design space. The next question is - do you care? Where is the model not fitting well? Looking at the 3D plots with the design points showing can help you find the area of the space that has more variation (design points are far away from the surface). Is this a region that you want to predict well? If so, then you may need to add some additional runs in that area to gather more data. If it is in an undesirable area, then don't worry about LOF. It may be that, if the design was a central composite design, the alpha points were set too far out and in order to model the entire space, a cubic model would be required. Do you need that whole space? Or is quadratic sufficient within the area of interest?

“All in all, when any particular statistic is not up to expectations, then using point prediction and the confirmation node becomes extra important. Decide where the optimal settings are and run a few runs at those settings. Make sure they provide the answer you are looking for. Models are approximating the real world, which doesn't follow a polynomial model anyway.”

PS I agree with Shari that if lack of fit stands out as the only problematic statistic, it need not be a ‘show stopper’. If the R^2s adjusted and predicted are positive and adequate precision exceeds the signal-to-noise ratio of 4 (as recommended in the Fit Statistics annotation), press ahead to the diagnostics. Assuming they appear normal, check out the actual lack of fit presented by the Predicted vs Actual plot. It may not be nearly so bad as you might think.

- Mark

(Learn more about model fit statistics by attending the next distance-learning presentation of Modern DOE for Process Optimization.)

Events Alert
Talks on Ruggedness Testing and Deploying DOE

At the invitation of the International Society of Six Sigma Professionals (ISSSP) I will present a webinar on “DOE for Ruggedness Testing”, Wednesday, May 12 at 2:00 pm CT (US). See more details and register here.

Shari Kraber will provide practical advice on “Deploying DOE to Predict Process Performance” to the World Conference on Quality & Improvement on May 24-28. Register here.

PS Do you need a speaker on DOE for a learning session within your company or professional society at regional, national, or international levels? If so, please get back to me. – Mark

WEBINAR ALERT
Free Webinars - Sign up Now to Take Advantage:

March 31—“New-User Intro to Design-Expert Software” by Richard Williams
April 7—“Split-Plot Pros and Cons: Dealing with a Hard-to-Change Factor” by Pat Whitcomb
May 26—"Leading-Edge Experiment Design for Aerospace" by me

Click here to view the times, descriptions and registration links for these upcoming live webinars. Sign up now to advance your DOE know-how!

INFO ALERT
DX13 Poisson Regression Maximizes Popcorn Output

Energized by new tools in version 13 of Design-Expert (DX13) for modeling counts, I tested a cellphone app against built-in timing on his microwave for minimizing unpopped kernels (UPK). DX13 paved the way to nearly perfect popcorn via its precise Poisson-regression fit. See the amazing results in my case study Experiment reveals secret to maximizing microwave popcorn.

WORKSHOP ALERT
Sharpen up on DOE—Enroll before spring classes fill

You can do no better for quickly advancing your DOE skills than attending a Stat-Ease workshop. Our expert instructors provide you with a lively and extremely informative series of lectures interspersed by valuable hands-on exercises. Enroll early to ensure your spot!

Modern DOE for Process Optimization
- April 12-16, 5 half-days online, $895
A Crash Course in Mixture Design of Experiments
- April 17, 10am US Central Time, FREE
Mixture Design for Optimal Formulations
- May 17-20, 4 half-days online, $895
Boot Camp for Experimenters
- May 19 - June 9, 3 weeks online and offline, $495

See this web page for the complete schedule of upcoming Stat-Ease distance-learning courses. To enroll in the workshop that suits you best, click Register on that webpage, or click here to contact us.

PS If you lead a group of 6 or more colleagues, save money and customize content via a private workshop. For a quote, please contact us.

“Quantipulation: The art of using unverifiable math and statistics to convince people of what you want them to believe.”

—Ron Shevlin, “Don’t Trust Those Numbers”, Forbes, 5/21/20. He also warns about “samplification” of results from non-representative data, advising that you always establish the provenance of statistics.

Stat-Ease, Design-Expert and Statistics Made Easy are registered trademarks of Stat-Ease, Inc.

Circulation: Over 4000 worldwide