FAQ
R-Squareds too low for useful predictive model?
Original question from a Senior Application Development Engineer / Lean Sigma Blackbelt:
“My Fit Statistics only give me a 0.25 raw R-Squared and 0.25 Adjusted R-Squared. Why would I trust anything with R-Squareds that low? Yet my Stat-Ease software says ‘this model can be used to navigate the design space.’ Really?”
Answer:
First off, shift your focus slightly on R-Squareds (which are a bit low for a designed experiment): Disregard the raw one, pay some attention to the one that’s adjusted and use the Predicted R-Squared as your bottom line on adequacy of fit. For short-and-sweet breakdown on these three ‘flavors’ of R-Squared, see this Stat-Ease blog.
Based on the screen shots of the ANOVA and fit statistics you provided, I agree with software’s take that this model might be useful—the p-value being very significant and adequate precision high—well above our guideline of 4, which triggers the positive review. I’ve often seen R-squareds as low as yours from models that did prove to be useful. This happens when with highly variable results, such as one might encounter with crude measurements, poorly controlled processes, people-dependent systems and so forth. Send me your file for a more thorough look. Otherwise, if the conclusions seem sensible, be sure to confirm them with follow up runs.
Further details (provided along with the data file):
“For what it’s worth, this is not a formal DOE—I imported existing data. There's a lot of missing responses (which doesn't help) due to the data coming from various sources throughout the plant. Even so, I hope to get something meaningful out of it.”
Follow-up answer after getting the rest of the story:
OK, I see what you are up to now. That explains the low R-squareds. Looking at Evaluation, Results for Model Terms, the quality of this existing data is not too bad based on the VIFs (variance inflation factors). It’s far better than the VIFs for Stat-Ease software’s tutorial case on historical data (a worse-case example). My feeling is that there is a good chance that you will find the results useful, despite it being an unplanned “experiment.” Time will tell.
(Learn more about model-fit statistics by enrolling in the next Mixture Design for Optimal Formulations and/or Modern DOE for Process Optimization workshop.)