MODEL FLEXIBILITY COMES FROM ADDING TERMS

Dave Doehlert

[---------------------------------------------------------------------]

The purpose of a model in DOE is to provide a way to interpolate between your data points. To save money and time, you want to run as few trials as possible and you want to know what would happen between the runs that you made.

So you choose a model, fit it to the data, and use it to predict combinations of your factors that are between the runs you made.

To get better predictions, you need a better model. I will start out here with the simplest model and work up to the models which are working well for many DOE experimenters. Then I will show how to go beyond the usual models if you have the budget and the time to do better than your competitors.

Let's keep things simple by acting as if there is only one response measured on your product which you wish to improve. In this example, call it Y.

The most extremely simple model is Y = (b0), in which (b0) is a constant no matter how the factors involved are varied. It would be absurd to believe that in your process the response doesn't change when you change the factors. But this conceptually ultrasimple model helps to understand the more sensible and realistic models.

To fit the model Y = (b0), one would make a run at any setting of the factors and measure Y. The resulting number for Y is then (b0). And then, all predictions for all settings of the factors are: Y = (b0). (Your estimate of (b0) could be improved by averaging several replicates of that setting.)

But you are experimenting because there is at least one factor that you can change that will change Y; call it (x1). The simplest effect for (x1) would be that increasing (x1) increases or decreases Y. The model for this possibility is Y = (b0) + (b1)(x1). Then (b1) can be a positive or negative number. To measure (b0) and (b1), you would make two runs, one each at a low and a high value of (x1) coded, say, to -1 and +1 in general. Then the difference in responses obtained, divided by 2, is (b1), the slope; and the average, sum/2, is the (b0). This is a two-term model in one factor.

If there are two factors, then each can have a slope: Y = (b0) + (b1)(x1) + (b2)(x2). Fitting these 3 b's requires 3 observations.

There is a possibility that the slope (b1) changes when changes are made in the level of (x2) at which b1 is measured. This can be modeled by Y = (b0) + (b1)(x1) + (b2)(x2) + (b12)(x1)(x2). Four data points are needed because you have four b coefficients in the model. Now the model is more flexible. It can come to meet your data and provide better predictions (which are interpolations).

You might expect that a simple linear slope, (b1)(x1), is too simple for your process. You are quite likely right. Although the model above predicts well in some predicted processes, it does not fit data in others. To make it more flexible, you can allow for curvilinear interpolation by (x2) terms: (b11)(x1)(x1) + (b22)(x2)(x2). For each term added, the number of runs increases by one.

Now the interpolating response surface can be more than just a plane; it can be a dome or a basin or a saddle-shaped surface. Then, whatever the physical processes might be in your industrial application, the interpolating model has a decent chance of flexing to meet your data.

Models with (xi), (xi)(xj) and (xi)(xi) terms are "quadratic models"; they are very often adequate for finding sweet spots. Run a few, fit the model and predict the rest. Pick from the predictions the best performing combination of the factors: that's the sweet spot.

If you fit quadratic models to your processes, you will almost certainly find better sweet spots than those of your competitors who don't use DOE. However, some processes are more complex than that.

Return to top of page