Here is an example of an experiment to determine the effects of
columns temperature (temp),
gas/liquid ratio (gas), and
packing height (pack)
in reducing unpleasant odor (odor) of chemical product that was being sold for household use.
> odor <- read.table("odor.data", header=T) #
read in the data
> odor # take a look
odor temp gas pack
1 66 -1 -1 0
2 39 1 -1 0
3 43 -1 1 0
4 49 1 1 0
5 58 -1 0 -1
6 17 1 0 -1
7 -5 -1 0 1
8 -40 1 0 1
9 65 0 -1 -1
10 7 0 1 -1
11 43 0 -1 1
12 -22 0 1 1
13 -31 0 0 0
14 -35 0 0 0
15 -26 0 0 0
Notice that
the three predictors have been transformed from their original scale of measurement, for example
temp=(Fahrenheit-80)/40,
so the original values of the predictor were 40, 80, 120.
the data is presented in John (1971), Statistical Design and Analysis of Experiments, and give an example of a central composite design.
Suppose that we are interested in fitting the model:
odor=β0+β1(temp)+β2(gas)+β3(pack)+ε.
The X-matrix is:
> x <- as.matrix(cbind(rep(1,15),odor[,-1])) # remember to include the constant term
> x # take a look
rep(1, 15) temp gas pack
1 1 -1 -1 0
2 1 1 -1 0
3 1 -1 1 0
4 1 1 1 0
5 1 -1 0 -1
6 1 1 0 -1
7 1 -1 0 1
8 1 1 0 1
9 1 0 -1 -1
10 1 0 1 -1
11 1 0 -1 1
12 1 0 1 1
13 1 0 0 0
14 1 0 0 0
15 1 0 0 0
Check whether the inner product of any two columns is zero.
Here is the XTX matrix:
> t(x)%*%x # calculate XTX
rep(1, 15) temp gas pack
rep(1, 15) 15 0 0 0
temp 0 8 0 0
gas 0 0 8 0
pack 0 0 0 8
Note that
The XTX is diagonal because of orthogonality.
Now, let's check what would happen if the temp term is analyzed using it original Fahrenheit scale.
> x[, 2] <- odor[, 2]*40+80 # change temp to its original scale
> x # take a look of the new model matrix
rep(1, 15) temp gas pack
1 1 40 -1 0
2 1 120 -1 0
3 1 40 1 0
4 1 120 1 0
5 1 40 0 -1
6 1 120 0 -1
7 1 40 0 1
8 1 120 0 1
9 1 80 -1 -1
10 1 80 1 -1
11 1 80 -1 1
12 1 80 1 1
13 1 80 0 0
14 1 80 0 0
15 1 80 0 0
> t(x)%*%x # calculate XTX
rep(1, 15) temp gas pack
rep(1, 15) 15 1200 0 0
temp 1200 108800 0 0
gas 0 0 8 0
pack 0 0 0 8
Q: Why is the constant and temp terms not orthogonal any more?
Q: Why are temp, gas, and pack terms still orthogonal?
Now fit the model.
> g <- lm(odor~temp+gas+pack, data=odor) # fit the
model
> summary(g,cor=T) # take a look of the fitted model
Call:
lm(formula = odor ~ temp + gas + pack, data = odor)
Residuals:
Min 1Q Median 3Q Max
-50.200 -17.138 1.175 20.300 62.925
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.200 9.298 1.635 0.130
temp -12.125 12.732 -0.952 0.361
gas -17.000 12.732 -1.335 0.209
pack -21.375 12.732 -1.679 0.121
Residual standard error: 36.01 on 11 degrees of freedom
Multiple R-Squared: 0.3337, Adjusted R-squared: 0.1519
F-statistic: 1.836 on 3 and 11 DF, p-value: 0.1989
Correlation of Coefficients:
(Intercept) temp gas
temp 0.00
gas 0.00 0.00
pack 0.00 0.00 0.00
Check out the correlation of the coefficients --- why did that happen?
Also, note that the standard error for the three coefficients are equal due to the balanced design.
Now, let us examine the effect of dropping variables when orthogonality exists.
> g1 <-
lm(odor~gas+pack,data=odor) # drop temp and fit a
model
> summary(g1) # take a look of the fitted model
Call:
lm(formula = odor ~ gas + pack, data = odor)
Residuals:
Min 1Q Median 3Q Max
-50.200 -26.700 1.175 26.800 50.800
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.200 9.262 1.641 0.127
gas -17.000 12.683 -1.340 0.205
pack -21.375 12.683 -1.685 0.118
Residual standard error: 35.87 on 12 degrees of freedom
Multiple R-Squared: 0.2787, Adjusted R-squared: 0.1585
F-statistic: 2.319 on 2 and 12 DF, p-value: 0.1408
Compare the summary of the model g1 and that of model g.
Q: Which things have changed and which stayed the same? Explain why.
The estimated values of β's do not change.
The residual standard error does change slightly, which causes small changes in the std. error of β-hat, t-values, and p-values.
But, in this case, these changes are not large enough to affect our qualitative conclusions.
Compare it with the analysis of saving data in the previous lab and examine the effect of dropping variables when orthogonality does not exist.
Can orthogonality still hold if we fit a more complicate model such as:
model 1: | odor = | β0 + β1(temp) + β2(gas) + β3(pack) + β4(temp*gas) + β5(temp*pack) + β6(gas*pack) + ε, or |
model 2: | odor = | β0 + β1(temp) + β2(gas) + β3(pack) + β4(temp*gas) + β5(temp*pack) + β6(gas*pack) + β7(temp2) + β8(gas2) + β9(pack2) + ε |
> summary(lm(odor~temp+gas+pack+I(temp*gas)+I(temp*pack)+I(gas*pack),
data=odor), cor=T)
# fit model 1
> summary(lm(odor~temp+gas+pack+I(temp*gas)+I(temp*pack)+I(gas*pack)+I(temp^2)+I(gas^2)+I(pack^2),
data=odor), cor=T)
# fit model 2
Take a guess before you read the results in R. A good design should have the
ability to keep terms in the model as orthogonal as possible when the fitted
model becomes more and more complicate.