Orthogonality (Reading: Faraway(2005, 1st edition), 3.6)

Here is an example of an experiment to determine the effects of

• columns temperature (temp),

• gas/liquid ratio (gas), and

• packing height (pack)

in reducing unpleasant odor (odor) of chemical product that was being sold for household use.

> odor # take a look

odor temp gas pack

1    66   -1  -1    0

2    39    1  -1    0

3    43   -1   1    0

4    49    1   1    0

5    58   -1   0   -1

6    17    1   0   -1

7    -5   -1   0    1

8   -40    1   0    1

9    65    0  -1   -1

10    7    0   1   -1

11   43    0  -1    1

12  -22    0   1    1

13  -31    0   0    0

14  -35    0   0    0

15  -26    0   0    0

Notice that

• the three predictors have been transformed from their original scale of measurement, for example

temp=(Fahrenheit-80)/40,

so the original values of the predictor were 40, 80, 120.

• the data is presented in John (1971), Statistical Design and Analysis of Experiments, and give an example of a central composite design.

odor=β0+β1(temp)+β2(gas)+β3(pack)+ε.

The X-matrix is:

> x <- as.matrix(cbind(rep(1,15),odor[,-1])) # remember to include the constant term
> x # take a look

rep(1, 15) temp gas pack

1           1   -1  -1    0

2           1    1  -1    0

3           1   -1   1    0

4           1    1   1    0

5           1   -1   0   -1

6           1    1   0   -1

7           1   -1   0    1

8           1    1   0    1

9           1    0  -1   -1

10          1    0   1   -1

11          1    0  -1    1

12          1    0   1    1

13          1    0   0    0

14          1    0   0    0

15          1    0   0    0

Check whether the inner product of any two columns is zero.

Here is the XTX matrix:

> t(x)%*%x # calculate XTX

rep(1, 15) temp gas pack

rep(1, 15)         15    0   0    0

temp                0    8   0    0

gas                 0    0   8    0

pack                0    0   0    8

Note that

• The XTX is diagonal because of orthogonality.

• Now, let's check what would happen if the temp term is analyzed using it original Fahrenheit scale.

> x[, 2] <- odor[, 2]*40+80 # change temp to its original scale

> x # take a look of the new model matrix

rep(1, 15) temp gas pack

1           1   40  -1    0

2           1  120  -1    0

3           1   40   1    0

4           1  120   1    0

5           1   40   0   -1

6           1  120   0   -1

7           1   40   0    1

8           1  120   0    1

9           1   80  -1   -1

10          1   80   1   -1

11          1   80  -1    1

12          1   80   1    1

13          1   80   0    0

14          1   80   0    0

15          1   80   0    0

> t(x)%*%x # calculate XTX

rep(1, 15)   temp gas pack

rep(1, 15)         15   1200   0    0

temp             1200 108800   0    0

gas                 0      0   8    0

pack                0      0   0    8

• Q: Why is the constant and temp terms not orthogonal any more?

• Q: Why are temp, gas, and pack terms still orthogonal?

Now fit the model.

> g <- lm(odor~temp+gas+pack, data=odor) # fit the model
> summary(g,cor=T) # take a look of the fitted model

Call:

lm(formula = odor ~ temp + gas + pack, data = odor)

Residuals:

Min      1Q  Median      3Q     Max

-50.200 -17.138   1.175  20.300  62.925

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)   15.200      9.298   1.635    0.130

temp         -12.125     12.732  -0.952    0.361

gas          -17.000     12.732  -1.335    0.209

pack         -21.375     12.732  -1.679    0.121

Residual standard error: 36.01 on 11 degrees of freedom

Multiple R-Squared: 0.3337,     Adjusted R-squared: 0.1519

F-statistic: 1.836 on 3 and 11 DF,  p-value: 0.1989

Correlation of Coefficients:

(Intercept) temp gas

temp 0.00

gas  0.00        0.00

pack 0.00        0.00 0.00

• Check out the correlation of the coefficients --- why did that happen?

• Also, note that the standard error for the three coefficients are equal due to the balanced design.

Now, let us examine the effect of dropping variables when orthogonality exists.

> g1 <- lm(odor~gas+pack,data=odor) # drop temp and fit a model
> summary(g1) # take a look of the fitted model

Call:

lm(formula = odor ~ gas + pack, data = odor)

Residuals:

Min      1Q  Median      3Q     Max

-50.200 -26.700   1.175  26.800  50.800

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)   15.200      9.262   1.641    0.127

gas          -17.000     12.683  -1.340    0.205

pack         -21.375     12.683  -1.685    0.118

Residual standard error: 35.87 on 12 degrees of freedom

Multiple R-Squared: 0.2787,     Adjusted R-squared: 0.1585

F-statistic: 2.319 on 2 and 12 DF,  p-value: 0.1408

Compare the summary of the model g1 and that of model g.

• Q: Which things have changed and which stayed the same? Explain why.

• The estimated values of β's do not change.

• The residual standard error does change slightly, which causes small changes in the std. error of β-hat, t-values, and p-values.

• But, in this case, these changes are not large enough to affect our qualitative conclusions.

• Compare it with the analysis of saving data in the previous lab and examine the effect of dropping variables when orthogonality does not exist.

Can orthogonality still hold if we fit a more complicate model such as:

 model 1: odor = β0 + β1(temp) + β2(gas) + β3(pack) + β4(temp*gas) + β5(temp*pack) + β6(gas*pack) + ε, or model 2: odor = β0 + β1(temp) + β2(gas) + β3(pack) + β4(temp*gas) + β5(temp*pack) + β6(gas*pack) + β7(temp2) + β8(gas2) + β9(pack2) + ε

> summary(lm(odor~temp+gas+pack+I(temp*gas)+I(temp*pack)+I(gas*pack), data=odor), cor=T) # fit model 1
> summary(lm(odor~temp+gas+pack+I(temp*gas)+I(temp*pack)+I(gas*pack)+I(temp^2)+I(gas^2)+I(pack^2), data=odor), cor=T) # fit model 2
Take a guess before you read the results in R. A good design should have the ability to keep terms in the model as orthogonal as possible when the fitted model becomes more and more complicate.