Orthogonality (Reading: Faraway(2005, 1st edition), 3.6)

 

Here is an example of an experiment to determine the effects of

in reducing unpleasant odor (odor) of chemical product that was being sold for household use.

 

> odor <- read.table("odor.data", header=T) # read in the data
> odor # take a look

   odor temp gas pack

1    66   -1  -1    0

2    39    1  -1    0

3    43   -1   1    0

4    49    1   1    0

5    58   -1   0   -1

6    17    1   0   -1

7    -5   -1   0    1

8   -40    1   0    1

9    65    0  -1   -1

10    7    0   1   -1

11   43    0  -1    1

12  -22    0   1    1

13  -31    0   0    0

14  -35    0   0    0

15  -26    0   0    0

Notice that

 

Suppose that we are interested in fitting the model:

odor=β0+β1(temp)+β2(gas)+β3(pack)+ε.

The X-matrix is:

> x <- as.matrix(cbind(rep(1,15),odor[,-1])) # remember to include the constant term
> x # take a look

   rep(1, 15) temp gas pack

1           1   -1  -1    0

2           1    1  -1    0

3           1   -1   1    0

4           1    1   1    0

5           1   -1   0   -1

6           1    1   0   -1

7           1   -1   0    1

8           1    1   0    1

9           1    0  -1   -1

10          1    0   1   -1

11          1    0  -1    1

12          1    0   1    1

13          1    0   0    0

14          1    0   0    0

15          1    0   0    0

Check whether the inner product of any two columns is zero.

 

Here is the XTX matrix:

> t(x)%*%x # calculate XTX

           rep(1, 15) temp gas pack

rep(1, 15)         15    0   0    0

temp                0    8   0    0

gas                 0    0   8    0

pack                0    0   0    8

Note that

Now fit the model.

> g <- lm(odor~temp+gas+pack, data=odor) # fit the model
> summary(g,cor=T) # take a look of the fitted model

Call:

lm(formula = odor ~ temp + gas + pack, data = odor)

 

Residuals:

    Min      1Q  Median      3Q     Max

-50.200 -17.138   1.175  20.300  62.925

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)

(Intercept)   15.200      9.298   1.635    0.130

temp         -12.125     12.732  -0.952    0.361

gas          -17.000     12.732  -1.335    0.209

pack         -21.375     12.732  -1.679    0.121

 

Residual standard error: 36.01 on 11 degrees of freedom

Multiple R-Squared: 0.3337,     Adjusted R-squared: 0.1519

F-statistic: 1.836 on 3 and 11 DF,  p-value: 0.1989

 

Correlation of Coefficients:

     (Intercept) temp gas

temp 0.00                

gas  0.00        0.00    

pack 0.00        0.00 0.00

Now, let us examine the effect of dropping variables when orthogonality exists.

> g1 <- lm(odor~gas+pack,data=odor) # drop temp and fit a model 
> summary(g1) # take a look of the fitted model

Call:

lm(formula = odor ~ gas + pack, data = odor)

 

Residuals:

    Min      1Q  Median      3Q     Max

-50.200 -26.700   1.175  26.800  50.800

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)

(Intercept)   15.200      9.262   1.641    0.127

gas          -17.000     12.683  -1.340    0.205

pack         -21.375     12.683  -1.685    0.118

 

Residual standard error: 35.87 on 12 degrees of freedom

Multiple R-Squared: 0.2787,     Adjusted R-squared: 0.1585

F-statistic: 2.319 on 2 and 12 DF,  p-value: 0.1408

Compare the summary of the model g1 and that of model g.

Can orthogonality still hold if we fit a more complicate model such as:

model 1: odor =  β0 + β1(temp) + β2(gas) + β3(pack) + β4(temp*gas) + β5(temp*pack) + β6(gas*pack) + ε, or
model 2: odor =  β0 + β1(temp) + β2(gas) + β3(pack) + β4(temp*gas) + β5(temp*pack) + β6(gas*pack) + β7(temp2) + β8(gas2) + β9(pack2) + ε

> summary(lm(odor~temp+gas+pack+I(temp*gas)+I(temp*pack)+I(gas*pack), data=odor), cor=T) # fit model 1
> summary(lm(odor~temp+gas+pack+I(temp*gas)+I(temp*pack)+I(gas*pack)+I(temp^2)+I(gas^2)+I(pack^2), data=odor), cor=T) # fit model 2
Take a guess before you read the results in R. A good design should have the ability to keep terms in the model as orthogonal as possible when the fitted model becomes more and more complicate.