Testing for lack of fit (Reading: Faraway(2005, 1st edition), 6.3)

 


Variance known

 

Let's first fit the weighted least square model.

> p <- read.table("phy.data", header=T) # read the data into R

> p # take a look of the data

   momentum energy crossx sd

1         4  0.345    367 17

2         6  0.287    311  9

3         8  0.251    295  9

...deleted...

10      150  0.060    192  5

> x <- p[,2]; y <- p[,3]; w <- 1/p[,4]^2 # extract variables and calculate weights

> g <- lm(y~x,weights=w) # fit a simple regression model with w as weights
> summary(g) # take a look of the fitted model

Call:

lm(formula = y ~ x, weights = w)

 

Residuals:

       Min         1Q     Median         3Q        Max

-2.323e+00 -8.842e-01  1.266e-06  1.390e+00  2.335e+00

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  148.473      8.079   18.38 7.91e-08 ***

x            530.835     47.550   11.16 3.71e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 1.657 on 8 degrees of freedom

Multiple R-Squared: 0.9397,     Adjusted R-squared: 0.9321

F-statistic: 124.6 on 1 and 8 DF,  p-value: 3.710e-06

 

Let us try to add a quadratic term, as suggested by the residual plot above, to the model and test again:

> g1 <- lm(y ~ x + I(x^2), weights=w) # fit a 2nd-order polynomial model
> summary(g1, cor=T) # take a look of the new fitted model

Call:

lm(formula = y ~ x + I(x^2), weights = w)

 

Residuals:

     Min       1Q   Median       3Q      Max

-0.89928 -0.43508  0.01374  0.37999  1.14238

 

Coefficients:

             Estimate Std. Error t value Pr(>|t|)   

(Intercept)  183.8305     6.4591  28.461  1.7e-08 ***

x              0.9709    85.3688   0.011 0.991243   

I(x^2)      1597.5047   250.5869   6.375 0.000376 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 0.6788 on 7 degrees of freedom

Multiple R-Squared: 0.9911,     Adjusted R-squared: 0.9886

F-statistic: 391.4 on 2 and 7 DF,  p-value: 6.554e-08

 

Correlation of Coefficients:

       (Intercept) x   

x      -0.94           

I(x^2)  0.86       -0.97

 


Variance unknown

> corrosion <- read.table("corrosion.data") # read the data into R

> corrosion # take a look of the data

     Fe  loss

1  0.01 127.6

2  0.48 124.0

3  0.71 110.8

4  0.95 103.9

5  1.19 101.5

6  0.01 130.1

7  0.48 122.0

8  1.44  92.3

9  0.71 113.1

10 1.96  83.7

11 0.01 128.0

12 1.44  91.4

13 1.96  86.2

> g <- lm(loss ~ Fe, data=corrosion) # Fit a simple linear model with loss as response and Fe as predictor
> summary(g) # take a look of the fitted model

Call:

lm(formula = loss ~ Fe, data = corrosion)

 

Residuals:

    Min      1Q  Median      3Q     Max

-3.7980 -1.9464  0.2971  0.9924  5.7429

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  129.787      1.403   92.52  < 2e-16 ***

Fe           -24.020      1.280  -18.77 1.06e-09 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 3.058 on 11 degrees of freedom

Multiple R-Squared: 0.9697,     Adjusted R-squared: 0.967

F-statistic: 352.3 on 1 and 11 DF,  p-value: 1.055e-09

 

To perform the test for lack of fit,

> ga <- lm(loss ~ factor(Fe), data=corrosion) # fit the saturated model, the "factor" command treats Fe as a qualitative predictor

> summary(ga) # take a look of the fitted saturated model

Call:

lm(formula = loss ~ factor(Fe), data = corrosion)

 

Residuals:

       Min         1Q     Median         3Q        Max

-1.250e+00 -9.667e-01  9.991e-17  1.000e+00  1.533e+00

 

Coefficients:

               Estimate Std. Error t value Pr(>|t|)   

(Intercept)     128.567      0.809 158.914 4.19e-12 ***

factor(Fe)0.48   -5.567      1.279  -4.352  0.00481 **

factor(Fe)0.71  -16.617      1.279 -12.990 1.28e-05 ***

factor(Fe)0.95  -24.667      1.618 -15.245 5.03e-06 ***

factor(Fe)1.19  -27.067      1.618 -16.728 2.91e-06 ***

factor(Fe)1.44  -36.717      1.279 -28.703 1.18e-07 ***

factor(Fe)1.96  -43.617      1.279 -34.097 4.24e-08 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 1.401 on 6 degrees of freedom

Multiple R-Squared: 0.9965,     Adjusted R-squared: 0.9931

F-statistic: 287.3 on 6 and 6 DF,  p-value: 4.152e-07

Notice that

 

Let's try to add more higher order polynomial terms one-by-one until the model pass the test for lack of fit:

> g1 <- lm(loss~Fe+I(Fe^2), data=corrosion) # fit a 2nd-order polynomial model
> anova(g1, ga)
# compute the ANOVA table for models g1 and ga

Analysis of Variance Table

 

Model 1: loss ~ Fe + I(Fe^2)

Model 2: loss ~ factor(Fe)

  Res.Df     RSS Df Sum of Sq      F   Pr(>F)  

1     10 100.086                               

2      6  11.782  4    88.305 11.243 0.005949 **

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> g2 <- lm(loss~Fe+I(Fe^2)+I(Fe^3), data=corrosion) # fit a 3rd-order polynomial model
> anova(g2, ga)
# compute the ANOVA table for models g2 and ga

Analysis of Variance Table

 

Model 1: loss ~ Fe + I(Fe^2) + I(Fe^3)

Model 2: loss ~ factor(Fe)

  Res.Df    RSS Df Sum of Sq      F  Pr(>F) 

1      9 45.051                             

2      6 11.782  3    33.269 5.6476 0.03506 *

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> g3 <- lm(loss~Fe+I(Fe^2)+I(Fe^3)+I(Fe^4), data=corrosion) # fit a 4th-order polynomial model
> anova(g3, ga)
# compute the ANOVA table for models g3 and ga

Analysis of Variance Table

 

Model 1: loss ~ Fe + I(Fe^2) + I(Fe^3) + I(Fe^4)

Model 2: loss ~ factor(Fe)

  Res.Df    RSS Df Sum of Sq      F  Pr(>F) 

1      8 34.975                             

2      6 11.782  2    23.194 5.9059 0.03822 *

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> g4 <- lm(loss~Fe+I(Fe^2)+I(Fe^3)+I(Fe^4)+I(Fe^5), data=corrosion) # fit a 5th-order polynomial model
> anova(g4, ga)
# compute the ANOVA table for models g4 and ga

Analysis of Variance Table

 

Model 1: loss ~ Fe + I(Fe^2) + I(Fe^3) + I(Fe^4) + I(Fe^5)

Model 2: loss ~ factor(Fe)

  Res.Df    RSS Df Sum of Sq      F Pr(>F)

1      7 14.465                          

2      6 11.782  1     2.683 1.3663 0.2868

Finally, the 5th-order polynomial model passes the test for lack of fit.