Identifiability (Reading: Faraway(2005, 1st edition), 2.9)

 

Now, consider the savings data we analyzed in previous lab:

> saving.x <- read.table("saving.x",header=T) # read the data into R

> p15 <- saving.x[,1]; p75 <- saving.x[,2]; inc <- saving.x[,3]; gro <- saving.x[,4]; sav <- saving.x[,5] # assign columns
 

We now create a new variable

pa=100-p15-p75.

Q: What does this variable represent in reality?

> pa <- 100-p15-p75 # calculate pa
> g <- lm(sav~pa+p15+p75+inc+gro) # fit a model with pa and all other predictors
> summary(g) # take a look of the fitted model

Call:

lm(formula = sav ~ pa + p15 + p75 + inc + gro)

 

Residuals:

    Min      1Q  Median      3Q     Max

-8.2423 -2.6859 -0.2491  2.4278  9.7509

 

Coefficients: (1 not defined because of singularities)

              Estimate Std. Error t value Pr(>|t|) 

(Intercept) -1.406e+02  1.025e+02  -1.372   0.1770 

pa           1.692e+00  1.084e+00   1.561   0.1255 

p15          1.230e+00  9.773e-01   1.259   0.2146 

p75                 NA         NA      NA       NA 

inc         -3.368e-04  9.311e-04  -0.362   0.7193 

gro          4.097e-01  1.962e-01   2.088   0.0425 *

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 3.803 on 45 degrees of freedom

Multiple R-Squared: 0.3385,     Adjusted R-squared: 0.2797

F-statistic: 5.756 on 4 and 45 DF,  p-value: 0.0007902

Note that

 

If we didn't know which linear combination was causing the trouble, how would we find out? An eigen-decomposition of XTX can help:

> e <- eigen(xtx) # compute the eigen-decomposition, the "eigen" command returns eigenvalues and eigenvectors of a matrix. Use "help(eigen)" to get more information.
> e$values # the eigenvalues of XTX

[1] 1.095112e+08 1.096867e+05 3.190823e+03 3.739541e+02 1.368720e+01

[6] 7.288675e-13

> signif(e$vectors,3) # the eigenvectors of XTX, the "signif" command rounds the values to the specified number of significant digits

          [,1]    [,2]     [,3]      [,4]     [,5]      [,6]

[1,] -0.000506 -0.0141  0.00125 -0.000603  0.00989  1.00e+00

[2,] -0.034300 -0.7940 -0.59700 -0.098100 -0.05630 -1.00e-02

[3,] -0.014700 -0.6040  0.79500  0.031000  0.04800 -1.00e-02

[4,] -0.001610 -0.0164 -0.07310  0.006840  0.99700 -1.00e-02

[5,] -0.999000  0.0363  0.00906  0.001170 -0.00036 -2.56e-17

[6,] -0.001740 -0.0594 -0.08310  0.995000 -0.01390 -1.22e-14

 


Notice that

To demonstrate this, suppose we add a small random perturbation from U[-0.005,0.005] to the 3rd decimal place of variable pa, where U denotes the uniform distribution, i.e.,

pa=100-p15-p75+U:

> pae <- pa+0.001*(runif(50)-0.5) # add the small perturbation to pa, the "runif" command generate Uniform[0, 1] sudo random numbers

> p15+p75+pae # take a look of the sum of p15, p75, and pae.

 [1] 100.00002 100.00008  99.99992 100.00038 100.00006  99.99967 100.00046

 [8]  99.99957  99.99972 100.00021 100.00036 100.00009  99.99993 100.00027

[15] 100.00021 100.00005 100.00022 100.00042  99.99995  99.99995 100.00041

[22]  99.99982 100.00033 100.00019  99.99973 100.00005  99.99997 100.00005

[29] 100.00043  99.99978  99.99965  99.99952 100.00004  99.99969  99.99956

[36] 100.00036  99.99972 100.00036  99.99970  99.99959  99.99987 100.00043

[43]  99.99993  99.99993  99.99951  99.99972 100.00021 100.00017  99.99953

[50] 100.00045

> ge <- lm(sav~pae+p15+p75+inc+gro) # now, fit the same model again
> summary(ge, cor=T) # take a look of the fitted model

Call:

lm(formula = sav ~ pae + p15 + p75 + inc + gro)

 

Residuals:

    Min      1Q  Median      3Q     Max

-8.0361 -2.7051 -0.2360  2.3631  9.6476

 

Coefficients:

              Estimate Std. Error t value Pr(>|t|) 

(Intercept)  4.463e+04  1.934e+05   0.231   0.8186 

pae         -4.460e+02  1.934e+03  -0.231   0.8187 

p15         -4.465e+02  1.934e+03  -0.231   0.8185 

p75         -4.477e+02  1.934e+03  -0.231   0.8180 

inc         -3.817e-04  9.610e-04  -0.397   0.6932 

gro          4.028e-01  2.005e-01   2.009   0.0507 .

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Residual standard error: 3.843 on 44 degrees of freedom

Multiple R-Squared: 0.3393,     Adjusted R-squared: 0.2642

F-statistic: 4.518 on 5 and 44 DF,  p-value: 0.002071

 

Correlation of Coefficients:

    (Intercept) pae   p15   p75   inc 

pae -1.00                             

p15 -1.00        1.00                 

p75 -1.00        1.00  1.00           

inc -0.20        0.20  0.20  0.20     

gro -0.15        0.15  0.15  0.15  0.28

Note that (compared with model g)