# Identifiability (Reading: Faraway(2005, 1st edition), 2.9)

Now, consider the savings data we analyzed in previous lab:

> p15 <- saving.x[,1]; p75 <- saving.x[,2]; inc <- saving.x[,3]; gro <- saving.x[,4]; sav <- saving.x[,5] # assign columns

We now create a new variable

pa=100-p15-p75.

Q: What does this variable represent in reality?

> pa <- 100-p15-p75 # calculate pa
> g <- lm(sav~pa+p15+p75+inc+gro) # fit a model with pa and all other predictors
> summary(g) # take a look of the fitted model

Call:

lm(formula = sav ~ pa + p15 + p75 + inc + gro)

Residuals:

Min      1Q  Median      3Q     Max

-8.2423 -2.6859 -0.2491  2.4278  9.7509

Coefficients: (1 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.406e+02  1.025e+02  -1.372   0.1770

pa           1.692e+00  1.084e+00   1.561   0.1255

p15          1.230e+00  9.773e-01   1.259   0.2146

p75                 NA         NA      NA       NA

inc         -3.368e-04  9.311e-04  -0.362   0.7193

gro          4.097e-01  1.962e-01   2.088   0.0425 *

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.803 on 45 degrees of freedom

Multiple R-Squared: 0.3385,     Adjusted R-squared: 0.2797

F-statistic: 5.756 on 4 and 45 DF,  p-value: 0.0007902

Note that

• we get a message about one undefined coefficient, the term p75,

• It is because the rank of the X-matrix is only 5, which is less than its 6 columns.

• R automatically fits a reduced model when XTX is singular.

• Let's take a look of the X-matrix.

> x <- model.matrix(g) # form the X matrix

> xtx <- t(x) %*% x # compute XTX
> solve(xtx) # XTX cannot be inverted

錯誤在solve.default(xtx) : 系統計算上是獨特的: 互反條件數=3.88081e-023

• Check whether the β-hats and t-tests for inc and gro are affected (compare with the result in the previous lab)

If we didn't know which linear combination was causing the trouble, how would we find out? An eigen-decomposition of XTX can help:

> e <- eigen(xtx) # compute the eigen-decomposition, the "eigen" command returns eigenvalues and eigenvectors of a matrix. Use "help(eigen)" to get more information.
> e\$values # the eigenvalues of XTX

 1.095112e+08 1.096867e+05 3.190823e+03 3.739541e+02 1.368720e+01

 7.288675e-13

• Note that only the last eigenvalue is zero (within calculation error), indicating only one linear combination existing in the columns of X.

> signif(e\$vectors,3) # the eigenvectors of XTX, the "signif" command rounds the values to the specified number of significant digits

[,1]    [,2]     [,3]      [,4]     [,5]      [,6]

[1,] -0.000506 -0.0141  0.00125 -0.000603  0.00989  1.00e+00

[2,] -0.034300 -0.7940 -0.59700 -0.098100 -0.05630 -1.00e-02

[3,] -0.014700 -0.6040  0.79500  0.031000  0.04800 -1.00e-02

[4,] -0.001610 -0.0164 -0.07310  0.006840  0.99700 -1.00e-02

[5,] -0.999000  0.0363  0.00906  0.001170 -0.00036 -2.56e-17

[6,] -0.001740 -0.0594 -0.08310  0.995000 -0.01390 -1.22e-14

• The last column, which corresponds to the zero eigenvalue, shows that

1-(0.01)pa-(0.01)p15-(0.01)p75=0,

which is equivalent to

100-pa-p15-p75=0,

is the offending combination.

Notice that

• lack of identifiablity is obviously a problem but it's easy to identify and work around

• more problematic are cases where we are "close to unidentifiability".

To demonstrate this, suppose we add a small random perturbation from U[-0.005,0.005] to the 3rd decimal place of variable pa, where U denotes the uniform distribution, i.e.,

pa=100-p15-p75+U:

> pae <- pa+0.001*(runif(50)-0.5) # add the small perturbation to pa, the "runif" command generate Uniform[0, 1] sudo random numbers

> p15+p75+pae # take a look of the sum of p15, p75, and pae.

 100.00002 100.00008  99.99992 100.00038 100.00006  99.99967 100.00046

  99.99957  99.99972 100.00021 100.00036 100.00009  99.99993 100.00027

 100.00021 100.00005 100.00022 100.00042  99.99995  99.99995 100.00041

  99.99982 100.00033 100.00019  99.99973 100.00005  99.99997 100.00005

 100.00043  99.99978  99.99965  99.99952 100.00004  99.99969  99.99956

 100.00036  99.99972 100.00036  99.99970  99.99959  99.99987 100.00043

  99.99993  99.99993  99.99951  99.99972 100.00021 100.00017  99.99953

 100.00045

• Notice that they are not 100 any more, but very very close.

> ge <- lm(sav~pae+p15+p75+inc+gro) # now, fit the same model again
> summary(ge, cor=T) # take a look of the fitted model

Call:

lm(formula = sav ~ pae + p15 + p75 + inc + gro)

Residuals:

Min      1Q  Median      3Q     Max

-8.0361 -2.7051 -0.2360  2.3631  9.6476

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)  4.463e+04  1.934e+05   0.231   0.8186

pae         -4.460e+02  1.934e+03  -0.231   0.8187

p15         -4.465e+02  1.934e+03  -0.231   0.8185

p75         -4.477e+02  1.934e+03  -0.231   0.8180

inc         -3.817e-04  9.610e-04  -0.397   0.6932

gro          4.028e-01  2.005e-01   2.009   0.0507 .

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.843 on 44 degrees of freedom

Multiple R-Squared: 0.3393,     Adjusted R-squared: 0.2642

F-statistic: 4.518 on 5 and 44 DF,  p-value: 0.002071

Correlation of Coefficients:

(Intercept) pae   p15   p75   inc

pae -1.00

p15 -1.00        1.00

p75 -1.00        1.00  1.00

inc -0.20        0.20  0.20  0.20

gro -0.15        0.15  0.15  0.15  0.28

Note that (compared with model g)

• all parameters can be estimated

• the standard errors of pae, p15, and p75, become very large because we cannot estimate them in a stable way

• also, pay attention to the correlation of β-hat, which show the sign of strong collinearity

• Please do an eigen-decomposition of XTX for the case by yourself and interpret your result.