Now, consider the savings data we analyzed in previous lab:
> saving.x <- read.table("saving.x",header=T) # read the data into R
> p15 <- saving.x[,1]; p75 <-
saving.x[,2]; inc <- saving.x[,3]; gro <- saving.x[,4];
sav <- saving.x[,5] # assign columns
We now create a new variable
Q: What does this variable represent in reality?
> pa <- 100-p15-p75 # calculate pa
> g <- lm(sav~pa+p15+p75+inc+gro) # fit a model with pa and all other predictors
> summary(g) # take a look of the fitted model
we get a message about one undefined coefficient, the term p75,
It is because the rank of the X-matrix is only 5, which is less than its 6 columns.
R automatically fits a reduced model when XTX is singular.
Let's take a look of the X-matrix.
> x <- model.matrix(g) # form the X matrix
> xtx <- t(x) %*% x # compute
> solve(xtx) # XTX cannot be inverted
Check whether the β-hats and t-tests for inc and gro are affected (compare with the result in the previous lab)
If we didn't know which linear combination was causing the trouble, how would we find out? An eigen-decomposition of XTX can help:
> e <- eigen(xtx) # compute the eigen-decomposition, the
"eigen" command returns eigenvalues and eigenvectors of a matrix. Use "help(eigen)"
to get more information.
> e$values # the eigenvalues of XTX
Note that only the last eigenvalue is zero (within calculation error), indicating only one linear combination existing in the columns of X.
> signif(e$vectors,3) # the eigenvectors of XTX, the "signif" command rounds the values to the specified number of significant digits
The last column, which corresponds to the zero eigenvalue, shows that
which is equivalent to
is the offending combination.
lack of identifiablity is obviously a problem but it's easy to identify and work around
more problematic are cases where we are "close to unidentifiability".
To demonstrate this, suppose we add a small random perturbation from U[-0.005,0.005] to the 3rd decimal place of variable pa, where U denotes the uniform distribution, i.e.,
> pae <- pa+0.001*(runif(50)-0.5) # add the small perturbation to pa, the "runif" command generate Uniform[0, 1] sudo random numbers
> p15+p75+pae # take a look of the sum of p15, p75, and pae.
Notice that they are not 100 any more, but very very close.
> ge <-
lm(sav~pae+p15+p75+inc+gro) # now, fit the same model again
> summary(ge, cor=T) # take a look of the fitted model
Note that (compared with model g)
all parameters can be estimated
the standard errors of pae, p15, and p75, become very large because we cannot estimate them in a stable way
also, pay attention to the correlation of β-hat, which show the sign of strong collinearity
Please do an eigen-decomposition of XTX for the case by yourself and interpret your result.