Parameter Estimation (Reading: Faraway (2005, 1st edition), section 2.8)

Here is a data set concerning the number of species of tortoise on the various Galapagos Islands. There are 30 cases (Islands) and 7 variables in the data set. We start by reading the data into R.

> gala <- read.table("gala.data") # read the data into R
> gala # take a look

Species Endemics Area Elevation Nearest Scruz Adjacent

Baltra 58 23 25.09 346 0.6 0.6 1.84

Bartolome 31 21 1.24 109 0.6 26.3 572.33

Caldwell 3 3 0.21 114 2.8 58.7 0.78

... much deleted ...

Wolf 21 12 2.85 253 34.1 254.7 2.33

The variables are

Species	the number of species of tortoise found on the island
Endemics	the number of endemic species
Area	the area of the island (km²)
Elevation	the highest elevation of the island (m)
Nearest	the distance from the nearest island (km)
Scruz	the distance from Santa Cruz island (km)
Adjacent	the area of the adjacent island (km²)

In this dataset,

Some missing values are filled in for simplicity.
Q: Are these variable quantitative or qualitative? If quantitative, continuous or discrete?

Now fit a model. Let us start with a simple model, in which

the variable "Species" is treated as the response (Q: Is it reasonable to regard Species as a continuous variable?) and
the rest variables except "Endemics" as the predictors.
The model can be expressed as

Species_i	=	β₀ + β₁Area_i + β₂Elevation_i + β₃Nearest_i + β₄Scruz_i + β₅*Adjacent_i + ε_i,
		i=1, 2, ..., 30.

The command to fit a linear model in R is:

> gfit <- lm(Species~Area+Elevation+Nearest+Scruz+Adjacent, data=gala) # "lm" is the command in R to fit a linear model. Use "help(lm)" to understand the details of the command. The option "data=" specifies the dataset that will be used. Also, use "help(formula)" to learn how to express a model in R.
> summary(gfit) # all the usual regression stuff

Call:
lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent,

data = gala)

Residuals:

Min 1Q Median 3Q Max

-111.679 -34.898 -7.862 33.460 182.584

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 7.068221 19.154198 0.369 0.715351

Area -0.023938 0.022422 -1.068 0.296318

Elevation 0.319465 0.053663 5.953 3.82e-06 ***

Nearest 0.009144 1.054136 0.009 0.993151

Scruz -0.240524 0.215402 -1.117 0.275208

Adjacent -0.074805 0.017700 -4.226 0.000297 ***

---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 60.98 on 24 degrees of freedom

Multiple R-Squared: 0.7658, Adjusted R-squared: 0.7171

F-statistic: 15.7 on 5 and 24 DF, p-value: 6.838e-07

Notice that

There are many quantities (statistics) and many information in the output.
Q: What quantities have caught your attention?
For different datasets, you might find their most important/useful information existing in different statistics.
Q: For this dataset, which statistics/quantities are particularly important?

The default of "summay(lm.object)" does not print the correlation matrix.

> summary(gfit, cor=T) # with the option "cor=T", the output will contain the correlation matrix of the estimated parameters

> options(digits=3) # the option "digits" controls the number of digits to print

> summary(gfit, cor=T)$cor # take a look of the correlation matrix of β's estimators

(Intercept) Area Elevation Nearest Scruz Adjacent

(Intercept) 1.0000 0.3271 -0.565 -0.0431 -0.3389 0.253

Area 0.3271 1.0000 -0.801 0.2035 -0.0378 0.433

Elevation -0.5650 -0.8014 1.000 -0.2333 0.0991 -0.642

Nearest -0.0431 0.2035 -0.233 1.0000 -0.6257 0.284

Scruz -0.3389 -0.0378 0.099 -0.6257 1.0000 -0.191

Adjacent 0.2533 0.4328 -0.642 0.2839 -0.1910 1.000

Notice that

the correlation matrix is a symmetric matrix
its diagonals are 1's
Q: How to read these numbers?
- For example, cor(β₁-hat,β₂-hat)=-0.8014 shows that the estimators of Area and Elevation effects are highly negatively correlated. They have strong collinearity.
- On the other hand, cor(β₂-hat,β₄-hat)=0.099 tells us the estimators of Elevation and Scruz effects have much weaker correlation.

> summary(gfit)$cov * (summary(gfit)$sigma^2) # the covariance matrix of the β's estimators, the command "summary(gfit)$cov" returns (X^TX)^-1

(Intercept) Area Elevation Nearest Scruz Adjacent

(Intercept) 366.8833 0.140474 -0.580739 -0.86964 -1.398067 0.085879

Area 0.1405 0.000503 -0.000964 0.00481 -0.000183 0.000172

Elevation -0.5807 -0.000964 0.002880 -0.01320 0.001145 -0.000610

Nearest -0.8696 0.004811 -0.013196 1.11120 -0.142067 0.005297

Scruz -1.3981 -0.000183 0.001145 -0.14207 0.046398 -0.000728

Adjacent 0.0859 0.000172 -0.000610 0.00530 -0.000728 0.000313

Notice that in the covariance matrix

The values in the diagonal are the estimated variances of β-hat. Compare it with the Std. Error in the lm output. Q: What is the relationship between them?
Q: How are the covariance matrix related to the correlation matrix? In other words, how to obtain the correlation matrix from the covariance matrix given above?

Let us now examine the correlation between the columns of X.

> cor(model.matrix(gfit)) # the command "model.matrix" returns the model matrix X, which in this case is exactly the matrix formed by the data of the five predictor variables themselves plus the constant vector. This output shows the correlation between the columns of X.

(Intercept) Area Elevation Nearest Scruz Adjacent

(Intercept) 1 NA NA NA NA NA

Area NA 1.000 0.7537 -0.1111 -0.1008 0.1800

Elevation NA 0.754 1.0000 -0.0111 -0.0154 0.5365

Nearest NA -0.111 -0.0111 1.0000 0.6154 -0.1162

Scruz NA -0.101 -0.0154 0.6154 1.0000 0.0517

Adjacent NA 0.180 0.5365 -0.1162 0.0517 1.0000

Notice that

this correlation matrix is related to X^TX, rather than the (X^TX)^-1 in the correlation matrix of β-hat.
Check how different it is from the correlation matrix of β-hat. Q: Can you identify some relationship between the two correlation matrices?
Q: Why do we get the NA's in the output?

> cov(model.matrix(gfit)) # the covariances between the five variables.

(Intercept) Area Elevation Nearest Scruz Adjacent

(Intercept) 0 0 0.0 0.0 0 0

Area 0 746687 274595.6 -1370.4 -5925 134495

Elevation 0 274596 177750.7 -66.7 -443 195531

Nearest 0 -1370 -66.7 203.8 598 -1435

Scruz 0 -5925 -442.8 597.6 4628 3038

Adjacent 0 134495 195531.1 -1434.6 3038 747393

Check how different it is from the covariance matrix of the estimated parameters.

For practice purpose, let us also construct these quantities directly --- we define Y, X, (X^TX)^-1, %*% denotes matrix multiplication, "t()" makes the transpose, and "solve()" computes the inverse.
> y <- gala$Species # the response
> rep(1,30) # make a vector of ones

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

> x <- cbind(rep(1,30), gala[,-c(1,2)]) # bind a column of ones to all except the first two columns of the data from gala
> x <- as.matrix(x) # force x from being a data frame to a matrix x
> xtxi <- solve(t(x) %*% x) # compute (X^TX)^-1
> xtxi # it is the same as the output obtained from the command "summary(gfit)$cov"

rep(1, 30) Area Elevation Nearest Scruz Adjacent

rep(1, 30) 9.87e-02 3.78e-05 -1.56e-04 -2.34e-04 -3.76e-04 2.31e-05

Area 3.78e-05 1.35e-07 -2.59e-07 1.29e-06 -4.91e-08 4.62e-08

Elevation -1.56e-04 -2.59e-07 7.75e-07 -3.55e-06 3.08e-07 -1.64e-07

Nearest -2.34e-04 1.29e-06 -3.55e-06 2.99e-04 -3.82e-05 1.42e-06

Scruz -3.76e-04 -4.91e-08 3.08e-07 -3.82e-05 1.25e-05 -1.96e-07

Adjacent 2.31e-05 4.62e-08 -1.64e-07 1.42e-06 -1.96e-07 8.43e-08

β-hat, y-hat, ε-hat may all be computed directly or extracted from the linear model object gfit. We do it both ways here --- check whether the results match.

> beta <- xtxi %*% t(x) %*% y # the β-hat
> beta

[,1]

rep(1, 30) 7.06822

Area -0.02394

Elevation 0.31946

Nearest 0.00914

Scruz -0.24052

Adjacent -0.07480

However,

it is a bad way to compute β-hat
it is inefficient and can be very inaccurate when the predictors are strongly correlated
An alternative is to directly solve the normal equation X^TXb=X^TY.

> solve(t(x)%*%x, t(x)%*%y) # solve the normal equation

               [,1]

rep(1, 30) 7.06822

Area       -0.02394

Elevation   0.31946

Nearest     0.00914

Scruz      -0.24052

Adjacent   -0.07480

We can extract the regression quantities we need from the lm object.

> names(gfit) # the "names()" command is the way to see the components of an R object

[1] "coefficients" "residuals" "effects" "rank"

[5] "fitted.values" "assign" "qr" "df.residual"

[9] "xlevels" "call" "terms" "model"

> gfits <- summary(gfit)
> names(gfits) # more regression quantities in summary of lm object.

[1] "call" "terms" "residuals" "coefficients"

[5] "aliased" "sigma" "df" "r.squared"

[9] "adj.r.squared" "fstatistic" "cov.unscaled"

> deviance(gfit) # the "deviance" command returns RSS for a regression fitted model.

[1] 89231

Compare the quantity and the Residual standard error in the lm output. Q: how to obtain the residual standard error from the RSS? Notice that σ-hat=sqrt(RSS/(n-p)).

> gfit$coef # get β-hat from the fitted model gfit

(Intercept) Area Elevation Nearest Scruz Adjacent

7.06822 -0.02394 0.31946 0.00914 -0.24052 -0.07480

> yhat <- x %*% beta # y-hat=X(β-hat)
> cbind(yhat, gfit$fitted) # compare if they are identical

[,1] [,2]

Baltra 116.726 116.726

Bartolome -7.273 -7.273

Caldwell 29.331 29.331

... deleted ...

Wolf 26.701 26.701

> res <- y - yhat # the residuals
> cbind(res, gfit$resid) # compare if they are identical

[,1] [,2]

Baltra -58.73 -58.73

Bartolome 38.27 38.27

Caldwell -26.33 -26.33

... deleted ...

Wolf -5.70 -5.70

Notice that

the residuals and the fitted values are orthogonal.

> sum(res*yhat) # their inner product = 0

[1] 2.73e-11

actually, the residual vector is orthogonal to any columns of X (therefore, orthogonal to any linear combinations of the columns of X

> t(x) %*% res # X^T(ε-hat)

                [,1]

rep(1, 30) 4.76e-12

Area        1.54e-09

Elevation   2.91e-11

Nearest    -6.96e-13

Scruz       6.98e-11

Adjacent   -5.07e-10

We now compute the residual sum of squares (RSS), σ-hat, standard error of β-hat, R², and the correlation matrix for β-hat. Verify that the results agree with the output from before.

> sum(res*res) # the RSS

[1] 89231

> gfit$df # the degrees of freedom, it = 30-6, why?

[1] 24

> sighat <- sqrt(sum(res*res)/gfit$df) # calculate σ-hat
> sighat # take a look

[1] 61

> diag(xtxi) # the "diag" command extract the diagonal of a matrix. The output is the diagonal entries of (X^TX)^-1

rep(1, 30) Area Elevation Nearest Scruz Adjacent

9.87e-02 1.35e-07 7.75e-07 2.99e-04 1.25e-05 8.43e-08

> sqrt(diag(xtxi))*sighat # Q: what's this?

rep(1, 30) Area Elevation Nearest Scruz Adjacent

19.1542 0.0224 0.0537 1.0541 0.2154 0.0177

> summary(gfit)$coef[,2] # compare with the previous outputs

(Intercept) Area Elevation Nearest Scruz Adjacent

19.1542 0.0224 0.0537 1.0541 0.2154 0.0177

> mean(y) # calculate the mean of y

[1] 85.2

> sum((y-mean(y))^2) # the total sum of squares

[1] 381081

> 1 - sum(res*res) / sum((y-mean(y))^2) # the R²

[1] 0.766

> var(yhat)/var(y) # Q: why does it equal R²?

[,1]

[1,] 0.766

> cor(y,yhat)^2 # Q: why does it equal R²?

[,1]

[1,] 0.766

Let's calculate the correlation matrix of β-hat.

> z <- sqrt(diag(xtxi))
> xtxi / z %o% z # the correlation matrix, where %o% means outerproduct zz^T

rep(1, 30) Area Elevation Nearest Scruz Adjacent

rep(1, 30) 1.0000 0.3271 -0.565 -0.0431 -0.3389 0.253

Area 0.3271 1.0000 -0.801 0.2035 -0.0378 0.433

Elevation -0.5650 -0.8014 1.000 -0.2333 0.0991 -0.642

Nearest -0.0431 0.2035 -0.233 1.0000 -0.6257 0.284

Scruz -0.3389 -0.0378 0.099 -0.6257 1.0000 -0.191

Adjacent 0.2533 0.4328 -0.642 0.2839 -0.1910 1.000

Now we make a prediction for the number of species on an island with predictor values: Area=0.08, Elevation=93, Nearest=6.0, Scruz=12.0, and Adjacent=0.34.

> xp <- c(1, 0.08, 93, 6.0, 12.0, 0.34) # don't forget the initial one
> sum(xp*beta) # the predicted number of species

[1] 33.9