# Parameter Estimation (Reading: Faraway (2005, 1st edition), section 2.8)

Here is a data set concerning the number of species of tortoise on the various Galapagos Islands. There are 30 cases (Islands) and 7 variables in the data set. We start by reading the data into R.

> gala # take a look

Species Endemics    Area Elevation Nearest Scruz Adjacent

Baltra            58       23   25.09       346     0.6   0.6     1.84

Bartolome         31       21    1.24       109     0.6  26.3   572.33

Caldwell           3        3    0.21       114     2.8  58.7     0.78

... much deleted ...

Wolf              21       12    2.85       253    34.1 254.7     2.33

The variables are

 Species the number of species of tortoise found on the island Endemics the number of endemic species Area the area of the island (km2) Elevation the highest elevation of the island (m) Nearest the distance from the nearest island (km) Scruz the distance from Santa Cruz island (km) Adjacent the area of the adjacent island (km2)

In this dataset,

• Some missing values are filled in for simplicity.

• Q: Are these variable quantitative or qualitative? If quantitative, continuous or discrete?

Now fit a model. Let us start with a simple model, in which

• the variable "Species" is treated as the response (Q: Is it reasonable to regard Species as a continuous variable?) and

• the rest variables except "Endemics" as the predictors.

• The model can be expressed as

 Speciesi = β0 + β1*Areai + β2*Elevationi + β3*Nearesti + β4*Scruzi + β5*Adjacenti + εi, i=1, 2, ..., 30.

The command to fit a linear model in R is:

> gfit <- lm(Species~Area+Elevation+Nearest+Scruz+Adjacent, data=gala) # "lm" is the command in R to fit a linear model. Use "help(lm)" to understand the details of the command. The option "data=" specifies the dataset that will be used. Also, use "help(formula)" to learn how to express a model in R.
> summary(gfit) # all the usual regression stuff

Call:
lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent,

data = gala)

Residuals:

Min      1Q      Median 3Q     Max

-111.679 -34.898 -7.862 33.460 182.584

Coefficients:

Estimate  Std. Error  t value  Pr(>|t|)

(Intercept)     7.068221   19.154198    0.369  0.715351

Area           -0.023938    0.022422   -1.068  0.296318

Elevation       0.319465    0.053663    5.953  3.82e-06 ***

Nearest         0.009144    1.054136    0.009  0.993151

Scruz          -0.240524    0.215402   -1.117  0.275208

Adjacent       -0.074805    0.017700   -4.226  0.000297 ***

---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 60.98 on 24 degrees of freedom

Multiple R-Squared: 0.7658, Adjusted R-squared: 0.7171

F-statistic: 15.7 on 5 and 24 DF, p-value: 6.838e-07

Notice that

• There are many quantities (statistics) and many information in the output.

• Q: What quantities have caught your attention?

• For different datasets, you might find their most important/useful information existing in different statistics.

• Q: For this dataset, which statistics/quantities are particularly important?

The default of "summay(lm.object)" does not print the correlation matrix.

> summary(gfit, cor=T) # with the option "cor=T", the output will contain the correlation matrix of the estimated parameters

> options(digits=3) # the option "digits" controls the number of digits to print

> summary(gfit, cor=T)\$cor # take a look of the correlation matrix of β's estimators

(Intercept)    Area Elevation Nearest   Scruz Adjacent

(Intercept)      1.0000  0.3271    -0.565 -0.0431 -0.3389    0.253

Area             0.3271  1.0000    -0.801  0.2035 -0.0378    0.433

Elevation       -0.5650 -0.8014     1.000 -0.2333  0.0991   -0.642

Nearest         -0.0431  0.2035    -0.233  1.0000 -0.6257    0.284

Scruz           -0.3389 -0.0378     0.099 -0.6257  1.0000   -0.191

Adjacent         0.2533  0.4328    -0.642  0.2839 -0.1910    1.000

Notice that

• the correlation matrix is a symmetric matrix

• its diagonals are 1's

• Q: How to read these numbers?

• For example, cor(β1-hat,β2-hat)=-0.8014 shows that the estimators of Area and Elevation effects are highly negatively correlated. They have strong collinearity.

• On the other hand, cor(β2-hat,β4-hat)=0.099 tells us the estimators of Elevation and Scruz effects have much weaker correlation.

> summary(gfit)\$cov * (summary(gfit)\$sigma^2) # the covariance matrix of the β's estimators, the command "summary(gfit)\$cov" returns (XTX)-1

(Intercept)      Area Elevation  Nearest     Scruz  Adjacent

(Intercept)    366.8833  0.140474 -0.580739 -0.86964 -1.398067  0.085879

Area             0.1405  0.000503 -0.000964  0.00481 -0.000183  0.000172

Elevation       -0.5807 -0.000964  0.002880 -0.01320  0.001145 -0.000610

Nearest         -0.8696  0.004811 -0.013196  1.11120 -0.142067  0.005297

Scruz           -1.3981 -0.000183  0.001145 -0.14207  0.046398 -0.000728

Adjacent         0.0859  0.000172 -0.000610  0.00530 -0.000728  0.000313

Notice that in the covariance matrix

• The values in the diagonal are the estimated variances of β-hat. Compare it with the Std. Error in the lm output. Q: What is the relationship between them?

• Q: How are the covariance matrix related to the correlation matrix? In other words, how to obtain the correlation matrix from the covariance matrix given above?

Let us now examine the correlation between the columns of X.

> cor(model.matrix(gfit)) # the command "model.matrix" returns the model matrix X, which in this case is exactly the matrix formed by the data of the five predictor variables themselves plus the constant vector. This output shows the correlation between the columns of X.

(Intercept)   Area Elevation Nearest   Scruz Adjacent

(Intercept)           1     NA        NA      NA      NA       NA

Area                 NA  1.000    0.7537 -0.1111 -0.1008   0.1800

Elevation            NA  0.754    1.0000 -0.0111 -0.0154   0.5365

Nearest              NA -0.111   -0.0111  1.0000  0.6154  -0.1162

Scruz                NA -0.101   -0.0154  0.6154  1.0000   0.0517

Adjacent             NA  0.180    0.5365 -0.1162  0.0517   1.0000

Notice that

• this correlation matrix is related to XTX, rather than the (XTX)-1 in the correlation matrix of β-hat.

• Check how different it is from the β-hat. Q: Can you identify some relationship between the two correlation matrices?

• Q: Why do we get the NA's in the output?

> cov(model.matrix(gfit)) # the covariances between the five variables.

(Intercept)   Area Elevation Nearest Scruz Adjacent

(Intercept)           0      0       0.0     0.0     0        0

Area                  0 746687  274595.6 -1370.4 -5925   134495

Elevation             0 274596  177750.7   -66.7  -443   195531

Nearest               0  -1370     -66.7   203.8   598    -1435

Scruz                 0  -5925    -442.8   597.6  4628     3038

Adjacent              0 134495  195531.1 -1434.6  3038   747393

Check how different it is from the covariance matrix of the estimated parameters.

For practice purpose, let us also construct these quantities directly --- we define Y, X, (XTX)-1, %*% denotes matrix multiplication, "t()" makes the transpose, and "solve()" computes the inverse.
> y <- gala\$Species # the response
> rep(1,30) # make a vector of ones

 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

> x <- cbind(rep(1,30), gala[,-c(1,2)]) # bind a column of ones to all except the first two columns of the data from gala
> x <- as.matrix(x) # force x from being a data frame to a matrix x
> xtxi <- solve(t(x) %*% x) # compute (XTX)-1
> xtxi # it is the same as the output obtained from the command "summary(gfit)\$cov"

rep(1, 30)      Area Elevation   Nearest     Scruz  Adjacent

rep(1, 30)   9.87e-02  3.78e-05 -1.56e-04 -2.34e-04 -3.76e-04  2.31e-05

Area         3.78e-05  1.35e-07 -2.59e-07  1.29e-06 -4.91e-08  4.62e-08

Elevation   -1.56e-04 -2.59e-07  7.75e-07 -3.55e-06  3.08e-07 -1.64e-07

Nearest     -2.34e-04  1.29e-06 -3.55e-06  2.99e-04 -3.82e-05  1.42e-06

Scruz       -3.76e-04 -4.91e-08  3.08e-07 -3.82e-05  1.25e-05 -1.96e-07

Adjacent     2.31e-05  4.62e-08 -1.64e-07  1.42e-06 -1.96e-07  8.43e-08

β-hat, y-hat, ε-hat may all be computed directly or extracted from the linear model object gfit. We do it both ways here --- check whether the results match.

> beta <- xtxi %*% t(x) %*% y # the β-hat
> beta

[,1]

rep(1, 30)  7.06822

Area       -0.02394

Elevation   0.31946

Nearest     0.00914

Scruz      -0.24052

However,

• it is a bad way to compute β-hat

• it is inefficient and can be very inaccurate when the predictors are strongly correlated

• An alternative is to directly solve the normal equation XTXb=XTY.

> solve(t(x)%*%x, t(x)%*%y) # solve the normal equation

[,1]

rep(1, 30)  7.06822

Area       -0.02394

Elevation   0.31946

Nearest     0.00914

Scruz      -0.24052

We can extract the regression quantities we need from the lm object.

> names(gfit) # the "names()" command is the way to see the components of an R object

 "coefficients"  "residuals"     "effects"       "rank"

 "fitted.values" "assign"        "qr"            "df.residual"

 "xlevels"       "call"          "terms"         "model"

> gfits <- summary(gfit)
> names(gfits)
# more regression quantities in summary of lm object.

 "call"          "terms"         "residuals"     "coefficients"

 "aliased"       "sigma"         "df"            "r.squared"

> deviance(gfit) # the "deviance" command returns RSS for a regression fitted model.

 89231

Compare the quantity and the Residual standard error in the lm output. Q: how to obtain the residual standard error from the RSS? Notice that σ-hat=sqrt(RSS/(n-p)).

> gfit\$coef # get β-hat from the fitted model gfit

(Intercept)        Area   Elevation     Nearest       Scruz    Adjacent

7.06822    -0.02394     0.31946     0.00914    -0.24052    -0.07480

> yhat <- x %*% beta # y-hat=X(β-hat)
> cbind(yhat, gfit\$fitted) # compare if they are identical

[,1]    [,2]

Baltra       116.726 116.726

Bartolome     -7.273  -7.273

Caldwell      29.331  29.331

... deleted ...

Wolf          26.701  26.701

> res <- y - yhat # the residuals
> cbind(res, gfit\$resid) # compare if they are identical

[,1]    [,2]

Baltra        -58.73  -58.73

Bartolome      38.27   38.27

Caldwell      -26.33  -26.33

... deleted ...

Wolf           -5.70   -5.70

Notice that

• the residuals and the fitted values are orthogonal.

• > sum(res*yhat)  # their inner product = 0

 2.73e-11

• actually, the residual vector is orthogonal to any columns of X (therefore, orthogonal to any linear combinations of the columns of X

> t(x) %*% res # XT(ε-hat)

[,1]

rep(1, 30)  4.76e-12

Area        1.54e-09

Elevation   2.91e-11

Nearest    -6.96e-13

Scruz       6.98e-11

We now compute the residual sum of squares (RSS), σ-hat, standard error of β-hat, R2, and the correlation matrix for β-hat. Verify that the results agree with the output from before.

 89231

> gfit\$df # the degrees of freedom, it = 30-6, why?

 24

> sighat <- sqrt(sum(res*res)/gfit\$df) # calculate σ-hat
> sighat # take a look

 61

> diag(xtxi) # the "diag" command extract the diagonal of a matrix. The output is the diagonal entries of (XTX)-1

rep(1, 30)       Area  Elevation    Nearest      Scruz   Adjacent

9.87e-02   1.35e-07   7.75e-07   2.99e-04   1.25e-05   8.43e-08

> sqrt(diag(xtxi))*sighat  # Q: what's this?

rep(1, 30)       Area  Elevation    Nearest      Scruz   Adjacent

19.1542     0.0224     0.0537     1.0541     0.2154     0.0177

> summary(gfit)\$coef[,2]  # compare with the previous outputs

(Intercept)        Area   Elevation     Nearest       Scruz    Adjacent

19.1542      0.0224      0.0537      1.0541      0.2154      0.0177

> mean(y) # calculate the mean of y

 85.2

> sum((y-mean(y))^2) # the total sum of squares

 381081

> 1 - sum(res*res) / sum((y-mean(y))^2) # the R2

 0.766

> var(yhat)/var(y) # Q: why does it equal R2?

[,1]

[1,] 0.766

> cor(y,yhat)^2 # Q: why does it equal R2?

[,1]

[1,] 0.766

Let's calculate the correlation matrix of β-hat.

> z <- sqrt(diag(xtxi))
> xtxi / z %o% z # the correlation matrix, where %o% means outerproduct zzT

rep(1, 30)    Area Elevation Nearest   Scruz Adjacent

rep(1, 30)     1.0000  0.3271    -0.565 -0.0431 -0.3389    0.253

Area           0.3271  1.0000    -0.801  0.2035 -0.0378    0.433

Elevation     -0.5650 -0.8014     1.000 -0.2333  0.0991   -0.642

Nearest       -0.0431  0.2035    -0.233  1.0000 -0.6257    0.284

Scruz         -0.3389 -0.0378     0.099 -0.6257  1.0000   -0.191

Adjacent       0.2533  0.4328    -0.642  0.2839 -0.1910    1.000

Now we make a prediction for the number of species on an island with predictor values: Area=0.08, Elevation=93, Nearest=6.0, Scruz=12.0, and Adjacent=0.34.

> xp <- c(1, 0.08, 93, 6.0, 12.0, 0.34) # don't forget the initial one
> sum(xp*beta) # the predicted number of species

 33.9