# Weighted least square (Reading: Faraway(2005, 1st edition), 6.2)

Look at here for a description of the data that we will use in this section.

• Q: Can you figure out from the description what weights should be used for the data and why?

> p <- read.table("phy.data", header=T) # note the extra argument to cope with not having row labels
> p # take a look of the data

momentum energy crossx sd

1         4  0.345    367 17

2         6  0.287    311  9

3         8  0.251    295  9

...deleted...

10      150  0.060    192  5

Notice that

• The sd decreases along with the reduction of energy.

• Since the weights should be inversely proportional to the variance, this suggests that the weights should be set to 1/sd2.

> x <- p[, 2] # extract the energy variable, the predictor
> y <- p[, 3] # extract the crossx variable, the response
> w <- 1/p[, 4]^2 # the weights
> g <- lm(y~x, weights=w) # fit the model with w as weights. "weights" adds as an option.
> summary(g) # take a look of the results

Call:

lm(formula = y ~ x, weights = w)

Residuals:

Min         1Q     Median         3Q        Max

-2.323e+00 -8.842e-01  1.266e-06  1.390e+00  2.335e+00

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)  148.473      8.079   18.38 7.91e-08 ***

x            530.835     47.550   11.16 3.71e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.657 on 8 degrees of freedom

Multiple R-Squared: 0.9397,     Adjusted R-squared: 0.9321

F-statistic: 124.6 on 1 and 8 DF,  p-value: 3.710e-06

• We can test the significance of x by looking at the p-value from the coefficients table or by looking at the overall F-stat (they are equivalent, 11.162=124.6, because this is a simple regression case).

• But, if we want to do it the hard way, we must construct the analysis of variance table by hand as follows:

> sum(w*g\$res*g\$res) # the residual sum of squares

[1] 21.95265

> wm <- sum(w*y)/sum(w) # the weighted mean of y
> sum(w*(y-wm)*(y-wm)) # the weighted total SS

[1] 363.944

> 363.944 - 21.95265 # the regression SS

[1] 341.9914

> (341.9914/1)/(21.95265/8) # the overall F statistic

[1] 124.6287

Let's try fitting the regression without weights and see what the difference is:

> g1 <- lm(y~x) # fit the model without weights
> summary(g1) # take a look of the fitted model

Call:

lm(formula = y ~ x)

Residuals:

Min      1Q  Median      3Q     Max

-14.773  -9.319  -2.829   5.571  19.818

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)   135.00      10.08   13.40 9.21e-07 ***

x             619.71      47.68   13.00 1.16e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.69 on 8 degrees of freedom

Multiple R-Squared: 0.9548,     Adjusted R-squared: 0.9491

F-statistic: 168.9 on 1 and 8 DF,  p-value: 1.165e-06

• Compare the output to the summary of model g. Explain their differences.

• R-square: why would g1 has a higher value than g?

• σ-hat: Why are they so different? Do they estimate the same parameter?

• residuals: Why are their ranges so different?

• The two fits can be compared:

> plot(x,y) # scatter plot of x and y
> abline(g) # the fitted line of model g, solid line
> abline(g1, lty=2) # the fitted line of model g2, dashed line

• The un-weighted fit (dashed line) appears to fit the data better overall

• But remember that for lower values of energy, the variance in the response is less.

• The weighted fit (solid line) tries to catch these points with lower energy better than the others.

Weighted least square can be regarded as regressing sqrt(wi)xi on sqrt(wi)yi using ordinary least square:

> ym<-sqrt(w)*y # calculate sqrt(wi)yi
> xm<-sqrt(w)*x  # calculate sqrt(wi)xi

> g2 <- lm(ym~sqrt(w)+xm-1) # regressing sqrt(wi)xi on sqrt(wi)yi. "-1" drop the intercept.

• Note that the 1 column in X-matrix is replaced by sqrt(wi)

> summary(g2) # take a look of the results

Call:

lm(formula = ym ~ sqrt(w) + xm - 1)

Residuals:

Min         1Q     Median         3Q        Max

-2.323e+00 -8.842e-01  1.266e-06  1.390e+00  2.335e+00

Coefficients:

Estimate Std. Error t value Pr(>|t|)

sqrt(w)  148.473      8.079   18.38 7.91e-08 ***

xm       530.835     47.550   11.16 3.71e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.657 on 8 degrees of freedom

Multiple R-Squared: 0.9983,     Adjusted R-squared: 0.9978

F-statistic:  2303 on 2 and 8 DF,  p-value: 9.034e-12

• Compare the output to the summary of model g. What quantities are different? Explain why.

• To get the correct overall F-test, we can use:

> anova(lm(ym~sqrt(w)-1), g2) # the correct F-statistics

Analysis of Variance Table

Model 1: ym ~ sqrt(w) - 1

Model 2: ym ~ sqrt(w) + xm - 1

Res.Df    RSS Df Sum of Sq      F    Pr(>F)

1      9 363.94

2      8  21.95  1    341.99 124.63 3.710e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

• You can use the two RSSs in the correct F-statistic to obtain the right R-square (How?)

Let's also check their residuals.

> print(g\$res, digits=2) # the residuals of model g

1       2       3       4       5       6       7       8       9      10

35.389  10.177  13.287   0.089  -5.356  -8.209 -13.938  -5.544  -0.063  11.677

> print(g2\$res, digits=2) # the residuals of model g2

1      2      3      4      5      6      7      8      9     10

2.082  1.131  1.476  0.013 -0.765 -1.368 -2.323 -0.924 -0.013  2.335

• Why are the two groups of residuals so different?

• Which residuals is yi minus yi-hat? Which residuals is sqrt(wi)yi minus sqrt(wi)yi-hat?

• Which residuals can be used to perform diagnostics (i.e., residual analysis) and to calculate σ-hat by directly taking sum of their squares?

• Let's find out the relationship between the two groups of residuals:

> print(sqrt(w)*g\$res, digits=2) # check if it equals residuals of model g2

1      2      3      4      5      6      7      8      9     10

2.082  1.131  1.476  0.013 -0.765 -1.368 -2.323 -0.924 -0.013  2.335