NTHU STAT 5410 Lab - Errors in Predictors

With real data, you never know the magnitude of the effect of errors in the predictors since we don't observe the true values. In this section we will simulate some data (so we know the true values of the parameters) and then add errors to the predictor to see the effect. First we generate some (x_i, y_i) pairs and then add increasing amounts of error to x. Remember the error is random so you won't get the exact same results if you repeat this experiment.

> x <- 10*runif(50) # x~U(0,10), 50 uniform random numbers
> y <- x+rnorm(50) # b₀=0, b₁=1, e~U(0,1)
> m1 <- lm(y~x) # fit a simple linear regression model

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-2.39959 -0.64664 0.01070 0.77001 2.20880

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.13929 0.28620 0.487 0.629

x 1.00315 0.05738 17.484 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.036 on 48 degrees of freedom

Multiple R-Squared: 0.8643, Adjusted R-squared: 0.8615

F-statistic: 305.7 on 1 and 48 DF, p-value: < 2.2e-16

> z <- x+rnorm(50) # now add some standard normal error to x
> m2 <- lm(y~z)
> summary(m2) # what happened to the estimates and the other quantities?

Call:

lm(formula = y ~ z)

Residuals:

Min 1Q Median 3Q Max

-3.68212 -0.84355 0.08807 0.78287 2.70036

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.46071 0.34165 1.348 0.184

z 0.91511 0.06687 13.685 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.27 on 48 degrees of freedom

Multiple R-Squared: 0.796, Adjusted R-squared: 0.7917

F-statistic: 187.3 on 1 and 48 DF, p-value: < 2.2e-16

> z1 <- x+5*rnorm(50) # add even more error
> m3 <- lm(y~z1)
> summary(m3) # relationship between x and y is almost washed out

Call:

lm(formula = y ~ z1)

Residuals:

Min 1Q Median 3Q Max

-5.34926 -1.54357 -0.01277 1.32366 5.75679

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.3447 0.3924 8.524 3.60e-11 ***

z1 0.2443 0.0495 4.935 1.01e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.291 on 48 degrees of freedom

Multiple R-Squared: 0.3366, Adjusted R-squared: 0.3228

F-statistic: 24.35 on 1 and 48 DF, p-value: 1.006e-05

To get an idea of average behavior we need to repeat the experiment (I will do it 1000 times here). The slopes from the 1000 experiments are saved in vector bc:

> for (i in 1:1000) {
+ y <- x + rnorm(50)
+ z <- x + 5*rnorm(50)
+ g <- lm(y~z)
+ bc[i] <- g$coef[2]
+ }
> summary(bc)

Given that the variance of a standard uniform random variable is 1/12, s_d²=25 and s_x²=100/12, we'd expect the mean to be 0.25.

Errors in predictors (Reading: Faraway (2005, 1st edition), 5.1)