Assignment 5

Q1. A very old problem in regression is that of predicting the height of the son from the height of the father. The available data summarizes the raw information. The father's height has been rounded to the nearest inch and the average height of the son for fathers of that height is given. The number of fathers in each category is given.

Construct a linear regression model for predicting the height of the son from the height of the father in the best manner given information available.
Can the model be simplified to
```
      height of son = height of father + error?
```
Carry out the appropriate test.

Q2. Researchers at NIST collected data on ultrasonic measurements of the depths of defects in the Alaska pipeline in the field. The depth of the defects were then re-measured in the laboratory. These measurements were performed in six different batches. In turns out that this batch effect is not significant and so can be ignored in the analysis that follows. The laboratory measurements are more accurate than the in field measurements, but more time consuming and expensive. We want to develop an regression equation for correcting the in field measurements.

Fit a regression model Lab ~ Field. Check for non-constant variance.
We wish to use weights to account for the non-constant variance. Here we split the range of Field into 12 groups of size 9 (except for the last group which has only 8 values). Within each group, we compute the the variance of Lab as varlab and the mean of Field as meanfield. Suppose pipe is the name of your dataframe, the following R code will make the need computations:
```
> i <- order(pipe$Field)
> npipe <- pipe[i,]
> ff <- gl(12,9)[-108]
> meanfield <- unlist(lapply(split(npipe$Field,ff),mean))
> varlab <- unlist(lapply(split(npipe$Lab,ff),var))
```
Suppose we guess that the the variance in the response is linked to the predictor in the following way:
```
   var(Lab) = a0 Field^a1
```
Regress log(varlab) on log(meanfield) to estimate a0 and a1. (You might choose to remove the last point). Use this to determine appropriate weights in a WLS fit of Lab on Field. Show the regression summary.

Q3. Data on the outside diameter of crankpins produced by an industrial process over several days are given. All of the crankpins should be between 0.7425 and 0.7430 inches. The numbers given in the table are in units of 0.00001 inches deviation from 0.742 inches.

When the manufacturing process is "under control", the average size of the crankpins produced should (1) fall near the middle of the specified range and (2) should not depend on time. Fit an appropriate model to see if the process is under control and test for lack of fit in the model.