¡@
(a) Consider
the model y_{i
}=
βx_{i} +
ε_{i
}, i=1,
..., n, where ε_{i}'s
are independent and distributed as N_{
}(0, σ^{2}x_{i}^{2}).
Find the weighted least squares estimator for
β
and its variance. Give reasons why you would not wish to use ordinary least
squares in this case.
(b) Suppose y_{1},¡K, y_{n }are independently distributed and y_{i}_{ }= βx_{i} + ε_{i} with x_{i }>_{ }0, E(ε_{i})_{ }=_{ }0, Var(ε_{i})=σ^{2}x_{i}, i = 1, ¡K, n. Find the best linear unbiased estimator (BLUE) of β and its variance.
Each case in the data set represents a pair of zones in the city of Chicago. The variable x gives travel times which were computed from bus timetables augmented by walk times from zone centroids to bus-stops (assuming a walking speed of 3 m.p.h.) and expected waiting times for the bus (which were set at half the headway, i.e., the time between successive buses). The variable y was the average of travel times as reported to the U.S. Census Bureau by n travelers.
(a) Plot y against x. What do you notice? In order to obtain a linear expression for perceived travel time in terms of computed travel times, what weights would you use? Carry out the appropriate regression exercise.
(b) Draw scatter plots of the residuals against n for the fitted model in (a) and the fitted model without using weights. Ignoring the observation with the most extreme residual, do you think you have adequately taken care of the issue of unequal variance?
(c) Is it possible to make a lack of fit test to the straight line model for this data? If so, do so. If not, explain why not.
A classical problem in regressions is that of relating heights of sons to heights of fathers. Using the data, obtain an appropriate relationship. Obviously, one needs to weight. If the number of sons for each height category were available, that variable would have provided the appropriate weights (explain why). What would you do with the data given and why?