Prospective and Retrospective Sampling (Reading: Faraway (2006), section 2.6)


Q: for the infant disease data discussed in the previous lab, can we use, say 77/458=0.168 to estimate the probability of having a respiratory disease for bottle-feeding boys?


We should be careful about it without knowing how the data was collected. If the data is obtained from a retrospective sampling and the disease is rare, 0.168 might be an over-estimate.


    What information is valid in a retrospective study? Let us focus on just boys who are breast or bottle fed:

> babyfood[c(1,3),c(1,2,4)]

  disease nondisease   food

1      77        381 Bottle

3      47        447 Breast

Suppose that this had been a prospective study. The predictor food is fixed first, and then the numbers of disease and non-disease boys are observed. We can find that:

Suppose that this had been a retrospective study. The total number of disease boys and that of non-disease boys are fixed first, and then the feeding types of these boys are observed. For the case, we could compute:

This shows that a retrospective design is as effective as a prospective design for estimating D.