Prospective and Retrospective Sampling (Reading: Faraway (2006), section 2.6)

¡@

Q: for the infant disease data discussed in the previous lab, can we use, say 77/458=0.168 to estimate the probability of having a respiratory disease for bottle-feeding boys?

¡@

We should be careful about it without knowing how the data was collected. If the data is obtained from a retrospective sampling and the disease is rare, 0.168 might be an over-estimate.

¡@

What information is valid in a retrospective study? Let us focus on just boys who are breast or bottle fed:

> babyfood[c(1,3),c(1,2,4)]

disease nondisease   food

1      77        381 Bottle

3      47        447 Breast

Suppose that this had been a prospective study. The predictor food is fixed first, and then the numbers of disease and non-disease boys are observed. We can find that:

• given the infant is bottle fed, the log-odds of having a respiratory disease is log(77/381)=-1.60

• given the infant is breast fed, the log-odds of having a respiratory disease is log(47/447)=-2.25

• the difference between these two log-odds, i.e.,  D=-1.60-(-2.25)=0.65, represents the increased risk of respiratory disease incurred by bottle feeding relative to breast feeding. This is the log of odds ratio.

Suppose that this had been a retrospective study. The total number of disease boys and that of non-disease boys are fixed first, and then the feeding types of these boys are observed. For the case, we could compute:

• given the infant has the disease, the log-odds of feeding type is log(77/47)=0.49

• given the infant does not have the disease, the log-odds of feeding type is log(381/447)=-0.16

• the difference between the two log-odds is  D=0.49-(-0.16)=0.65, which gives the same result as in a prospective design

This shows that a retrospective design is as effective as a prospective design for estimating D.