NTHU STAT 5230 Lab - Binomial Data: Prospective and Retrospective Sampling

Prospective and Retrospective Sampling (Reading: Faraway (2006, 1st ed.), section 2.6)

Q: for the infant disease data discussed in the previous lab, can we use, say 77/458=0.168 to estimate the probability of having a respiratory disease for bottle-feeding boys?

We should be careful about it without knowing how the data was collected. If the data is obtained from a retrospective sampling and the disease is rare, 0.168 might be an over-estimate.

What information is valid in a retrospective study? Let us focus on just boys who are breast or bottle fed:

> babyfood[c(1,3),c(1,2,4)]

disease nondisease food

1 77 381 Bottle

3 47 447 Breast

Suppose that this had been a prospective study. The predictor food is fixed first, and then the numbers of disease and non-disease boys are observed. We can find that:

given the infant is bottle fed, the log-odds of having a respiratory disease is log(77/381)=-1.60
given the infant is breast fed, the log-odds of having a respiratory disease is log(47/447)=-2.25
the difference between these two log-odds, i.e., D=-1.60-(-2.25)=0.65, represents the increased risk of respiratory disease incurred by bottle feeding relative to breast feeding. This is the log of odds ratio.

Suppose that this had been a retrospective study. The total number of disease boys and that of non-disease boys are fixed first, and then the feeding types of these boys are observed. For the case, we could compute:

given the infant has the disease, the log-odds of feeding type is log(77/47)=0.49
given the infant does not have the disease, the log-odds of feeding type is log(381/447)=-0.16
the difference between the two log-odds is D=0.49-(-0.16)=0.65, which gives the same result as in a prospective design

This shows that a retrospective design is as effective as a prospective design for estimating D.