The question concerns data from a case-control study of esophageal cancer in Ile-et-Vilaine, France. The data is distributed with R and maybe obtained along with a description of the variables by :
> data(esoph)
> esoph
agegp alcgp tobgp ncases ncontrols
1 25-34 0-39g/day 0-9g/day 0 40
2 25-34 0-39g/day 10-19 0 10
3 25-34 0-39g/day 20-29 0 6
¡Kdeleted¡K
88 75+ 120+ 10-19 1 1
> help(esoph)
Fit a binomial GLM with interaction effects between all three predictors. Start from the saturated model and use backward elimination to simplify the model as far as is reasonable.
All three factors are ordered and so special contrasts have been used appropriate for ordered factors involving linear term, quadratic term, cubic term, ..., etc. Further simplification of the model is possible by eliminating some of these terms. Use the unclass() command in R to convert some or all factors to a numerical representation and show how the model may be simplified.
Does your final model fit the data? Is the test you make accurate for this data?
What is the predicted effect of moving to a category one higher in alcohol consumption? Compute a 95% confidence interval for this predicted effect.
Bearing in mind that this is a case-control study, what can be said about the predicted probability that a 25 year old who does not smoke or drink will get esophageal cancer?