Assignment 3

   

  1. The question concerns data from a case-control study of esophageal cancer in Ile-et-Vilaine, France. The data is distributed with R and maybe obtained along with a description of the variables by :

    > data(esoph)

    > esoph

       agegp     alcgp    tobgp ncases ncontrols

    1  25-34 0-39g/day 0-9g/day      0        40

    2  25-34 0-39g/day    10-19      0        10

    3  25-34 0-39g/day    20-29      0         6

       ¡Kdeleted¡K

    88   75+      120+    10-19      1         1

    > help(esoph)

    1. Fit a binomial GLM with interaction effects between all three predictors. Start from the saturated model and use backward elimination to simplify the model as far as is reasonable.

    2. All three factors are ordered and so special contrasts have been used appropriate for ordered factors involving linear term, quadratic term, cubic term, ..., etc. Further simplification of the model is possible by eliminating some of these terms. Use the unclass() command in R to convert some or all factors to a numerical representation and show how the model may be simplified.

    3. Does your final model fit the data? Is the test you make accurate for this data?

    4. What is the predicted effect of moving to a category one higher in alcohol consumption? Compute a 95% confidence interval for this predicted effect.

    5. Bearing in mind that this is a case-control study, what can be said about the predicted probability that a 25 year old who does not smoke or drink will get esophageal cancer?

  2. The data come from a study of breast cancer in Wisconsin (Bennet and Mangasarian, 1992). There are 681 cases of potentially cancerous tumors of which 238 are actually malignant. Determining whether a tumor is really malignant is traditionally determined by an invasive surgical procedure. The purpose of this study was to determine whether a new procedure, called fine needle aspiration, which draws only a small sample of tissue could be effective in determining tumor status. The data has the following response and nine predictors:

    The predictor values are determined by a doctor observing the cells and rating them on a scale from 1 (normal) to 10 (most abnormal) with respect to the particular characteristic.

    ¡@