Assignment 2

The data is from a cross-sectional survey of US cotton workers to determine the incidence of byssinosis disease and its relation to dust in the work place and other variables. The following variables were recorded:

yes: Number of workers suffering from the disease,
no: Number of workers not suffering from the disease,
dust: Dustiness in the work place, high, medium, or low,
race: White or non-white,
sex: Male or female,
smoke: Smoker or non-smoker,
emp: Length of employment (short: <10years, medium: 10-20years, long: >20years).

The objective here is to see what factors are important in determining the incidence of byssinosis.

Use differences-in-deviance to find a good model for the data. Don't forget about possible interaction effects. Since we have not learnt about residual diagnostics, do not attempt to do them --- just find a parsimonious model.
Does your final model fit the data? In your final model, are all the Wald tests significant? Based on your answers to the two questions, explain or justify your choice of the final model.
Use your final model to interpret the relationship between the occurrence of byssinosis disease and the predictors.
How much does smoking increase the odds of getting byssinosis based on your final fitted model? Find a 95% confidence interval for this.
If the lung capacity of cotton workers to tolerate cotton and dust follows a latent continuous distribution, what distribution does this lung capacity follow in your final model? Explain your answer.