The National Institute of Diabetes and Digestive and Kidney Diseases conducted a study on 768 adult female Pima Indians living near Phoenix. The following variables were recorded:
pregnant: Number of times pregnant,
glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test,
diastolic: Diastolic blood pressure (mm Hg),
triceps: Triceps skin fold thickness (mm),
insulin: 2-Hour serum insulin (mu U/ml),
bmi: Body mass index (weight in kg/(height in m2)),
diabetes: Diabetes pedigree function,
age: Age (years),
test: a test whether the patient shows signs of diabetes (coded 0 if negative, 1 if positive).
The purpose of the study was to investigate factors related to diabetes. The data can be found in the dataset pima.
Perform simple graphical and numerical summaries of the data. Can you find any obvious irregularities in the data? If you do, take appropriate steps to correct the problems.
Fit a model with the result of the diabetes test as the response and all the other variables as predictors. Can you tell whether this model fits the data?
What is the difference in the odds of testing positive for diabetes for a woman with a BMI at the first quartile compared with a woman at the third quartile, assuming that all other factors are held constant? Give a confidence interval for this difference.
Do women who test positive have higher diastolic blood pressures? Is the diastolic blood pressure significant in the model? Explain the distinction between the two questions and discuss why the answers are only apparently contradictory.