Interpreting Odds (Reading: Faraway (2006, 1st ed.), section 2.5)

¡@

Here is a data from a study on infant respiratory disease, namely the proportions of children developing bronchitis or pneumonia in their first year of life by type of feeding and sex, which may be found in Payne (1987). Let us first read into R the data and take a look of it:

> babyfood <- read.table("babyfood.txt")
> babyfood

  disease nondisease  sex   food

1      77        381  Boy Bottle

2      19        128  Boy  Suppl

3      47        447  Boy Breast

4      48        336 Girl Bottle

5      16        111 Girl  Suppl

6      31        433 Girl Breast

The predictors "sex" and "food" are cross-classifying categorical variables so that we can express the data in a format of contingency table with yx/nx in each cell as follows:

¡@ Bottle Only Some Breast with Supplyment Breast Only
Boys 77/458 19/147 47/494
Girls 48/384 16/127 31/464

The layout above with the proportion in each cell can be obtained by:

> xtabs(disease/(disease+nondisease)~sex+food,babyfood)

      food

sex        Bottle     Breast      Suppl

  Boy  0.16812227 0.09514170 0.12925170

  Girl 0.12500000 0.06681034 0.12598425

The table seems indicating that

¡@

    To draw more concrete and accurate conclusions, let us fit and examine a binomial GLM of the data:

> mdl <- glm(cbind(disease,nondisease) ~ sex + food, family=binomial,babyfood) # 0-1 dummy variables are used for this case. You can use the command "model.matrix(mdl)" to understand how sex and food are coded
> summary(mdl)

Call:

glm(formula = cbind(disease, nondisease) ~ sex + food, family = binomial, data = babyfood)

 

Deviance Residuals:

      1        2        3        4        5        6 

 0.1096  -0.5052   0.1922  -0.1342   0.5896  -0.2284 

 

Coefficients:

            Estimate Std. Error z value Pr(>|z|)   

(Intercept)  -1.6127     0.1124 -14.347  < 2e-16 ***

sexGirl      -0.3126     0.1410  -2.216   0.0267 * 

foodBreast   -0.6693     0.1530  -4.374 1.22e-05 ***

foodSuppl    -0.1725     0.2056  -0.839   0.4013   

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

    Null deviance: 26.37529  on 5  degrees of freedom

Residual deviance:  0.72192  on 2  degrees of freedom

AIC: 40.24

 

Number of Fisher Scoring iterations: 4

Notice that

In R, you can fit a model with interaction effects by:

> mdlfull <- glm(cbind(disease,nondisease) ~ sex * food, family=binomial,babyfood) # an alternative is to replayce "sex * food" by "sex + food + sex:food". You can use the command "help(formula)" to get more information.
> summary(mdlfull)

Call:

glm(formula = cbind(disease, nondisease) ~ sex * food, family = binomial, data = babyfood)

 

Deviance Residuals:

[1]  0  0  0  0  0  0

 

Coefficients:

                   Estimate Std. Error z value Pr(>|z|)    

(Intercept)        -1.59899    0.12495 -12.797  < 2e-16 ***

sexGirl            -0.34692    0.19855  -1.747 0.080591 . 

foodBreast         -0.65342    0.19780  -3.303 0.000955 ***

foodSuppl          -0.30860    0.27578  -1.119 0.263145   

sexGirl:foodBreast -0.03742    0.31225  -0.120 0.904603   

sexGirl:foodSuppl   0.31757    0.41397   0.767 0.443012   

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

    Null deviance:  2.6375e+01  on 5  degrees of freedom

Residual deviance: -7.9270e-14  on 0  degrees of freedom

AIC: 43.518

 

Number of Fisher Scoring iterations: 3

Notice that both interaction effects are insignificant, which supports our previous arguments.

¡@

    Let us now go back to the model with only main effects, i.e, mdl. We can test for the significance of the predictors by:

> drop1(mdl,test="Chi")

Single term deletions

 

Model:

cbind(disease, nondisease) ~ sex + food

       Df Deviance    AIC    LRT   Pr(Chi)   

<none>       0.722 40.240                    

sex     1    5.699 43.217  4.977   0.02569 * 

food    2   20.899 56.417 20.177 4.155e-05 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The drop1 function test each predictor relative to the full, i.e., mdl. Notice that

¡@

    Now, consider the interpretation of the coefficients. Let us start with the effect of breast feeding:

> exp(-0.6693)

[1] 0.5120669

We see that

We could compute a confidence interval for it by figuring the standard error on the odds scale; however, we get better coverage properties by computing the interval on the log-odds scale and then transforming the endpoints as follows:

> c(-0.6693-1.96*0.153,-0.6693+1.96*0.153)

[1] -0.96918 -0.36942

> exp(c(-0.96918,-0.36942))

[1] 0.3793940 0.6911351

Notice that the CI is asymmetric about the estimated effect of 0.5120669.

Confidence intervals can also be computed using profile likelihood methods:
> library(MASS)
> exp(confint(mdl))

Waiting for profiling to be done...

                2.5 %    97.5 %

(Intercept) 0.1591976 0.2474327

sexGirl     0.5536196 0.9629212

foodBreast  0.3781890 0.6895168

foodSuppl   0.5555197 1.2464284

Notice that

¡@

    As an aside, note that for small value of e, we have:

        log(x(1+e)) = log(x) + log(1+e) » log(x) + e

This approximation is reasonable for -0.25 < e < 0.25. So, for example, given the estimated foodSuppl effect=-0.1725, we can approximate the reduction in odds as about 17% relative to bottle feeding. The exact figure is:

> 1-exp(-0.173)

[1] 0.1588624

So, the approximation is good for a quick sense of the effect, but an exact calculation is necessary for results that will be presented to others.