Interpreting Odds (Reading: Faraway (2006), section 2.5)

¡@

Here is a data from a study on infant respiratory disease, namely the proportions of children developing bronchitis or pneumonia in their first year of life by type of feeding and sex, which may be found in Payne (1987). Let us first read into R the data and take a look of it:

> babyfood

disease nondisease  sex   food

1      77        381  Boy Bottle

2      19        128  Boy  Suppl

3      47        447  Boy Breast

4      48        336 Girl Bottle

5      16        111 Girl  Suppl

6      31        433 Girl Breast

The predictors "sex" and "food" are cross-classifying categorical variables so that we can express the data in a format of contingency table with yx/nx in each cell as follows:

 ¡@ Bottle Only Some Breast with Supplyment Breast Only Boys 77/458 19/147 47/494 Girls 48/384 16/127 31/464

The layout above with the proportion in each cell can be obtained by:

> xtabs(disease/(disease+nondisease)~sex+food,babyfood)

food

sex        Bottle     Breast      Suppl

Boy  0.16812227 0.09514170 0.12925170

Girl 0.12500000 0.06681034 0.12598425

The table seems indicating that

• breast feeding can reduce the chance of getting respiratory disease

• boys have higher probability of getting the disease than girls

¡@

To draw more concrete and accurate conclusions, let us fit and examine a binomial GLM of the data:

> mdl <- glm(cbind(disease,nondisease) ~ sex + food, family=binomial,babyfood) # 0-1 dummy variables are used for this case. You can use the command "model.matrix(mdl)" to understand how sex and food are coded
> summary(mdl)

Call:

glm(formula = cbind(disease, nondisease) ~ sex + food, family = binomial, data = babyfood)

Deviance Residuals:

1        2        3        4        5        6

0.1096  -0.5052   0.1922  -0.1342   0.5896  -0.2284

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -1.6127     0.1124 -14.347  < 2e-16 ***

sexGirl      -0.3126     0.1410  -2.216   0.0267 *

foodBreast   -0.6693     0.1530  -4.374 1.22e-05 ***

foodSuppl    -0.1725     0.2056  -0.839   0.4013

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 26.37529  on 5  degrees of freedom

Residual deviance:  0.72192  on 2  degrees of freedom

AIC: 40.24

Number of Fisher Scoring iterations: 4

Notice that

• this is a model containing only main effects. The predictor sex has two categories (Þ one main effect) and the predictor food has three categories (Þ two main effects) so that the model has four parameters (including an intercept term).

• the chi-square approximation can be expected to be accurate here due to the large sample sizes in each of the covariate classes

• the significance of the effect sexGirl indicates that there is a clear difference between boys and girls on the probabilities of getting the disease

• the significance of the effect foodBreast indicates that there is a clear difference between "breast only" and "bottle only"

• because the effect foodSuppl is insignificant, there is no strong evidence that supports "some breast with supplement" and "bottle only" are different

• the small Residual deviance shows that this model is a good fit

• Q: Is it required to add the sex-by-food interaction effects?

• To answer the question, one way is to fit a model with interaction effects (as will be shown shortly) and examine whether the interaction effects are significant.

• However, there is a simpler method for this case. Notice that a model with all main effects and interaction effects (there are 1´2=2 interaction effects) would require six parameters, which equals the number of covariate classes in this case. In other words, the model with main effects and interaction effects would be saturated with zero deviance whose degrees of freedom is also zero. Suppose that S=the model with main effects and L=the model with main effect and interaction effects. We can soon get DS-DL=0.72192 and dfS-dfL=2. Because 0.72192 is not at all large for two degrees of freedom, we would accept H0: S and may conclude that there is no evidence of significant interactions.

In R, you can fit a model with interaction effects by:

> mdlfull <- glm(cbind(disease,nondisease) ~ sex * food, family=binomial,babyfood) # an alternative is to replayce "sex * food" by "sex + food + sex:food". You can use the command "help(formula)" to get more information.
> summary(mdlfull)

Call:

glm(formula = cbind(disease, nondisease) ~ sex * food, family = binomial, data = babyfood)

Deviance Residuals:

[1]  0  0  0  0  0  0

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)        -1.59899    0.12495 -12.797  < 2e-16 ***

sexGirl            -0.34692    0.19855  -1.747 0.080591 .

foodBreast         -0.65342    0.19780  -3.303 0.000955 ***

foodSuppl          -0.30860    0.27578  -1.119 0.263145

sexGirl:foodBreast -0.03742    0.31225  -0.120 0.904603

sexGirl:foodSuppl   0.31757    0.41397   0.767 0.443012

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance:  2.6375e+01  on 5  degrees of freedom

Residual deviance: -7.9270e-14  on 0  degrees of freedom

AIC: 43.518

Number of Fisher Scoring iterations: 3

Notice that both interaction effects are insignificant, which supports our previous arguments.

¡@

Let us now go back to the model with only main effects, i.e, mdl. We can test for the significance of the predictors by:

> drop1(mdl,test="Chi")

Single term deletions

Model:

cbind(disease, nondisease) ~ sex + food

Df Deviance    AIC    LRT   Pr(Chi)

<none>       0.722 40.240

sex     1    5.699 43.217  4.977   0.02569 *

food    2   20.899 56.417 20.177 4.155e-05 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The drop1 function test each predictor relative to the full, i.e., mdl. Notice that

• 0.722 is the deviance of the model mdl, i.e., cbind(disease, nondisease) ~ sex + food

• 5.699 is the deviance of the model cbind(disease, nondisease) ~ food

• 20.899 is the deviance of the model cbind(disease, nondisease) ~ sex

• the p-value, say 4.155e-05, can be obtained by the command "pchisq(20.899-0.722,2,lower=FALSE)"

• We see that both predictors are significant in this sense.

¡@

Now, consider the interpretation of the coefficients. Let us start with the effect of breast feeding:

> exp(-0.6693)

[1] 0.5120669

We see that

• breast feeding reduces the odds of respiratory disease to 51.2% of that for bottle feeding, or

• breast feeding reduces the log-odds of respiratory disease by 0.6693 compared to that for bottle feeding

We could compute a confidence interval for it by figuring the standard error on the odds scale; however, we get better coverage properties by computing the interval on the log-odds scale and then transforming the endpoints as follows:

> c(-0.6693-1.96*0.153,-0.6693+1.96*0.153)

[1] -0.96918 -0.36942

> exp(c(-0.96918,-0.36942))

[1] 0.3793940 0.6911351

Notice that the CI is asymmetric about the estimated effect of 0.5120669.

Confidence intervals can also be computed using profile likelihood methods:
> library(MASS)
> exp(confint(mdl))

Waiting for profiling to be done...

2.5 %    97.5 %

(Intercept) 0.1591976 0.2474327

sexGirl     0.5536196 0.9629212

foodBreast  0.3781890 0.6895168

foodSuppl   0.5555197 1.2464284

Notice that

• the CI [0.3781890, 0.6895168] is slightly closer to zero than the previous one although it makes little difference for this data

• the latter result is usually more reliable for the Hauck-Donner reason discussed before

¡@

As an aside, note that for small value of e, we have:

log(x(1+e)) = log(x) + log(1+e) » log(x) + e

This approximation is reasonable for -0.25 < e < 0.25. So, for example, given the estimated foodSuppl effect=-0.1725, we can approximate the reduction in odds as about 17% relative to bottle feeding. The exact figure is:

> 1-exp(-0.173)

[1] 0.1588624

So, the approximation is good for a quick sense of the effect, but an exact calculation is necessary for results that will be presented to others.