Hierarchical or Nested Multinomial Responses (Reading: Faraway (2006), section 5.2)

 

Consider the data collected by Lowe et al. (1971) concerning live births with deformations of the central nervous system in south Wales. Let us first read the data into R and take a look:

> cns <- read.table("cns.txt")
> cns

            Area NoCNS An Sp Other Water      Work

1        Cardiff  4091  5  9     5   110 NonManual

2        Newport  1515  1  7     0   100 NonManual

3        Swansea  2394  9  5     0    95 NonManual

4     GlamorganE  3163  9 14     3    42 NonManual

5     GlamorganW  1979  5 10     1    39 NonManual

6     GlamorganC  4838 11 12     2   161 NonManual

7      MonmouthV  2362  6  8     4    83 NonManual

8  MonmouthOther  1604  3  6     0   122 NonManual

9        Cardiff  9424 31 33    14   110    Manual

10       Newport  4610  3 15     6   100    Manual

11       Swansea  5526 19 30     4    95    Manual

12    GlamorganE 13217 55 71    19    42    Manual

13    GlamorganW  8195 30 44    10    39    Manual

14    GlamorganC  7803 25 28    12   161    Manual

15     MonmouthV  9962 36 37    13    83    Manual

16 MonmouthOther  3172  8 13     3   122    Manual

In the dataset,

¡@

We might consider a multinomial response with the four categories: NoCNS, An, Sp, and Other. However,

We now separately develop a binomial model for whether malformation occurs and a multinomial model for the type of malformation.

¡@

We start with the binomial model. First, we accumulate the number of CNS births and plot the data with the response on the logit scale:

> cns$CNS <- cns$An+cns$Sp+cns$Other
> plot(log(CNS/NoCNS) ~ Water, cns, pch=as.character(Work))

We observe that:

¡@

Because Area is confounded with Water, we cannot put both these predictors in one model. Let us try them both in different models and compare:

> binmodw <- glm(cbind(CNS,NoCNS) ~ Water + Work, cns, family=binomial)

> binmoda <- glm(cbind(CNS,NoCNS) ~ Area + Work, cns, family=binomial)
> anova(binmodw,binmoda,test="Chi")

Analysis of Deviance Table

 

Model 1: cbind(CNS, NoCNS) ~ Water + Work

Model 2: cbind(CNS, NoCNS) ~ Area + Work

  Resid. Df Resid. Dev Df Deviance P(>|Chi|)

1        13    12.3628                     

2         7     3.0771  6   9.2857    0.1581

Notice that:

Let us take a look at the chosen model:

> summary(binmodw)

Call:

glm(formula = cbind(CNS, NoCNS) ~ Water + Work, family = binomial,

    data = cns)

 

Deviance Residuals:

     Min        1Q    Median        3Q       Max 

-2.65570  -0.30179  -0.03131   0.57213   1.32998 

 

Coefficients:

                Estimate Std. Error z value Pr(>|z|)   

(Intercept)   -4.4325803  0.0897889 -49.367  < 2e-16 ***

Water         -0.0032644  0.0009684  -3.371 0.000749 ***

WorkNonManual -0.3390577  0.0970943  -3.492 0.000479 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

    Null deviance: 41.047  on 15  degrees of freedom

Residual deviance: 12.363  on 13  degrees of freedom

AIC: 102.49

 

Number of Fisher Scoring iterations: 4

We see that:


Now, consider a multinomial model for the three malformation types conditional on a malformation having occurred. As this data is grouped, it is most convenient to present the response as a matrix formed by An, Sp, and Other:

> library(nnet)

> cmmod <- multinom(cbind(An,Sp,Other) ~ Water + Work, cns)

# weights:  12 (6 variable)

initial  value 762.436928

iter  10 value 685.762336

final  value 685.762238

converged

¡@

Let us use the AIC criterion to select important effects:

> nmod <- step(cmmod)

Start:  AIC= 1383.52

 cbind(An, Sp, Other) ~ Water + Work

 

trying - Water

# weights:  9 (4 variable)

initial  value 762.436928

iter  10 value 686.562074

final  value 686.562063

converged

trying - Work

# weights:  9 (4 variable)

initial  value 762.436928

final  value 686.580556

converged

        Df      AIC

- Water  4 1381.124

- Work   4 1381.161

<none>   6 1383.524

# weights:  9 (4 variable)

initial  value 762.436928

iter  10 value 686.562074

final  value 686.562063

converged

 

Step:  AIC= 1381.12

 cbind(An, Sp, Other) ~ Work

 

trying - Work

# weights:  6 (2 variable)

initial  value 762.436928

final  value 687.227416

converged

       Df      AIC

- Work  2 1378.455

<none>  4 1381.124

# weights:  6 (2 variable)

initial  value 762.436928

final  value 687.227416

converged

 

Step:  AIC= 1378.45

 cbind(An, Sp, Other) ~ 1

We see that:

¡@

So, we find that

¡@

¡@

¡@

¡@

¡@