Ordinal Variables (Reading: Faraway (2006), section 4.5)

ˇ@

Consider a data from the 1996 American National Election Study (Rosenstone et al., 1997). Let us first read the data into R and take a look:

> nes96 <- read.table("nes96.txt")

> nes96

    popul TVnews selfLR ClinLR DoleLR     PID age  educ    income    vote

1       0      7 extCon extLib    Con  strRep  36    HS  $3Kminus    Dole

2     190      1 sliLib sliLib sliCon weakDem  20  Coll  $3Kminus Clinton

3      31      7    Lib    Lib    Con weakDem  24 BAdeg  $3Kminus Clinton

...deleted...

944    18      7    Mod    Lib    Con  indind  61 MAdeg $105Kplus    Dole

ˇ@

Consider just the data on party affiliation (PID) and level of education (educ). The default of R sorts the levels in alphabetical order:

> levels(nes96$PID)

[1] "indDem"  "indind"  "indRep"  "strDem"  "strRep"  "weakDem" "weakRep"

> levels(nes96$educ)

[1] "BAdeg"  "CCdeg"  "Coll"   "HS"     "HSdrop" "MAdeg"  "MS" 

Because both variables are ordinal in this example, we need to arrange the levels according to their natural order:

> nes96$PID <- factor(nes96$PID, levels=c("strDem", "weakDem", "indDem", "indind", "indRep", "weakRep", "strRep"))

> levels(nes96$PID)

[1] "strDem"  "weakDem" "indDem"  "indind"  "indRep"  "weakRep" "strRep"

> nes96$educ <- factor(nes96$educ, levels=c("MS", "HSdrop", "HS", "Coll", "CCdeg", "BAdeg", "MAdeg"))

> levels(nes96$educ)

[1] "MS"     "HSdrop" "HS"     "Coll"   "CCdeg"  "BAdeg"  "MAdeg"

Now, the orders of the levels are correctly reflected.

ˇ@

W can construct a two-way table for the two variables:

> xtabs( ~ PID + educ, nes96)

         educ

PID       MS HSdrop HS Coll CCdeg BAdeg MAdeg

  strDem   5     19 59   38    17    40    22

  weakDem  4     10 49   36    17    41    23

  indDem   1      4 28   15    13    27    20

  indind   0      3 12    9     3     6     4

  indRep   2      7 23   16     8    22    16

  weakRep  0      5 35   40    15    38    17

  strRep   1      4 42   33    17    53    25

(Q: what information have you found from the contingency table?) But, we need to convert the table to a dataframe of R with one count per line to enable model fitting:

> (partyed <- as.data.frame.table(xtabs( ~ PID + educ, nes96)))

       PID   educ Freq

1   strDem     MS    5

2  weakDem     MS    4

3   indDem     MS    1

ˇKdeletedˇK

49  strRep  MAdeg   25

ˇ@

If we treat PID and educ as nominal variables and fit a nominal-by-nominal Poisson model containing only main effect terms:

> nomod <- glm(Freq ~ PID + educ, partyed, family= poisson)

> summary(nomod)

Call:

glm(formula = Freq ~ PID + educ, family = poisson, data = partyed)

 

Deviance Residuals:

    Min       1Q   Median       3Q      Max 

-2.0598  -0.6225  -0.1276   0.5956   2.1781 

 

Coefficients:

            Estimate Std. Error z value Pr(>|z|)   

(Intercept)   1.0131     0.2844   3.563 0.000367 ***

PIDweakDem   -0.1054     0.1027  -1.026 0.305125   

PIDindDem    -0.6162     0.1194  -5.160 2.47e-07 ***

PIDindind    -1.6874     0.1790  -9.429  < 2e-16 ***

PIDindRep    -0.7550     0.1251  -6.038 1.56e-09 ***

PIDweakRep   -0.2877     0.1080  -2.663 0.007735 **

PIDstrRep    -0.1335     0.1035  -1.290 0.197038   

educHSdrop    1.3863     0.3101   4.471 7.80e-06 ***

educHS        2.9485     0.2845  10.363  < 2e-16 ***

educColl      2.6662     0.2868   9.295  < 2e-16 ***

educCCdeg     1.9349     0.2967   6.521 6.98e-11 ***

educBAdeg     2.8600     0.2852  10.029  < 2e-16 ***

educMAdeg     2.2792     0.2912   7.827 4.99e-15 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for poisson family taken to be 1)

 

    Null deviance: 626.798  on 48  degrees of freedom

Residual deviance:  40.743  on 36  degrees of freedom

AIC: 276.44

 

Number of Fisher Scoring iterations: 5

We see that the model has a deviance of 40.742 on 36 degrees of freedom, which gives a p-value:

> pchisq(deviance(nomod),df.residual(nomod),lower=F)

[1] 0.2696086

Therefore, the Poisson GLM with only main effect terms is a good fit and there is no evidence against independence.

ˇ@

However, we can take advantage of the ordinal structure of both variables and define some scores. As there seems to be no strong reason to the contrary, we assign evenly spaced scores, i.e., 1 to 7, for the levels of both PID and educ:

> partyed$oPID <- unclass(partyed$PID)
> partyed$oeduc <- unclass(partyed$educ)
Now, let us fit the linear-by-linear association model:

> ormod <- glm(Freq ~ PID + educ + I(oPID*oeduc), partyed, family= poisson)
> summary(ormod)

Call:

glm(formula = Freq ~ PID + educ + I(oPID * oeduc), family = poisson,

    data = partyed)

 

Deviance Residuals:

     Min        1Q    Median        3Q       Max 

-1.79607  -0.49952  -0.09936   0.58966   1.99271 

 

Coefficients:

                 Estimate Std. Error z value Pr(>|z|)   

(Intercept)      1.245874   0.290541   4.288 1.80e-05 ***

PIDweakDem      -0.231690   0.109657  -2.113 0.034613 * 

PIDindDem       -0.870935   0.142609  -6.107 1.01e-09 ***

PIDindind       -2.072665   0.214863  -9.646  < 2e-16 ***

PIDindRep       -1.272907   0.203997  -6.240 4.38e-10 ***

PIDweakRep      -0.940284   0.231496  -4.062 4.87e-05 ***

PIDstrRep       -0.922944   0.270247  -3.415 0.000637 ***

educHSdrop       1.288644   0.311257   4.140 3.47e-05 ***

educHS           2.749103   0.290065   9.478  < 2e-16 ***

educColl         2.360892   0.300152   7.866 3.67e-15 ***

educCCdeg        1.519473   0.321228   4.730 2.24e-06 ***

educBAdeg        2.330228   0.327217   7.121 1.07e-12 ***

educMAdeg        1.630806   0.353532   4.613 3.97e-06 ***

I(oPID * oeduc)  0.028745   0.009062   3.172 0.001513 **

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for poisson family taken to be 1)

 

    Null deviance: 626.798  on 48  degrees of freedom

Residual deviance:  30.568  on 35  degrees of freedom

AIC: 268.26

 

Number of Fisher Scoring iterations: 5

Notice that:

ˇ@

Just to check the robustness of the assignment of the scores, it is worth trying some different choices. Suppose that we choose scores for PID so that there is more of a distinction between Democrats and Independents as well as Independents and Republican. Our assignment of scores for PID below achieves this:

> apid <- c(1,2,5,6,7,10,11)
Another idea for educ might be that

Our assignment of scores for educ below achieves this:

> aedu <- c(1,1,1,2,2,3,3)

Let us use the new scores to fit a linear-by-linear association model:

> ormoda <- glm(Freq ~ PID + educ + I(apid[oPID]*aedu[oeduc]), partyed, family= poisson)
> summary(ormoda)

Call:

glm(formula = Freq ~ PID + educ + I(apid[oPID] * aedu[oeduc]),

    family = poisson, data = partyed)

 

Deviance Residuals:

    Min       1Q   Median       3Q      Max 

-1.8889  -0.4680  -0.1993   0.5842   1.8295 

 

Coefficients:

                            Estimate Std. Error z value Pr(>|z|)   

(Intercept)                  1.12405    0.28568   3.935 8.33e-05 ***

PIDweakDem                  -0.16556    0.10438  -1.586 0.112707   

PIDindDem                   -0.86101    0.14170  -6.076 1.23e-09 ***

PIDindind                   -1.99511    0.20329  -9.814  < 2e-16 ***

PIDindRep                   -1.12628    0.17126  -6.577 4.81e-11 ***

PIDweakRep                  -0.85357    0.21100  -4.045 5.22e-05 ***

PIDstrRep                   -0.76562    0.22831  -3.353 0.000798 ***

educHSdrop                   1.38629    0.31009   4.471 7.80e-06 ***

educHS                       2.94848    0.28453  10.363  < 2e-16 ***

educColl                     2.49768    0.29143   8.570  < 2e-16 ***

educCCdeg                    1.76638    0.30116   5.865 4.48e-09 ***

educBAdeg                    2.50821    0.30662   8.180 2.83e-16 ***

educMAdeg                    1.92745    0.31222   6.173 6.69e-10 ***

I(apid[oPID] * aedu[oeduc])  0.03084    0.00989   3.118 0.001820 **

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for poisson family taken to be 1)

 

    Null deviance: 626.798  on 48  degrees of freedom

Residual deviance:  30.929  on 35  degrees of freedom

AIC: 268.63

 

Number of Fisher Scoring iterations: 5

Notice that:

ˇ@

To investigate the remaining structure not explained by the linear-by-linear association term (as mentioned above), we might consider an ordinal-by-nominal model (i.e. column-effect model) where for the interaction terms we now treat educ as a nominal variable. The model is more complex than the linear-by-linear association model because the former assigns more interaction effects to characterize the association than the latter:

> cmod <- glm(Freq ~ PID + educ + educ:oPID, partyed, family= poisson)
> summary(cmod)

Call:

glm(formula = Freq ~ PID + educ + educ:oPID, family = poisson,

    data = partyed)

 

Deviance Residuals:

     Min        1Q    Median        3Q       Max 

-1.40202  -0.47936  -0.08035   0.32315   1.48657 

 

Coefficients: (1 not defined because of singularities)

                 Estimate Std. Error z value Pr(>|z|)   

(Intercept)      1.945679   0.471322   4.128 3.66e-05 ***

PIDweakDem      -0.075007   0.109326  -0.686 0.492661   

PIDindDem       -0.560399   0.140521  -3.988 6.66e-05 ***

PIDindind       -1.610437   0.210250  -7.660 1.86e-14 ***

PIDindRep       -0.660584   0.192522  -3.431 0.000601 ***

PIDweakRep      -0.178992   0.211862  -0.845 0.398193   

PIDstrRep       -0.013421   0.241404  -0.056 0.955663   

educHSdrop       1.061194   0.527977   2.010 0.044439 * 

educHS           2.161235   0.484489   4.461 8.16e-06 ***

educColl         1.650803   0.491805   3.357 0.000789 ***

educCCdeg        0.971275   0.513874   1.890 0.058744 . 

educBAdeg        1.722897   0.489151   3.522 0.000428 ***

educMAdeg        1.281529   0.501813   2.554 0.010655 * 

educMS:oPID     -0.312217   0.154051  -2.027 0.042692 * 

educHSdrop:oPID -0.194451   0.077228  -2.518 0.011806 * 

educHS:oPID     -0.055347   0.048196  -1.148 0.250810   

educColl:oPID    0.004460   0.050603   0.088 0.929760   

educCCdeg:oPID  -0.008699   0.060667  -0.143 0.885978   

educBAdeg:oPID   0.034554   0.048782   0.708 0.478740   

educMAdeg:oPID         NA         NA      NA       NA   

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for poisson family taken to be 1)

 

    Null deviance: 626.798  on 48  degrees of freedom

Residual deviance:  22.761  on 30  degrees of freedom

AIC: 270.46

 

Number of Fisher Scoring iterations: 5

Notice that:

The above finding suggests an alternative assignment of scores for educ (Q: how to interpret the scores?):

> aedu <- c(1,1,2,2,2,2,2)
> ormodb <- glm(Freq ~ PID + educ + I(oPID*aedu[oeduc]), partyed, family= poisson)
> summary(ormodb)

Call:

glm(formula = Freq ~ PID + educ + I(oPID * aedu[oeduc]), family = poisson,

    data = partyed)

 

Deviance Residuals:

     Min        1Q    Median        3Q       Max 

-1.57015  -0.51363  -0.07621   0.40611   1.62183 

 

Coefficients:

                      Estimate Std. Error z value Pr(>|z|)   

(Intercept)            1.26072    0.28643   4.402 1.07e-05 ***

PIDweakDem            -0.50311    0.15572  -3.231 0.001234 **

PIDindDem             -1.41529    0.26481  -5.344 9.07e-08 ***

PIDindind             -2.89086    0.39979  -7.231 4.79e-13 ***

PIDindRep             -2.36537    0.49601  -4.769 1.85e-06 ***

PIDweakRep            -2.30701    0.61308  -3.763 0.000168 ***

PIDstrRep             -2.56356    0.73506  -3.488 0.000487 ***

educHSdrop             1.38629    0.31008   4.471 7.79e-06 ***

educHS                 2.23864    0.33986   6.587 4.49e-11 ***

educColl               1.95632    0.34179   5.724 1.04e-08 ***

educCCdeg              1.22502    0.35012   3.499 0.000467 ***

educBAdeg              2.15016    0.34041   6.316 2.68e-10 ***

educMAdeg              1.56940    0.34546   4.543 5.55e-06 ***

I(oPID * aedu[oeduc])  0.20925    0.06246   3.350 0.000807 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for poisson family taken to be 1)

 

    Null deviance: 626.798  on 48  degrees of freedom

Residual deviance:  28.451  on 35  degrees of freedom

AIC: 266.15

 

Number of Fisher Scoring iterations: 4

We see that:

ˇ@