Ordinal Variables (Reading: Faraway (2006), section 4.5)
ˇ@
Consider a data from the 1996 American National Election Study (Rosenstone et al., 1997). Let us first read the data into R and take a look:
> nes96 <- read.table("nes96.txt")
> nes96
popul TVnews selfLR ClinLR DoleLR PID age educ income vote
1 0 7 extCon extLib Con strRep 36 HS $3Kminus Dole
2 190 1 sliLib sliLib sliCon weakDem 20 Coll $3Kminus Clinton
3 31 7 Lib Lib Con weakDem 24 BAdeg $3Kminus Clinton
...deleted...
944 18 7 Mod Lib Con indind 61 MAdeg $105Kplus Dole
ˇ@
Consider just the data on party affiliation (PID) and level of education (educ). The default of R sorts the levels in alphabetical order:
> levels(nes96$PID)
[1] "indDem" "indind" "indRep" "strDem" "strRep" "weakDem" "weakRep"
> levels(nes96$educ)
[1] "BAdeg" "CCdeg" "Coll" "HS" "HSdrop" "MAdeg" "MS"
Because both variables are ordinal in this example, we need to arrange the levels according to their natural order:
> nes96$PID <- factor(nes96$PID, levels=c("strDem", "weakDem", "indDem", "indind", "indRep", "weakRep", "strRep"))
> levels(nes96$PID)
[1] "strDem" "weakDem" "indDem" "indind" "indRep" "weakRep" "strRep"
> nes96$educ <- factor(nes96$educ, levels=c("MS", "HSdrop", "HS", "Coll", "CCdeg", "BAdeg", "MAdeg"))
> levels(nes96$educ)
[1] "MS" "HSdrop" "HS" "Coll" "CCdeg" "BAdeg" "MAdeg"
Now, the orders of the levels are correctly reflected.
ˇ@
W can construct a two-way table for the two variables:
> xtabs( ~ PID + educ, nes96)
educ
PID MS HSdrop HS Coll CCdeg BAdeg MAdeg
strDem 5 19 59 38 17 40 22
weakDem 4 10 49 36 17 41 23
indDem 1 4 28 15 13 27 20
indind 0 3 12 9 3 6 4
indRep 2 7 23 16 8 22 16
weakRep 0 5 35 40 15 38 17
strRep 1 4 42 33 17 53 25
(Q: what information have you found from the contingency table?) But, we need to convert the table to a dataframe of R with one count per line to enable model fitting:
> (partyed <- as.data.frame.table(xtabs( ~ PID + educ, nes96)))
PID educ Freq
1 strDem MS 5
2 weakDem MS 4
3 indDem MS 1
ˇKdeletedˇK
49 strRep MAdeg 25
ˇ@
If we treat PID and educ as nominal variables and fit a nominal-by-nominal Poisson model containing only main effect terms:
> nomod <- glm(Freq ~ PID + educ, partyed, family= poisson)
> summary(nomod)
Call:
glm(formula = Freq ~ PID + educ, family = poisson, data = partyed)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0598 -0.6225 -0.1276 0.5956 2.1781
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0131 0.2844 3.563 0.000367 ***
PIDweakDem -0.1054 0.1027 -1.026 0.305125
PIDindDem -0.6162 0.1194 -5.160 2.47e-07 ***
PIDindind -1.6874 0.1790 -9.429 < 2e-16 ***
PIDindRep -0.7550 0.1251 -6.038 1.56e-09 ***
PIDweakRep -0.2877 0.1080 -2.663 0.007735 **
PIDstrRep -0.1335 0.1035 -1.290 0.197038
educHSdrop 1.3863 0.3101 4.471 7.80e-06 ***
educHS 2.9485 0.2845 10.363 < 2e-16 ***
educColl 2.6662 0.2868 9.295 < 2e-16 ***
educCCdeg 1.9349 0.2967 6.521 6.98e-11 ***
educBAdeg 2.8600 0.2852 10.029 < 2e-16 ***
educMAdeg 2.2792 0.2912 7.827 4.99e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 626.798 on 48 degrees of freedom
Residual deviance: 40.743 on 36 degrees of freedom
AIC: 276.44
Number of Fisher Scoring iterations: 5
We see that the model has a deviance of 40.742 on 36 degrees of freedom, which gives a p-value:
> pchisq(deviance(nomod),df.residual(nomod),lower=F)
[1] 0.2696086
Therefore, the Poisson GLM with only main effect terms is a good fit and there is no evidence against independence.
ˇ@
However, we can take advantage of the ordinal structure of both variables and define some scores. As there seems to be no strong reason to the contrary, we assign evenly spaced scores, i.e., 1 to 7, for the levels of both PID and educ:
> partyed$oPID <-
unclass(partyed$PID)
> partyed$oeduc <- unclass(partyed$educ)
Now, let us fit the linear-by-linear association model:
>
ormod <-
glm(Freq ~ PID + educ + I(oPID*oeduc), partyed, family= poisson)
>
summary(ormod)
Call:
glm(formula = Freq ~ PID + educ + I(oPID * oeduc), family = poisson,
data = partyed)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.79607 -0.49952 -0.09936 0.58966 1.99271
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.245874 0.290541 4.288 1.80e-05 ***
PIDweakDem -0.231690 0.109657 -2.113 0.034613 *
PIDindDem -0.870935 0.142609 -6.107 1.01e-09 ***
PIDindind -2.072665 0.214863 -9.646 < 2e-16 ***
PIDindRep -1.272907 0.203997 -6.240 4.38e-10 ***
PIDweakRep -0.940284 0.231496 -4.062 4.87e-05 ***
PIDstrRep -0.922944 0.270247 -3.415 0.000637 ***
educHSdrop 1.288644 0.311257 4.140 3.47e-05 ***
educHS 2.749103 0.290065 9.478 < 2e-16 ***
educColl 2.360892 0.300152 7.866 3.67e-15 ***
educCCdeg 1.519473 0.321228 4.730 2.24e-06 ***
educBAdeg 2.330228 0.327217 7.121 1.07e-12 ***
educMAdeg 1.630806 0.353532 4.613 3.97e-06 ***
I(oPID * oeduc) 0.028745 0.009062 3.172 0.001513 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 626.798 on 48 degrees of freedom
Residual deviance: 30.568 on 35 degrees of freedom
AIC: 268.26
Number of Fisher Scoring iterations: 5
Notice that:
the model only use one degrees of freedom (i.e. one interaction effect) to characterize the association between party affiliation and education
if the linear-by-linear interaction effect (I(oPID * oeduc)in this case) is insignificant, the model is reduced to the independence model
we can use the difference-in-deviance test to compare this model to the independence model:
> anova(nomod,ormod,test="Chi")
Analysis of Deviance Table
Model 1: Freq ~ PID + educ
Model 2: Freq ~ PID + educ + I(oPID * oeduc)
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 36 40.743
2 35 30.568 1 10.175 0.001
We see that:
the significance of the test suggests that there is some evidence of an association, which is a different conclusion from what obtained under the nominal-by-nominal approach
Q: what causes the difference? Note that the 36 interaction effects in the nominal-by-nominal approach reduce a deviance of 40.743. Under the ordinal-by-ordinal approach, the deviance can be further split into two parts: a deviance of 10.175 on one degree of freedom for the linear-by-linear association effect (which is of course significant) and a deviance of 30.568 for the rest 35 interaction effects. When we only look at the overall deviance (i.e., 40.743), the information of the fewer significant effects (such as I(oPID * oeduc) in this case) could be blocked by the the insignificance of other interaction effects.
the use of the ordinal information gives us more power to detect an association
the estimate of the parameter for the linear-by-linear interaction (denoted by g) is 0.028745, with a p-value 0.001513.
the p-value here can also be used to test the significance of the association although, as a Wald test, it is less reliable than the difference-in-deviance test given above.
here is a qualitative conclusion for the estimate of g : Because it is positive, it means that, given the way that we have assigned the scores, a higher level of education is associated with a greater probability of tending to the Republican end of the spectrum
here is a quantitative conclusion for the estimate of g: Note that the estimate of association parameter g can be interpreted in terms of log-odds. For our example, where the scores are spaced one apart, the fitted log-odds ratio is g (not sample log-odds ratios). To illustrate this point, consider the fitted values under this model:
> round(ormodfit <- xtabs(predict(ormod,type="response") ~ PID + educ, partyed),2)
educ
PID MS HSdrop HS Coll CCdeg BAdeg MAdeg
strDem 3.58 13.36 59.22 41.34 18.34 42.46 21.71
weakDem 2.92 11.22 51.20 36.78 16.80 40.02 21.06
indDem 1.59 6.27 29.45 21.78 10.23 25.09 13.59
indind 0.49 2.00 9.65 7.34 3.55 8.96 5.00
indRep 1.12 4.71 23.41 18.33 9.13 23.70 13.60
weakRep 1.61 6.95 35.59 28.68 14.69 39.28 23.19
strRep 1.69 7.49 39.48 32.74 17.26 47.49 28.85
Now, compute the log-odds ratio for, say, the right bottom two-by-two table:
> log(ormodfit[6,6]*ormodfit[7,7]/(ormodfit[6,7]*ormodfit[7,6]))
[1] 0.02874461
We see this is, after rounding, equal to the estimate of g. (exercise: for other adjacent 2-by-2 tables, check whether they are still equal)
it is always worth examining the residuals to check if there is more structure than the linear-by-linear association model suggests. We use the raw response residuals (the unscaled difference between observed and expected) because we would like to see effects which are large in an absolute sense:
> round(xtabs(residuals(ormod,type="response") ~ PID + educ, partyed),2)
educ
PID MS HSdrop HS Coll CCdeg BAdeg MAdeg
strDem 1.42 5.64 -0.22 -3.34 -1.34 -2.46 0.29
weakDem 1.08 -1.22 -2.20 -0.78 0.20 0.98 1.94
indDem -0.59 -2.27 -1.45 -6.78 2.77 1.91 6.41
indind -0.49 1.00 2.35 1.66 -0.55 -2.96 -1.00
indRep 0.88 2.29 -0.41 -2.33 -1.13 -1.70 2.40
weakRep -1.61 -1.95 -0.59 11.32 0.31 -1.28 -6.19
strRep -0.69 -3.49 2.52 0.26 -0.26 5.51 -3.85
We do see some indications of remaining structure, for example:
many more weak Republicans with some college than expected (with a positive residual 11.32)
fewer Republicans with master's degree or higher (with negative residuals -6.19 and -3.85)
There may not be only a monotone relationship between party affiliation and education. We will investigate it later by considering an ordinal-by-nominal model.
ˇ@
Just to check the robustness of the assignment of the scores, it is worth trying some different choices. Suppose that we choose scores for PID so that there is more of a distinction between Democrats and Independents as well as Independents and Republican. Our assignment of scores for PID below achieves this:
>
apid <- c(1,2,5,6,7,10,11)
Another idea
for educ might be that
people who complete high school or less are not different;
those who go to college, but do not get a BA degree are not different;
those who get a BA or higher are not different.
Our assignment of scores for educ below achieves this:
> aedu <- c(1,1,1,2,2,3,3)
Let us use the new scores to fit a linear-by-linear association model:
> ormoda <- glm(Freq
~ PID + educ + I(apid[oPID]*aedu[oeduc]), partyed, family= poisson)
>
summary(ormoda)
Call:
glm(formula = Freq ~ PID + educ + I(apid[oPID] * aedu[oeduc]),
family = poisson, data = partyed)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8889 -0.4680 -0.1993 0.5842 1.8295
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.12405 0.28568 3.935 8.33e-05 ***
PIDweakDem -0.16556 0.10438 -1.586 0.112707
PIDindDem -0.86101 0.14170 -6.076 1.23e-09 ***
PIDindind -1.99511 0.20329 -9.814 < 2e-16 ***
PIDindRep -1.12628 0.17126 -6.577 4.81e-11 ***
PIDweakRep -0.85357 0.21100 -4.045 5.22e-05 ***
PIDstrRep -0.76562 0.22831 -3.353 0.000798 ***
educHSdrop 1.38629 0.31009 4.471 7.80e-06 ***
educHS 2.94848 0.28453 10.363 < 2e-16 ***
educColl 2.49768 0.29143 8.570 < 2e-16 ***
educCCdeg 1.76638 0.30116 5.865 4.48e-09 ***
educBAdeg 2.50821 0.30662 8.180 2.83e-16 ***
educMAdeg 1.92745 0.31222 6.173 6.69e-10 ***
I(apid[oPID] * aedu[oeduc]) 0.03084 0.00989 3.118 0.001820 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 626.798 on 48 degrees of freedom
Residual deviance: 30.929 on 35 degrees of freedom
AIC: 268.63
Number of Fisher Scoring iterations: 5
Notice that:
the numerical outcomes are slightly different from what obtained under the evenly-spaced scores
but, the qualitative conclusions are quite similar, for example,
the difference-in-deviance test is still significant:
> anova(nomod,ormoda,test="Chi")
Analysis of Deviance Table
Model 1: Freq ~ PID + educ
Model 2: Freq ~ PID + educ + I(apid[oPID] * aedu[oeduc])
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 36 40.743
2 35 30.929 1 9.814 0.002
and, the sign of the linear-by-linear association term is still positive
some experimentation with other plausible choices of scores (exercise: what other scores would you want to try? check whether the qualitative conclusions are still similar under your scores) indicates that we can fairly confident about the association here
ˇ@
To investigate the remaining structure not explained by the linear-by-linear association term (as mentioned above), we might consider an ordinal-by-nominal model (i.e. column-effect model) where for the interaction terms we now treat educ as a nominal variable. The model is more complex than the linear-by-linear association model because the former assigns more interaction effects to characterize the association than the latter:
> cmod
<- glm(Freq ~ PID + educ + educ:oPID, partyed, family= poisson)
>
summary(cmod)
Call:
glm(formula = Freq ~ PID + educ + educ:oPID, family = poisson,
data = partyed)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.40202 -0.47936 -0.08035 0.32315 1.48657
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.945679 0.471322 4.128 3.66e-05 ***
PIDweakDem -0.075007 0.109326 -0.686 0.492661
PIDindDem -0.560399 0.140521 -3.988 6.66e-05 ***
PIDindind -1.610437 0.210250 -7.660 1.86e-14 ***
PIDindRep -0.660584 0.192522 -3.431 0.000601 ***
PIDweakRep -0.178992 0.211862 -0.845 0.398193
PIDstrRep -0.013421 0.241404 -0.056 0.955663
educHSdrop 1.061194 0.527977 2.010 0.044439 *
educHS 2.161235 0.484489 4.461 8.16e-06 ***
educColl 1.650803 0.491805 3.357 0.000789 ***
educCCdeg 0.971275 0.513874 1.890 0.058744 .
educBAdeg 1.722897 0.489151 3.522 0.000428 ***
educMAdeg 1.281529 0.501813 2.554 0.010655 *
educMS:oPID -0.312217 0.154051 -2.027 0.042692 *
educHSdrop:oPID -0.194451 0.077228 -2.518 0.011806 *
educHS:oPID -0.055347 0.048196 -1.148 0.250810
educColl:oPID 0.004460 0.050603 0.088 0.929760
educCCdeg:oPID -0.008699 0.060667 -0.143 0.885978
educBAdeg:oPID 0.034554 0.048782 0.708 0.478740
educMAdeg:oPID NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 626.798 on 48 degrees of freedom
Residual deviance: 22.761 on 30 degrees of freedom
AIC: 270.46
Number of Fisher Scoring iterations: 5
Notice that:
the last coefficient, educMAdeg:oPID, is not identifiable because the interaction terms are over-parameterized (educ has 7 levels so that 6 interaction effects are enough). In this circumstance, R automatically removes some unidentifiable effects to make the model identifiable; in other words, the coefficients of these unidentifiable effects are set as zero. Because educMAdeg:oPID is removed, we can regard the level MAdeg as a reference level in the interpretation of the interaction terms. For example, the difference between the "slopes" of oPID within each of the two educational groups MS and MAdeg is about -0.312217.
now, examine the fitted coefficients, let us look at just the interaction terms as the main effects have no particular interest. We see that:
if the linear-by-linear association model using evenly-spaced scores were a good fit (i.e., if there was really a monotone trend in the effect of educational level on party affiliation), we would expect the observed column-effect coefficients to be roughly evenly spaced (i.e., these coefficients have a linear trend). However, we can see that they are not.
when we look at these coefficients we observe that for high school and above, the coefficients are not significantly different from zero (i.e., for the five educational groups HS, Coll, CCdeq, BAdeq, and MAdeq, the "slopes" of oPID are quite similar) while for the lowest two categories, these is so difference (i.e., the "slopes" of oPID in the two groups MS and HSdrop are quite different from that in the other five groups).
we can compare this ordinal-by-nominal model to the independence model (nomod):
> anova(nomod,cmod,test="Chi")
Analysis of Deviance Table
Model 1: Freq ~ PID + educ
Model 2: Freq ~ PID + educ + educ:oPID
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 36 40.743
2 30 22.761 6 17.982 0.006
We find that the column-effects model is preferred.
if we compare the column-effects model to the linear-by-linear association model using evenly-spaced scores (ormod)
> anova(ormod,cmod,test="Chi")
Analysis of Deviance Table
Model 1: Freq ~ PID + educ + I(oPID * oeduc)
Model 2: Freq ~ PID + educ + educ:oPID
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 35 30.5683
2 30 22.7615 5 7.8068 0.1672
We see that the simpler linear-by-linear association is preferred to the more complex column-effects model.
The above finding suggests an alternative assignment of scores for educ (Q: how to interpret the scores?):
> aedu <- c(1,1,2,2,2,2,2)
> ormodb <-
glm(Freq ~ PID + educ + I(oPID*aedu[oeduc]), partyed, family= poisson)
>
summary(ormodb)
Call:
glm(formula = Freq ~ PID + educ + I(oPID * aedu[oeduc]), family = poisson,
data = partyed)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.57015 -0.51363 -0.07621 0.40611 1.62183
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.26072 0.28643 4.402 1.07e-05 ***
PIDweakDem -0.50311 0.15572 -3.231 0.001234 **
PIDindDem -1.41529 0.26481 -5.344 9.07e-08 ***
PIDindind -2.89086 0.39979 -7.231 4.79e-13 ***
PIDindRep -2.36537 0.49601 -4.769 1.85e-06 ***
PIDweakRep -2.30701 0.61308 -3.763 0.000168 ***
PIDstrRep -2.56356 0.73506 -3.488 0.000487 ***
educHSdrop 1.38629 0.31008 4.471 7.79e-06 ***
educHS 2.23864 0.33986 6.587 4.49e-11 ***
educColl 1.95632 0.34179 5.724 1.04e-08 ***
educCCdeg 1.22502 0.35012 3.499 0.000467 ***
educBAdeg 2.15016 0.34041 6.316 2.68e-10 ***
educMAdeg 1.56940 0.34546 4.543 5.55e-06 ***
I(oPID * aedu[oeduc]) 0.20925 0.06246 3.350 0.000807 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 626.798 on 48 degrees of freedom
Residual deviance: 28.451 on 35 degrees of freedom
AIC: 266.15
Number of Fisher Scoring iterations: 4
We see that:
the deviance of this model is even lower than the linear-by-linear association model using evenly-spaced scores:
> deviance(ormod)
[1] 30.56833
and this model also offers similar qualitative conclusions as the previous linear-by-linear association models.
the model gives credence to the view that whether a person finished high school or not is the determining factor in party affiliation. However, since we used what we found from the data itself to assign the scores and came up with this hypothesis, we would be temping fate to then use the data again to test this hypothesis.
ˇ@