NTHU STAT 5230 Lab - Contingency Tables: Matched Pairs

Matched Pairs (Reading: Faraway (2006), section 4.3)

In Stuart (1955), data on the vision of a sample of women is presented. The left and right eye performance is graded into four categories. Let us read the data into R and take a look of it:

> eyegrade <- read.table("eyegrade.txt")
> eyegrade

y right left

1 1520 best best

2 266 best second

3 124 best third

4 66 best worst

5 234 second best

6 1512 second second

7 432 second third

8 78 second worst

9 117 third best

10 362 third second

11 1772 third third

12 205 third worst

13 36 worst best

14 82 worst second

15 179 worst third

16 492 worst worst

> (ct <- xtabs(y ~ right+left, eyegrade))

left

right best second third worst

best 1520 266 124 66

second 234 1512 432 78

third 117 362 1772 205

worst 36 82 179 492

In the table,

we see that the row and column variables have the same levels. This is because in matched pairs, we observe one measure on two matched objects.
we observe larger numbers on the diagonal than off the diagonal, which indicates many people do have symmetric vision
we can also see that the entries adjacent to the diagonal are larger than those further away

Let us check for independence:

> summary(ct)

Call: xtabs(formula = y ~ right + left, data = eyegrade)

Number of cases in table: 7477

Number of factors: 2

Test for independence of all factors:

Chisq = 8097, df = 9, p-value = 0

We are not surprised to find strong evidence of dependence in matched-pair data.

Let us check whether the matched-pair data has the symmetry property. We can fit such a model by first defining a factor where the levels represent the symmetric pairs for the off-diagonal elements. There is only one observation for each level down the diagonal while there are two observations for each level off the diagonal:

> (symfac <- factor(apply(eyegrade[,2:3],1,function(x) paste(sort(x),collapse="-"))))

[1] best-best best-second best-third best-worst

[5] best-second second-second second-third second-worst

[9] best-third second-third third-third third-worst

[13] best-worst second-worst third-worst worst-worst

10 Levels: best-best best-second best-third ... worst-worst

We can see that the new factor has I(I+1)/2=4(4+1)/2=10 levels. We now fit a model for the new factor:

> mods <- glm(y ~ symfac, eyegrade, family=poisson)
> summary(mods)

Call:

glm(formula = y ~ symfac, family = poisson, data = eyegrade)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.219e+00 -4.776e-01 -1.475e-07 4.700e-01 2.008e+00

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 7.326466 0.025649 285.638 < 2e-16 ***

symfacbest-second -1.805005 0.051555 -35.011 < 2e-16 ***

symfacbest-third -2.534816 0.069334 -36.559 < 2e-16 ***

symfacbest-worst -3.394640 0.102283 -33.189 < 2e-16 ***

symfacsecond-second -0.005277 0.036322 -0.145 0.884

symfacsecond-third -1.342529 0.043787 -30.660 < 2e-16 ***

symfacsecond-worst -2.944439 0.083114 -35.427 < 2e-16 ***

symfacthird-third 0.153399 0.034960 4.388 1.15e-05 ***

symfacthird-worst -2.068970 0.057114 -36.225 < 2e-16 ***

symfacworst-worst -1.127987 0.051869 -21.747 < 2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 8692.334 on 15 degrees of freedom

Residual deviance: 19.249 on 6 degrees of freedom

AIC: 156.63

Number of Fisher Scoring iterations: 4

> pchisq(deviance(mods),df.residual(mods),lower=F)

[1] 0.003762852

Here, we see evidence of a lack of symmetry because a deviance 19.249 on 6 degrees of freedom is too large. It is worth checking the residuals:

> round(xtabs(residuals(mods) ~ right+left, eyegrade),3)

left

right best second third worst

best 0.000 1.001 0.317 2.008

second -1.023 0.000 1.732 -0.225

third -0.320 -1.783 0.000 0.928

worst -2.219 0.223 -0.949 0.000

We see that:

the residuals down the diagonal are zero because there is only one observation for each level
the matrix of absolute residuals would be very close to a symmetric matrix (Q: can you see why it happens? Hint: there are two observations for each level off the diagonal)
the residuals above the diagonal are mostly positive, while they are mostly negative below
so, there are generally more poor left, good right eye combinations than good left, poor right eye combination
furthermore, let us compute the marginal totals:

> margin.table(ct,1)

right

best second third worst

1976 2256 2456 789

> margin.table(ct,2)

left

best second third worst

1907 2222 2507 841

We see that

there are somewhat more poor left eyes and good right eyes
so, perhaps marginal homogeneity does not hold here
notice that symmetry implies marginal homogeneity

Let us examine whether the data has the quasi-symmetry property. To do it, we can fit a model with main effect terms of the row and column variables and the new factor, symfac:

> modq <- glm(y ~ right+left+symfac, eyegrade, family=poisson)
> summary(modq)

Call:

glm(formula = y ~ right + left + symfac, family = poisson, data = eyegrade)

Deviance Residuals:

1 2 3 4 5 6 7 8

0.000e+00 1.612e-01 -8.394e-01 8.894e-01 -1.706e-01 -1.460e-07 6.325e-01 -1.128e+00

9 10 11 12 13 14 15 16

9.114e-01 -6.760e-01 -3.501e-07 2.409e-01 -1.093e+00 1.200e+00 -2.548e-01 -1.490e-07

Coefficients: (3 not defined because of singularities)

Estimate Std. Error z value Pr(>|z|)

(Intercept) 7.32647 0.02565 285.638 < 2e-16 ***

rightsecond -2.43955 0.09055 -26.942 < 2e-16 ***

rightthird -1.61523 0.06955 -23.223 < 2e-16 ***

rightworst -0.72288 0.05641 -12.816 < 2e-16 ***

leftsecond -2.33241 0.09149 -25.493 < 2e-16 ***

leftthird -1.39721 0.07012 -19.927 < 2e-16 ***

leftworst -0.40510 0.05641 -7.182 6.88e-13 ***

symfacbest-second 0.57954 0.09462 6.125 9.07e-10 ***

symfacbest-third -1.03453 0.08633 -11.983 < 2e-16 ***

symfacbest-worst -2.84322 0.10266 -27.696 < 2e-16 ***

symfacsecond-second 4.76669 0.16668 28.598 < 2e-16 ***

symfacsecond-third 2.54814 0.11038 23.085 < 2e-16 ***

symfacsecond-worst NA NA NA NA

symfacthird-third 3.16584 0.11415 27.734 < 2e-16 ***

symfacthird-worst NA NA NA NA

symfacworst-worst NA NA NA NA

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 8692.3336 on 15 degrees of freedom

Residual deviance: 7.2708 on 3 degrees of freedom

AIC: 150.65

Number of Fisher Scoring iterations: 4

We can see that:

some effects got "NA" in the output. it happened because the model is over-parameterized, which causes a singular X^TX (X is the model matrix). When X^TX is singular, R would automatically remove some effects to get a non-singular model matrix. (Q: the number of effects in the model, i.e., 16, equals the number of observations. Why is the model not saturated?)
let us check whether a deviance of 7.2708 on 3 degrees of freedom can pass the goodness-of-fit test:

> pchisq(deviance(modq),df.residual(modq),lower=F)

[1] 0.06375054

We see that this model does fit, which suggests that the table has the property of quasi-symmetry.

Because marginal homogeneity together with quasi-symmetry implies symmetry. One can test for marginal homogeneity by comparing the symmetry and quasi-symmetry models using the difference-in-deviance test:

> anova(mods,modq,test="Chi")

Analysis of Deviance Table

Model 1: y ~ symfac

Model 2: y ~ right + left + symfac

Resid. Df Resid. Dev Df Deviance P(>|Chi|)

1 6 19.2492

2 3 7.2708 3 11.9784 0.0075

We can see that:

the difference in deviances is 11.9784 on 3 degrees of freedom.
because the p-value is small, we find evidence of a lack of marginal homogeneity
note that the test is appropriate only when quasi-symmetry already holds

When we examine the table here, we do see that the numbers on the diagonal is much larger than those off the diagonal. A table with such a property is almost impossible to pass the test for independence. We might ask whether there is independence within off-diagonal cells, i.e., whether left and right eye performance are independent among those people whose vision is not symmetric. This is called quasi-independence hypothesis and we can test it by removing the data on the diagonal:

> modqi <- glm(y ~ right+left, eyegrade, family=poisson, subset=-c(1,6,11,16))
> summary(modqi)

Call:

glm(formula = y ~ right + left, family = poisson, data = eyegrade, subset = -c(1, 6, 11, 16))

Deviance Residuals:

2 3 4 5 7 8 9 10 12 13 14

4.7352 -5.2834 -0.1783 4.1642 0.5516 -6.2875 -3.5775 -1.4254 5.6736 -2.2159 -4.0338

4.8330

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 4.38841 0.07454 58.870 < 2e-16 ***

rightsecond 0.78176 0.06409 12.197 < 2e-16 ***

rightthird 0.68723 0.06470 10.622 < 2e-16 ***

rightworst -0.45698 0.07552 -6.051 1.44e-09 ***

leftsecond 0.88999 0.06789 13.110 < 2e-16 ***

leftthird 0.87160 0.06689 13.030 < 2e-16 ***

leftworst -0.17689 0.07481 -2.364 0.0181 *

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 900.99 on 11 degrees of freedom

Residual deviance: 199.11 on 5 degrees of freedom

AIC: 294.81

Number of Fisher Scoring iterations: 4

Notice that:

there are only 12 observations because the 4 observations on the diagonal have been removed
the deviance 199.11 on 5 degrees of freedom is clearly too large so that the model does not fit Þ the table is not quasi-independent
this is not surprising because in the table the entries adjacent to the diagonal are larger than those further away

. The difference in vision between the two eyes is likely to be smaller than expected under independence