Matched Pairs (Reading: Faraway (2006), section 4.3)

¡@

In Stuart (1955), data on the vision of a sample of women is presented. The left and right eye performance is graded into four categories. Let us read the data into R and take a look of it:

y  right   left

1  1520   best   best

2   266   best second

3   124   best  third

4    66   best  worst

5   234 second   best

6  1512 second second

7   432 second  third

8    78 second  worst

9   117  third   best

10  362  third second

11 1772  third  third

12  205  third  worst

13   36  worst   best

14   82  worst second

15  179  worst  third

16  492  worst  worst

> (ct <- xtabs(y ~ right+left, eyegrade))

left

right    best second third worst

best   1520    266   124    66

second  234   1512   432    78

third   117    362  1772   205

worst    36     82   179   492

In the table,

• we see that the row and column variables have the same levels. This is because in matched pairs, we observe one measure on two matched objects.

• we observe larger numbers on the diagonal than off the diagonal, which indicates many people do have symmetric vision

• we can also see that the entries adjacent to the diagonal are larger than those further away

¡@

Let us check for independence:

> summary(ct)

Call: xtabs(formula = y ~ right + left, data = eyegrade)

Number of cases in table: 7477

Number of factors: 2

Test for independence of all factors:

Chisq = 8097, df = 9, p-value = 0

We are not surprised to find strong evidence of dependence in matched-pair data.

¡@

Let us check whether the matched-pair data has the symmetry property. We can fit such a model by first defining a factor where the levels represent the symmetric pairs for the off-diagonal elements. There is only one observation for each level down the diagonal while there are two observations for each level off the diagonal:

[1] best-best     best-second   best-third    best-worst

[5] best-second   second-second second-third  second-worst

[9] best-third    second-third  third-third   third-worst

[13] best-worst    second-worst  third-worst   worst-worst

10 Levels: best-best best-second best-third ... worst-worst

We can see that the new factor has I(I+1)/2=4(4+1)/2=10 levels. We now fit a model for the new factor:

> mods <- glm(y ~ symfac, eyegrade, family=poisson)
> summary(mods)

Call:

glm(formula = y ~ symfac, family = poisson, data = eyegrade)

Deviance Residuals:

Min          1Q      Median          3Q         Max

-2.219e+00  -4.776e-01  -1.475e-07   4.700e-01   2.008e+00

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)          7.326466   0.025649 285.638  < 2e-16 ***

symfacbest-second   -1.805005   0.051555 -35.011  < 2e-16 ***

symfacbest-third    -2.534816   0.069334 -36.559  < 2e-16 ***

symfacbest-worst    -3.394640   0.102283 -33.189  < 2e-16 ***

symfacsecond-second -0.005277   0.036322  -0.145    0.884

symfacsecond-third  -1.342529   0.043787 -30.660  < 2e-16 ***

symfacsecond-worst  -2.944439   0.083114 -35.427  < 2e-16 ***

symfacthird-third    0.153399   0.034960   4.388 1.15e-05 ***

symfacthird-worst   -2.068970   0.057114 -36.225  < 2e-16 ***

symfacworst-worst   -1.127987   0.051869 -21.747  < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 8692.334  on 15  degrees of freedom

Residual deviance:   19.249  on  6  degrees of freedom

AIC: 156.63

Number of Fisher Scoring iterations: 4

> pchisq(deviance(mods),df.residual(mods),lower=F)

[1] 0.003762852

Here, we see evidence of a lack of symmetry because a deviance 19.249 on 6 degrees of freedom is too large. It is worth checking the residuals:

left

right      best second  third  worst

best    0.000  1.001  0.317  2.008

second -1.023  0.000  1.732 -0.225

third  -0.320 -1.783  0.000  0.928

worst  -2.219  0.223 -0.949  0.000

We see that:

• the residuals down the diagonal are zero because there is only one observation for each level

• the matrix of absolute residuals would be very close to a symmetric matrix (Q: can you see why it happens? Hint: there are two observations for each level off the diagonal)

• the residuals above the diagonal are mostly positive, while they are mostly negative below

• so, there are generally more poor left, good right eye combinations than good left, poor right eye combination

• furthermore, let us compute the marginal totals:

• > margin.table(ct,1)

right

best second  third  worst

1976   2256   2456    789

> margin.table(ct,2)

left

best second  third  worst

1907   2222   2507    841

We see that

• there are somewhat more poor left eyes and good right eyes

• so, perhaps marginal homogeneity does not hold here

• notice that symmetry implies marginal homogeneity

¡@

Let us examine whether the data has the quasi-symmetry property. To do it, we can fit a model with main effect terms of the row and column variables and the new factor, symfac:

> modq <- glm(y ~ right+left+symfac, eyegrade, family=poisson)
> summary(modq)

Call:

glm(formula = y ~ right + left + symfac, family = poisson, data = eyegrade)

Deviance Residuals:

1           2           3           4           5           6           7           8

0.000e+00   1.612e-01  -8.394e-01   8.894e-01  -1.706e-01  -1.460e-07   6.325e-01  -1.128e+00

9          10          11          12          13          14          15          16

9.114e-01  -6.760e-01  -3.501e-07   2.409e-01  -1.093e+00   1.200e+00  -2.548e-01  -1.490e-07

Coefficients: (3 not defined because of singularities)

Estimate Std. Error z value Pr(>|z|)

(Intercept)          7.32647    0.02565 285.638  < 2e-16 ***

rightsecond         -2.43955    0.09055 -26.942  < 2e-16 ***

rightthird          -1.61523    0.06955 -23.223  < 2e-16 ***

rightworst          -0.72288    0.05641 -12.816  < 2e-16 ***

leftsecond          -2.33241    0.09149 -25.493  < 2e-16 ***

leftthird           -1.39721    0.07012 -19.927  < 2e-16 ***

leftworst           -0.40510    0.05641  -7.182 6.88e-13 ***

symfacbest-second    0.57954    0.09462   6.125 9.07e-10 ***

symfacbest-third    -1.03453    0.08633 -11.983  < 2e-16 ***

symfacbest-worst    -2.84322    0.10266 -27.696  < 2e-16 ***

symfacsecond-second  4.76669    0.16668  28.598  < 2e-16 ***

symfacsecond-third   2.54814    0.11038  23.085  < 2e-16 ***

symfacsecond-worst        NA         NA      NA       NA

symfacthird-third    3.16584    0.11415  27.734  < 2e-16 ***

symfacthird-worst         NA         NA      NA       NA

symfacworst-worst         NA         NA      NA       NA

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 8692.3336  on 15  degrees of freedom

Residual deviance:    7.2708  on  3  degrees of freedom

AIC: 150.65

Number of Fisher Scoring iterations: 4

We can see that:

• some effects got "NA" in the output. it happened because the model is over-parameterized, which causes a singular XTX (X is the model matrix). When XTX is singular, R would automatically remove some effects to get a non-singular model matrix. (Q: the number of effects in the model, i.e., 16, equals the number of observations. Why is the model not saturated?)

• let us check whether a deviance of 7.2708 on 3 degrees of freedom can pass the goodness-of-fit test:

• > pchisq(deviance(modq),df.residual(modq),lower=F)

[1] 0.06375054

We see that this model does fit, which suggests that the table has the property of quasi-symmetry.

¡@

Because marginal homogeneity together with quasi-symmetry implies symmetry. One can test for marginal homogeneity by comparing the symmetry and quasi-symmetry models using the difference-in-deviance test:

> anova(mods,modq,test="Chi")

Analysis of Deviance Table

Model 1: y ~ symfac

Model 2: y ~ right + left + symfac

Resid. Df Resid. Dev Df Deviance P(>|Chi|)

1         6    19.2492

2         3     7.2708  3  11.9784    0.0075

We can see that:

• the difference in deviances is 11.9784 on 3 degrees of freedom.

• because the p-value is small, we find evidence of a lack of marginal homogeneity

• note that the test is appropriate only when quasi-symmetry already holds

¡@

When we examine the table here, we do see that the numbers on the diagonal is much larger than those off the diagonal. A table with such a property is almost impossible to pass the test for independence. We might ask whether there is independence within off-diagonal cells, i.e., whether left and right eye performance are independent among those people whose vision is not symmetric. This is called quasi-independence hypothesis and we can test it by removing the data on the diagonal:

> modqi <- glm(y ~ right+left, eyegrade, family=poisson, subset=-c(1,6,11,16))
> summary(modqi)

Call:

glm(formula = y ~ right + left, family = poisson, data = eyegrade, subset = -c(1, 6, 11, 16))

Deviance Residuals:

2        3        4        5        7        8        9       10       12       13       14

4.7352  -5.2834  -0.1783   4.1642   0.5516  -6.2875  -3.5775  -1.4254   5.6736  -2.2159  -4.0338

15

4.8330

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)  4.38841    0.07454  58.870  < 2e-16 ***

rightsecond  0.78176    0.06409  12.197  < 2e-16 ***

rightthird   0.68723    0.06470  10.622  < 2e-16 ***

rightworst  -0.45698    0.07552  -6.051 1.44e-09 ***

leftsecond   0.88999    0.06789  13.110  < 2e-16 ***

leftthird    0.87160    0.06689  13.030  < 2e-16 ***

leftworst   -0.17689    0.07481  -2.364   0.0181 *

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 900.99  on 11  degrees of freedom

Residual deviance: 199.11  on  5  degrees of freedom

AIC: 294.81

Number of Fisher Scoring iterations: 4

Notice that:

• there are only 12 observations because the 4 observations on the diagonal have been removed

• the deviance 199.11 on 5 degrees of freedom is clearly too large so that the model does not fit Þ the table is not quasi-independent

• this is not surprising because in the table the entries adjacent to the diagonal are larger than those further away

• . The difference in vision between the two eyes is likely to be smaller than expected under independence