Matched Case-Control Studies (Reading: Faraway (2006), section 2.12)

ˇ@

Le (1998) presented a data from a matched case-control study concerning the association between x-rays and childhood acute myeloid leukemia. The sets are matched on age, race, and county of residence. The subjects with the same ID value are in same matched set. We start by reading the data into R and take a look of it:

> amlxray <- read.table("amlxray.txt")
> amlxray

      ID disease Sex downs age Mray MupRay MlowRay Fray Cray CnRay

1   7004       1   F    no   0   no     no      no   no   no     1

2   7004       0   F    no   0   no     no      no   no   no     1

3   7006       1   M    no   6   no     no      no   no  yes     3

4   7006       0   M    no   6   no     no      no   no  yes     2

5   7009       1   F    no   8   no     no      no   no   no     1

6   7009       0   F    no   8   no     no      no   no   no     1

ˇKdeletedˇK

238 7211       0   F    no   6   no     no      no   no   no     1

Notice that:

ˇ@

    Let us first pay some attention to the data itself before fitting a model. In the first stage of data analysis, it is often required to perform some initial analyses to gain some basic information of data and evaluate the quality of data. The variable downs indicates whether the subject has Down's syndrome. Let us take a look of the data of the subjects with a Down's syndrome:
> amlxray[amlxray$downs=="yes",1:4]

      ID disease Sex downs

7   7010       1   M   yes

17  7018       1   F   yes

78  7066       1   F   yes

88  7077       1   M   yes

173 7146       1   F   yes

196 7176       1   F   yes

210 7189       1   F   yes

We find that:

> (ii <- which(amlxray$downs=="yes"))

[1]   7  17  78  88 173 196 210

> ramlxray <- amlxray[-c(ii,ii+1),]
ˇ@

For the other risk factors,

> ramlxray$CnRay <- as.ordered(ramlxray$CnRay)

> contrasts(ramlxray$CnRay)

          .L   .Q         .C

1 -0.6708204  0.5 -0.2236068

2 -0.2236068 -0.5  0.6708204

3  0.2236068 -0.5 -0.6708204

4  0.6708204  0.5  0.2236068

Notice that:

ˇ@

    Now, let us fit a model to the data. Since the conditional likelihood of a logit model (see lecture notes p.3-37) for matched case-control data takes the same form as that used for the proportional hazards model (PHM) in survival analysis, we can use software developed for PHM to fit the logit model. In R, the clogit function in the survival library can do the work. Note that the matched sets must be designated by the strata function:

> library(survival)
> cmod <- clogit(disease ~ Sex+Mray+Fray+CnRay+strata(ID),ramlxray)
> summary(cmod)

Call:

coxph(formula = Surv(rep(1, 224), disease) ~ Sex + Mray + Fray +

    CnRay + strata(ID), data = ramlxray, method = "exact")

 

  n= 224

          coef exp(coef) se(coef)      z      p

SexM     0.156      1.17    0.386  0.405 0.6900

Mrayyes  0.228      1.26    0.582  0.391 0.7000

Frayyes  0.693      2.00    0.351  1.974 0.0480

CnRay.L  1.941      6.96    0.621  3.127 0.0018

CnRay.Q -0.248      0.78    0.582 -0.426 0.6700

CnRay.C -0.580      0.56    0.591 -0.982 0.3300

 

        exp(coef) exp(-coef) lower .95 upper .95

SexM         1.17      0.855     0.549      2.49

Mrayyes      1.26      0.796     0.401      3.93

Frayyes      2.00      0.500     1.005      3.98

CnRay.L      6.96      0.144     2.063     23.51

CnRay.Q      0.78      1.281     0.249      2.44

CnRay.C      0.56      1.786     0.176      1.78

 

Rsquare= 0.089   (max possible= 0.499 )

Likelihood ratio test= 20.9  on 6 df,   p=0.00192

Wald test            = 14.5  on 6 df,   p=0.0246

Score (logrank) test = 18.6  on 6 df,   p=0.0049

Notice that:

ˇ@

    Let us recursively eliminate predictors/effects which are not significant to simplify the model (intermediate steps skipped). Since we have found only a significant linear effect for CnRay, we can convert levels 1-4 in CnRay to the numerical values 1-4, respectively, using the unclass function (note: the former and the latter 1-4 have different meanings). The final model is:
> cmodr <- clogit(disease ~ Fray+unclass(CnRay)+strata(ID),ramlxray)
> summary(cmodr)

Call:

coxph(formula = Surv(rep(1, 224), disease) ~ Fray + unclass(CnRay) + strata(ID), data = ramlxray, method = "exact")

 

  n= 224

                coef exp(coef) se(coef)    z       p

Frayyes        0.670      1.96    0.344 1.95 0.05100

unclass(CnRay) 0.814      2.26    0.237 3.44 0.00058

 

               exp(coef) exp(-coef) lower .95 upper .95

Frayyes             1.96      0.512     0.996      3.84

unclass(CnRay)      2.26      0.443     1.419      3.59

 

Rsquare= 0.084   (max possible= 0.499 )

Likelihood ratio test= 19.6  on 2 df,   p=5.69e-05

Wald test            = 14.1  on 2 df,   p=0.000859

Score (logrank) test = 17.6  on 2 df,   p=0.000154

Notice that:

ˇ@

    An incorrect analysis of this data ignores the matching structure and simply uses a binomial GLM:

> gmod <- glm(disease ~ Fray+unclass(CnRay),family=binomial,ramlxray)
> summary(gmod)

Call:

glm(formula = disease ~ Fray + unclass(CnRay), family = binomial, data = ramlxray)

 

Deviance Residuals:

   Min      1Q  Median      3Q     Max 

-1.950  -0.950  -0.950   1.204   1.423 

 

Coefficients:

               Estimate Std. Error z value Pr(>|z|)   

(Intercept)     -1.1623     0.3011  -3.861 0.000113 ***

Frayyes          0.5004     0.3078   1.626 0.104046   

unclass(CnRay)   0.6005     0.1774   3.385 0.000711 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

(Dispersion parameter for binomial family taken to be 1)

 

    Null deviance: 309.39  on 223  degrees of freedom

Residual deviance: 293.26  on 221  degrees of freedom

AIC: 299.26

 

Number of Fisher Scoring iterations: 4

We can see that the results are somewhat different.