Correspondence Analysis (Reading: Faraway (2006), section 4.2)


When the test of independence for 2-way contingency table rejects the null, we know that the two variables are dependent. It might be nice to know where the dependence is coming from, i.e., how the two variables are associated. To study this, we can use a kind of residual analysis for 2-way contingency tables called correspondence analysis. It performs a singular value decomposition on a table formed by the Pearson residuals, which is obtained from a model corresponding to independence. Let us demonstrate by using the hair and eye color data discussed in the previous lab:

> haireye <- read.table("haireye.txt")

Now, let us fit a Poisson GLM containing only main effect terms (corresponding to independence) and extract its residuals:

> modc <- glm(y ~ hair + eye, family=poisson, haireye)

> z <- xtabs(residuals(modc,type="pearson")~hair+eye,haireye)
> svdz <- svd(z,2,2)
> leftsv <- svdz$u %*% diag(sqrt(svdz$d[1:2]))
> rightsv <- svdz$v %*% diag(sqrt(svdz$d[1:2]))
> ll <- 1.1*max(abs(rightsv), abs(leftsv))
# because the distance between points in the correspondence plot is of interest, it is important the plot is scaled so that the visual distance is proportionately correct. Therefore, we will specify in next command the ranges of x-axis and y-axis to be the value of ll
> plot(rbind(leftsv,rightsv),asp=1,xlim=c(-ll,ll),ylim=c(-ll,ll),xlab="SV1",ylab="SV2",type="n")
> abline(h=0,v=0)
> text(leftsv,dimnames(z)[[1]])
> text(rightsv,dimnames(z)[[2]])

In the plot, we should particularly pay attention to: