Choice of Link Function (Reading: Faraway (2006), section 2.7)
Bliss (1935) analyzed some data on the number of insects dying at different levels of insecticide concentration. Let us first read into R the data and take a look of it:
bliss <- read.table("bliss.txt")
We now fit the binomial GLM to the data under the three link functions --- logit, probit, and complementary log-log:
modl <- glm(cbind(dead,alive) ~ conc, family=binomial, data=bliss)
> modp <- glm(cbind(dead,alive) ~ conc, family=binomial(link=probit), data=bliss)
> modc <- glm(cbind(dead,alive) ~ conc, family=binomial(link=cloglog), data=bliss)
We start by considering the fitted values of probability px:
> fitted(modl) # or use the command "predict(modl,type="response")"
An alternative way to obtain these values is to use linear predictor, hx:
The values of linear predictor can also be obtained from:
> modl$linear.predictors # or use the command "predict(modl)"
The fitted values of probability are then:
Notice the need to distinguish between predictions in the scale of the response (i.e., px) and the link (i.e., hx).
Now, let us compare the logit, probit, and complementary log-log fits:
These are not very different, but now look at a wider range -2<concentration<8:
x <- seq(-2,8,0.2)
> pl <- ilogit(modl$coef+modl$coef*x)
> pp <- pnorm(modp$coef+modp$coef*x)
> pc <- 1-exp(-exp((modc$coef+modc$coef*x)))
In the figure, logit fit is shown by a solid line, probit fit a dotted line, complementary log-log a dashed line. We can see that:
when 0.2 < px <0.8, the three lines do not seem very different (both on their differences and on their ratios)
however, when px is close to 0 or 1, although their differences are still small, their ratios differ substantially as shown in the following two figures
This is problematic since the concentration in the dataset falls in the range: [0, 4], which indicates it would be difficult to distinguish between these link functions using the data
In the figure, the lower tail ratio of probit to logit probabilities (i.e., px,probit/px,logit) is given by the solid line, the upper tail ratio (i.e., (1-px,probit)/(1-px,logit)) is given by the dashed line.
In the figure, the lower tail ratio of complementary log-log to logit probabilities is given by the solid line, the upper tail ratio is given by the dashed line.