Estimation Problem (Reading: Faraway (2006, 1st ed.), section 2.8)

¡@

Margolese (1970) reported a data that recorded urinary androsterone (androgen) and etiocholanolone (estrogen) values from 26 healthy males. Among them, 15 men are homosexual (labeled as "g" in the orientation variable) and 11 heterosexual (labeled as "s"). We start by reading the data into R and drawing a plot of it:

> hormone <- read.table("hormone.txt")
> hormone

   androgen estrogen orientation

1       3.9      1.8           s

2       4.0      2.3           s

3       3.8      2.3           s

 ¡Kdeleted¡K

26      1.3      4.0           g

> plot(estrogen ~ androgen,data=hormone,pch=as.character(orientation))


Notice that:

¡@

    We now fit a binomial GLM to see if the orientation can be predicted from the two hormone values:

> modl <- glm(orientation ~ estrogen + androgen, hormone, family=binomial)

Warning messages:

1: algorithm did not converge in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 

2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 

Notice that:

A look at the summary reveals further evidence:

> summary(modl)

Call:

glm(formula = orientation ~ estrogen + androgen, family = binomial, data = hormone)

 

Deviance Residuals:

       Min          1Q      Median          3Q         Max 

-2.759e-05  -2.107e-08  -2.107e-08   2.107e-08   3.380e-05 

 

Coefficients:

             Estimate Std. Error z value Pr(>|z|)

(Intercept)    -84.49  136095.03  -0.001        1

estrogen       -90.22   75910.98  -0.001        1

androgen       100.91   92755.62   0.001        1

 

(Dispersion parameter for binomial family taken to be 1)

 

    Null deviance: 3.5426e+01  on 25  degrees of freedom

Residual deviance: 2.3229e-09  on 23  degrees of freedom

AIC: 6

 

Number of Fisher Scoring iterations: 25

Notice that

Let us now perform deviance-based method to test the significance of individual parameters:

> drop1(modl,test="Chi")

Single term deletions

 

Model:

orientation ~ estrogen + androgen

         Df  Deviance    AIC    LRT   Pr(Chi)   

<none>      2.323e-09  6.000                    

estrogen  1    27.437 31.437 27.437 1.623e-07 ***

androgen  1    28.257 32.257 28.257 1.062e-07 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that:

¡@

    We now go back to the problem of unstable parameter estimation. Let us take a look at a figure, which computes the line corresponding to px=1/2, where px is predicted using the fitted model modl. The figure reveals the reason for the instability in parameter estimation:

> abline(-84.49/90.22,100.91/90.22)

Notice that:

¡@

    Let us try to estimate the parameters by using a bias-reduction method proposed by Firth (1993). The estimates have the advantage of always being finite:

> library(brglm)
> modb <- brglm(orientation ~ estrogen + androgen, hormone, family=binomial)
> summary(modb)

Call:
brglm(formula = orientation ~ estrogen + androgen, family = binomial, 
    data = hormone)


Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -3.6502     2.9104 -1.2542  0.20976  
estrogen     -3.5853     1.4985 -2.3926  0.01673 *
androgen      4.0732     1.6206  2.5134  0.01196 *
---
Signif. codes:  0 ¡¥***¡¦ 0.001 ¡¥**¡¦ 0.01 ¡¥*¡¦ 0.05 ¡¥.¡¦ 0.1 ¡¥ ¡¦ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 28.92309  on 25  degrees of freedom
Residual deviance:  3.70078  on 23  degrees of freedom
Penalized deviance: 4.18374 
AIC:  9.70078 

Notice that:

    By using the fitted model in the brglm result, we add the line corresponding to px=1/2:

> abline(-3.6502/3.5853,4.0732/3.5853,lty=2)

Notice that: