Estimation Problem (Reading: Faraway (2006), section 2.8)

ˇ@

Margolese (1970) reported a data that recorded urinary androsterone (androgen) and etiocholanolone (estrogen) values from 26 healthy males. Among them, 15 men are homosexual (labeled as "g" in the orientation variable) and 11 heterosexual (labeled as "s"). We start by reading the data into R and drawing a plot of it:

> hormone <- read.table("hormone.txt")
> hormone

   androgen estrogen orientation

1       3.9      1.8           s

2       4.0      2.3           s

3       3.8      2.3           s

 ˇKdeletedˇK

26      1.3      4.0           g

> plot(estrogen ~ androgen,data=hormone,pch=as.character(orientation))


Notice that:

ˇ@

    We now fit a binomial GLM to see if the orientation can be predicted from the two hormone values:

> modl <- glm(orientation ~ estrogen + androgen, hormone, family=binomial)

Warning messages:

1: algorithm did not converge in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 

2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 

Notice that:

A look at the summary reveals further evidence:

> summary(modl)

Call:

glm(formula = orientation ~ estrogen + androgen, family = binomial, data = hormone)

 

Deviance Residuals:

       Min          1Q      Median          3Q         Max 

-2.759e-05  -2.107e-08  -2.107e-08   2.107e-08   3.380e-05 

 

Coefficients:

             Estimate Std. Error z value Pr(>|z|)

(Intercept)    -84.49  136095.03  -0.001        1

estrogen       -90.22   75910.98  -0.001        1

androgen       100.91   92755.62   0.001        1

 

(Dispersion parameter for binomial family taken to be 1)

 

    Null deviance: 3.5426e+01  on 25  degrees of freedom

Residual deviance: 2.3229e-09  on 23  degrees of freedom

AIC: 6

 

Number of Fisher Scoring iterations: 25

Notice that

Let us now perform deviance-based method to test the significance of individual parameters:

> drop1(modl,test="Chi")

Single term deletions

 

Model:

orientation ~ estrogen + androgen

         Df  Deviance    AIC    LRT   Pr(Chi)   

<none>      2.323e-09  6.000                    

estrogen  1    27.437 31.437 27.437 1.623e-07 ***

androgen  1    28.257 32.257 28.257 1.062e-07 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that:

ˇ@

    We now go back to the problem of unstable parameter estimation. Let us take a look at a figure, which computes the line corresponding to px=1/2, where px is predicted using the fitted model modl. The figure reveals the reason for the instability in parameter estimation:

> abline(-84.49/90.22,100.91/90.22)

Notice that:

ˇ@

    Let us try to estimate the parameters by using a bias-reduction method proposed by Firth (1993). The estimates have the advantage of always being finite:

> library(brlr)
> modb <- brlr(orientation ~ estrogen + androgen, hormone, family=binomial)
> summary(modb)

Call:

brlr(formula = orientation ~ estrogen + androgen, data = hormone, family = binomial)

 

Coefficients:

            Value   Std. Error t value

(Intercept) -3.6512  2.9110    -1.2543

estrogen    -3.5862  1.4989    -2.3925

androgen     4.0743  1.6212     2.5132

 

Deviance: 3.6993

Penalized deviance: 4.1837

Residual df: 23

Notice that:

    By using the fitted model in the brlr result, we add the line corresponding to px=1/2:

> abline(-3.6512/3.5862,4.0743/3.5862,lty=2)

Notice that: