A note on lme4 models

Setup

A user asked a reasonable question about score contributions summing to zero for a logistic-Normal model.

Here we analyze 500 pairs of 2 units generated from:

\[ \Pr(Y_{ij} = 1 | X, b) = \mbox{logit}^{-1}(\beta_0 + \beta_1 x_{ij} + b_i) = h(\beta, x_{ij}, b_i) \]

where \(X_{ij} \sim N(0, 1)\) and \(b_i \sim f(b_i) = N(0, \sigma^2)\). Let \(\theta = \{\beta_0, \beta_1, \sigma^2 \} = \{2, 2, 1\}\).

The loglihood for this model is:

\[ l(\theta; Y, X) = \sum_{i = 1}^m \{ \int \prod_{j = 1}^{n_i} h(\beta, x_{ij}, b_i)^y{ij}[1 - h(\beta, x_{ij}, b_i)]^{1 - y{ij}} f(b_i) \} \]

Take the derivative of \(l(\theta; Y, X)\), set equal to zero, and solve for \(\theta\) to find \(\hat{\theta}\).

set.seed(1)
#generate data; 500 pairs
n <- 500
m <- 2
beta <- c(2, 2)
id <- rep(1:n, each=m)
x <- rnorm(m*n)
b <- rep(rnorm(n), each=m)
y <- rbinom(m*n, 1, plogis(cbind(1, x) %*% beta + b))
d <- data.frame(y,x,id)

\(\theta\) can be estimated from the lme4::glmer function.

#fit logistic-normal random effects model
fit <- lme4::glmer(y~x+(1|id), family=binomial, data=d)
print(summary(fit))

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: y ~ x + (1 | id)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    800.1    814.8   -397.0    794.1      997 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -9.1700  0.0493  0.2134  0.4260  2.2126 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.5483   0.7404  
## Number of obs: 1000, groups:  id, 500
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.8842     0.1707   11.04   <2e-16 ***
## x             1.7950     0.1704   10.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## x 0.760

Computing score equations

Note that geex does most of this work internally.

eefun <- function(data, lme4model){
  f <- grab_psiFUN(lme4model, data = data)
  function(theta) f(theta)
}

splitdt <- split(d, f = d$id)

theta_hat <- unlist(lme4::getME(fit, c('beta', 'theta')))

psi <- lapply(splitdt, function(dt){
  eefun(dt, fit)
})

# Evaluate psi functions

psi_eval <- lapply(psi, function(f){
  f(theta_hat)
})

Do these evaluated score equations sum to zero?

colSums(matrix(unlist(psi_eval), ncol = 3, byrow  = TRUE))

## [1] -0.3425674  0.0140744  2.6353120

No.

As another check, how does the observed loglihood compare to lme4::glmer?

# geex internal functions objFun_glmerMod_binomial and binomial integrand copied
# here since they are not exported
objFun_glmerMod_binomial <- function(parms, response, xmatrix, linkinv)
{
  log(stats::integrate(binomial_integrand, lower = -Inf, upper = Inf,
                parms    = parms,
                response = response,
                xmatrix  = xmatrix,
                linkinv  = linkinv)$value)

}

binomial_integrand <- function(b, response, xmatrix, parms, linkinv){
  if(class(xmatrix) != 'matrix'){
    xmatrix <- as.matrix(xmatrix)
  }
  pr  <- linkinv( drop(outer(xmatrix %*% parms[-length(parms)], b, '+') ) )
  hh  <- stats::dbinom(response, 1, prob = pr)
  hha <- apply(hh, 2, prod)
  hha * stats::dnorm(b, mean = 0, sd = parms[length(parms)])
}

# Compute the per-unit contribution the loglihood
objFun <- lapply(splitdt, function(dt){
  X <- cbind(1, dt$x)
  objFun_glmerMod_binomial(parms = theta_hat, response = dt$y, xmatrix = X, linkinv = plogis )
})

sum(unlist(objFun))

## [1] -396.7983

logLik(fit)

## 'log Lik.' -397.0352 (df=3)

Not bad.

What happens if instead of using \(\hat{\theta}_{glmer}\) to compute the scores we use estimated parameters obtained from the finding the roots of the score functions as they are defined within the grab_eeFUN function?

NOTE: the evaluation of m_estimate is slow!

# NOT run to speed creation of documentation
attempt <- m_estimate(
  estFUN = eefun,
  data  = d,
  units = 'id',
  root_control = setup_rootFUN(start = theta_hat), # start root finding at lme4's beta hat.
  compute_vcov = FALSE,
  outer_args = list(lme4model = fit)
)

## parameter estimates obtained from geex
attempt

psi_eval2 <- lapply(psi, function(f){
  f(roots(attempt))
})

colSums(matrix(unlist(psi_eval2), ncol = 3, byrow  = TRUE))

Ah! That looks much better.

Setup

Computing score equations

Contents