Troubleshooting common errors and warnings in (G)L(M)Ms

Quentin D. Read

Who is this talk for?

Scientists who use linear models in their work

The general discussion of statistical issues here is independent of your software platform of choice
- I will mention the common error and warning messages in both R and SAS

Informal poll

Who has seen any of these messages before?

Talk outline

Review: what are GLMMs and how do we fit them?

How to check and prepare your data for model fitting

Common notes, errors, and warnings you’ll run into, and how to deal with them
- Failure to converge
- Singular fit
- Quasi-complete separation

A note on the Bayesian approach

Review: what is a GLMM?

Generalized Linear Mixed Model

Linear model because we are estimating regression coefficients for the linear combination of predictor variables that explain the most variation in the response variable

Mixed model because it has a mix of fixed effects and random effects

Fixed effects: effects that are the same for all experimental units in the study and that we expect to be the same if we were to do the experiment again in a different context. Example: Effect of caffeine on heart rate

Random effects: effects that are unique for the different experimental units in the study and that would be different if we did the experiment again in a different context. Example: Effect of different teachers on students’ test scores

Generalized because we use a link function to convert the predicted response into something that is roughly normal (bell curve that can theoretically have any value, positive or negative).
- For example, we use the log-odds function to convert from the probability scale (bounded between 0 and 1) to the log-odds scale (no bounds)

Implementations of GLMM

Frequentist statistics
- SAS: proc glimmix
- R: lme4 and glmmTMB packages

Bayesian statistics
- SAS: proc bglimm
- Stan (interfaces with R through brms and CmdStan, and with Python through PyStan)
And many more …

GLMM fitting algorithms

Frequentist GLMMs are fit using maximum likelihood algorithms
- Optimization problem that finds the set of parameters with the highest likelihood (probability of those parameters being true, given the data)
- Often deterministic (gets exactly the same answer every time)

Bayesian GLMMs are often fit using Monte Carlo algorithms
- Sampling a distribution of likelihoods
- Stochastic (includes randomness so you don’t always get exactly the same answer)
In this talk, we will mostly focus on frequentist maximum likelihood methods

Check and prepare your data before model fitting

Plot, plot, plot!
Check data for sufficient variability
Rescale your data

Plot, plot, plot!

Look at your data before fitting the statistical model!

A cautionary tale: researchers gave students a dataset of steps taken per day and body mass index
- Some were told to test a specific hypothesis that more steps per day causes reduced BMI
- Some were given the dataset without a specific hypothesis

The students given the hypothesis were 5 times more likely to skip plotting the data and go straight to the statistical model

This is what the data looked like!

Statistical models are abstractions of reality and only provide valid inference under strict assumptions
Always “use your noggin” and look at the data!
- Yanai & Lercher 2020, bioRxiv preprint

Check data for sufficient variability: predictors

If a predictor variable has little to no variation, it cannot possibly explain any variation in the response

This is especially an issue with binary and other discrete variables

Remove predictor variables that have ~0 variation from your model

Check data for sufficient variability: response

If the response variable has little to no variation, there is no variation for your model to explain no matter what predictors you include!

You cannot learn anything from a statistical model in this case, so don’t bother trying

Example: what is the probability of me beating MJ in one-on-one?

Rescale your data

Your predictor variables may be on different scales. Model fitting algorithms have trouble if the scales are very different
- Example: temperature ranging from 15-25 C, solar radiation ranging from 400-1600 W m^-2, concentration of trace soil element ranging from 0.0001 to 0.0005

Convert units (divide or multiply by constants) until they are roughly on the same scale

Or consider standardizing (z-transforming) your predictor variables:

\[z = \frac{x - \mu}{\sigma}\]

Pay attention to how this changes interpretation of the coefficients (you can always back-transform the coefficients)

Failure to converge

Under the hood, some serious numerical computation is going on

Algorithms must converge, meaning they are run for many iterations until the answer stops changing (we no longer get any meaningful increase in likelihood)

This depends on a number of parameters, for instance the tolerance
- tolerance = the smallest difference in likelihood we consider to be “meaningful”

If you set tolerance to a very tiny number the algorithm will take longer to converge; if tolerance is increased it will converge faster but at the risk of possibly missing a solution with higher likelihood

How to deal with convergence failures

Convergence failures often happen when a too-complex model specification is used for a too-small dataset

The best advice is to reduce the complexity of the model

You may also try different optimization algorithms, or tweak their parameters
- Increase the tolerance
- Increase the number of iterations

Checking the iteration history may give you some clue as to what is happening (can you fix it by increasing the number of iterations, or is that unlikely to help?)

Singular fit warning

lme4 in R: Boundary (singular) fit
proc glimmix or proc mixed in SAS: NOTE: Estimated G matrix is not positive definite.

One or more of your random effects, or a linear combination of them, is zero (or at least so small that the algorithm cannot tell the difference between it and zero)

Correlation between two of your random effects, or a linear combination of them, is exactly ±1

Again, this often occurs when a too-complex model specification is used for a too-small dataset

Or maybe one of the random effects is truly near zero

How to deal with singular fit warning

Singular fit means there are “too many random effects” in your model. There are competing philosophies about random effects.

“Keep it maximal”: Follow the design and include all possible random intercepts and slopes. Ignore singular fit warnings. Any random effects that the data doesn’t support will turn to zero anyway so it will not affect your answer.

“Data-driven”: fit the model, see if any random effects are zero or have a correlation of ±1 to each other, then “prune” the model to get rid of any of those. Repeat until all random effects are OK.

Enlightened GLMM practitioners prefer “The Middle Way”

Start with a fairly complex model, but use your judgment and don’t go crazy with random slopes

You may judiciously remove some random effects if it helps the algorithm converge, but keep the experimental design in mind

Complete and quasi-complete separation

Binomial with many factors and/or little variation in response

The y variable “separates” one or more x variables perfectly, creating a “perfect fit”

Example: if you are studying how age and sex predict presence of a disease, some subgroups (e.g. 1-year-old females) may have 0% or 100% incidence

There is zero variance, so the coefficient for that group goes to infinity

This often occurs if sample size is small

Quasi-complete separation is when only some of the subgroups are perfectly separated and not others

Dealing with complete or quasi-complete separation

If separation is only partial (quasi-complete) you can still interpret the estimates for other subgroups

Reduce the number of variables (not ideal)

Categorize continuous variables or merge many fine categories into fewer broader categories

A penalized regression approach may be tried such as glmnet in R

An alternative: use the Bayesian approach!

The Bayesian approach may use (weakly) informative priors to stabilize the computational algorithms

This is my preferred solution to all the problems we’ve covered today!

It can be as simple as putting a \(\text{Normal}(0, 5)\) prior on your fixed effects or a \(\text{Gamma}(1, 1)\) prior on the random effect variance parameter
- This doesn’t really assume anything specific about your effects, it just keeps the algorithm from trying out implausible or impossible values to help it converge

A final note

All we’ve talked about just now is very general. Every dataset is unique; your mileage may vary.
In closing, remember the words of George Box!

Resources

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3881361/ paper by Barr et al. 2014; advice to “keep it maximal” with random effects
https://support.sas.com/resources/papers/proceedings18/2179-2018.pdf SAS paper including how to troubleshoot PROC GLIMMIX issues
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/ FAQ on complete separation, with SAS and R examples
https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html overall FAQ on GLMMs in R by Ben Bolker
https://rpubs.com/bbolker/lme4trouble1 documentation by Ben Bolker on troubleshooting mixed models in R’s lme4 package