A crash course in Bayesian mixed models with brms (Lesson 5)
Introduction and review
What is this class?
Part 5 of a practical introduction to fitting Bayesian multilevel models in R and Stan
Uses the brms package (Bayesian Regression Models using Stan)
How to follow the course
Slides and text version of lessons are online
Fill in code in the worksheet (replace ... with code)
You can always copy and paste code from text version of lesson if you fall behind
What we’ve learned in Lessons 1-4
Relationship between prior, likelihood, and posterior
How to sample from the posterior with Hamiltonian Monte Carlo
How to use posterior distributions from our models to make predictions and test hypotheses
GLMMs with non-normal response distributions and random intercepts and slopes
Models with residuals correlated in space and time
Spatial generalized additive mixed models
Beta regression for proportion data
Zero-inflated and hurdle models for data with zeros
What will we learn in this lesson?
Cumulative logistic mixed models for ordered categorical data
Nonlinear mixed models
Big-picture concepts
Explore in more detail how link functions work in mixed models
Discover how every parameter in a mixed model can have fixed and random effects
Part 1. Cumulative logistic mixed models for ordinal data
Load packages and set options
library(brms)library(agridat)library(tidyverse)library(emmeans)library(tidybayes)library(ggdist)library(bayestestR)library(gt)theme_set(theme_bw())colorblind_friendly <-scale_color_manual(values =unname(palette.colors(5)[-1]), aesthetics =c('colour', 'fill'))# IF YOU ARE RUNNING THIS CODE ON YOUR OWN MACHINEoptions(mc.cores =4, brms.file_refit ='on_change')# IF YOU ARE RUNNING THIS CODE ON THE POSIT CLOUD SERVER (or you have configured CmdStan on your own machine)options(brms.backend ='cmdstanr', mc.cores =4, brms.file_refit ='on_change')
Categorical data
Image (c) Allison Horst
We’ve seen binary response data in lesson 2
Nominal data (for example: blood group) has statistical models but we won’t cover them today
Ordered categorical data
Lots of data in ag science are on an ordinal rating scale
Common in phenotyping: capture a complex phenotype, or a quantitative one that is too time-consuming to measure directly, with a simple score
Rate disease severity on a scale from 1 to 5
Instead of counting hairs on a stem, rate low/medium/high
Image (c) Univ. Groningen
Shortcut at your own risk
It is faster to score phenotypes on a rating scale than to physically measure the underlying variable directly
But this speed has a price: more data points are required for similarly precise estimates, than if you had a continuous variable
Some people try to have it both ways and treat the numerical ratings as continuous variables with normally distributed error
This “quick and dirty” shortcut often works OK
But why do that when you have statistical models for ordinal response data?!?
Review: modeling binary data
For binary data we use a logistic model: binomial response distribution with logit link function
We’re trying to predict a probability, which has to be between 0 and 1
But our linear predictor is a combination of fixed and random terms, multiplied by coefficients and added together, which can be any value, positive or negative
Logit, or log-odds, link function takes a probability between 0 and 1 and maps it to a scale that can have any value, positive or negative
\(\text{logit } p = \log \frac{p}{1-p}\)
Binomial distribution for binary data
Image (c) ICMA Photos
Logit model is modeling a binary outcome, we only need to predict single probability
Probability of success + probability of failure = 1
If we know one, we know the other
Model this process as following a binomial distribution which only has one parameter
Example: model coin flips as following a binomial distribution with one parameter: \(\theta\), the probability of flipping heads
Multinomial distribution, from coins to dice
Image (c) Diacritica
If we have \(n\) ordered categories, we have to estimate \(n - 1\) probabilities
They all have to be between 0 and 1
Instead of a binomial distribution, use a multinomial distribution.
\(n - 1\) parameters, the probability of each outcome except for the last
You can model a dice roll as following a multinomial distribution with equal probability, 1/6, of the first five outcomes
If we know probability of rolling 1, 2, 3, 4, and 5, we know the probability of rolling a 6
Cumulative logit link
We still use a logit link function but it’s a bit more complicated
Example: we have four categories, 1, 2, 3, 4
Estimate probability of category 1, probability of 2 or less, and probability of 3 or less
(Probability of 4 or less = 1)
These are cumulative probabilities so we’ll fit a cumulative logit model
Link functions in a cumulative logit model
\(n - 1\) link functions, one for each category except the last
Map our cumulative probabilities for each category to the linear predictor scale
If probabilities are \(p_{y \le 1}\), \(p_{y \le 2}\), and \(p_{y \le 3}\), the link functions are:
Presentation ideas: How to report this analysis in a manuscript?
Description of statistical model for methods section (including model comparison results)
Verbal description of results
Figure
Methods section
We fit a generalized linear mixed model to the turfgrass color rating data, pooling Good and Excellent to a single category for a total of three categories. The model assumed a multinomial response distribution with cumulative logit link function. We allowed the intercept of each linear predictor to vary by rating category but estimated a single set of fixed effects (nitrogen treatment, management treatment, week as a discrete variable, and all their interactions) and random effects (random intercept for replicate nested within treatment combination) for each category threshold. We put weakly informative Normal(0, 2) priors on all fixed effects. Using Hamiltonian Monte Carlo, we sampled 1000 warmup iterations (discarded) and 1000 sampling iterations for each of four chains.
Methods section, continued
We generated posterior distributions of the estimated marginal probability of each rating category, and the mean category prediction, for each combination of treatment and week and averaged over weeks. We compared the mean category prediction for each pair of treatments averaged over weeks, testing against a null hypothesis of no difference using the Bayesian probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\). We characterized the posterior distributions using the medians and equal-tailed credible intervals.
Results section
The probability of higher color ratings tended to be higher in the M3 and M4 management treatments relative to M1 and M2, and higher in the high N addition treatment relative to the low N addition treatment. There were only minimal interactions between management treatment and nitrogen addition level, and the relative differences between the treatments varied very little over time.
(At this point I would refer to specific tables and figures because it would become cumbersome to repeat many probabilities in the text).
Manuscript-quality figure
Only present means averaged over weeks (as we ignored interactions with time anyway)
Present the probabilities instead of the (sketchy) mean class predictions
Separate into a grid of panels by nitrogen treatment and rating class
ggplot(karcher_epred %>%filter(week =='overall'), aes(x = manage, y = .epred)) +stat_interval(aes(color = rating, color_ramp =after_stat(level)), .width =c(.5, .8, .95)) +stat_summary(fun ='median', geom ='point') +facet_grid(rating ~ nitro,labeller =labeller(nitro =c('N1'='low N addition', 'N2'='high N addition') )) + colorblind_friendly +labs(x ='Management treatment', y ='Modeled class probability', color_ramp ='CI width')
Part 2. Nonlinear mixed models
Nonlinear model versus GAM
In Lesson 3 we used GAMs (semiparametric models) to model spatial variation
GAMs use a bunch of polynomials to approximate a nonlinear relationship with linear models
But what if we have a strong theoretical expectation about how two variables are related?
We can fit a fully parametric nonlinear mixed model with brms
We can use any error distribution we want and put fixed and/or random effects on any of the parameters
Microbial growth curves
A population of microbial cells grows exponentially at an increasing rate until they start to crowd each other and growth levels off
This looks like an S-curve
Different mathematical equations exist to approximate this relationship
Based on existing knowledge and theory we can hypothesize which functional form will fit the data best
Image (c) Garnhami
Gompertz equation
\[f(t) = a e^{-b e^{-ct}}\]
One of many equations that looks kind of like an S
Has been shown to model microbial biomass growth well
Population size as a function of time \(t\), with three parameters:
\(a\): asymptotic population size, or carrying capacity of environment
\(b\): parameter to offset the curve to left or right along time axis (length of lag phase)
\(c\): determines growth rate (how fast population grows toward carrying capacity)
Example Gompertz curves
Example data
Image (c) Bioscreen
Small subset of a real ARS dataset, with noise added
How much do pyrrocidines inhibit the growth of Fusarium mold?
Inference from posterior distributions of parameter means
emm_a_post <-describe_posterior(emm_a$emmeans, ci_method ='eti', ci =c(0.5, 0.8, 0.95), test =NULL)emm_b_post <-describe_posterior(emm_b$emmeans, ci_method ='eti', ci =c(0.5, 0.8, 0.95), test =NULL)emm_c_post <-describe_posterior(emm_c$emmeans, ci_method ='eti', ci =c(0.5, 0.8, 0.95), test =NULL)contr_a_post <-describe_posterior(emm_a$contrasts, ci_method ='eti', ci =c(0.5, 0.8, 0.95), test =c('pd', 'p_map'))contr_b_post <-describe_posterior(emm_b$contrasts, ci_method ='eti', ci =c(0.5, 0.8, 0.95), test =c('pd', 'p_map'))contr_c_post <-describe_posterior(emm_c$contrasts, ci_method ='eti', ci =c(0.5, 0.8, 0.95), test =c('pd', 'p_map'))
Equal-tailed quantile credible intervals of means in $emmeans objects
Credible intervals and Bayesian hypothesis tests in $contrasts objects
How do parameters differ between compounds? (means)
Presentation ideas: How to report this analysis in a manuscript?
Description of statistical model for methods section (including model comparison results)
Verbal description of results
Figure
Methods section
We fit a nonlinear mixed model to the OD data. The response variable was adjusted by taking the logarithm and subtracting the minimum value of each well so that the lowest value in all wells was zero. We assumed the growth rate of OD with time followed a Gompertz relationship \(f(t) = a e^{-b e^{-ct}}\). A fixed effect for compound was put on each of the three nonlinear parameters: \(a\), the asymptote; \(b\), the offset, and \(c\), the growth rate. In addition, we put a random intercept for each well nested within replicate on the asymptote parameter. We put weakly informative Normal(0, 1) priors on the fixed effects and priors for the intercepts of each nonlinear parameter which were normally distributed with means derived from the point estimates from a least squares nonlinear fit.
Methods section, continued
We sampled the posterior distribution of model parameters using the Hamiltonian Monte Carlo algorithm with four chains, each run for 1000 warmup iterations (discarded) and 2500 sampling iterations (retained). We compared this model with a similar model that did not allow the \(b\) and \(c\) parameters to vary by compound, and found the more complex model to be a better fit; we present predictions from only that model in what follows.
We obtained posterior estimates of the marginal mean for each nonlinear parameter for each compound and compared the estimates between each pair of compounds. We characterized the posterior distributions using the medians and equal-tailed credible intervals, and tested the contrasts against a null value of zero using the Bayesian probability of direction (\(p_d\)) and maximum a posteriori p-value (\(p_{MAP}\)).
Results section
The model predicted a similar asymptote value for adjusted log OD for each of the three compounds (A: 3.06, 95% CI [2.97, 3.15]; B: 3.19 [3.10, 3.28]; AB: 3.15 [3.06, 3.24]). The model predicted a slightly higher growth rate for B (0.063 [0.062, 0.065]) compared to A (0.061 [0.059, 0.063]) and AB (0.059 [0.058, 0.061]). This indicates that the A compound may have been slightly more effective at initially slowing the growth rate but that the long-term inhibition differs very little between the compounds.
Manuscript-quality figure
Previous figure recreated with more publication-quality options
ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +geom_line(aes(y = adjusted_log_OD, group =interaction(compound, rep, well)), data = bioscreen, alpha =0.25, linewidth =0.5) +stat_lineribbon(aes(fill_ramp =after_stat(level)), alpha =0.5) + colorblind_friendly +labs(fill_ramp ='CI width', x ='time (h)', y ='adjusted log(OD)') +theme(panel.grid =element_blank(),legend.position ='inside',legend.position.inside =c(0, 1),legend.justification =c(0, 1),legend.background =element_blank() )
Conclusion
You now have a lot of techniques in your Bayesian toolkit! Today you learned how to:
Fit a statistical model when you have ordered categorical response data
Make different kinds of predictions and comparisons using a Bayesian cumulative logit-link mixed model
Fit a nonlinear model with fixed and random effects
Compare parameters of a nonlinear model between treatment groups in a Bayesian framework
Well done!
See text version of lesson for further reading and useful resources
Please provide feedback to help me improve this course!