A crash course in Bayesian mixed models with brms (Lesson 2)

Introduction and review

What is this class?

Part 2 of a practical introduction to fitting Bayesian multilevel models in R and Stan
Uses the brms package (Bayesian Regression Models using Stan)

How to follow the course

Slides and text version of lessons are online
Fill in code in the worksheet (replace ... with code)
You can always copy and paste code from text version of lesson if you fall behind

Concepts we covered in Lesson 1

What is Bayesian inference
Prior, likelihood, and posterior
Credible intervals

Techniques we covered in Lesson 1

Use Hamiltonian Monte Carlo method, implemented in Stan/brms, to fit Bayesian models
Fit a multilevel (mixed) model with random intercepts and random slopes
Specify different priors
Diagnose and deal with convergence problems
Plot model parameters and predictions
Compare models with information criteria
Do hypothesis testing with Bayesian analogues of p-values

What will we learn in this lesson?

Specify strongly and weakly informative priors and see how they may influence your results
Fit a Bayesian generalized linear model (GLM), for situations where you can’t assume normally distributed error in your model
Fit a Bayesian generalized linear mixed model (GLMM), to accommodate non-normal data and a mixture of fixed and random effects

Packages for a “Bayesian data-to-doc pipeline”

emmeans: make predictions from Bayesian models
bayestestR: Bayesian hypothesis testing
tidybayes, ggdist, and ggplot2: make nice plots of Bayesian model predictions
gt: make nice tables of model predictions

Back to Bayes-ics: Bayes’ Theorem

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

The probability of A being true given that B is true (\(P(A|B)\)) is equal to the probability that B is true given that A is true (\(P(B|A)\)) times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\))

Bayes’ theorem in a statistical context

\[P(model|data) = \frac {P(data|model)P(model)}{P(data)}\]

\[posterior \propto likelihood \times prior\]

Bayes’ theorem in statistics, explained

We have some prior knowledge about the world, formalized with a statistical distribution of what we think the parameters are without having collected any data yet
- prior, represented by \(P(model)\), or the probability of the model not dependent on the data
We go out and collect some data and calculate \(P(data|model)\)
- likelihood, a function that just depends on the data.
- How likely it was to observe the data we did, given all possible combinations of parameters we could have in our model

Divide the likelihood, \(P(data|model)\), by the marginal probability, \(P(data)\), to make the likelihood function have an area under the curve of 1 and become a probability distribution
Multiply the prior and marginalized likelihood together to get \(P(model|data)\)
- posterior, our new knowledge of the world, after collecting the data, represented by a new statistical distribution for our parameter estimates

The posterior distribution

The posterior is the product of the prior and the likelihood
It represents a combination of what we knew before we got the data (the prior), updated by the new information we got from the data (the likelihood)

Priors have a bad reputation

Both classical (frequentist) approach and Bayesian approach use likelihood
Only Bayesian approach uses priors
Priors scare some people because they think they are subjective and you can use them to get any result you want
It’s understandable to feel that way. What arguments can we give to defend the use of priors?

Argument 1: Assumptions are not a bad thing

Prior distributions are not the only assumption; likelihood always involves assumptions too!
- Example: a general linear model, Bayesian or not, assumes x-y relationship is linear, and the noise around that is normally distributed
- That’s a strong assumption that only roughly approximates the real world
- If you won’t make assumptions, you can’t fit any statistical model at all!

Argument 2: Priors are (often) more practical than philosophical

If you have a large dataset, the prior distribution usually has relatively little influence on the results
The prior makes the computations needed to fit the statistical model a little bit easier
In those cases, only the Bayesian model can give us an answer in the first place

Argument 3: We should always check sensitivity of results to assumptions anyway!

If you make a major assumption in constructing your model, you should check whether the results are sensitive to it, priors or no priors
If the results don’t change as a result of changing the prior, then say so
If the results do change as a result of changing the prior, then present that and allow the reader to decide
If you know that your results are sensitive to choice of prior, it is wrong to present results from a single model with a single prior distribution as if they are the only “true” results

How the posterior is influenced by the prior and the data: example

Example: We want to know the probability of a new vaccine successfully granting immunity to a disease.
Baseline immunity rate is 50%
Pessimistic/conservative prior assumption that the vaccine is likely to be ineffective at changing immunity rate, but it might have an effect
Assume a prior distribution with the most probable outcome being 50%, but allowing for outcomes both above and below that

Collect different amounts of data

Vaccinate animals and then challenge them with a virus and measure their immunity
Three different hypothetical scenarios

	Number of animals tested	Number immune	Number not immune	Observed immunity rate
Scenario 1	10	7	3	70%
Scenario 2	50	35	15	70%
Scenario 3	100	70	30	70%

Posterior distributions get closer to likelihood with increasing sample size

plots showing influence of likelihood on posterior

As the amount of data goes up, evidence that the true immunity rate is 70% becomes stronger
The influence of our prior distribution centered at 50% keeps getting weaker and weaker
The influence of the likelihood (informed by the data) keeps getting stronger
The more data you have, the more evidence you have, and the more the likelihood pulls the posterior towards itself and away from the prior

Hands-on example of prior distribution influence

If you have lots of data, you don’t have to worry that much about the influence of the prior
But what if you have a very small dataset?
Let’s do a hands-on example using data on a controversial topic where people may have strong prior beliefs

Setup

Load packages.

library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)

Set plotting options (including colorblind-friendly palette)
Create directory for fitted model objects if it does not exist

theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))

if (!dir.exists('fits')) dir.create('fits')

Set brms options
Cloud server/SCINet has pre-fit model objects loaded to save time

options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr')

If you are on a local machine without CmdStan, do not set brms.backend

options(mc.cores = 4, brms.file_refit = 'on_change')

Meet the example data

gathmann.bt example dataset from agridat package
Abundance of non-target arthropod species measured on transgenic Bt maize and control (isogenic line).
Type ?gathmann.bt in your console to pull up documentation for the dataset
Does Bt corn have bigger impacts on non-target species than conventional insecticide, or is it safer?

Small sample size: \(n=8\) Thysanoptera abundance measurements on each line

data(gathmann.bt)

ggplot(gathmann.bt, aes(x = gen, y = thysan)) + geom_boxplot()

Set factor order so that the control line (ISO) is treated as the reference level in the model
A positive coefficient in the model will mean that Bt has higher Thysanoptera abundance than control

gathmann.bt <- mutate(gathmann.bt, gen = relevel(gen, ref = 'ISO'))

Frequentist analysis: two-sided t-test

t.test(formula = thysan ~ gen, data = gathmann.bt)

We ask: is there enough evidence to reject the null hypothesis that the two means are equal?
Only considers the likelihood, which is calculated from the data
Does not formally incorporate any prior knowledge

Bayesian analyses with different priors

The response variable ranges from ~4 to 18, so an effect of +5 or -5 is “big”
We’re going to fit the model with 7 different priors
Default completely uninformative “flat” prior assigns equal prior probability to any effect size
Not realistic for most situations
We usually at least know the order of magnitude of expected effect size, even if we don’t know if it will be positive or negative

Six other possible prior distributions

prior distribution	belief
normal(0,5)	We don't know if Bt has a positive or negative effect and we think there's a moderate probability of a large effect in either direction
normal(0,1)	We don't know if Bt has a positive or negative effect and we think large effects in either direction are very improbable
normal(5,5)	We think Bt has a positive effect but we're very uncertain
normal(5,1)	We think Bt has a positive effect and we're very certain of that
normal(-5,5)	We think Bt has a negative effect but we're very uncertain
normal(-5,1)	We think Bt has a negative effect and we're very certain of that

Fit the models

Defaults to 4 Markov chains, 1000 warmup iterations, 2000 total iterations per chain
No need to increase sampling iterations to fully characterize posterior

# Flat prior
m1gathmann <- brm(thysan ~ gen, data = gathmann.bt, 
                  seed = 838, file = 'fits/m1gathmann')
# Prior saying that extreme differences between genotypes are unlikely
m2gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(0, 5), class = b),
                  seed = 838, file = 'fits/m2gathmann')
# Prior with even stricter regularization
m3gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(0, 1), class = b),
                  seed = 838, file = 'fits/m3gathmann')
# Prior if we think that Bt is more likely to have a positive impact on non-target species but we are uncertain
m4gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(5, 5), class = b),
                  seed = 838, file = 'fits/m4gathmann')
# Prior if we think that Bt is more likely to have a positive impact on non-target species and we have strong prior evidence for that
m5gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(5, 1), class = b),
                  seed = 838, file = 'fits/m5gathmann')
# Prior if we think that Bt is more likely to have a negative impact on non-target species but we are uncertain
m6gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(-5, 5), class = b),
                  seed = 838, file = 'fits/m6gathmann')
# Prior if we think that Bt is more likely to have a negative impact on non-target species and we have strong prior evidence for that
m7gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(-5, 1), class = b),
                  seed = 838, file = 'fits/m7gathmann')

Comparison of posterior estimates

Pull out the posterior estimates for the difference between the means
Plot the median of the posterior with 95% credible interval

get_post <- function(fit) {
  fixedeff <- summary(fit)$fixed
  setNames(fixedeff['genBt', c('Estimate','l-95% CI', 'u-95% CI')], c('median', 'lower95', 'upper95'))
}

gathmann_post_estimates <- cbind(
  prior = c('flat', 'normal(0,5)', 'normal(0,1)', 'normal(5,5)', 'normal(5,1)', 'normal(-5,5)', 'normal(-5,1)'),
  map_dfr(list(m1gathmann, m2gathmann, m3gathmann, m4gathmann, m5gathmann, m6gathmann, m7gathmann), get_post)
) %>%
  mutate(prior = factor(prior, levels = unique(prior)))

ggplot(gathmann_post_estimates, aes(x = prior, y = median, ymin = lower95, ymax = upper95)) +
  geom_pointrange() +
  geom_hline(yintercept = 0, linetype = 'dashed', linewidth = 1) +
  labs(y = 'effect of Bt on Thysanoptera abundance')

Comparison of posterior estimates

\(\text{Normal}(0,5)\), \(\text{Normal}(5,5)\), and \(\text{Normal}(-5,5)\): all similar inference to the flat prior
\(\text{Normal}(5, 1)\): similar point estimate to flat prior, but narrower uncertainty interval (informative prior)
\(\text{Normal}(0, 1)\): informative prior that assigns a very low prior probability to strong positive and strong negative effects
\(\text{Normal}(-5, 1)\): informative prior that assigns low prior probability to everything except strong negative effects
The informative priors with low standard deviations influence our conclusions

Comparison of posterior estimates: Bayesian-style p-values

“maximum a posteriori” p-value, or p_MAP (see Lesson 1)
Ratio of posterior probability of null value to probability of posterior point estimate
The lower p_MAP is, the stronger the evidence in favor of the posterior estimate relative to null

gathmann_pvalues <- data.frame(
  prior = c('flat', 'normal(0,5)', 'normal(0,1)', 'normal(5,5)', 'normal(5,1)', 'normal(-5,5)', 'normal(-5,1)'),
  pMAP = c(p_map(m1gathmann)$p_MAP[2], p_map(m2gathmann)$p_MAP[2], p_map(m3gathmann)$p_MAP[2], p_map(m4gathmann)$p_MAP[2], p_map(m5gathmann)$p_MAP[2], p_map(m6gathmann)$p_MAP[2], p_map(m7gathmann)$p_MAP[2])
)

Influence of priors: recap

Prior can be influential if it has a narrow distribution
Even though \(\text{Normal}(0, 1)\) assigns equal prior probability to positive and negative outcomes, it’s informative because it discounts large effects
But just because Bayesian result does not agree with frequentist maximum likelihood result, that doesn’t mean it’s wrong!
Maybe the skeptical prior with low probability of extreme differences is actually a good thing!
This is an extreme example: it’s never good to draw big conclusions from a tiny dataset anyway
Ideal to document the influence of the prior and let the reader decide!

Generalized linear mixed models

So far we have fit simple models where Bayesian and frequentist methods both work well
Let’s explore some territory where Bayesian methods really shine!

Review: general linear mixed models

\[y \sim \text{Normal}(Xb + Zu, \sigma), u \sim \text{Normal}(0, \sigma_u)\]

This means response \(y\) is a function of fixed effects and random effects
- \(X\): the predictor variables
- \(b\): the fixed effect parameters
- \(Z\): model matrix that says which group (block) each observation belongs to
- \(u\): the random effect parameters
- \(\sigma\): normally distributed noise

Review: general linear mixed models

\[y \sim \text{Normal}(Xb + Zu, \sigma), u \sim \text{Normal}(0, \sigma_u)\]

Random effects \(u\) are normally distributed too
Some random effects are positive, others are negative, a few are far from zero, most are close to zero
Same goes for the model residuals

Non-normal data: why is it a problem?

Not all datasets fit neatly into the generalized linear model box
Many response variables can’t be “forced” to have normally distributed residuals
- For example, what if y is a binary variable with only 0 and 1 values?
- Or what if it is a highly skewed concentration variable with many values at or near zero, and a few high positive values?
In those cases, you cannot get residuals that are normally distributed, and centered at zero, with about half of them positive and half of them negative

Non-normal data: making predictions

Predictions made from a general linear mixed model fit to non-normal data often don’t make sense
- Model fit to binary data may give you predicted probabilities greater than 1 or less than 0
- Model fit to the skewed data may give you negative predictions
Even if you don’t care about prediction but only about inference, this is still a problem
If your model doesn’t make good predictions, it doesn’t fit the data well, and you need a good fit to make inference!

Dealing with non-normal data: transforming the data

Old school solution: transform the data!
Not ideal because it introduces bias (possible exception: log transformation)
We should find a model that fits the data, instead of torturing the data to make it fit the model
Better alternative: generalized linear models!

Generalized linear mixed models: an introduction

\[g(y) \sim \text{Distribution}(Xb + Zu | \theta), u \sim \text{Normal}(0, \sigma_u)\]

The distribution can be any of a wide variety of distributions, with parameters \(\theta\)
The left side now says \(g(y)\) instead of just \(y\): link function
- Gives us the relationship between the distribution function and the predicted variable
- Converts variables from the scale of the statistical distribution, to the scale of the data

Generalized linear model example: Orobanche seed germination

What is a generalized linear mixed model?

generalized: it allows for response distributions that aren’t normal
linear: Y is modeled as a linear combination of effects (multiply each variable by something, then add them together)
mixed: it includes both fixed and random effects
This example will be generalized and linear, but not mixed (fixed effects only)

Meet the example data

Image (c) Wikimedia Commons, user Eitan_f

crowder.seeds example dataset from agridat package
Orobanche aegyptiaca: parasitic plant with no chlorophyll, gets its energy from other plants’ roots
It needs a host to survive so the seeds do not germinate until they detect a nearby host by sensing chemicals in soil
2×2 factorial experiment
- Two Orobanche genotypes ('O73' and 'O75')
- Treated with two different extracts (bean and cucumber)

Experimental design

Many plates, each with a variable number of seeds from one treatment combination
Completely randomized design: no blocks, so no random effects
- n: total number of seeds on the plate
- germ: the number of seeds that germinated

Compute and plot p (proportion of germinated seeds)

data(crowder.seeds)

crowder.seeds <- mutate(crowder.seeds, p = germ/n)

ggplot(crowder.seeds, aes(x = gen, y = p, color = extract)) +
  geom_boxplot() + colorblind_friendly

Fit a linear model to the proportions

This is OK in many cases if proportions are between ~0.05 and ~0.95
Assumption of normally distributed error can still work out
Because proportions are between 0 and 1, \(\text{Normal}(0, 1)\) prior on categorical fixed effect is completely uninformative (it encompasses any possible effect size)

crowder_lm <- brm(p ~ gen * extract, 
                  prior = c(
                    prior(normal(0, 1), class = b)
                  ),
                  data = crowder.seeds, 
                  seed = 111, file = 'fits/crowder_lm')

Fit a generalized linear model to the binary outcomes

Outcome is binary (0 = seed did not germinate, 1 = seed germinated)
Response variable is specified with two parts
- Number of successes (germinated seeds in each dish)
- Total number of trials (all seeds in each dish including the ones that didn’t germinate)
germ | trials(n)

Response family

family = <distribution function>(link = '<link function>')
What is the response distribution?
And what is the link function that determines the relationship between the observed data scale and the model prediction scale?
Here we are going to use family = binomial(link = 'logit')
- Binomial is a statistical distribution that has two possible values, 0 or 1
- But what is that “logit” thing?

The logit link function

Linear combination of covariates in a linear model \(Xb + Zu\) can take any value, positive or negative
But we are trying to model the probability of germination, which can only be between 0 and 1
Logit link function is used to transform the unbounded value to a scale that ranges from 0 to 1

Logit = log odds

\[\text{logit}~p = \log \frac {p}{1-p}\]

Maps values ranging from 0 to 1 to values ranging from \(-\infty\) to \(+\infty\)
More or less a straight line when the p is between about 0.25 and 0.75, gets steep close to 0 and 1
If you have an event that has an extremely low (or high) chance of happening, a little increase (or decrease) to the probability will multiply the odds by a lot

GLM code

normal(0, 5) prior is sensible for fixed effects on the log odds scale
Again we will use all default model fitting options

crowder_glm <- brm(germ | trials(n) ~ gen * extract, 
                   family = binomial(link = 'logit'),
                   prior = c(
                     prior(normal(0, 5), class = b)
                   ),
                   data = crowder.seeds, 
                   seed = 112, file = 'fits/crowder_glm')

Comparing LM and GLM

What do you notice about the posterior predictive check plots?

thick black line: distribution of the observed data
each thin blue line: distribution of the predicted values from one random draw from the posterior

pp_check(crowder_lm)
pp_check(crowder_glm)

LM and GLM: Posterior predictive check plots

Both are not too bad: shape of the fitted values is similar to shape of the observed data
Scale of x-axis is different
But notice some predictions from LM are <0 and >1
Why? Why is this a problem?

LM: Estimating marginal means

pairwise ~ extract | gen: within each level of genotype (gen), estimate the predicted mean for each level of extract and take the pairwise contrast between them
LM contrasts are taken by subtracting one modeled proportion from the other

emm_crowder_lm <- emmeans(crowder_lm, pairwise ~ extract | gen)

emm_crowder_lm$emmeans
emm_crowder_lm$contrasts

GLM: Estimating marginal means

Estimated marginal means on the logit scale
Contrasts on the log odds ratio scale

emm_crowder_glm <- emmeans(crowder_glm, pairwise ~ extract | gen)

emm_crowder_glm$emmeans
emm_crowder_glm$contrasts

Estimated marginal means on the response scale

The argument type = 'response' transforms the results back to the data scale
The logit link function in R is qlogis() and its inverse is plogis()

emmresponse_crowder_glm <- emmeans(crowder_glm, pairwise ~ extract | gen, type = 'response')

emmresponse_crowder_glm$emmeans
emmresponse_crowder_glm$contrasts

Odds ratios

Now estimated marginal means and credible intervals are probabilities
Contrasts are odds ratios
- Ratio of the odds of germination between the two levels of extract
- An odds ratio of 1 indicates no difference between the odds
- The ratio is bean / cucumber so we have evidence that bean extract is less effective at stimulating germination than cucumber extract

Hypothesis testing

What if we want to do a test relative to a null hypothesis?
Use describe_posterior() from the bayestestR package
Describe the posterior distributions of the differences between the means of the extract treatments within each genotype for the general linear model
Describe the posterior distributions of the odds ratios of the means for the generalized linear model

`describe_posterior()` explained

We have 1000 sampling iterations from each of 4 chains: total of 4000 posterior draws for the distribution of each contrast
By default describe_posterior() gives us the median of the 4000 values as the point estimate
Specify equal-tailed credible interval using ci_method = 'eti', default width 95%
- We get 2.5% and 97.5% quantiles of the 4000 values

Bayesian-style p-values revisited

Request two different null hypothesis tests (see Lesson 1)
- p_map: the ratio of the probability of the null value to the most probable posterior value (point estimate)
- pd (probability of direction): the percent of the posterior distribution that has the same direction (sign) as the point estimate; the closer to 100%, the more evidence in favor of a non-null difference
- We have to specify null = 1 for the odds ratio contrasts (ratio of 1 is no difference)

describe_posterior(emm_crowder_lm$contrasts, ci_method = 'eti', test = c('pd', 'p_map'))
describe_posterior(emmresponse_crowder_glm$contrasts, ci_method = 'eti', test = c('pd', 'p_map'), null = 1)

Comparing inference from LM and GLM

The generalized linear model provides stronger evidence in favor of differences
It is taking into account the greater precision we can get from having many partially independent binary outcomes (seeds) in each dish
200/1000 is more evidence than 2/10 (both are treated the same in the linear model fit to proportions)

Assessing evidence for interaction between treatments

Is the difference between the response to bean and cucumber extract different between the two genotypes?
Call the contrast() function and specify two levels of interaction contrast
First contrast the extracts within each genotype, then contrast those two contrasts with each other
- 'revpairwise': reverse pairwise

emmeans(crowder_glm, ~ extract + gen, type = 'response') %>%
  contrast(interaction = c('revpairwise', 'pairwise')) %>%
  describe_posterior(ci_method = 'eti', test = c('pd', 'p_map'))

Presentation ideas: How to report this analysis in a manuscript?

Description of the statistical model for the methods section
Verbal description of the results
Figure
Table

Methods section

We fit a generalized linear model that modeled the germination probability for each dish as a function of extract type, genotype, and their interaction. The model assumed a binomial response distribution and logit link function. The model was fit with Hamiltonian Monte Carlo. Four chains were run, each with 1000 warmup iterations and 1000 sampling iterations. We verified model convergence by ensuring the R-hat statistics for all parameters were below 1.01. We assessed model fit by examining the posterior predictive check plot. We obtained posterior estimates of the marginal mean for each combination of extract and genotype, and of the odds ratio between each extract type within each genotype. We calculated the median and 95% equal-tailed credible interval for each estimated mean and odds ratio. To assess evidence for differences between extracts, we calculated the maximum a posteriori p-value (p_MAP) and probability of direction (p_d).

Results section

The modeled probability of germination was greater when seeds were treated with cucumber extract versus bean extract for both the O73 genotype (bean extract: posterior median 0.396, 95% CI [0.316, 0.478]; cucumber extract: 0.531 [0.458, 0.614]) and the O75 genotype (bean extract: 0.364 [0.305, 0.421]; cucumber extract: 0.681 [0.629, 0.734]). The odds ratio between cucumber and bean extract was smaller for the O73 genotype than the O75 genotype (ratio 0.46 [0.25, 0.83], p_MAP < 0.001), indicating a greater relative effect of cucumber extract on germination probability in the O75 genotype.

Manuscript-quality figure

Show the median and several different credible intervals of the distributions of posterior means for each treatment combination
Use gather_emmeans_draws() from the tidybayes package to pull out the individual posterior draws from the emmeans object
Use stat_interval() to simultaneously compute our desired credible intervals and plot them
We need to transform the means from the logit scale to the probability scale using mutate(.value = plogis(.value))
- gather_emmeans_draws() does not preserve the transformation information

gather_emmeans_draws(emmresponse_crowder_glm$emmeans) %>%
  mutate(p = plogis(.value)) %>%
  ggplot(aes(x = gen, y = p, group = interaction(gen, extract))) +
    stat_interval(aes(color = extract, color_ramp = after_stat(level)), .width = c(0.5, 0.8, 0.95), position = position_dodge(width = .3)) +
    stat_summary(fun = 'median', geom = 'point', size = 2, position = position_dodge(width = .3)) +
    colorblind_friendly +
    labs(x = 'Genotype', y = 'Modeled germination probability', color_ramp = 'CI width', color = 'Extract')

Manuscript-quality table

Present the medians (95% CI) of the means and the odds ratios for the two extracts within each genotype
Some rearranging needed to get a compact wide format
median_qi() from tidybayes summarizes posterior draws with a median point estimate and a quantile credible interval of a given width
Again we need to transform the means with plogis() to go from logit scale to probability scale
Transform the contrasts with exp() to go from log odds ratio scale to odds ratio scale

means_summary <- gather_emmeans_draws(emmresponse_crowder_glm$emmeans) %>%
  mutate(.value = plogis(.value)) %>%
  median_qi(.width = 0.95) %>%
  pivot_wider(id_cols = gen, names_from = extract, values_from = c(.value, .lower, .upper))

contrasts_summary <- gather_emmeans_draws(emmresponse_crowder_glm$contrasts) %>%
  mutate(.value = exp(.value)) %>%
  median_qi(.width = 0.95)

left_join(means_summary, contrasts_summary) %>%
  select(-c(contrast, .width, .point, .interval)) %>%
  gt(caption = 'Germination probability') %>%
  fmt_number(decimals = 3) %>%
  cols_merge(c(.value_bean, .lower_bean, .upper_bean), pattern = '{1} [{2}, {3}]') %>%
  cols_merge(c(.value_cucumber, .lower_cucumber, .upper_cucumber), pattern = '{1} [{2}, {3}]') %>%
  cols_merge(c(.value, .lower, .upper), pattern = '{1} [{2}, {3}]') %>%
  cols_label(gen = 'Genotype', .value_bean = 'Bean extract', .value_cucumber = 'Cucumber extract', .value = 'Bean:cucumber odds ratio')

Generalized linear mixed model example: turnip yield

Generalized linear mixed models

GLMM: non-normal data and a mixture of fixed and random effects
Bayesian GLMMs really outperform frequentist
Model syntax requires random terms on right-hand side of formula, and to specify response family and link function

Meet the example data

mcconway.turnip from agridat
Continuous outcome (yield)
Asymmetrical distribution; yield must be > 0
Positive, right-skewed distributions are super common in ag science

data(mcconway.turnip)
ggplot(mcconway.turnip, aes(x = yield)) + geom_histogram()

Turnip experimental design

2×2×4 factorial design experiment, with three factors
- genotype (gen, either "Barkant" or "Marco")
- planting date (date, either "21Aug1990" or "28Aug1990")
- planting density (density, 1, 2, 4, or 8 kg/ha)
Randomized complete block design with four blocks
Each block has 16 plots, one for each treatment combination
A single measurement of yield was taken from each plot

Treat density as a discrete variable with four levels
This is easier because we don’t need to worry about whether the effect of planting density on yield is linear

mcconway.turnip <- mutate(mcconway.turnip, density = factor(density))

Boxplot of the yield values in each treatment combination
Split the genotypes into separate panels using facet_wrap()
Arrange the boxplots in increasing order of density on the x-axis
Show the planting dates for each density with a different color box

ggplot(mcconway.turnip, aes(x = density, y = yield, group = interaction(density, date), color = date)) +
  facet_wrap(~ gen) +
  geom_boxplot() +
  colorblind_friendly +
  labs(x = 'Planting density (kg/ha)', y = 'Yield', color = 'Planting date')

Yield distribution

Interaction between planting date and density
Possible interaction with genotype
Relationship between mean and variance
Linear model assumes one single standard deviation for all the residuals, so unequal variance can cause bad predictions

Fit a linear mixed model

Pretend we don’t care about the unequal variance
Include all main effects, two-way interactions, and three-way interactions: gen * date * density
+ (1 | block): random intercept for each block
Yield varies between 0 and 25 so we will use normal(0, 5) as a prior on fixed effects
All default MC sampling options

mcconway_lmm <- brm(yield ~ gen * date * density + (1 | block), 
                    prior = c(
                      prior(normal(0, 5), class = b)
                    ),
                    data = mcconway.turnip, 
                    seed = 113, file = 'fits/mcconway_lmm')

Side note: divergent transitions

You may see a warning about divergent transitions
In this case they are few enough that we can ignore them
They often occur when fitting complex models to small datasets
May indicate numerical problems with the sampler; it might be “missing” parts of the posterior distribution
Fix this by setting narrower priors or tweaking the sampler arguments
Make the sampler take smaller steps with each iteration, which reduces chance of divergent transition but is slower

Posterior predictive check plot on linear model

pp_check(mcconway_lmm)

What is our model getting wrong?
Also look at estimated marginal means by density

emmeans(mcconway_lmm, ~ density)

Fit a linear mixed model to the log-transformed data

Old-school way: log-transform the response variable then fit the model
- Makes positive right-skewed data more symmetrical
- Ensures that all model predictions are positive: \(e^x > 0\)
- Stabilizes variance: flattens out relationship between mean and variance
Same syntax, just put log() around the response variable
normal(0, 2) prior because yield only varies by 2 to 3 units on the log scale

mcconway_loglmm <- brm(log(yield) ~ gen * date * density + (1 | block), 
                       prior = c(
                         prior(normal(0, 2), class = b)
                       ),
                       data = mcconway.turnip, 
                       seed = 114, file = 'fits/mcconway_loglmm')

Posterior predictive check plot: LMM with log-transformation

pp_check(mcconway_loglmm)

Gamma distribution

The log transformation fixed the problems with skew and unequal variance … right?
Yes, but transformation and back-transformation introduces bias
Better to keep data on original scale and pick a response distribution that actually fits the data
Gamma is often a good go-to option for positive right-skewed data
Its variance is proportional to its mean so it can deal with the unequal variance issue

Fit a GLMM with gamma response distribution

Fit a GLMM with gamma response distribution and log link function using family = Gamma(link = 'log').

mcconway_glmm <- brm(yield ~ gen * date * density + (1 | block), 
                     family = Gamma(link = 'log'), 
                     prior = c(
                       prior(normal(0, 2), class = b)
                     ),
                     data = mcconway.turnip, 
                     seed = 115, file = 'fits/mcconway_glmm')

Posterior predictive check: Gamma GLMM

pp_check(mcconway_glmm)

Estimating marginal means: Gamma GLMM

Marginal means for each combination of fixed effects
The means and CIs do not include random effect of block
Expected value for a block that’s “smack-dab in the middle”
type = 'response' gives predictions on data scale, not log link scale

emm_mcconway <- emmeans(mcconway_glmm, ~ density + date + gen, type = 'response')

emm_mcconway

Contrasts: Gamma GLMM

For example, pairwise contrasts between density levels within each combination of planting date and genotype
Contrasts are ratios
'revpairwise' so that the higher density is in the numerator

contrast(emm_mcconway, 'revpairwise', by = c('date', 'gen'))

Assessing evidence for interactions between treatments

Effect sizes look bigger for Marco than Barkant
Interaction contrasts (contrasts of contrasts) to test this

contrast(emm_mcconway, 'revpairwise', by = 'date', interaction = 'revpairwise')

Presentation ideas: How to report this analysis in a manuscript?

Description of the statistical model for the methods section
Text write-up for the results section
A figure

Methods section

We fit a generalized linear mixed model that modeled turnip yield as a function of genotype, planting date, planting density (discrete variable with four levels), and their interactions. The model had a gamma response distribution with log link function. A random intercept was fit to each block. The model was fit with Hamiltonian Monte Carlo. Four chains were run, each with 1000 warmup iterations and 1000 sampling iterations. We verified model convergence by ensuring the R-hat statistics for all parameters were below 1.01. We assessed model fit by examining the posterior predictive check plot. We obtained posterior estimates of the marginal mean for each combination of genotype, planting date, and planting density. We also estimated the yield ratio between all pairs of density treatments within each combination of planting date and genotype. We calculated the median and 95% equal-tailed credible interval for each estimated mean and ratio.

Results section

We found evidence that turnip yield was highest at the highest levels of planting density. The effect of density was highest for the later planting date. There was not strong evidence for a different effect of planting density between the genotypes; it was positive for both. Mean yield for density level 8 kg/ha at the later planting date was 14.66 (95% CI [4.28, 31.27]) for the Barkant genotype, versus 3.82 [1.14, 8.05] at 1 kg/ha; the ratio of yields between highest and lowest density levels was 3.84 [1.32, 7.60]. For the Marco genotype, the mean yield at later planting date was 9.80 [3.32, 21.01] for 8 kg/ha density and 1.31 [0.40, 2.82] for 1 kg/ha density; the yield ratio between the two density levels was 7.45 [2.63, 14.81].

Manuscript-quality figure

exp() is used to back-transform the posterior estimates

gather_emmeans_draws(emm_mcconway) %>%
  mutate(.value = exp(.value)) %>%
  ggplot(aes(x = density, y = .value, group = interaction(density, date))) +
    stat_interval(aes(color = date, color_ramp = after_stat(level)), position = position_dodge(width = .5)) +
    stat_summary(fun = 'median', geom = 'point', size = 2, position = position_dodge(width = .5)) +
    facet_wrap(~ gen) +
    colorblind_friendly +
    labs(x = 'Planting density (kg/ha)', y = 'Modeled yield', color_ramp = 'CI width', color = 'Planting date')

Conclusion

We’ve made it to the end of Lesson 2! Let’s revisit the learning objectives!

We’ve learned . . .

How to explore different priors and how they may influence (or not influence) your conclusions
How to fit a Bayesian generalized linear model (GLM) for non-normal response data
How to fit a Bayesian generalized linear mixed model (GLMM) for non-normal response data with random effects
Techniques from packages brms, emmeans, tidybayes, and bayestestR

Give yourself a well-deserved round of applause!

See text version of lesson for further reading and useful resources
Please provide feedback to help me improve this course!
- See course homepage for link to feedback form

A crash course in Bayesian mixed models with brms (Lesson 2)

Introduction and review

What is this class?

How to follow the course

Concepts we covered in Lesson 1

Techniques we covered in Lesson 1

What will we learn in this lesson?

Packages for a “Bayesian data-to-doc pipeline”

Back to Bayes-ics: Bayes’ Theorem

Bayes’ theorem in a statistical context

Bayes’ theorem in statistics, explained

The posterior distribution

Priors have a bad reputation

Argument 1: Assumptions are not a bad thing

Argument 2: Priors are (often) more practical than philosophical

Argument 3: We should always check sensitivity of results to assumptions anyway!

How the posterior is influenced by the prior and the data: example

Collect different amounts of data

Posterior distributions get closer to likelihood with increasing sample size

Hands-on example of prior distribution influence

Setup

Meet the example data

Frequentist analysis: two-sided t-test

Bayesian analyses with different priors

Six other possible prior distributions

Fit the models

Comparison of posterior estimates

Comparison of posterior estimates

Comparison of posterior estimates: Bayesian-style p-values

Influence of priors: recap

Generalized linear mixed models

Review: general linear mixed models

Review: general linear mixed models

Non-normal data: why is it a problem?

Non-normal data: making predictions

Dealing with non-normal data: transforming the data

Generalized linear mixed models: an introduction

Generalized linear model example: Orobanche seed germination

What is a generalized linear mixed model?

Meet the example data

Experimental design

Fit a linear model to the proportions

Fit a generalized linear model to the binary outcomes

Response family

The logit link function

Logit = log odds

GLM code

Comparing LM and GLM

LM and GLM: Posterior predictive check plots

LM: Estimating marginal means

GLM: Estimating marginal means

Estimated marginal means on the response scale

Odds ratios

Hypothesis testing

describe_posterior() explained

Bayesian-style p-values revisited

Comparing inference from LM and GLM

Assessing evidence for interaction between treatments

Presentation ideas: How to report this analysis in a manuscript?

Methods section

Results section

Manuscript-quality figure

Manuscript-quality table

Generalized linear mixed model example: turnip yield

Generalized linear mixed models

Meet the example data

Turnip experimental design

Yield distribution

Fit a linear mixed model

Side note: divergent transitions

Posterior predictive check plot on linear model

Fit a linear mixed model to the log-transformed data

Posterior predictive check plot: LMM with log-transformation

Gamma distribution

Fit a GLMM with gamma response distribution

Posterior predictive check: Gamma GLMM

Estimating marginal means: Gamma GLMM

Contrasts: Gamma GLMM

Assessing evidence for interactions between treatments

Presentation ideas: How to report this analysis in a manuscript?

`describe_posterior()` explained