Introduction and setup

This lesson picks up where Lesson 1 left off, continuing along with a practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan).

Download the worksheet for this lesson here.

What did we learn in Lesson 1?

We started out learning the basic ideas behind Bayes.

What is Bayesian inference
Prior, likelihood, and posterior
Credible intervals

We learned how to harness the power of the Hamiltonian Monte Carlo method to fit Bayesian models, using the Stan language via the R package brms. Specifically, we learned how to:

Fit a multilevel (mixed) model with random intercepts and random slopes
Specify different priors
Diagnose and deal with convergence problems
Plot model parameters and predictions
Compare models with information criteria
Do hypothesis testing with Bayesian analogues of p-values

Of course, we only scratched the surface of those complex topics. But if you don’t remember any of that, now would be a good time to review Lesson 1.

What will we learn in this lesson?

After teaching Lesson 1, I got feedback from the participants. People asked me to teach them how to fit mixed models that correspond to common study designs in agricultural science. I also was asked to take a little deeper dive into prior distributions and show how specifying different priors can impact the conclusions we come to. I designed Lessons 2 and 3 based on that feedback. In Lesson 2, you are going to learn how to:

Specify strongly and weakly informative priors and see how they may influence your results
Fit a Bayesian generalized linear model (GLM), for situations where you can’t assume normally distributed error in your model
Fit a Bayesian generalized linear mixed model (GLMM), to accommodate non-normal data and a mixture of fixed and random effects

You will learn some more practical skills for fitting Bayesian models and talking about the results in papers or presentations. We’ve seen these packages already in Lesson 1 but we will keep practicing them. In this lesson we will:

Use the emmeans package to make predictions from Bayesian models
Use the bayestestR package for Bayesian hypothesis testing
Use the tidybayes, ggdist, and ggplot2 packages to make nice plots of Bayesian model predictions
Use the gt package to make nice tables of model predictions

Load the packages

Load all the packages and set a plotting theme. Also define a color scale that is colorblind-friendly. Create a directory called fits in your current working directory to store fitted model objects, if it does not already exist.

library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)

theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))

if (!dir.exists('fits')) dir.create('fits')

Set brms options. The option mc.cores = 4 means we will use four processor cores in parallel to fit the models, meaning we can allow four Markov chains to sample simultaneously. The options brms.file_refit = 'on_change' means that if a model has already been fit and saved to a file, and the code and data are unchanged, the model will be reloaded from the file instead of having to fit it again. If you are running this lesson on SCINet or the cloud server, there are some pre-fit model objects to save time during the workshop.

options(mc.cores = 4, brms.file_refit = 'on_change')

If you are running this code on SCINet or the cloud server, set these options instead of the ones above. This will use the faster and less memory-intensive cmdstanr package to fit the models behind the scenes. Unfortunately, the CmdStan software that cmdstanr uses is difficult to install on USDA laptops, but it can be done. Check out instructions for installing CmdStan here, and if those don’t work, instructions for a workaround to install CmdStan using Anaconda.

options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr')

Review: influence of priors

Before we dive into more complex models, let’s review Bayesian statistics a little bit. Don’t feel bad if it takes you a few times to really grasp the concepts. Most people need a lot of repetition before these ideas become second nature.

Back to Bayes-ics

Let’s start by going over Bayes’ theorem again.

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

This means the probability of A being true given that B is true (\(P(A|B)\)) is equal to the probability that B is true given that A is true (\(P(B|A)\)) times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\)). Using Bayes’ theorem in the context of fitting a statistical model to data, we can state this as:

\[P(model|data) = \frac {P(data|model)P(model)}{P(data)}\]

\[posterior \propto likelihood \times prior\]

What does this mean? It means that we have some prior knowledge about the world. We formally state that knowledge by giving a statistical distribution of what we think the parameters of the statistical model describing the world are without having collected any data yet. This is called the prior, which in Bayes’ theorem is represented by \(P(model)\) because it’s the probability of the model not dependent on the data. Then, we go out and collect some data. We use the data to calculate \(P(data|model)\) which we call the likelihood. That’s a function that just depends on the data. It represents how likely it was to observe the data we did, given all possible combinations of parameters we could have in our model. We have to divide the likelihood by the marginal probability or \(P(data)\) which is basically just some math to make the likelihood function have an area under the curve of 1. (In a probability distribution, adding up the probabilities of all possible mutually exclusive outcomes together has to result in a sum of 1, or 100% probability.) Then, when we multiply the prior and likelihood together, we get \(P(model|data)\) which we call the posterior. That is our new knowledge of the world, after collecting the data, represented by a new statistical distribution for our parameter estimates. Because it is the product of the prior and the likelihood, it represents a combination of what we knew before we got the data (the prior), updated by the new information we got from the data (the likelihood).

Priors are not a bad thing

Likelihood is the basis of both the classical statistical models that you are familiar with, and of Bayesian models. What makes Bayesian models unique is the use of prior information. But that’s also what scares a lot of people. They are worried that using Bayesian models is like putting your thumb on the scale. The priors are subjective assumptions, they say. You can get whatever results you want because you can use any priors that you want, they say. It’s understandable to feel that way. Here is how I respond to those arguments:

Prior distributions are not the only assumption made when fitting a statistical model. The likelihood part of the model also involves assumptions. For example, a general linear model, Bayesian or not, assumes that the relationship between x and y is linear, and the deviations from that relationship have a normal distribution, which is at best a rough approximation of what is going on in the real world. So, if you refuse to make any assumptions, you won’t be able to fit any statistical model at all, because they all involve subjective assumptions.
If you have a large dataset, the prior distribution has little influence on the results. In those cases, the practical purpose of the prior distribution is to make the computations needed to fit the statistical model a little bit easier.
You should always check how sensitive the results of your analysis are to the assumptions you make. This is especially true for prior distributions. If the results don’t change as a result of changing the prior, then say so. If the results do change as a result of changing the prior, then present that and allow the reader to decide. The worst thing would be to know that your results are sensitive to choice of prior, but present results from a single model with a single prior distribution as if they are the only “true” results.

How the posterior is influenced by the prior and the data

Here is an illustration of how having more data increases the influence of the likelihood relative to the prior. We want to know the probability of a new vaccine successfully granting immunity to a disease.

Image (c) Washington State University

Let’s say the baseline immunity rate is 50%. We will pessimistically assume that the vaccine is ineffective and so we will assume a prior distribution with the most probable outcome being 50%, but allowing for outcomes both above and below that. Here are three scenarios:

We collect data on 10 vaccinated animals that are challenged with a virus, and find that 7 are immune and 3 are not.
We collect data on 50 vaccinated and challenged animals, and find that 35 are immune and 15 are not.
We collect data on 100 vaccinated and challenged animals, and find that 70 are immune and 30 are not.

In all three of these scenarios, the observed immunity rate is 70%. Our prior distribution is the same in each case. Here’s what the posterior distributions look like:

plots showing influence of likelihood on posterior

As the amount of data goes up, the evidence that the true immunity rate of vaccinated animals is 70% becomes stronger. Thus, the influence of our prior distribution centered at 50% keeps getting weaker and weaker, and the influence of the likelihood (informed by the data) keeps getting stronger. The more data you have, the more evidence you have, and the more the likelihood pulls the posterior towards itself and away from the prior. This feels intuitive, at least to me.

Hands-on example of prior distribution influence

I just said above that if you have lots of data, you don’t have to worry that much about the influence of the prior. But what if you have a very small dataset? Let’s look at an extreme case where we only have a very small amount of data on a controversial topic where people may have strong opinions about what the true effect “should” be.

Meet the example data

In the agridat package, there is an example dataset called gathmann.bt. It includes abundance metrics of non-target arthropod species on a line of transgenic Bt maize and on a control line. You can type ?gathmann.bt in your console to pull up documentation for the dataset, including a reference to the original source. When Bt corn was introduced, some people were concerned that it would have bigger effects on non-target species than conventional insecticides. But others argued that it actually has smaller impacts.

Image (c) Science 2.0

Well, let’s get some data and resolve the debate! Unfortunately this dataset only has a very small sample size. We have \(n=8\) measurements of Thysanoptera abundance on Bt maize, and \(n=8\) measurements on the control line. Looking at the distribution of the raw data, it looks like the Bt line tends to have higher abundance.

data(gathmann.bt)

ggplot(gathmann.bt, aes(x = gen, y = thysan)) + geom_boxplot()

boxplot of Bt corn raw data

Edit the dataset so that the control line, ISO, is treated as the reference level in the model. Now if we get a positive coefficient it will mean that Bt has higher Thysanoptera abundance than control.

gathmann.bt <- mutate(gathmann.bt, gen = relevel(gen, ref = 'ISO'))

Frequentist analysis

We want to compare the means of the two groups to each other. Let’s start by doing a frequentist analysis: a t-test.

t.test(formula = thysan ~ gen, data = gathmann.bt)

## 
##  Welch Two Sample t-test
## 
## data:  thysan by gen
## t = -3.2182, df = 13.834, p-value = 0.006274
## alternative hypothesis: true difference in means between group ISO and group Bt is not equal to 0
## 95 percent confidence interval:
##  -9.25298 -1.84702
## sample estimates:
## mean in group ISO  mean in group Bt 
##             8.725            14.275

We get a p-value of 0.006 for a two-sided t-test, which we would usually say is enough evidence to reject the null hypothesis that the means are equal. The mean in the Bt group is higher, giving us significant evidence that Bt maize has higher abundance of at least one species of non-target arthropod than control maize. As always, the frequentist analysis only considers the likelihood, which is calculated from the data. It does not formally incorporate any prior knowledge we may have.

Bayesian analyses with different priors

How would we do this with a Bayesian analysis? Let’s fit the model with a few different priors. I don’t really know what the units of the data are here, but they range from about 4 to 18. So we can say that anything that changes mean Thysanoptera abundance by +5 or -5 is a big effect.

I fit this model with no less than seven different priors. The default “flat” prior is completely uninformative. It assigns equal prior probability to any difference, including 0, +1, -1000, +1000000, you name it. This is not a realistic prior for most situations because we usually have at least some idea of the order of magnitude of the effect size, even if we have no idea if it will be positive or negative. Here are the other six prior distributions and what belief they might represent:

prior distribution	belief
normal(0,5)	We don't know if Bt has a positive or negative effect and we think there's a moderate probability of a large effect in either direction
normal(0,1)	We don't know if Bt has a positive or negative effect and we think large effects in either direction are very improbable
normal(5,5)	We think Bt has a positive effect but we're very uncertain
normal(5,1)	We think Bt has a positive effect and we're very certain of that
normal(-5,5)	We think Bt has a negative effect but we're very uncertain
normal(-5,1)	We think Bt has a negative effect and we're very certain of that

Here’s the code to fit the models.

Note: In this code we do not specify number of Markov chains, warmup iterations per chain, or total iterations including warmup and sampling. This means it will default to 4 chains, 1000 warmup iterations, and 2000 total iterations (1000 post-warmup sampling iterations) per chain. In practice for more complex models you may need to specify higher numbers of warmup iterations to aid convergence, and higher numbers of sampling iterations to accurately describe the shape of the posterior, especially its tails.

# Flat prior
m1gathmann <- brm(thysan ~ gen, data = gathmann.bt, 
                  seed = 838, file = 'fits/m1gathmann')
# Prior saying that extreme differences between genotypes are unlikely
m2gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(0, 5), class = b),
                  seed = 838, file = 'fits/m2gathmann')
# Prior with even stricter regularization
m3gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(0, 1), class = b),
                  seed = 838, file = 'fits/m3gathmann')
# Prior if we think that Bt is more likely to have a positive impact on non-target species but we are uncertain
m4gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(5, 5), class = b),
                  seed = 838, file = 'fits/m4gathmann')
# Prior if we think that Bt is more likely to have a positive impact on non-target species and we have strong prior evidence for that
m5gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(5, 1), class = b),
                  seed = 838, file = 'fits/m5gathmann')
# Prior if we think that Bt is more likely to have a negative impact on non-target species but we are uncertain
m6gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(-5, 5), class = b),
                  seed = 838, file = 'fits/m6gathmann')
# Prior if we think that Bt is more likely to have a negative impact on non-target species and we have strong prior evidence for that
m7gathmann <- brm(thysan ~ gen, data = gathmann.bt,
                  prior = prior(normal(-5, 1), class = b),
                  seed = 838, file = 'fits/m7gathmann')

Comparison of posterior estimates

We can pull out the posterior estimates for the difference between the means, and plot the median of the posterior with 95% credible interval.

get_post <- function(fit) {
  fixedeff <- summary(fit)$fixed
  setNames(fixedeff['genBt', c('Estimate','l-95% CI', 'u-95% CI')], c('median', 'lower95', 'upper95'))
}

gathmann_post_estimates <- cbind(
  prior = c('flat', 'normal(0,5)', 'normal(0,1)', 'normal(5,5)', 'normal(5,1)', 'normal(-5,5)', 'normal(-5,1)'),
  map_dfr(list(m1gathmann, m2gathmann, m3gathmann, m4gathmann, m5gathmann, m6gathmann, m7gathmann), get_post)
) %>%
  mutate(prior = factor(prior, levels = unique(prior)))

ggplot(gathmann_post_estimates, aes(x = prior, y = median, ymin = lower95, ymax = upper95)) +
  geom_pointrange() +
  geom_hline(yintercept = 0, linetype = 'dashed', linewidth = 1) +
  labs(y = 'effect of Bt on Thysanoptera abundance')

plot showing posterior estimates from Bt corn models

These are interesting results. We see that assuming a normal distribution on the prior with a standard deviation of 5 leads to very similar inference as a completely flat prior. This is true whether we set the mean of the normal to 0, +5, or -5. The informative prior \(\text{Normal}(5, 1)\) is similar but with a narrower uncertainty interval. However, if we set an informative prior with mean 0, \(\text{Normal}(0, 1)\), we are effectively saying that we believe the probability of Bt corn having either a strong positive or a strong negative effect is low a priori. This is reflected in our posterior. The point estimate is much closer to 0, and the 95% credible interval covers a lot of territory on either side of 0. This is even more of an issue with \(\text{Normal}(-5, 1)\) where the point estimate is negative and the 95% credible interval is completely negative.

So that we can roughly compare our results to the frequentist analysis, let’s calculate a Bayesian analogue of a p-value for each of these models. This is the “maximum a posteriori” or MAP p-value that we discussed in Lesson 1.

gathmann_pvalues <- data.frame(
  prior = c('flat', 'normal(0,5)', 'normal(0,1)', 'normal(5,5)', 'normal(5,1)', 'normal(-5,5)', 'normal(-5,1)'),
  pMAP = c(p_map(m1gathmann)$p_MAP[2], p_map(m2gathmann)$p_MAP[2], p_map(m3gathmann)$p_MAP[2], p_map(m4gathmann)$p_MAP[2], p_map(m5gathmann)$p_MAP[2], p_map(m6gathmann)$p_MAP[2], p_map(m7gathmann)$p_MAP[2])
)

prior	p_MAP
flat	0.027
normal(0,5)	0.029
normal(0,1)	0.565
normal(5,5)	0.004
normal(5,1)	< 0.001
normal(-5,5)	0.095
normal(-5,1)	< 0.001

To me, the conclusion here is that the prior distribution can influence our inference if it is a very narrow distribution, especially in this fairly extreme case where we are trying to draw big conclusions from a tiny dataset. Many people would say that \(\text{Normal}(0, 1)\) is an uninformative prior because it is agnostic about whether Bt will have a positive or negative effect. It is saying there’s a fifty-fifty chance that Bt will have a positive impact. However, because it is such a narrow distribution, it is not giving very much probability to either extreme negative or extreme positive effects of Bt. So in fact, it is actually fairly informative! Now, this isn’t to say that the inference we get from the model with the narrow prior is wrong. Maybe \(n=8\) is not enough to make much of a conclusion, and a skeptical prior with low probability of extreme effects is actually a good thing. Just because a Bayesian result does not match a frequentist result does not mean it is wrong. But to revisit another point made above, we should be wary of presenting one of the above results as if it is the only correct one. The ideal thing would be to document the influence of the prior and let the reader decide.

Generalized linear mixed models

I hope the exercise on priors has gotten you back into the swing of things, thinking like a Bayesian. So far, in all of Lesson 1 and for the beginning of Lesson 2, we have only fit fairly simple models. For those situations, Bayesian and frequentist methods work equally well. Now we are going to start to explore some models where Bayesian methods really shine in comparison to frequentist methods.

Review: general linear mixed models

So far, we’ve covered general linear mixed models. Whether we’re in Bayesian-land or not, those models have the following basic form.

\[y \sim \text{Normal}(Xb + Zu, \sigma), u \sim \text{Normal}(0, \sigma_u)\]

Hey, stop talking like a statistician! What does that actually mean? Basically, the response (y) is a function of fixed effects (X is the predictors and b are the fixed effect parameters) + random effects (Z is a model matrix and u are the random effect parameters) with normally distributed noise around it with standard deviation \(\sigma\). We also model the random effects as being normally distributed, with a mean of zero and a standard deviation \(\sigma_u\). In other words some of the random effects are positive, and some are negative, a few are far from zero but most are close to zero, and the same goes for the residuals. This model is an amazingly powerful tool and we should all be thankful to the statisticians that developed it.

Non-normal data: what to do about it?

But not all datasets fit neatly into this little box. A lot of y variables that we might meet out in the wild can’t be “forced” to have normally distributed residuals, no matter what parameters you estimate for the fixed and random effects. For example, what if y is a binary variable with only 0 and 1 values? Or what if it is a highly skewed concentration variable with many values at or near zero, and a few high positive values? In that case, you are not going to be able to get residuals that are normally distributed, and centered at zero, with about half of them positive and half of them negative.

For example, if you have positive skewed concentration data, the residuals are going to be positive skewed as well. The high positive values in the “tail” of the data are going to have high positive residuals too. Or for the binary data, the residuals are going to tend to have two peaks, clustering around 0 and 1. That doesn’t seem very normal to me.

Another issue is that the predictions made from a general linear mixed model fit to data that don’t meet the normally distributed residuals assumption often don’t make sense. For example if you predict from the model fit to the binary data, you will get predicted probabilities greater than 1 or less than 0. You will get negative values if you predict from the model fit to the skewed data. If your goal is prediction, and not just inference, this is a problem. Even if you only care about inference, it’s still a bad sign if your model makes bad predictions … because it probably doesn’t fit the data well … which you need to make good inferences.

By the way you may notice I am only considering the assumption of normally distributed residuals, not questioning the assumption of normally distributed random effects. That’s because it’s a lot more challenging to relax the assumption of normally distributed random effects and it is rarely done in practice.

Transforming the data before model fitting

The “old school” way of dealing with this issue was to transform the data. Sometimes people would try dozens of transformations, like alchemists trying all sorts of mixtures to see if they could make lead into gold. This is still something that people do routinely. For the most part, I avoid doing this. The one data transformation that I still commonly use is the log transformation, because many processes and phenomena in agricultural science make sense to compare on a ratio scale, and the difference between the logs of two variables is the same as the ratio of those variables on their original scale. So we don’t have to throw log transformations out just yet. But transformations like arcsine-square-root definitely belong in the past! Why is that? Because of something called generalized linear models!

Generalized linear mixed models: an introduction

Generalized linear mixed models look like this:

\[g(y) \sim \text{Distribution}(Xb + Zu | \theta), u \sim \text{Normal}(0, \sigma_u)\]

Looks just like the general linear mixed model above, with two differences. First is that the distribution doesn’t have to be normal. It can be any of a wide variety of distributions which may have different parameters (represented by \(\theta\)). The second is that instead of modeling y as a function of fixed effects, random effects, and residuals, we are now modeling g(y). What’s g()? It is a link function. That’s a function that gives us the relationship between the distribution function and the predicted value. Basically, it converts variables from the statistical distribution back into the scale of the data.

I won’t go that much more into detail here because this is more of an applied lesson to show you how to fit the models. But I would strongly encourage you to read about the theory behind GLMMs. There are quite a few non-technical and accessible texts on the subject. See my statistical learning resources page.

Generalized linear model example: Orobanche seed germination

A generalized linear mixed model is just what it sounds like: a model that’s generalized (allows for response distributions that aren’t normal), linear (Y is modeled as a bunch of effects added together) and mixed (it includes both fixed and random effects). Let’s build it up little by little. First we’ll fit a model with two of the three: a generalized linear model. It has a non-normal distribution but for now we will only deal with fixed effects and not random effects.

Meet the example data

In this lesson, we are going to use example data from the R package agridat, just as we did in Lesson 1.

Image (c) Wikimedia Commons, user Eitan_f

It is a very simple dataset published by Crowder in 1978. There’s a parasitic plant called Orobanche aegyptiaca that can’t photosynthesize on its own. It leeches energy from the roots of other plants. Because it can’t survive without a host plant, it needs to wait until a suitable host is nearby to germinate. The seed can detect compounds given off by the host plant in the soil, which stimulate it to germinate.
You can type ?crowder.seeds in the console to pull up the documentation for the dataset.

The dataset comes from a 2×2 factorial experiment. There were two Orobanche genotypes (called 'O73' and 'O75' in the dataset) that were each treated with two different extracts: bean extract and cucumber extract. The experimenter set up a lot of plates; each plate had a variable number of seeds from one of the treatment combinations. The plates were laid out in a completely randomized design, so there is no blocking factor to account for. So no random effects are needed (at least for now).

For each plate we have a value for n, the total number of seeds on the plate, and germ, the number of seeds that germinated. Let’s compute a column for the proportion of seeds that germinated and call it p. Then let’s plot the distribution of p across the genotype by extract combinations.

data(crowder.seeds)

crowder.seeds <- mutate(crowder.seeds, p = germ/n)

ggplot(crowder.seeds, aes(x = gen, y = p, color = extract)) +
  geom_boxplot() + colorblind_friendly

boxplot of Orobanche germination raw data

It seems like cucumber extract promotes germination more than bean extract. It looks like there is an interaction because the difference between bean and cucumber is greater for one genotype than the other.

Fit a linear model to the proportions

What many people do in this case is fit a linear model directly to the proportion of germinated seeds (I see ARS scientists doing this all the time). This is often okay because if you have proportions anywhere between 0.05 and 0.95, the assumption of normally distributed error works out pretty well. Let’s try to do that here. We will put a sensible normal prior on the fixed effects. Because the response variable varies between 0 and 1, a normal prior on a categorical predictor’s coefficient with standard deviation 1 will encompass any possible effect size. Again we’re using all defaults for Monte Carlo sampling options.

crowder_lm <- brm(p ~ gen * extract, 
                  prior = c(
                    prior(normal(0, 1), class = b)
                  ),
                  data = crowder.seeds, 
                  seed = 111, file = 'fits/crowder_lm')

We’ll look at this model’s predictions in a moment.

Fit a generalized linear model to the binary outcomes

Let’s also take a look at the generalized linear model approach. Here, our outcome is binary (a seed either germinates or it doesn’t). So, instead of the response variable being the proportion of germination, it needs to be specified with two parts: the number of successes (germinated seeds in each dish) and the total number of trials (all seeds in each dish including the ones that didn’t germinate). In brms we write this germ | trials(n).

The other new thing about the generalized linear model is the family argument, which takes the form family = <distribution function>(link = '<link function>'). This tells brm() what the response distribution is, and what the link function will be that determines the relationship between the observed data scale and the model prediction scale. Here we are going to use family = binomial(link = 'logit'). Binomial is a statistical distribution that has two possible values, 0 or 1. But what is that “logit” thing?

The logit link function

Logit is the link function. Why do we need it? Well, the linear predictor in a linear model is the sum of the fixed effects (and random effects if present). Because it’s created by adding and multiplying positive and negative numbers together, it can take any value, positive or negative. So far, that has not been a problem for us because we’ve been assuming that the error distribution is normal. Normal distributions can also take any value, positive or negative. So, the linear predictor and the modeled response have matching scales. But now we are trying to model the probability of germination, which can only be between 0 and 1, using a linear predictor that can take any value, positive or negative. Those two things do not match. The link function is used to map a probability value between 0 and 1 to a scale that includes all positive and negative values. Logit is another word for the log-odds function, defined as

\[\text{logit}~p = \log \frac {p}{1-p}\]

Here is a graph of that function:

graph of logit function

You can see that it maps the values ranging from 0 to 1 to values ranging from negative to positive infinity. It is more or less a straight line when the probability is between about 0.25 and 0.75, but starts to get really steep as you get close to the boundaries. This makes sense because if you have an event that has an extremely low (or high) chance of happening, a little increase (or decrease) to the probability will be a relatively bigger difference.

GLM code

Putting it all together, here’s the code for the generalized linear model. As mentioned above, the differences are that we specify the number of successes out of the total number of trials as the response variable (germ | trials(n)), and the response distribution and link function as family = binomial(link = 'logit'). A normal(0, 5) prior is sensible for fixed effects on the log odds scale. Again we will use all default model fitting options.

crowder_glm <- brm(germ | trials(n) ~ gen * extract, 
                   family = binomial(link = 'logit'),
                   prior = c(
                     prior(normal(0, 5), class = b)
                   ),
                   data = crowder.seeds, 
                   seed = 112, file = 'fits/crowder_glm')

Comparing LM and GLM

Here are the posterior predictive check plots for each of these models. What do you notice about them?

pp_check(crowder_lm)

LM and GLM posterior predictive check plots

pp_check(crowder_glm)

LM and GLM posterior predictive check plots

Both posterior predictive check plots look reasonably good. The posterior draws from the model (thin blue lines) have a similar shape to the distribution of the observed data (thick black line). The two plots are different because the first model is modeling a proportion, so it should have values between 0 and 1, while the second model is modeling the number of germinated seeds, which is a positive whole number.

One thing to note is that the model predictions from the first model go below 0 and above 1. That’s because the general linear model assumes normally distributed error, so there is nothing stopping it from making nonsensical predictions of a probability of germination that is either negative or greater than 100%. The generalized linear model doesn’t have this issue.

Estimating marginal means

Now calculate the estimated marginal means for the linear model. Here the syntax pairwise ~ extract | gen means within each level of genotype (gen), estimate the predicted mean for each level of extract and take the pairwise contrast between them.

emm_crowder_lm <- emmeans(crowder_lm, pairwise ~ extract | gen)

emm_crowder_lm$emmeans

## gen = O73:
##  extract  emmean lower.HPD upper.HPD
##  bean      0.324     0.196     0.463
##  cucumber  0.466     0.321     0.601
## 
## gen = O75:
##  extract  emmean lower.HPD upper.HPD
##  bean      0.372     0.243     0.524
##  cucumber  0.715     0.602     0.840
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95

emm_crowder_lm$contrasts

## gen = O73:
##  contrast        estimate lower.HPD upper.HPD
##  bean - cucumber    -0.14    -0.330    0.0526
## 
## gen = O75:
##  contrast        estimate lower.HPD upper.HPD
##  bean - cucumber    -0.34    -0.513   -0.1467
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95

You can see that the pairwise contrasts are taken by subtracting one modeled proportion from the other. What do we get when we estimate the marginal means in the same way for the generalized linear model?

emm_crowder_glm <- emmeans(crowder_glm, pairwise ~ extract | gen)

emm_crowder_glm$emmeans

## gen = O73:
##  extract  emmean lower.HPD upper.HPD
##  bean     -0.418    -0.769    -0.056
##  cucumber  0.132    -0.189     0.454
## 
## gen = O75:
##  extract  emmean lower.HPD upper.HPD
##  bean     -0.560    -0.808    -0.323
##  cucumber  0.760     0.510     1.003
## 
## Point estimate displayed: median 
## Results are given on the logit (not the response) scale. 
## HPD interval probability: 0.95

emm_crowder_glm$contrasts

## gen = O73:
##  contrast        estimate lower.HPD upper.HPD
##  bean - cucumber   -0.553     -1.03   -0.0764
## 
## gen = O75:
##  contrast        estimate lower.HPD upper.HPD
##  bean - cucumber   -1.319     -1.68   -0.9878
## 
## Point estimate displayed: median 
## Results are given on the log odds ratio (not the response) scale. 
## HPD interval probability: 0.95

You can see that the estimated marginal means are given on the logit scale, and the contrasts on the log odds ratio scale. This is sort of hard to interpret. So you can use the argument type = 'response' which transforms the results back to the data scale.

The logit link function in R is qlogis() and its inverse is plogis().

emmresponse_crowder_glm <- emmeans(crowder_glm, pairwise ~ extract | gen, type = 'response')

emmresponse_crowder_glm$emmeans

## gen = O73:
##  extract   prob lower.HPD upper.HPD
##  bean     0.397     0.317     0.486
##  cucumber 0.533     0.453     0.612
## 
## gen = O75:
##  extract   prob lower.HPD upper.HPD
##  bean     0.363     0.308     0.419
##  cucumber 0.681     0.625     0.732
## 
## Point estimate displayed: median 
## Results are back-transformed from the logit scale 
## HPD interval probability: 0.95

emmresponse_crowder_glm$contrasts

## gen = O73:
##  contrast        odds.ratio lower.HPD upper.HPD
##  bean / cucumber      0.575     0.327     0.878
## 
## gen = O75:
##  contrast        odds.ratio lower.HPD upper.HPD
##  bean / cucumber      0.267     0.180     0.365
## 
## Point estimate displayed: median 
## Results are back-transformed from the log odds ratio scale 
## HPD interval probability: 0.95

Now, the estimated marginal means and their credible intervals are given as probabilities. The contrasts are now odds ratios. This is the ratio of the odds of germination between the two levels of extract. An odds ratio of 1 indicates no difference between the odds. Because the ratio is bean / cucumber and the values are <1, we have evidence that bean extract is less effective at stimulating germination than cucumber extract in both genotypes.

Comparing inference between LM and GLM

Let’s try to make an inference about the difference between the means. The function describe_posterior() is a workhorse from the bayestestR package. The posteriors we want to describe here are the posterior distributions of the differences between the means of the extract treatments within each genotype for the general linear model, and the odds ratios of the means for the generalized linear model. We have 1000 sampling iterations from each of 4 chains, thus for each difference we have 4000 draws from its posterior distribution.

By default describe_posterior() gives us the median of the 4000 values as the point estimate to characterize the distribution, and we specify that we want an equal-tailed credible interval using ci_method = 'eti'. The default width is 95% so this will give us the 2.5% and 97.5% quantiles of the 4000 values. We also specify that we want two different null hypothesis tests (described in more detail back in Lesson 1). The p_map is the ratio of the probability of the null value to the most probable posterior value (point estimate); the lower, the more evidence in favor of a non-null difference. The pd is the percent of the posterior distribution that has the same direction (sign) as the point estimate; the closer to 100%, the more evidence in favor of a non-null difference.

Notice that we have to specify that the null value for the odds ratio contrasts is 1, because a ratio of 1 indicates no difference. The default, null = 0, is nonsensical because you cannot have a ratio of 0.

describe_posterior(emm_crowder_lm$contrasts, ci_method = 'eti', test = c('pd', 'p_map'))

## Summary of Posterior Distribution
## 
## contrast        | gen | Median |         95% CI | p (MAP) |     pd
## ------------------------------------------------------------------
## bean - cucumber | O73 |  -0.14 | [-0.33,  0.05] |  0.334  | 93.17%
## bean - cucumber | O75 |  -0.34 | [-0.53, -0.16] |  < .001 |   100%

describe_posterior(emmresponse_crowder_glm$contrasts, ci_method = 'eti', test = c('pd', 'p_map'), null = 1)

## Summary of Posterior Distribution
## 
## contrast        | gen | Median |       95% CI | p (MAP) |     pd
## ----------------------------------------------------------------
## bean / cucumber | O73 |   0.58 | [0.36, 0.93] |  0.052  | 98.88%
## bean / cucumber | O75 |   0.27 | [0.19, 0.38] |  < .001 |   100%

You can see that the generalized linear model provides stronger evidence in favor of differences. You could say it has “higher power.” This is because it is taking into account the greater precision we can get from having many partially independent binary outcomes (seeds) in each dish. In contrast, the plain old linear model is ignoring that. If your response variable is a proportion, then a dish with 1000 seeds where 200 germinated (20%) will provide as much evidence as a dish with 10 seeds where 2 germinated. But in the generalized linear model, the dish with more seeds will, understandably, contribute more evidence.

Assessing evidence for interaction between treatments

What if we wanted to make a statistical inference about the interaction between extract and genotype? Is the difference between the response to bean and cucumber extract different between the two genotypes? We can call the contrast() function and specify two levels of interaction contrast. This will first contrast the extracts within each genotype, then it will contrast those two contrasts with each other.

emmeans(crowder_glm, ~ extract + gen, type = 'response') %>%
  contrast(interaction = c('revpairwise', 'pairwise')) %>%
  describe_posterior(ci_method = 'eti', test = c('pd', 'p_map'))

## Summary of Posterior Distribution
## 
## extract_revpairwise | gen_pairwise | Median |       95% CI | p (MAP) |   pd
## ---------------------------------------------------------------------------
## cucumber / bean     |    O73 / O75 |   0.46 | [0.26, 0.83] |  < .001 | 100%

The ratio of the odds ratios is 0.46, indicating that our point estimate of the difference between extract responses in O73 is about half as big as for O75. There’s a 95% chance that the true ratio is between 0.25 and 0.85, and the evidence is in favor of a non-null ratio.

Presentation ideas

Now, because this is an applied stats lesson, in addition to showing you what statistical model is appropriate and how to fit it, I also want to give you the tools to go through all the steps of the “data to doc pipeline.” How would you report the results of this analysis in a manuscript? Let’s go ahead and create:

A description of the statistical model for the methods section
A verbal description of the results
A figure
A table

Methods section

In your methods section, a description of the statistical model (excluding citations for methods and software) might be:

We fit a generalized linear model that modeled the germination probability for each dish as a function of extract type, genotype, and their interaction. The model assumed a binomial response distribution and logit link function. The model was fit with Hamiltonian Monte Carlo. Four chains were run, each with 1000 warmup iterations and 1000 sampling iterations. We verified model convergence by ensuring the R-hat statistics for all parameters were below 1.01. We assessed model fit by examining the posterior predictive check plot. We obtained posterior estimates of the marginal mean for each combination of extract and genotype, and of the odds ratio between each extract type within each genotype. We calculated the median and 95% equal-tailed credible interval for each estimated mean and odds ratio. To assess evidence for differences between extracts, we calculated the maximum a posteriori p-value (p_MAP) and probability of direction (p_d).

Results section

In the text part of your results section, you might say:

The modeled probability of germination was greater when seeds were treated with cucumber extract versus bean extract for both the O73 genotype (bean extract: posterior median 0.396, 95% CI [0.316, 0.478]; cucumber extract: 0.531 [0.458, 0.614]) and the O75 genotype (bean extract: 0.364 [0.305, 0.421]; cucumber extract: 0.681 [0.629, 0.734]). The odds ratio between cucumber and bean extract was smaller for the O73 genotype than the O75 genotype (ratio 0.46 [0.25, 0.83], p_MAP < 0.001), indicating a greater relative effect of cucumber extract on germination probability in the O75 genotype.

Figure

A figure might show the median and several different credible intervals of the distributions of posterior means for each treatment combination, to give a fuller picture of the distribution. This requires using the tidybayes package to pull out the individual posterior draws from the emmeans object with gather_emmeans_draws() and then use stat_interval() to simultaneously compute our desired credible intervals and plot them. Notice we also need to transform the means from the logit scale to the probability scale using mutate(.value = plogis(.value)) because when we pull out the values from the emmeans object, the back-transformation information is lost.

gather_emmeans_draws(emmresponse_crowder_glm$emmeans) %>%
  mutate(p = plogis(.value)) %>%
  ggplot(aes(x = gen, y = p, group = interaction(gen, extract))) +
    stat_interval(aes(color = extract, color_ramp = after_stat(level)), .width = c(0.5, 0.8, 0.95), position = position_dodge(width = .3)) +
    stat_summary(fun = 'median', geom = 'point', size = 2, position = position_dodge(width = .3)) +
    colorblind_friendly +
    labs(x = 'Genotype', y = 'Modeled germination probability', color_ramp = 'CI width', color = 'Extract')

Manuscript-quality figure with germination model results

Table

Finally, a table could present the medians (95% CI) of the means and the odds ratios for the two extracts within each genotype, which requires a little rearranging to get in a compact wide format. I won’t go into too much detail on the code. Focus on the median_qi() function, also from tidybayes. It summarizes posterior draws with a median point estimate and a quantile credible interval of a given width. Also notice that we need to back-transform the means with plogis() to go from logit scale to probability scale, and back-transform the contrasts with exp() to go from log odds ratio scale to odds ratio scale.

means_summary <- gather_emmeans_draws(emmresponse_crowder_glm$emmeans) %>%
  mutate(.value = plogis(.value)) %>%
  median_qi(.width = 0.95) %>%
  pivot_wider(id_cols = gen, names_from = extract, values_from = c(.value, .lower, .upper))

contrasts_summary <- gather_emmeans_draws(emmresponse_crowder_glm$contrasts) %>%
  mutate(.value = exp(.value)) %>%
  median_qi(.width = 0.95)

left_join(means_summary, contrasts_summary) %>%
  select(-c(contrast, .width, .point, .interval)) %>%
  gt(caption = 'Germination probability') %>%
  fmt_number(decimals = 3) %>%
  cols_merge(c(.value_bean, .lower_bean, .upper_bean), pattern = '{1} [{2}, {3}]') %>%
  cols_merge(c(.value_cucumber, .lower_cucumber, .upper_cucumber), pattern = '{1} [{2}, {3}]') %>%
  cols_merge(c(.value, .lower, .upper), pattern = '{1} [{2}, {3}]') %>%
  cols_label(gen = 'Genotype', .value_bean = 'Bean extract', .value_cucumber = 'Cucumber extract', .value = 'Bean:cucumber odds ratio')

Germination probability
Genotype	Bean extract	Cucumber extract	Bean:cucumber odds ratio
O73	0.397 [0.313, 0.484]	0.533 [0.453, 0.612]	0.575 [0.357, 0.930]
O75	0.363 [0.309, 0.421]	0.681 [0.626, 0.733]	0.267 [0.188, 0.381]

Generalized linear mixed model example: turnip yield

In Lesson 1 we fit a linear mixed model (LMM), which allowed for a mix of fixed treatment effects and random block effects, but assumed a normal response distribution. And just now we fit a generalized linear model (GLM) which allowed for a non-normal response distribution but did not include random effects. Now let’s take things to the next level with a GLMM. This powerful class of models can accommodate a wide variety of distributions as well as random effects. When it comes to estimating the parameters of GLMMs, Bayesian models really can’t be beat. Classical frequentist estimation algorithms often fail on datasets where the Bayesian approach can effortlessly get you an answer.

The model syntax for fitting GLMMs in brms won’t provide any big surprises for you. You have already seen how to specify a response family and link function, and how to include random terms on the right-hand side of the model formula. You’ll need to do both those things here.

Meet the example data

Image (c) thebittenword.com

We are going to use another example dataset from the agridat package, mcconway.turnip, the results of an experiment on turnips originally published by McConway et al. This dataset does not have a binary outcome, it has a continuous outcome (yield). This variable has a distribution that’s extremely common in agricultural datasets. Let’s take a look.

data(mcconway.turnip)
ggplot(mcconway.turnip, aes(x = yield)) + geom_histogram()

histogram of turnip yield raw data

The distribution is far from symmetrical. We observe that yield cannot be a negative value; it has to be zero or greater. We see that there are many low values of yield, peaking close to zero, with a long tail of high values stretching up almost to 25. This kind of “positive right-skewed” data is all over the place in ag science. Here we will look at what kind of response family is appropriate in a GLMM fit to this kind of data.

Let’s also look at what kind of an experimental design was used here. It is a 2×2×4 factorial design experiment, with three factors: genotype (gen, either "Barkant" or "Marco"), planting date (date, either "21Aug1990" or "28Aug1990"), and planting density (density, 1, 2, 4, or 8 kg/ha). All 16 combinations are represented in the data. It is a randomized complete block design with four blocks. Because the blocks are complete, each block has 16 plots, one for each treatment combination. A single measurement of yield was taken from each plot.

We will treat density as a discrete variable with four levels, instead of worrying about whether the effect of planting density on yield is linear.

mcconway.turnip <- mutate(mcconway.turnip, density = factor(density))

Make a boxplot of the yield values in each treatment combination. Split the genotypes into separate panels using facet_wrap(), arrange the boxplots in increasing order of density on the x-axis, and show the planting dates for each density with a different color box.

ggplot(mcconway.turnip, aes(x = density, y = yield, group = interaction(density, date), color = date)) +
  facet_wrap(~ gen) +
  geom_boxplot() +
  colorblind_friendly +
  labs(x = 'Planting density (kg/ha)', y = 'Yield', color = 'Planting date')

boxplots of turnip yield raw data

Looking at the data, we can see that there is a clear interaction between planting date and density. At low density, yield is low regardless of planting date. At higher densities, yield is still low for the early planting date but high for the later one. There are not huge differences between the genotypes, but it does seem that Barkant has a bigger difference between the planting dates than Marco.

Another thing to notice is that there’s a relationship between the mean and the variance. The treatment combinations with lower values also have a narrower distribution. This is a problem for a general linear model, because a general linear model only has one standard deviation for all the residuals. If there is greater variance in some groups than others, that can lead to biased inference and inaccurate predictions.

Fit a linear mixed model

Let’s start by pretending we don’t care about any of that mean-variance business. Just fit a simple general linear mixed model to the data. We will include all main effects, two-way interactions, and three-way interactions between genotype, planting date, and density using the syntax gen * date * density on the right-hand side of the model formula. To that, we will add (1 | block) which means that we want a random intercept for each block. This means the model will fit a separate intercept for each block and a variance parameter to model how much the blocks systematically vary from one another. As yield varies between 0 and 25 we will use normal(0, 5) as a prior. Once again, we’ll use all default MC sampling options.

mcconway_lmm <- brm(yield ~ gen * date * density + (1 | block), 
                    prior = c(
                      prior(normal(0, 5), class = b)
                    ),
                    data = mcconway.turnip, 
                    seed = 113, file = 'fits/mcconway_lmm')

Note: In this model and the following, we get a warning about a few divergent transitions. They are few enough that we can ignore the warning for the purpose of this lesson. Divergent transitions can occur when you are fitting models to small datasets. They may indicate that the posterior is being inadequately sampled. To deal with this you can either set narrower priors or tweak the sampler arguments so that the sampler makes smaller steps with each iteration; this is “safer” because it reduces the chance of a divergent transitions, but it can cause the sampling to be slower.

What does the posterior predictive check plot show?

pp_check(mcconway_lmm)

posterior predictive check plot for turnip yield linear model

Clearly, our model is missing something. First of all, the draws from the posterior predictive (thin blue lines) clearly have a very different shape from the distribution of the observed data (thick black line). They are basically symmetrical and are not capturing the skewed shape of the data at all. Second, they are producing a lot of nonsensical values. A good portion of the distributions are < 0, meaning the fitted yield values for a lot of the data points are negative. As we discussed, this is not possible. You can also see the problem if you calculate estimated marginal means and look at the credible intervals, which we do here for density averaged over the other treatments:

emmeans(mcconway_lmm, ~ density)

##  density emmean lower.HPD upper.HPD
##  1         2.07    -0.983      4.60
##  2         3.14     0.360      5.87
##  4         7.03     4.282      9.75
##  8         8.39     5.608     11.06
## 
## Results are averaged over the levels of: gen, date 
## Point estimate displayed: median 
## HPD interval probability: 0.95

Fit a linear mixed model to the log-transformed data

The 95% credible interval for planting density 1 includes negative values. We are saying that there is a substantial probability that the yield at planting density 1 is negative, which is nonsense. So we definitely have to do something to make the model fit the data better. What can we do? Well, the “old school” way, that lots of people, including me, still do, is to log-transform the response variable. Usually, this is helpful for positive, right-skewed data in three ways. First, it makes the distribution more symmetrical and “normal-ish.” Second, it ensures that all back-transformed estimates are positive because you invert the log by exponentiating, and \(e^x > 0\). Finally, it stabilizes the variance, meaning that it flattens out the relationship between the mean and the variance. To fit a model with a log-transformed response variable, we just need to put log() around the response variable on the left-hand side of the model formula. We also use a normal(0, 2) prior because yield only varies by about 2 to 3 units on a log scale.

mcconway_loglmm <- brm(log(yield) ~ gen * date * density + (1 | block), 
                       prior = c(
                         prior(normal(0, 2), class = b)
                       ),
                       data = mcconway.turnip, 
                       seed = 114, file = 'fits/mcconway_loglmm')

Take a look at the posterior predictive check plot.

pp_check(mcconway_loglmm)

posterior predictive check plot for turnip yield log linear model

The model assuming normal error is doing a better job capturing the shape of the log-transformed data than it did with the untransformed data.

Fit a GLMM with gamma response distribution

The model fit to the log-transformed response variable would probably be satisfactory in most cases. The transformation makes the data less skewed and deals with the unequal variance. However, when you transform data to a different scale, fit a model, and back-transform the model estimates back to the original scale, that introduces bias. It’s better to keep the data on the original scale and pick a distribution that really matches the data. Well, we already saw that the normal distribution doesn’t do the trick. For positive, right-skewed data a good go-to option is the Gamma distribution. It accommodates positive right-skewed data and has the added bonus of having a variance that’s proportional to the mean. So it will be able to deal with the unequal variance we see among groups in our data.

Here are some shapes that the gamma distribution can take. You can see that they are asymmetrical and always positive (the x-axis does not go below 0). Some of them look a lot like the skewed shape of our yield distribution.

different shapes of gamma distribution

We can fit a GLMM with gamma response distribution and log link function using family = Gamma(link = 'log').

mcconway_glmm <- brm(yield ~ gen * date * density + (1 | block), 
                     family = Gamma(link = 'log'), 
                     prior = c(
                       prior(normal(0, 2), class = b)
                     ),
                     data = mcconway.turnip, 
                     seed = 115, file = 'fits/mcconway_glmm')

The posterior predictive check shows that the model is faithful to the positive, right-skewed distribution of the data!

pp_check(mcconway_glmm)

posterior predictive check plot of turnip yield gamma GLMM

Estimating marginal means

With our gamma GLMM, we can do fun things like estimate the marginal means for each combination of planting density, planting date, and genotype. These means and their credible intervals don’t include the random effect of block. They are the expected value for a block that’s “right in the middle” of the distribution of variation across blocks. Note that type = 'response' gives the predictions on the data scale, not the log link scale.

emm_mcconway <- emmeans(mcconway_glmm, ~ density + date + gen, type = 'response')

emm_mcconway

##  density date      gen     response lower.HPD upper.HPD
##  1       21Aug1990 Barkant     2.62     0.809      5.42
##  2       21Aug1990 Barkant     3.59     1.079      7.27
##  4       21Aug1990 Barkant     3.86     1.129      7.91
##  8       21Aug1990 Barkant     5.06     1.515     10.81
##  1       28Aug1990 Barkant     3.78     1.239      7.63
##  2       28Aug1990 Barkant     5.16     1.383     11.12
##  4       28Aug1990 Barkant    12.41     2.626     26.10
##  8       28Aug1990 Barkant    14.44     3.737     30.29
##  1       21Aug1990 Marco       1.35     0.280      2.79
##  2       21Aug1990 Marco       2.41     0.656      5.11
##  4       21Aug1990 Marco       3.43     0.884      6.97
##  8       21Aug1990 Marco       5.15     1.577     10.37
##  1       28Aug1990 Marco       1.32     0.351      2.70
##  2       28Aug1990 Marco       2.24     0.531      4.65
##  4       28Aug1990 Marco       7.39     1.975     15.12
##  8       28Aug1990 Marco       9.78     2.880     20.77
## 
## Point estimate displayed: median 
## Results are back-transformed from the log scale 
## HPD interval probability: 0.95

Then we can take contrasts between any permutation of those that we want. For example, we can take pairwise contrasts between each density level within each combination of planting date and genotype.

contrast(emm_mcconway, 'revpairwise', by = c('date', 'gen'))

## date = 21Aug1990, gen = Barkant:
##  contrast            ratio lower.HPD upper.HPD
##  density2 / density1  1.35     0.506      2.59
##  density4 / density1  1.46     0.639      2.86
##  density4 / density2  1.09     0.395      2.20
##  density8 / density1  1.91     0.717      3.70
##  density8 / density2  1.42     0.511      2.88
##  density8 / density4  1.31     0.485      2.63
## 
## date = 28Aug1990, gen = Barkant:
##  contrast            ratio lower.HPD upper.HPD
##  density2 / density1  1.37     0.473      2.72
##  density4 / density1  3.29     1.156      6.57
##  density4 / density2  2.40     0.738      4.77
##  density8 / density1  3.86     1.366      7.60
##  density8 / density2  2.81     0.875      5.41
##  density8 / density4  1.16     0.432      2.36
## 
## date = 21Aug1990, gen = Marco:
##  contrast            ratio lower.HPD upper.HPD
##  density2 / density1  1.77     0.698      3.48
##  density4 / density1  2.50     0.929      4.96
##  density4 / density2  1.41     0.493      2.81
##  density8 / density1  3.75     1.343      7.33
##  density8 / density2  2.10     0.782      4.25
##  density8 / density4  1.49     0.502      2.94
## 
## date = 28Aug1990, gen = Marco:
##  contrast            ratio lower.HPD upper.HPD
##  density2 / density1  1.68     0.668      3.46
##  density4 / density1  5.66     1.942     11.28
##  density4 / density2  3.32     1.058      6.61
##  density8 / density1  7.42     2.711     14.85
##  density8 / density2  4.42     1.524      8.78
##  density8 / density4  1.32     0.493      2.77
## 
## Point estimate displayed: median 
## Results are back-transformed from the log scale 
## HPD interval probability: 0.95

Assessing evidence for interactions between treatments

The above results show strong statistical evidence for a positive relationship between planting density and yield for the later planting date, but much weaker evidence for the earlier planting date. The effect sizes tend to be greater for the Marco genotype than for the Barkant genotype. You could do interaction contrasts (contrasts of contrasts, as we did above for the germination example) to test whether the density effects are greater in Marco than Barkant:

contrast(emm_mcconway, 'revpairwise', by = 'date', interaction = 'revpairwise')

## date = 21Aug1990:
##  density_revpairwise gen_revpairwise ratio lower.HPD upper.HPD
##  2 / 1               Marco / Barkant  1.33     0.307      3.20
##  4 / 1               Marco / Barkant  1.71     0.385      3.90
##  4 / 2               Marco / Barkant  1.31     0.254      3.36
##  8 / 1               Marco / Barkant  1.96     0.464      4.59
##  8 / 2               Marco / Barkant  1.49     0.296      3.76
##  8 / 4               Marco / Barkant  1.13     0.215      2.87
## 
## date = 28Aug1990:
##  density_revpairwise gen_revpairwise ratio lower.HPD upper.HPD
##  2 / 1               Marco / Barkant  1.24     0.285      3.14
##  4 / 1               Marco / Barkant  1.71     0.276      4.13
##  4 / 2               Marco / Barkant  1.37     0.255      3.68
##  8 / 1               Marco / Barkant  1.95     0.418      4.80
##  8 / 2               Marco / Barkant  1.58     0.323      4.25
##  8 / 4               Marco / Barkant  1.13     0.246      3.10
## 
## Point estimate displayed: median 
## Results are back-transformed from the log scale 
## HPD interval probability: 0.95

We can see from the above that the model predicts Marco to have somewhat higher yield differences between planting density levels than Barkant, but the evidence is relatively weak (the 95% credible intervals indicates that there’s substantial probability of no difference or even for the Marco differences to be lower).

Presentation ideas

As we did above, let’s present this model and interpretation of the results, as we might in a manuscript, including:

Description of the statistical model for the methods section
Text write-up for the results section
A figure

Methods section

A description of the statistical model in the methods section might look like this:

We fit a generalized linear mixed model that modeled turnip yield as a function of genotype, planting date, planting density (discrete variable with four levels), and their interactions. The model had a gamma response distribution with log link function. A random intercept was fit to each block. The model was fit with Hamiltonian Monte Carlo. Four chains were run, each with 1000 warmup iterations and 1000 sampling iterations. We verified model convergence by ensuring the R-hat statistics for all parameters were below 1.01. We assessed model fit by examining the posterior predictive check plot. We obtained posterior estimates of the marginal mean for each combination of genotype, planting date, and planting density. We also estimated the yield ratio between all pairs of density treatments within each combination of planting date and genotype. We calculated the median and 95% equal-tailed credible interval for each estimated mean and ratio.

Results section

A partial write-up in the results section, mainly focusing on the density findings, might look like this:

We found evidence that turnip yield was highest at the highest levels of planting density. The effect of density was highest for the later planting date. There was not strong evidence for a different effect of planting density between the genotypes; it was positive for both. Mean yield for density level 8 kg/ha at the later planting date was 14.66 (95% CI [4.28, 31.27]) for the Barkant genotype, versus 3.82 [1.14, 8.05] at 1 kg/ha; the ratio of yields between highest and lowest density levels was 3.84 [1.32, 7.60]. For the Marco genotype, the mean yield at later planting date was 9.80 [3.32, 21.01] for 8 kg/ha density and 1.31 [0.40, 2.82] for 1 kg/ha density; the yield ratio between the two density levels was 7.45 [2.63, 14.81].

Figure

Here is some code to make a nice plot of the estimated marginal means, along with 95% credible intervals. Note that exp() is used to back-transform the posterior estimates.

gather_emmeans_draws(emm_mcconway) %>%
  mutate(.value = exp(.value)) %>%
  ggplot(aes(x = density, y = .value, group = interaction(density, date))) +
    stat_interval(aes(color = date, color_ramp = after_stat(level)), position = position_dodge(width = .5)) +
    stat_summary(fun = 'median', geom = 'point', size = 2, position = position_dodge(width = .5)) +
    facet_wrap(~ gen) +
    colorblind_friendly +
    labs(x = 'Planting density (kg/ha)', y = 'Modeled yield', color_ramp = 'CI width', color = 'Planting date')

Manuscript quality figure of turnip yield model results

Creating a table is left as an exercise.

Conclusion

We’ve made it to the end of Lesson 2! As always, let’s revisit the learning objectives to see what we’ve learned. You now have experience:

Exploring different priors and how they may influence (or not influence) your conclusions
Fit a Bayesian generalized linear model (GLM) for non-normal response data
Fit a Bayesian generalized linear mixed model (GLMM) for non-normal response data with random effects

Along the way, you learned about a lot of R packages, first and foremost brms, but also emmeans, tidybayes, and bayestestR, among others. Give yourself a well-deserved round of applause!

Exercises

These exercises should give you some more practice with the concepts we learned in this lesson. They get progressively more difficult as you go.

Exercise 1

After we fit the Gamma GLMM to the mcconway.turnip dataset, we calculated posterior estimates of the yield ratios between different planting density levels within each combination of planting date and genotype. Calculate “Bayesian p-values” associated with those ratios.

Exercise 2

Refit the GLM model that we fit to the crowder.seeds dataset, but this time use a normal(0, 0.5) prior on the fixed effects. Does this influence the conclusions we draw? Why or why not?

Exercise 3

For this exercise, we will use another example dataset from agridat. Load the johnson.blight dataset. Make a boxplot showing the distribution of number of rainy days in April and May (rain.am, a continuous variable) for years that did and did not have potato blight (blight is a binary variable, 0 if no blight occurred and 1 if blight occurred). Fit an appropriate GLM to test whether the number of rainy days in April and May influences the probability of potato blight. Ignore all other columns for the purpose of this example. Inspect the model summary. What can you conclude?

Hint: Use binomial family with logit link. Each row is only a single outcome so the formula should begin with blight | trials(1).

Click here for answers

A crash course in Bayesian mixed models with brms (Lesson 2)

Quentin D. Read

June 25, 2025

Introduction and setup

What did we learn in Lesson 1?

What will we learn in this lesson?

Load the packages

Review: influence of priors

Back to Bayes-ics

Priors are not a bad thing

How the posterior is influenced by the prior and the data

Hands-on example of prior distribution influence

Meet the example data

Frequentist analysis

Bayesian analyses with different priors

Comparison of posterior estimates

Generalized linear mixed models

Review: general linear mixed models

Non-normal data: what to do about it?

Transforming the data before model fitting

Generalized linear mixed models: an introduction

Generalized linear model example: Orobanche seed germination

Meet the example data

Fit a linear model to the proportions

Fit a generalized linear model to the binary outcomes

The logit link function

GLM code

Comparing LM and GLM

Estimating marginal means

Comparing inference between LM and GLM

Assessing evidence for interaction between treatments

Presentation ideas

Methods section

Results section

Figure

Table

Generalized linear mixed model example: turnip yield

Meet the example data

Fit a linear mixed model

Fit a linear mixed model to the log-transformed data

Fit a GLMM with gamma response distribution

Estimating marginal means

Assessing evidence for interactions between treatments

Presentation ideas

Methods section

Results section

Figure

Conclusion

Further reading

Exercises

Exercise 1

Exercise 2

Exercise 3