Introduction and setup

This lesson picks up where Lesson 1 left off, continuing along with a practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan).

What did we learn in Lesson 1?

We started out learning the basic ideas behind Bayes.

• What is Bayesian inference
• Prior, likelihood, and posterior
• Credible intervals

We learned how to harness the power of the Hamiltonian Monte Carlo method to fit Bayesian models, using the Stan language via the R package brms. Specifically, we learned how to:

• Fit a multilevel (mixed) model with random intercepts and random slopes
• Specify different priors
• Diagnose and deal with convergence problems
• Plot model parameters and predictions
• Compare models with information criteria
• Do hypothesis testing with Bayesian analogues of p-values

Of course, we only scratched the surface of those complex topics. But if you donâ€™t remember any of that, now would be a good time to review Lesson 1.

What will we learn in this lesson?

After teaching Lesson 1, I got feedback from the participants. People asked me to teach them how to fit mixed models that correspond to common study designs in agricultural science. I also was asked to take a little deeper dive into prior distributions and show how specifying different priors can impact the conclusions we come to. I designed Lessons 2 and 3 based on that feedback. In Lesson 2, you are going to learn how to:

• Specify strongly and weakly informative priors and see how they may influence your results
• Fit a Bayesian generalized linear model (GLM), for situations where you canâ€™t assume normally distributed error in your model
• Fit a Bayesian generalized linear mixed model (GLMM), to accommodate non-normal data and a mixture of fixed and random effects

You will learn some more practical skills for fitting Bayesian models and talking about the results in papers or presentations. Weâ€™ve seen these packages already in Lesson 1 but we will keep practicing them. In this lesson we will:

• Use the emmeans package to make predictions from Bayesian models
• Use the bayestestR package for Bayesian hypothesis testing
• Use the tidybayes, ggdist, and ggplot2 packages to make nice plots of Bayesian model predictions
• Use the gt package to make nice tables of model predictions

Load all the packages and set a plotting theme. Also define a color scale that is colorblind-friendly. Create a directory called fits in your current working directory to store fitted model objects, if it does not already exist.

library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)

theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))

if (!dir.exists('fits')) dir.create('fits')

Set brms options. The option mc.cores = 4 means we will use four processor cores in parallel to fit the models, meaning we can allow four Markov chains to sample simultaneously. The options brms.file_refit = 'on_change' means that if a model has already been fit and saved to a file, and the code and data are unchanged, the model will be reloaded from the file instead of having to fit it again. If you are running this lesson on the cloud server, there are some pre-fit model objects to save time during the workshop.

options(mc.cores = 4, brms.file_refit = 'on_change')

If you are running this code on the Posit Cloud server, set these options instead of the ones above. This will use the faster and less memory-intensive cmdstanr package to fit the models behind the scenes. Unfortunately, the CmdStan software that cmdstanr uses is difficult to install on USDA laptops, but it can be done. Check out instructions for installing CmdStan here, and if those donâ€™t work, instructions for a workaround to install CmdStan using Anaconda. If you end up wanting to run Bayesian models on your own data after this workshop, another option is to run brms code on the SciNet cluster, where the cmdstanr package can easily be configured.

options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr')

Review: influence of priors

Before we dive into more complex models, letâ€™s review Bayesian statistics a little bit. Donâ€™t feel bad if it takes you a few times to really grasp the concepts. Most people need a lot of repetition before these ideas become second nature.

Back to Bayes-ics

Letâ€™s start by going over Bayesâ€™ theorem again.

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

This means the probability of A being true given that B is true ($$P(A|B)$$) is equal to the probability that B is true given that A is true ($$P(B|A)$$) times the ratio of probabilities that A and B are true ($$\frac{P(A)}{P(B)}$$). Using Bayesâ€™ theorem in the context of fitting a statistical model to data, we can state this as:

$P(model|data) = \frac {P(data|model)P(model)}{P(data)}$

or

$posterior \propto likelihood \times prior$

What does this mean? It means that we have some prior knowledge about the world. We formally state that knowledge by giving a statistical distribution of what we think the parameters of the statistical model describing the world are without having collected any data yet. This is called the prior, which in Bayesâ€™ theorem is represented by $$P(model)$$ because itâ€™s the probability of the model not dependent on the data. Then, we go out and collect some data. We use the data to calculate $$P(data|model)$$ which we call the likelihood. Thatâ€™s a function that just depends on the data. It represents how likely it was to observe the data we did, given all possible combinations of parameters we could have in our model. We have to divide the likelihood by the marginal probability or $$P(data)$$ which is basically just some math to make the likelihood function have an area under the curve of 1. (In a probability distribution, adding up the probabilities of all possible mutually exclusive outcomes together has to result in a sum of 1, or 100% probability.) Then, when we multiply the prior and likelihood together, we get $$P(model|data)$$ which we call the posterior. That is our new knowledge of the world, after collecting the data, represented by a new statistical distribution for our parameter estimates. Because it is the product of the prior and the likelihood, it represents a combination of what we knew before we got the data (the prior), updated by the new information we got from the data (the likelihood).