This lesson picks up where Lesson 1 left off, continuing along with a
practical introduction to fitting Bayesian multilevel models in R and
Stan using the **brms**
package (**B**ayesian **R**egression
**M**odels using **S**tan).

Download the worksheet for this lesson here.

We started out learning the basic ideas behind Bayes.

- What is Bayesian inference
- Prior, likelihood, and posterior
- Credible intervals

We learned how to harness the power of the Hamiltonian Monte Carlo
method to fit Bayesian models, using the Stan language via the R package
**brms**. Specifically, we learned how to:

- Fit a multilevel (mixed) model with random intercepts and random slopes
- Specify different priors
- Diagnose and deal with convergence problems
- Plot model parameters and predictions
- Compare models with information criteria
- Do hypothesis testing with Bayesian analogues of p-values

Of course, we only scratched the surface of those complex topics. But if you donâ€™t remember any of that, now would be a good time to review Lesson 1.

After teaching Lesson 1, I got feedback from the participants. People asked me to teach them how to fit mixed models that correspond to common study designs in agricultural science. I also was asked to take a little deeper dive into prior distributions and show how specifying different priors can impact the conclusions we come to. I designed Lessons 2 and 3 based on that feedback. In Lesson 2, you are going to learn how to:

- Specify strongly and weakly informative priors and see how they may influence your results
- Fit a Bayesian generalized linear model (GLM), for situations where you canâ€™t assume normally distributed error in your model
- Fit a Bayesian generalized linear mixed model (GLMM), to accommodate non-normal data and a mixture of fixed and random effects

You will learn some more practical skills for fitting Bayesian models and talking about the results in papers or presentations. Weâ€™ve seen these packages already in Lesson 1 but we will keep practicing them. In this lesson we will:

- Use the
**emmeans**package to make predictions from Bayesian models - Use the
**bayestestR**package for Bayesian hypothesis testing - Use the
**tidybayes**,**ggdist**, and**ggplot2**packages to make nice plots of Bayesian model predictions - Use the
**gt**package to make nice tables of model predictions

Load all the packages and set a plotting theme. Also define a color
scale that is colorblind-friendly. Create a directory called
`fits`

in your current working directory to store fitted
model objects, if it does not already exist.

```
library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)
theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))
if (!dir.exists('fits')) dir.create('fits')
```

Set **brms** options. The option
`mc.cores = 4`

means we will use four processor cores in
parallel to fit the models, meaning we can allow four Markov chains to
sample simultaneously. The options
`brms.file_refit = 'on_change'`

means that if a model has
already been fit and saved to a file, and the code and data are
unchanged, the model will be reloaded from the file instead of having to
fit it again. If you are running this lesson on the cloud server, there
are some pre-fit model objects to save time during the workshop.

`options(mc.cores = 4, brms.file_refit = 'on_change')`

If you are running this code on the Posit Cloud server, set these
options instead of the ones above. This will use the faster and less
memory-intensive **cmdstanr**
package to fit the models behind the scenes. Unfortunately, the CmdStan
software that **cmdstanr** uses is difficult to install on
USDA laptops, but it can be done. Check
out instructions for installing CmdStan here, and if those donâ€™t
work, instructions
for a workaround to install CmdStan using Anaconda. If you end up
wanting to run Bayesian models on your own data after this workshop,
another option is to run **brms** code on the SciNet
cluster, where the **cmdstanr** package can easily be
configured.

`options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr')`

Before we dive into more complex models, letâ€™s review Bayesian statistics a little bit. Donâ€™t feel bad if it takes you a few times to really grasp the concepts. Most people need a lot of repetition before these ideas become second nature.

Letâ€™s start by going over Bayesâ€™ theorem again.

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

This means the **probability of A being true given that B is
true** (\(P(A|B)\)) is equal to
the **probability that B is true given that A is true**
(\(P(B|A)\)) times the **ratio of
probabilities that A and B are true** (\(\frac{P(A)}{P(B)}\)). Using Bayesâ€™ theorem
in the context of fitting a statistical model to data, we can state this
as:

\[P(model|data) = \frac {P(data|model)P(model)}{P(data)}\]

or

\[posterior \propto likelihood \times prior\]

What does this mean? It means that we have some prior knowledge about
the world. We formally state that knowledge by giving a statistical
distribution of what we think the parameters of the statistical model
describing the world are *without having collected any data yet*.
This is called the **prior**, which in Bayesâ€™ theorem is
represented by \(P(model)\) because
itâ€™s the probability of the model not dependent on the data. Then, we go
out and collect some data. We use the data to calculate \(P(data|model)\) which we call the
**likelihood**. Thatâ€™s a function that just depends on the
data. It represents how likely it was to observe the data we did, given
all possible combinations of parameters we could have in our model. We
have to divide the likelihood by the **marginal
probability** or \(P(data)\)
which is basically just some math to make the likelihood function have
an area under the curve of 1. (In a probability distribution, adding up
the probabilities of all possible mutually exclusive outcomes together
has to result in a sum of 1, or 100% probability.) Then, when we
multiply the prior and likelihood together, we get \(P(model|data)\) which we call the
**posterior**. That is our new knowledge of the world,
after collecting the data, represented by a new statistical distribution
for our parameter estimates. Because it is the product of the prior and
the likelihood, it represents a combination of what we knew before we
got the data (the prior), updated by the new information we got from the
data (the likelihood).