A crash course in Bayesian mixed models with brms (Lesson 1)
What is this class?
- A brief and practical introduction to fitting Bayesian multilevel models in R and Stan
- Using brms (Bayesian Regression Models using Stan)
- Quick intro to Bayesian inference
- Mostly practical skills
Minimal prerequisites
- Know what mixed-effects or multilevel model is
- A little experience with stats and/or data science in R
Advanced prerequisites
- Knowing about the lme4 package will help
- Knowing about tidyverse and ggplot2 will help
How to follow the course
- Slides and text version of lessons are online
- Fill in code in the worksheet (replace
... with code)
- You can always copy and paste code from text version of lesson if you fall behind
Conceptual learning objectives
At the end of this course, you will understand …
- The basics of Bayesian inference
- What a prior, likelihood, and posterior are
- The basics of how Markov Chain Monte Carlo works
- What a credible interval is
Practical learning objectives
At the end of this course, you will be able to …
- Write brms code to fit a multilevel model with random intercepts and random slopes
- Diagnose and deal with convergence problems
- Interpret brms output
- Compare models with LOO information criteria
- Use Bayes factors and “Bayesian p-values” to assess strength of evidence for effects
- Make plots of model parameters and predictions with credible intervals
What is Bayesian inference?
What is Bayesian inference?
A method of statistical inference that allows you to use information you already know to assign a prior probability to a hypothesis, then update the probability of that hypothesis as you get more information
- Used in many disciplines and fields
- We’re going to look at how to use it to estimate parameters of statistical models to analyze scientific data
- Powerful, user-friendly, open-source software is making it easier for everyone to go Bayesian
Bayes’ Theorem
![photo of a neon sign of Bayes’ Theorem]()
- Thomas Bayes, 1763
- Pierre-Simon Laplace, 1774
Bayes’ Theorem
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
- How likely an event is to happen based on our prior knowledge about conditions related to that event
- The conditional probability of an event A occurring, conditioned on the probability of another event B occurring
Bayes’ Theorem
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
The probability of A being true given that B is true (\(P(A|B)\))
is equal to the probability that B is true given that A is true (\(P(B|A)\))
times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\))
Bayes’ theorem and statistical inference
- Let’s say \(A\) is a statistical model (a hypothesis about the world)
- How probable is it that our hypothesis is true?
- \(P(A)\): prior probability that we assign based on our subjective knowledge before we get any data
Bayes’ theorem and statistical inference
- We go out and get some data \(B\)
- \(P(B|A)\): likelihood is the probability of observing that data if our model \(A\) is true
- Use the likelihood to update our estimate of probability of our model
- \(P(A|B)\): posterior probability that model \(A\) is true, given that we observed \(B\).
Bayes’ theorem and statistical inference
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
- What about \(P(B)\)?
- marginal probability, the probability of the data
- Basically just a normalizing constant
- If we are comparing two models with the same data, the two \(P(B)\)s cancel out
Restating Bayes’ theorem
\[P(model|data) \propto P(data|model)P(model)\]
\[posterior = likelihood \times prior\]
what we believed before about the world (prior) × how much our new data changes our beliefs (likelihood) = what we believe now about the world (posterior)
Example
- Find a coin on the street. What is our prior estimate of the probability of flipping heads?
- Now we flip 10 times and get 8 heads. What is our belief now?
- Probably doesn’t change much because we have a strong prior and the likelihood of probability = 0.5 is still high enough even if we see 8/10 heads
- Shady character on the street shows us a coin and offers to flip it. He will pay $1 for each tails if we pay $1 for each heads
- What is our prior estimate of the probability?
- He flips 10 times and gets 8 heads. What’s our belief now?
![photo of a magician who can control coin flips]()
- In classical “frequentist” analysis we cannot incorporate prior information into the analysis
- In each case our point estimate of the probability would be 0.8
Bayesian vs. frequentist probability
- Probability, in the Bayesian interpretation, includes how uncertain our knowledge of an event is
- Example: Before the 2016 Olympics I said “The probability that Usain Bolt will win the gold medal in the men’s 100 meter dash is 75%.”
- In frequentist analysis, one single event does not have a probability. Either Bolt wins or Bolt loses
- In frequentist analysis, probability is a long-run frequency. We could predict if the 2016 men’s 100m final was repeated many times Bolt would win 75% of them
- But Bayesian probability sees the single event as an uncertain outcome, given our imperfect knowledge
- Calculating Bayesian probability = giving a number to a belief that best reflects the state of your knowledge
Bayes is computationally intensive
- We need to calculate an integral to find \(P(data)\), which we need to get \(P(model|data)\), the posterior
- But the “model” is not just one parameter, it might be 100s or 1000s of parameters
- Need to calculate an integral with 100s or 1000s of dimensions
- For many years, this was computationally not possible
Markov Chain Monte Carlo (MCMC)
- Class of algorithms for sampling from probability distributions
- The longer the chain runs, the closer it gets to the true distribution
- In Bayesian inference, we run multiple Markov chains with different initial values for a preset number of samples
- Discard the initial samples (warmup)
- What remains is our estimate of the posterior distribution
- Multiple chains confirm you reach the same answer regardless of where each chain started
Hamiltonian Monte Carlo (HMC) and Stan
![Stan software logo]()
- HMC is the fastest and most efficient MCMC algorithm that has ever been developed
- It’s implemented in software called Stan
What is brms?
![brms software logo]()
- An easy way to fit Bayesian mixed models using Stan in R
- Syntax of brms models is just like lme4
- Runs a Stan model behind the scenes
- Automatically assigns sensible priors and does lots of tricks to speed up HMC convergence
Bayes Myths: Busted!
![Myth Busted]()
- Myth 1. Bayes is confusing
- Myth 2. Bayes is subjective
- Myth 3. Bayes takes too long
- Myth 4. You can’t be both Bayesian and frequentist
Myth 1: Bayes is confusing — BUSTED!