A crash course in Bayesian mixed models with brms (Lesson 1)

What is this class?

  • A brief and practical introduction to fitting Bayesian multilevel models in R and Stan
  • Using brms (Bayesian Regression Models using Stan)
  • Quick intro to Bayesian inference
  • Mostly practical skills

Minimal prerequisites

  • Know what mixed-effects or multilevel model is
  • A little experience with stats and/or data science in R

Advanced prerequisites

  • Knowing about the lme4 package will help
  • Knowing about tidyverse and ggplot2 will help

How to follow the course

  • Slides and text version of lessons are online
  • Fill in code in the worksheet (replace ... with code)
  • You can always copy and paste code from text version of lesson if you fall behind

Conceptual learning objectives

At the end of this course, you will understand …

  • The basics of Bayesian inference
  • What a prior, likelihood, and posterior are
  • The basics of how Markov Chain Monte Carlo works
  • What a credible interval is

Practical learning objectives

At the end of this course, you will be able to …

  • Write brms code to fit a multilevel model with random intercepts and random slopes
  • Diagnose and deal with convergence problems
  • Interpret brms output
  • Compare models with LOO information criteria
  • Use Bayes factors and “Bayesian p-values” to assess strength of evidence for effects
  • Make plots of model parameters and predictions with credible intervals

What is Bayesian inference?

portrait presumed to be Thomas Bayes 

What is Bayesian inference?

A method of statistical inference that allows you to use information you already know to assign a prior probability to a hypothesis, then update the probability of that hypothesis as you get more information

  • Used in many disciplines and fields
  • We’re going to look at how to use it to estimate parameters of statistical models to analyze scientific data
  • Powerful, user-friendly, open-source software is making it easier for everyone to go Bayesian

Bayes’ Theorem

photo of a neon sign of Bayes’ Theorem

  • Thomas Bayes, 1763
  • Pierre-Simon Laplace, 1774

Bayes’ Theorem

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

  • How likely an event is to happen based on our prior knowledge about conditions related to that event
  • The conditional probability of an event A occurring, conditioned on the probability of another event B occurring

Bayes’ Theorem

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

The probability of A being true given that B is true (\(P(A|B)\))
is equal to the probability that B is true given that A is true (\(P(B|A)\))
times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\))

Bayes’ theorem and statistical inference

  • Let’s say \(A\) is a statistical model (a hypothesis about the world)
  • How probable is it that our hypothesis is true?
  • \(P(A)\): prior probability that we assign based on our subjective knowledge before we get any data

Bayes’ theorem and statistical inference

  • We go out and get some data \(B\)
  • \(P(B|A)\): likelihood is the probability of observing that data if our model \(A\) is true
  • Use the likelihood to update our estimate of probability of our model
  • \(P(A|B)\): posterior probability that model \(A\) is true, given that we observed \(B\).

Bayes’ theorem and statistical inference

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

  • What about \(P(B)\)?
  • marginal probability, the probability of the data
  • Basically just a normalizing constant
  • If we are comparing two models with the same data, the two \(P(B)\)s cancel out

Restating Bayes’ theorem

\[P(model|data) \propto P(data|model)P(model)\]

\[posterior = likelihood \times prior\]

what we believed before about the world (prior) × how much our new data changes our beliefs (likelihood) = what we believe now about the world (posterior)

Example

  • Find a coin on the street. What is our prior estimate of the probability of flipping heads?
  • Now we flip 10 times and get 8 heads. What is our belief now?
  • Probably doesn’t change much because we have a strong prior and the likelihood of probability = 0.5 is still high enough even if we see 8/10 heads
  • Shady character on the street shows us a coin and offers to flip it. He will pay $1 for each tails if we pay $1 for each heads
  • What is our prior estimate of the probability?
  • He flips 10 times and gets 8 heads. What’s our belief now?

photo of a magician who can control coin flips

  • In classical “frequentist” analysis we cannot incorporate prior information into the analysis
  • In each case our point estimate of the probability would be 0.8

Bayesian vs. frequentist probability

Usain Bolt, the GOAT

  • Probability, in the Bayesian interpretation, includes how uncertain our knowledge of an event is
  • Example: Before the 2016 Olympics I said “The probability that Usain Bolt will win the gold medal in the men’s 100 meter dash is 75%.”
  • In frequentist analysis, one single event does not have a probability. Either Bolt wins or Bolt loses
  • In frequentist analysis, probability is a long-run frequency. We could predict if the 2016 men’s 100m final was repeated many times Bolt would win 75% of them
  • But Bayesian probability sees the single event as an uncertain outcome, given our imperfect knowledge
  • Calculating Bayesian probability = giving a number to a belief that best reflects the state of your knowledge

Bayes is computationally intensive

  • We need to calculate an integral to find \(P(data)\), which we need to get \(P(model|data)\), the posterior
  • But the “model” is not just one parameter, it might be 100s or 1000s of parameters
  • Need to calculate an integral with 100s or 1000s of dimensions
  • For many years, this was computationally not possible

Markov Chain Monte Carlo (MCMC)

  • Class of algorithms for sampling from probability distributions
  • The longer the chain runs, the closer it gets to the true distribution
  • In Bayesian inference, we run multiple Markov chains with different initial values for a preset number of samples
  • Discard the initial samples (warmup)
  • What remains is our estimate of the posterior distribution
  • Multiple chains confirm you reach the same answer regardless of where each chain started

Hamiltonian Monte Carlo (HMC) and Stan

Stan software logo

  • HMC is the fastest and most efficient MCMC algorithm that has ever been developed
  • It’s implemented in software called Stan

What is brms?

brms software logo

  • An easy way to fit Bayesian mixed models using Stan in R
  • Syntax of brms models is just like lme4
  • Runs a Stan model behind the scenes
  • Automatically assigns sensible priors and does lots of tricks to speed up HMC convergence

Bayes Myths: Busted!

Myth Busted

  • Myth 1. Bayes is confusing
  • Myth 2. Bayes is subjective
  • Myth 3. Bayes takes too long
  • Myth 4. You can’t be both Bayesian and frequentist

Myth 1: Bayes is confusing — BUSTED!