A crash course in Bayesian mixed models with brms (Lesson 1)

What is this class?

  • A brief and practical introduction to fitting Bayesian multilevel models in R and Stan
  • Using brms (Bayesian Regression Models using Stan)
  • Quick intro to Bayesian inference
  • Mostly practical skills

Minimal prerequisites

  • Know what mixed-effects or multilevel model is
  • A little experience with stats and/or data science in R
  • Vague knowledge of what Bayesian stats are

Advanced prerequisites

  • Knowing about the lme4 package will help
  • Knowing about tidyverse and ggplot2 will help

How to follow the course

  • Slides and text version of lessons are online
  • Fill in code in the worksheet (replace ... with code)
  • You can always copy and paste code from text version of lesson if you fall behind

Conceptual learning objectives

At the end of this course, you will understand …

  • The basics of Bayesian inference
  • What a prior, likelihood, and posterior are
  • The basics of how Markov Chain Monte Carlo works
  • What a credible interval is

Practical learning objectives

At the end of this course, you will be able to …

  • Write brms code to fit a multilevel model with random intercepts and random slopes
  • Diagnose and deal with convergence problems
  • Interpret brms output
  • Compare models with LOO information criteria
  • Use Bayes factors and “Bayesian p-values” to assess strength of evidence for effects
  • Make plots of model parameters and predictions with credible intervals

What is Bayesian inference?

portrait presumed to be Thomas Bayes 

What is Bayesian inference?

A method of statistical inference that allows you to use information you already know to assign a prior probability to a hypothesis, then update the probability of that hypothesis as you get more information

  • Used in many disciplines and fields
  • We’re going to look at how to use it to estimate parameters of statistical models to analyze scientific data
  • Powerful, user-friendly, open-source software is making it easier for everyone to go Bayesian

Bayes’ Theorem

photo of a neon sign of Bayes’ Theorem

  • Thomas Bayes, 1763
  • Pierre-Simon Laplace, 1774

Bayes’ Theorem

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

  • How likely an event is to happen based on our prior knowledge about conditions related to that event
  • The conditional probability of an event A occurring, conditioned on the probability of another event B occurring

Bayes’ Theorem

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

The probability of A being true given that B is true (\(P(A|B)\))
is equal to the probability that B is true given that A is true (\(P(B|A)\))
times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\))

Bayes’ theorem and statistical inference

  • Let’s say \(A\) is a statistical model (a hypothesis about the world)
  • How probable is it that our hypothesis is true?
  • \(P(A)\): prior probability that we assign based on our subjective knowledge before we get any data

Bayes’ theorem and statistical inference

  • We go out and get some data \(B\)
  • \(P(B|A)\): likelihood is the probability of observing that data if our model \(A\) is true
  • Use the likelihood to update our estimate of probability of our model
  • \(P(A|B)\): posterior probability that model \(A\) is true, given that we observed \(B\).

Bayes’ theorem and statistical inference

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

  • What about \(P(B)\)?
  • marginal probability, the probability of the data
  • Basically just a normalizing constant
  • If we are comparing two models with the same data, the two \(P(B)\)s cancel out

Restating Bayes’ theorem

\[P(model|data) \propto P(data|model)P(model)\]

\[posterior = likelihood \times prior\]

what we believed before about the world (prior) × how much our new data changes our beliefs (likelihood) = what we believe now about the world (posterior)

Example

  • Find a coin on the street. What is our prior estimate of the probability of flipping heads?
  • Now we flip 10 times and get 8 heads. What is our belief now?
  • Probably doesn’t change much because we have a strong prior and the likelihood of probability = 0.5 is still high enough even if we see 8/10 heads
  • Shady character on the street shows us a coin and offers to flip it. He will pay $1 for each tails if we pay $1 for each heads
  • What is our prior estimate of the probability?
  • He flips 10 times and gets 8 heads. What’s our belief now?

photo of a magician who can control coin flips

  • In classical “frequentist” analysis we cannot incorporate prior information into the analysis
  • In each case our point estimate of the probability would be 0.8

Bayesian vs. frequentist probability

Usain Bolt, the GOAT

  • Probability, in the Bayesian interpretation, includes how uncertain our knowledge of an event is
  • Example: Before the 2016 Olympics I said “The probability that Usain Bolt will win the gold medal in the men’s 100 meter dash is 75%.”
  • In frequentist analysis, one single event does not have a probability. Either Bolt wins or Bolt loses
  • In frequentist analysis, probability is a long-run frequency. We could predict if the 2016 men’s 100m final was repeated many times Bolt would win 75% of them
  • But Bayesian probability sees the single event as an uncertain outcome, given our imperfect knowledge
  • Calculating Bayesian probability = giving a number to a belief that best reflects the state of your knowledge

Bayes is computationally intensive

  • We need to calculate an integral to find \(P(data)\), which we need to get \(P(model|data)\), the posterior
  • But the “model” is not just one parameter, it might be 100s or 1000s of parameters
  • Need to calculate an integral with 100s or 1000s of dimensions
  • For many years, this was computationally not possible

Markov Chain Monte Carlo (MCMC)

  • Class of algorithms for sampling from probability distributions
  • The longer the chain runs, the closer it gets to the true distribution
  • In Bayesian inference, we run multiple Markov chains for a preset number of samples
  • Discard the initial samples (warmup)
  • What remains is our estimate of the posterior distribution

Hamiltonian Monte Carlo (HMC) and Stan

Stan software logo

  • HMC is the fastest and most efficient MCMC algorithm that has ever been developed
  • It’s implemented in software called Stan

What is brms?

brms software logo

  • An easy way to fit Bayesian mixed models using Stan in R
  • Syntax of brms models is just like lme4
  • Runs a Stan model behind the scenes
  • Automatically assigns sensible priors and does lots of tricks to speed up HMC convergence

Bayes Myths: Busted!

Myth Busted

  • Myth 1. Bayes is confusing
  • Myth 2. Bayes is subjective
  • Myth 3. Bayes takes too long
  • Myth 4. You can’t be both Bayesian and frequentist

Myth 1: Bayes is confusing — BUSTED!

man confused by math

  • Yes, it can be confusing at first
  • Largely because frequentist methods were, and still often are, the only ones taught, so Bayesian methods are unfamiliar
  • Frequentist methods can be just as confusing (focuses on rejecting null hypotheses and requires you to imagine a hypothetical population that may not exist to calculate long-run frequencies of events)

Myth 2: Bayes is subjective — BUSTED!

Mr. Spock the Vulcan, played by Leonard Nimoy

  • A very old misconception that dates back at least to R. A. Fisher
  • Bayesian analysis is no more subjective than frequentist
  • All statistical approaches make assumptions and require prior knowledge
  • Incorporating prior knowledge is not a bias to be avoided, but a benefit we should embrace
  • Selectively ignoring prior information = selectively cherry-picking data to include in an analysis

Myth 3: Bayes takes too long — BUSTED! (sort of)

slow sampling

  • There’s no denying that these models take a lot of computing time and memory
  • Two ways around it
    • Fast algorithms for Monte Carlo sampling that quickly converge on correct solution
    • Algorithms that approximate the solution of the integral without having to do full MC sampling
  • brms uses the first approach, check out INLA for the second approach

Myth 4: You can’t be both Bayesian and frequentist — BUSTED!

Why don’t we have both meme

  • Each statistical approach is a tool in your toolkit, not a dogma to blindly follow
  • The differences between Bayesian and frequentist methods are often exaggerated
  • Model selection, cross-validation, and penalized regression, used in frequentist analysis, have similar effects to use of priors in Bayesian analysis
  • With the rise of machine learning the boundaries between these traditional approaches are blurring

Why use Bayes?

  • Some models just can’t be fit with frequentist maximum-likelihood methods
  • Adding priors makes the computational algorithms work better and get the correct answer
  • Estimate how big effects are instead of yes-or-no framework of rejecting a null hypothesis
  • We can say “the probability of something being between a and b is 95%” instead of “if we ran this experiment many times, the estimate would be between a and b 95% of the time.”

Let’s finally fit some Bayesian models!

Setup

Load packages.

library(brms)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(easystats)
library(logspline)

Set plotting theme. Create directory for model fits.

theme_set(theme_minimal())
if (!dir.exists('fits')) dir.create('fits')

Set brms options

options(brms.backend = 'cmdstanr', mc.cores = 4, brms.file_refit = 'on_change')

If not on cloud server and you don’t have cmdstanr installed, don’t set 'cmdstanr as the back-end

options(mc.cores = 4, brms.file_refit = 'on_change')

Example data: Caribbean maize experiment

Bayes on the Maize

The data

Read the simulated yield dataset from CSV

stvincent <- read_csv('data/stvincent.csv')

If not on cloud server, read from GitHub URL

stvincent <- read_csv('https://usda-ree-ars.github.io/SEAStats/brms_crash_course/stvincent.csv')

Examine the data

St. Vincent island 

  • Experiment testing N, P, and K fertilization effects on maize yield
  • Done at 9 sites on the Caribbean island of St. Vincent
  • Incomplete block design, 4 blocks (reps) per site
  • Modified from example dataset included in agridat package

Column descriptions

  • site: character ID for each site (environment)
  • block: character ID for each block (rep), 'B1' through 'B4'. Nested within site
  • plot: integer ID for each plot
  • N: character variable with four levels of nitrogen addition treatment from 0N to 3N.
  • P: character variable with four levels of phosphorus addition treatment from 0P to 3P.
  • K: character variable with four levels of potassium addition treatment from 0K to 3K.
  • yield: continuous outcome variable: maize yield of each plot

Exploratory plots

  • Look at relationship between yield and N using boxplots
  • I will not explain ggplot2 code for now
(yield_vs_N <- ggplot(data = stvincent, aes(x = N, y = yield)) +
  geom_boxplot() +
  ggtitle('Yield by N fertilization treatment'))

Take multilevel structure of data into account (separate by site).

yield_vs_N +
  facet_wrap(~ site) +
  ggtitle('Yield by N fertilization treatment', 'separately by site')

Fitting models

For reference this is mixed model syntax from lme4 package:

lmer(yield ~ 1 + (1 | site) + (1 | block:site), data = stvincent)
  • Dependent or response variable (yield) on left side
  • Tilde ~ separates dependent from independent variables
  • Here the only fixed effect is the global intercept (1)
  • Random effects specification ((1 | site)) has a design side (on the left hand) and group side (on the right hand) separated by |.
  • In this case, the 1 on the design side means only fit random intercepts and no random slopes
  • site on the group side means each site will get its own random intercept
  • block:site means each unique combination of block and site will get its own random intercepts

Our first Bayesian multilevel model!

fit_interceptonly <- brm(yield ~ 1 + (1 | site) + (1 | block:site),
                         data = stvincent,
                         chains = 2,
                         iter = 200,
                         warmup = 100,
                         init = 'random',
                         seed = 1)
  • Same formula as lme4 but with some extra instructions for the HMC sampler
    • Number of Markov chains
    • Iterations for each chain
    • How many iterations to discard as warmup
    • Random initial values
    • No priors specified, so defaults are used

Model output

summary(fit_interceptonly)
  • Look at Rhat value for each parameter
  • Rhat indicates convergence of MCMC chains, approaching 1 at convergence
  • Rhat < 1.01 is ideal

Check convergence: R-hat

max(rhat(fit_interceptonly))
  • Some Rhat values are > 1.05

Check convergence: trace plots

plot(fit_interceptonly)

Posterior distributions and trace plots for

  • the fixed effect intercept (b_Intercept)
  • the standard deviation of the random block intercepts (sd_block:site__Intercept)
  • the standard deviation of the random site intercepts (sd_site__Intercept)
  • the standard deviation of the model’s residuals (sigma)

Dealing with convergence problems

  • Either increase iterations or set stronger priors
  • Increase iterations to 1000 warmup, 1000 sampling per chain (2000 total)
  • update() lets us draw more samples without recompiling code
fit_interceptonly_moresamples <- update(fit_interceptonly, chains = 2, iter = 2000, warmup = 1000)

plot(fit_interceptonly_moresamples)

summary(fit_interceptonly_moresamples)

Note on terminology

  • brms says “group-level effects” and “population-level effects”
  • More commonly called “random effects” and “fixed effects”
  • Everything is a random variable in a Bayesian analysis so some people try to avoid the fixed/random terminology
  • I will still call them fixed and random

Credible intervals

  • What is the “95% CI” thing on the parameter summaries?
  • credible interval, not confidence interval
  • more direct interpretation than confidence interval:
    • We are 95% sure the parameter’s value is in the 95% credible interval
  • based on quantiles of the posterior distribution

Calculating credible intervals

Median and 90% quantile-based credible interval (QI) of the intercept

post_samples <- as_draws_df(fit_interceptonly_moresamples)
post_samples_intercept <- post_samples$b_Intercept

median(post_samples_intercept)
quantile(post_samples_intercept, c(0.05, 0.95))

as_draws_df() gets all posterior samples for all parameters and puts them into a data frame

  • Literally anything in a Bayesian model has a posterior distribution, so anything can have a credible interval!
  • In frequentist models, you have to do bootstrapping to get that kind of interval on most quantities

Variance decomposition

  • Proportion of variation at different nested levels
  • ICC (intraclass correlation coefficient) for each random effect level
icc(fit_interceptonly_moresamples, by_group = TRUE)
  • Proportion of variance for site is very high so there is a need for a multilevel model
  • I would recommend including both block and site, if that’s the way your study was designed

Mixed-effects model with a single fixed effect

  • So far we have only calculated mean of yield and random variation by site and block
  • But what factors influence yield?
  • Add fixed effect of N fertilization treatment
  • Fixed-effect part of model formula is 1 + N
  • No random slope (effect of N fertilization on yield is the same at each site)
  • Still using default priors
fit_fixedN <- brm(yield ~ 1 + N + (1 | site) + (1 | block:site),
                  data = stvincent,
                  chains = 4,
                  iter = 2000,
                  warmup = 1000,
                  seed = 2,
                  file = 'fits/fit_fixedN')
  • seed sets a random seed for reproducibility
  • file creates a .rds file in the working directory so you can reload the model later without rerunning
  • Good practice to use 4 Markov chains — we can be more confident of our answer if all 4 converge
summary(fit_fixedN)
  • Low Rhat (the model converged)
  • Posterior distribution mass for fixed effects is well above zero
  • sigma (SD of residuals) is smaller than before because we’re explaining more variation

Modifying priors

  • prior_summary() shows what priors were used to fit the model
prior_summary(fit_fixedN)
  • t-distributions on intercept, random effect SD, and residual SD (sigma)
  • mean of intercept prior is the mean of the data
  • mean of the variance parameters is 0 but lower bound is 0 (half bell curves)

Priors on fixed effect slopes

  • By default they are flat
  • Assigns equal prior probability to any possible value
  • 0 is as probable as 100000 which is as probable as -55555, etc.
  • Not very plausible
  • It is OK in this case because the model converged, but often it helps convergence to use priors that only allow “reasonable” values

Refitting with reasonable fixed-effect priors

  • normal(0, 10) is a good choice
  • Mean of 0 means we think that positive effects are just as likely as negative
  • SD of 10 means we are still assigning pretty high probability to large effect sizes
  • Use prior() to assign a prior to each class of parameters
fit_fixedN_priors <- brm(yield ~ 1 + N + (1 | site) + (1 | block:site),
                         data = stvincent,
                         prior = c(prior(normal(0, 10), class = b)),
                         chains = 4,
                         iter = 2000,
                         warmup = 1000,
                         seed = 2,
                         file = 'fits/fit_fixedN_priors')
summary(fit_fixedN_priors)
  • Very modest but noticeable effect on the fixed effect estimates
  • Always report what priors you used!

Posterior predictive check

  • pp_check() is a useful diagnostic for how well the model fits the data
pp_check(fit_fixedN_priors)
  • black line: density plot of observed data
  • blue lines: density plot of predicted data from 10 random draws from the posterior distribution
  • Fairly good but the data have two humps that the model doesn’t capture
  • That may be OK because we don’t want to overfit

Plotting posterior estimates

  • summary() only gives us the median and 95% credible interval
  • We can work with the full uncertainty distribution (nothing special about 95%)
  • Functions from tidybayes used to make tables and plots
  • gather_draws() makes a data frame from the posterior samples of parameters that we choose
posterior_slopes <- gather_draws(fit_fixedN_priors, b_Intercept, b_N1N, b_N2N, b_N3N)
  • median_qi() gives us median and quantiles of the parameters
posterior_slopes %>%
  median_qi(.width = c(.66, .95, .99))

Estimated marginal means

  • We have intercept (mean of control group) and three slopes (difference between control and each other level)
  • We can use them to get predictions of mean yield in each of the three other treatment levels
  • Marginal mean is estimated for each draw from the posterior distribution

Estimated marginal means: example “by hand”

  • Add posterior samples of intercept (mean of 0 N treatment) and slope for 1 N (difference between 0 N and 1 N) to get mean of 1 N treatment
  • Calculate median and 95% quantile credible interval
posterior_mean <- pivot_wider(posterior_slopes, names_from = .variable, values_from = .value) %>%
  mutate(mean_1N = b_Intercept + b_N1N)

median_qi(posterior_mean$mean_1N)

Estimated marginal means: use emmeans()

  • First argument is the model fit
  • Second argument ~ N tells it to get means for each N level
N_emmeans <- emmeans(fit_fixedN_priors, ~ N)

Differences between each pair of means

  • Subtract the posterior samples of one mean from another
  • Done with contrast() function, specifying method = 'pairwise' to get all pairwise comparisons
contrast(N_emmeans, method = 'pairwise')
  • Special extensions to ggplot2 for plotting quantiles of posterior distribution
  • gather_emmeans_draws() is used to put posterior samples from emmeans object into a dataframe
post_emm_draws <- gather_emmeans_draws(N_emmeans)

ggplot(post_emm_draws, aes(y = N, x = .value)) +
  stat_halfeye(.width = c(.8, .95)) +
  labs(x = 'estimated marginal mean yield')
ggplot(post_emm_draws, aes(y = N, x = .value)) +
  stat_interval() +
  stat_summary(fun = median, geom = 'point', size = 2) +
  scale_color_brewer(palette = 'Blues') +
  labs(x = 'estimated marginal mean yield')

Model with multiple fixed effects

  • Fixed-effect part is now 1 + N + P + K
fit_fixedNPK <- brm(yield ~ 1 + N + P + K + (1 | site) + (1 | block:site),
                    data = stvincent,
                    prior = c(prior(normal(0, 10), class = b)),
                    chains = 4,
                    iter = 2000,
                    warmup = 1000,
                    seed = 3,
                    file = 'fits/fit_fixedNPK')

Look at trace plots, posterior predictive check, and model summary.

  • What do they show?

Model with random slopes

  • So far we’ve assumed any predictor’s effect is the same at every site (only intercept varies, not slope)
  • Add random slope term to allow both intercept and slope to vary
  • Specify a random slope by adding the appropriate slope to the design side of the random effect specification
    • random intercept only (1 | site)
    • random intercept and random slope with respect to N (1 + N | site)
    • random intercept and random slope with respect to N, P, and K (1 + N + P + K | site)
  • Not enough data for block-level random slopes
fit_randomNPKslopes <- brm(yield ~ 1 + N + P + K + (1 + N + P + K | site) + (1 | block:site),
                           data = stvincent,
                           prior = c(prior(normal(0, 10), class = b)),
                           chains = 4,
                           iter = 2000,
                           warmup = 1000,
                           seed = 4,
                           file = 'fits/fit_randomNPKslopes')
  • Look at trace plots, pp_check, and model summary
  • Lots of parameters because random intercept and three random slopes with three levels each all have variances and covariances

Predictions for N response from random slope model

  • We’re now averaging over random intercepts by site, random slopes by site, block effects within site, and the other fixed effects
N_emmeans <- emmeans(fit_randomNPKslopes, ~ N)

N_contrasts <- contrast(N_emmeans, 'pairwise')
  • Smaller differences between point estimates, and wider credible intervals

Plot of site-level variation in N responses

  • add_epred_draws() from tidybayes generates predictions for any combination of fixed and random effects, so we can plot their posterior distributions
  • epred: expectation of the posterior predictive
  • Predictions for each level of N addition averaged over block effects, incorporating site random effects, and at control levels of P and K
N_epred <- expand_grid(N = unique(stvincent$N),
                       P = '0P',
                       K = '0K',
                       site = unique(stvincent$site)) %>%
  add_epred_draws(fit_randomNPKslopes, re_formula = ~ (1 + N | site), value = 'yield')

ggplot(N_epred, aes(x = N, y = yield)) +
  stat_interval() +
  stat_summary(fun = median, geom = 'point', size = 2) +
  stat_summary(aes(x = as.numeric(N)), fun = median, geom = 'line') +
  scale_color_brewer(palette = 'Greens') +
  facet_wrap(~ site)

Shrinkage

  • The trends within site predicted by the model are less extreme than the raw data would indicate
  • The mixed model pulls estimates toward the mean, “shrinking” them
  • This means it predicts better for new data, at the cost of fitting the original data a little less closely

Interactions between fixed effects

  • You can add interaction terms separated by :
  • example: 1 + N + P + K + N:P + N:K + P:K
    • or shorthand: N*P*K
  • Left as an exercise

Comparing models with information criteria

  • Leave-one-out (LOO) cross-validation compares models
  • How well does a model fit to all data points but one predict the one remaining data point?
  • First use add_criterion() to compute the LOO criterion for each model
  • Then use loo_compare() to rank the models
fit_interceptonly_moresamples <- add_criterion(fit_interceptonly_moresamples, 'loo')
fit_fixedN_priors <- add_criterion(fit_fixedN_priors, 'loo')
fit_fixedNPK <- add_criterion(fit_fixedNPK, 'loo')
fit_randomNPKslopes <- add_criterion(fit_randomNPKslopes, 'loo')
loo_compare(fit_interceptonly_moresamples, fit_fixedN_priors, fit_fixedNPK, fit_randomNPKslopes)
  • Models ranked by ELPD (expected log pointwise predictive density)
  • The best one always has 0 and the others are ranked relative to it
  • The random slope model is a much better fit than models that ignore site-level variation in response
  • The model with no fixed effects does very poorly

Assessing evidence: Bayes Factors

  • One Bayesian analogue of a p-value
  • Ratio of evidence between two models: \(\frac{P(model_1|data)}{P(model_2|data)}\)
  • Ranges from 0 to infinity
  • BF = 1 means equal evidence, BF > 1 means more evidence for model 1
  • No “significance” threshold but BF > 10 is usually called strong evidence

Bayes factors for pairwise means comparisons

  • R package bayestestR lets us compute BF for individual parameters or estimates in the model
  • Ratio of evidence for posterior : evidence for prior
  • Our prior distributions were centered at 0 so they are like “null hypotheses”
  • BF = 1 means we did not change our belief about the parameter at all from the prior, after seeing the data
  • BF > 1 means we’ve changed our belief about the parameter after seeing the data
  • BF < 1 means we have even stronger evidence that the prior is true, after seeing the data
  • We want to test the hypotheses that there are differences between yield means for each of the pairs of N treatments
  • Get prior samples using unupdate() and compute the prior distributions of the differences
  • Compare these to the posterior differences we calculated earlier
fit_randomNPKslopes_prior <- unupdate(fit_randomNPKslopes)
N_emmeans_prior <- emmeans(fit_randomNPKslopes_prior, ~ N)
N_contrasts_prior <- contrast(N_emmeans_prior, 'pairwise')

bayesfactor_parameters(posterior = N_contrasts, prior = N_contrasts_prior)
  • BF shows moderately strong evidence for a difference between 0 N and 2 N
  • BF < 1 for all other pairs
  • WARNING: BFs are very sensitive to your choice of prior and require many samples to be precise

Assessing evidence: MAP-based “p-value”

  • Odds against a “point value” representing a null hypothesis
  • Compare posterior density at this null value with posterior density at the mode of the posterior distribution
  • “Maximum a posteriori” = MAP
  • Use p_pointnull(), specifying null value (default is 0)
  • Not as sensitive to prior but sensitive to how the density is estimated
p_pointnull(N_contrasts, null = 0)

Assessing evidence: ROPE-based “p-value”

  • We don’t have to test the hypothesis that a value is exactly zero
  • We can define a range that is scientifically or practically relevant
  • Example: We will only invest in N fertilizer if we can get a yield increase of at least 1 unit
  • (-1, 1) is a two-sided “region of practical equivalence” = ROPE
  • Use p_rope(), specifying lower and upper bounds of range
p_rope(N_contrasts, range = c(-1, 1))
  • What if we decided 0.5 unit was an acceptably large increase in yield?
p_rope(N_contrasts, range = c(-0.5, 0.5))
  • Not as sensitive to priors or methodological assumptions but obviously completely depends on what the ROPE is
  • It is good to be explicit about what effect size you think is relevant
  • Probably useful to report these values for a range of “ROPEs”

Conclusion

What did we learn? Let’s revisit the learning objectives!

Conceptual learning objectives

You now understand…

  • The basics of Bayesian inference
  • Definition of prior, likelihood, and posterior
  • How Markov Chain Monte Carlo works
  • What a credible interval is

Practical skills

You now can…

  • Write brms code to fit a multilevel model with random intercepts and random slopes
  • Diagnose and deal with convergence problems
  • Interpret brms output
  • Compare models with LOO information criteria
  • Use Bayes factors and “Bayesian p-values” to assess strength of evidence for effects
  • Make plots of model parameters and predictions with credible intervals

Congratulations, you are now Bayesians!

Meme by Tristan Mahr

  • See text version of lesson for further reading and useful resources
  • Please provide feedback to help me improve this course!
    • Send feedback to quentin.read@usda.gov