Introduction and setup

This lesson picks up where Lesson 2 left off, continuing along with a practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan).

Download the worksheet for this lesson here.

What did we learn in Lessons 1 and 2?

So far, we have covered the basics of Bayesian inference and learned about the basic components of a Bayesian model (prior, likelihood, and posterior). We’ve used the R package brms to fit Bayesian linear mixed models using the Hamiltonian Monte Carlo method, with the Stan language doing the sampling behind the scenes. We’ve learned some tricks to deal with convergence issues when sampling. We’ve used the posterior distributions from our models to make inferences: we’ve calculated credible intervals, plotted model parameters and predictions, compared models with information criteria, and done hypothesis testing with Bayesian-style p-values. That was all Lesson 1!

In Lesson 2, we explored prior distributions in more depth and learned how the inference you make can be sensitive to your choice of prior, especially if you’re working with small data. Then we learned how to specify and fit Bayesian GLMs (for non-normal data) and GLMMs (for non-normal data with random effects). We now know what response distributions and link function we can use if we have binary response data or positive right-skewed data.

What will we learn in Lesson 3?

Another piece of feedback I got when I taught the first Bayesian workshop is that people are interested in learning how to specify more complex model structures that go along with typical experimental designs in agricultural science. I think this lesson helps address that need. In ag science, we often have data collected over time and space. Usually, we can assume that the closer in time or space two data points are, the more they will be correlated with one another. In this lesson, we are going to learn how to account for that non-independence in time or space when we are trying to estimate an effect. We will learn:

How to fit a linear mixed model with residuals that are correlated in time (repeated measures in time)
How to fit a linear mixed model to a split plot design, and add a spatial additive component to the model

Space and time

The basic insight that you need to have to understand why specialized model structures are needed to deal with data collected over time and space is this: the closer two things are in time or space, the more closely related they are. In a spatial context, this is known as the First Law of Geography.

Image (c) NASA

This has an important consequence if we’re collecting data and trying to use that data to make conclusions about the world. Let’s say we are doing an observational study of farms to look at how their management practices influence soil carbon sequestration. If you have farms that are right next to each other, they might have similar environmental conditions. So even though each farm may make independent management decisions, they are not fully independent because they share a spatial environment. This is not only an issue with observational studies, but also with designed experiments. Even if you randomly assign plots to treatments, plots that are near each other in space or time will share environmental influences.

The statistical models that you’re familiar with, including all the models we’ve covered so far in this course, assume that the residuals of every data point in the model are independent of one another. Even the mixed models we’ve seen so far assume that after you account for the random effects, the residuals are independent. Often, patterns of variation in space and time cause this assumption of independent residuals to be a bad one. What can we do about that?

There are two main ways that we can model data that are correlated in space and time.

Within the framework of a GLMM, we can model a residual covariance structure, such that data points are modeled as being more closely correlated, the closer they are to each other in space or time.
We can include an additive term, otherwise known as a “smoother,” which will model the response variable as depending on a smooth function of either space or time. At that point our model becomes a GAMM (generalized additive mixed model).

Both these approaches can work for spatial and temporal data. In this lesson, we will work through two examples. The first is a GLMM with a residual covariance structure to model autocorrelation in time (first approach). The second is a GAMM with a spatial additive term (second approach).

Note: it is sometimes necessary to combine the two approaches and fit a model with an additive component that also models the residual covariance. We will not open that can of worms today!

Load packages

Load the same packages and plotting themes as we have done before. We are also going to use the ape package to calculate spatial autocorrelation statistics.

library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)
library(ape)

theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))

Run the first version of options() if you don’t have access to CmdStan, or the second version if you do.

options(mc.cores = 4, brms.file_refit = 'on_change')
options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr')

Model with residual covariance for repeated measures in time

Our first foray into the space-time continuum will be to deal with datasets where the same subjects (e.g., plots or animals) are measured at different time points. We want to account for measurements from the same subject at different times not being independent.

Image (c) Te Ara, the Encyclopedia of New Zealand

Meet the example data

We are going to continue to use example data from the agridat package in this lesson. For our first example, we are going to use data from an experiment on weight gain in cows, originally published by Lepper & Lewis in 1989 and then reanalyzed by Diggle et al. in 1994. The study was designed to study how iron supplementation and tuberculosis infection interact to affect body weight gain in cows. The two treatments in the 2×2 factorial design were whether the animals got supplemental iron (NoIron and Iron) and whether the animals were inoculated with tuberculosis at six weeks of age (NonInfected and Infected). Animal ID is in the animal column.

The design is not balanced. The number of animals per treatment combination was intentionally unequal, with 10 cows in each of the two infected groups and 4 in each of the two uninfected groups. Furthermore, two of the original animals were removed from the study for unknown reasons. This leaves us with 26 animals. The unequal numbers in the treatment groups is not a problem for our analysis, although it will affect the relative uncertainty of estimates for the different groups.

Diggle experimental design schematic

The cows were weighed 23 times. The number of days between weighings was slightly uneven, but they’re pretty close, so we’ll ignore that minor inconsistency for the purposes of this analysis. We’re interested in how much the rate of weight gain over time is influenced by iron supplementation and the presence of tuberculosis, and how much the effect of iron depends on tuberculosis (i.e., do iron and tuberculosis interact)?

Let’s load the example data and then relevel the factors so that the control groups (NoIron and NonInfected) are first in the factor level ordering. To me this makes the results more interpretable because if the control group comes first it will be the reference, or intercept, level.

data(diggle.cow)

diggle.cow <- diggle.cow %>%
  mutate(iron = relevel(iron, ref = 'NoIron'),
         infect = relevel(infect, ref = 'NonInfected'))

Make a plot of the data, showing each animal’s weight gain trend as an individual thin line, with the median trend for each treatment combination as a thick line. Color indicates iron supplementation and line type indicates infection treatment.

ggplot(diggle.cow, aes(x = day, y = weight, color = iron, linetype = infect, group = animal)) +
  geom_line() +
  stat_summary(aes(group = interaction(iron, infect)), fun = 'median', geom = 'line', linewidth = 2, show.legend = FALSE) +
  colorblind_friendly +
  labs(y = 'weight (kg)') +
  theme(legend.title = element_blank())

time series plot of cow weight gain raw data

We can see from the above plot that the weight gain trend over time is pretty linear. For simplicity here, we’ll assume a linear trend of weight with respect to day. The slopes seem different between the treatments. The rate of weight gain looks faster in the no iron supplement groups and in the uninfected groups. But the effect of iron supplementation does not appear to depend very strongly on infection; no iron looks higher than iron in both infected and uninfected. So I would not expect to see strong evidence for a statistical interaction.

When doing Monte Carlo sampling, it is a good idea to not have extremely large or small values in the predictors or response. They lead to numerical problems. So because the lowest number in the day column is 122, we will make a new column, dayafterstart, that begins at 0. In addition, we will standardize (z-transform, or subtract mean and then divide by standard deviation) the weight variable. Neither of those things will affect our conclusions, because we are just adding and multiplying by constants. But they will make the numerical computations done by brm() easier.

Another benefit of standardizing the variables is that if we know our response variable is in standard deviation units, it makes it easier for us to decide on a prior distribution for the fixed effects. If there is a categorical fixed effect with a coefficient of 5 in standard deviation units, we know that represents a huge effect. But if the variable is body weight, 5 would be a big effect if weight is measured in kg but a tiny effect if it is measured in mg. No need to guess, if we standardize the variable first.

diggle.cow$dayafterstart <- diggle.cow$day - min(diggle.cow$day)
diggle.cow$weight_scaled <- scale(diggle.cow$weight)

We know that we are going to have to include weight_scaled ~ dayafterstart * iron * infect as the fixed part of our model formula so that we can estimate the slope of body weight versus time, and how that slope is affected by iron supplementation, infection, and their interaction. But we have 26 cows in the study that each have their own time trend. We need to make sure our model accounts for body weight measurements from the same cow on different days not being independent of each other.

Different ways to model residual covariance

Up to now, we’ve assumed that residuals in our statistical models are all conditionally independent. In other words, after accounting for any fixed and random effects, one residual has no relationship to any other. Those models still had a variance-covariance matrix for the residuals; it just had zero for any of the covariances between different residuals. If we call that matrix $\textbf{R}$ then our models with independent residuals had this covariance structure:

$\textbf{R} = \sigma^2\textbf{I}$

That’s a $n \times n$ matrix, where $n$ is the number of time points. It has $\sigma^2$ on every element of the diagonal, meaning that all residuals have the same variance. It has 0 everywhere else, meaning there is 0 covariance between any pair of residuals.

The big step we are taking here is to put some non-zero values into the $\textbf{R}$ matrix to model correlation between the residuals. But how do we do that? There are lots of different ways we could think about modeling that so-called “residual autocorrelation.” We’ll go through three of the simplest ones here.

Unstructured covariance: This is the most flexible model but also the most complicated, with many parameters. Each pair of times is modeled as having its own unique correlation. Every single off-diagonal element of the $\textbf{R}$ matrix is a separate parameter that we have to estimate.
Compound symmetry: This is the simplest model with the fewest parameters. A single correlation parameter $\rho$ is estimated. Any two time points from the same subject (here, cow) are modeled as being equally strongly correlated, regardless of how close together or how far apart in time they are. That means every single off-diagonal element of the $\textbf{R}$ matrix is forced to be the same non-zero value, $\rho$.
First-order autoregressive: This is also known as AR(1). This structure considers the correlation between two times from the same cow to be highest if the times are adjacent and decrease systematically as the two times get further apart. It has just as few parameters as the compound symmetry structure. A single correlation $\rho$ is estimated for all pairs of points 1 time unit apart. Pairs of points 2 time units apart are correlated at $\rho^2$, then $\rho^3$ for points 3 time units apart, and so on. This makes the correlation steadily weaker as the time gap grows larger. For example, if you had four time points, the $\textbf{R}$ matrix would look like this:

\[\textbf{R}=\sigma^2\begin{pmatrix}1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \end{pmatrix}\]

We are going to try to fit all three of the above covariance structures to the body weight gain data. It’s likely that the unstructured covariance model will be extremely overparameterized and cause a lot of model fitting issues. That’s because there are 23 time points, so we would have to estimate a parameter for all pairs of days, 253 in total! Normally I would probably not try to fit that model, but I did it here as proof of concept.

Model syntax for residual covariance structures

The model syntax for fitting the different covariance structures is not too complex in brms. You need to specify (1) the type of covariance structure as the function name, (2) the column in the dataset that is the time variable as the time argument, and (3) the column in the dataset that is the grouping or subject variable as the gr argument. We want to model observations from the same subject as being correlated, and potentially having that correlation vary as a function of closeness in time.

Unstructured covariance is modeled with unstr(time = dayafterstart, gr = animal)
Compound symmetry covariance is modeled with cosy(time = dayafterstart, gr = animal)
Autoregressive AR(1) covariance is modeled with ar(time = dayafterstart, gr = animal, p = 1). Note the p = 1 specifies what order of autoregressive structure we want. Increasing p would yield more complex autoregressive structures with deeper time lags.

Fitting the residual covariance models

Now let’s fit the models. The unstructured covariance model takes quite a bit of time to run, and I had to do quite a bit of tweaking to the priors and initial values and increase the number of iterations so that the model would converge. In practice often these warnings are red flags that the model is overparameterized. The biggest red flag is that I had to initialize all the chains to starting values of zero just to get it to sample, which is not a good habit to get into. But as I said above we are trying to fit over 250 unique covariance parameters to a pretty small dataset with only ~600 rows. We know that’s not a great idea, but we are just fitting it to compare with the other two models.

Note: a normal prior is put on the fixed effects, with a fairly large standard deviation. An additional prior is placed on the covariance matrix for the unstructured covariance model. This is because we have so many parameters to estimate within that matrix. A prior that sets a low probability on high correlations is necessary to get the model to converge. By the way, I tried to fit identical models using the frequentist modeling package glmmTMB and could not successfully get the models to fit — we are now in the territory where Bayesian models really outperform their frequentist cousins in terms of actually getting you an answer!

diggle_unstructured <- brm(weight_scaled ~ iron * infect * dayafterstart + unstr(time = dayafterstart, gr = animal), 
                           prior = c(
                             prior(normal(0, 5), class = b),
                             prior(lkj_corr_cholesky(3), class = Lcortime)
                           ),
                           iter = 4000, warmup = 3000, init = 0,
                           data = diggle.cow, 
                           seed = 1001, file = 'fits/diggle_unstructured')
diggle_compoundsymm <- brm(weight_scaled ~ iron * infect * dayafterstart + cosy(time = dayafterstart, gr = animal), 
                           prior = c(
                             prior(normal(0, 5), class = b)
                           ),
                           data = diggle.cow, 
                           seed = 1002, file = 'fits/diggle_compoundsymm')
diggle_ar1 <- brm(weight_scaled ~ iron * infect * dayafterstart + ar(time = dayafterstart, gr = animal, p = 1), 
                  prior = c(
                    prior(normal(0, 5), class = b)
                  ),
                  data = diggle.cow, 
                  seed = 1003, file = 'fits/diggle_ar1')

Comparing the models

Let’s look at the posterior predictive check plots for each model.

pp_check(diggle_unstructured)

posterior predictive check plots for cow weight gain models

pp_check(diggle_compoundsymm)

posterior predictive check plots for cow weight gain models

pp_check(diggle_ar1)

posterior predictive check plots for cow weight gain models

They all look reasonably good; possibly the AR1 is a slightly better fit, judging from the check plot.

Cross-validation two different ways

Unfortunately, the leave-one-out cross-validation that we’ve been using to compare models in Lessons 1 and 2 does not work well with models like the unstructured covariance model, where each observation basically has its own parameter (technical explanation can be found here). It will give misleading results. Instead, we have to use k-fold cross-validation to compare the unstructured model to the other two models. Because this requires resampling each model $k$ times, it is very time-consuming. Thus I present the results here instead of having you run them.

kfold(diggle_unstructured, diggle_compoundsymm, diggle_ar1, folds = 'grouped', group = 'animal', K = 5, compare = TRUE)

Model comparisons:
                    elpd_diff se_diff
diggle_ar1             0.0       0.0 
diggle_unstructured  -60.1      19.0 
diggle_compoundsymm -321.1      25.2

The k-fold cross-validation shows that the AR1 model is superior, somewhat better than the unstructured covariance model. But despite having tons of parameters, the unstructured model still outperforms the compound symmetry model. It isn’t surprising to me that the AR1 model performs well in this case, because we have an ideal dataset to illustrate a first-order autoregressive process. A cow’s weight at time $t$ is directly dependent on its weight at the previous timepoint $t-1$. Other processes might be better modeled with compound symmetry or unstructured covariance, such as annual grain yield from a field which might jump up and down with “random” environmental fluctuations from year to year.

In classical frequentist analysis, unstructured covariance models tend to perform very poorly because they are so overparameterized. However in the Bayesian world, models with high numbers of parameters, like unstructured covariance models, actually tend to perform relatively well. This is because of the regularizing effect of priors. Priors can “shrink” the parameter values toward zero so that even a model with lots and lots of parameters isn’t overfit.

Assessing model predictions

Now that we’ve established which model is best, we can go ahead and look at the posterior estimates. As always the emmeans package has a trick for this. The function emtrends() (estimated marginal trends) will get the posterior estimate of the slope for each treatment combination, averaged over the random effects.

The syntax is like emmeans() but with an additional argument, var. We pass the name of the time variable that we want the trend over. We provide the specification revpairwise ~ iron | infect to get a reverse pairwise contrast between each level of iron within each level of infect. (It is reverse pairwise so that we subtract the control from the treatment, to see if the treatment has a positive or negative effect relative to control.) You can swap iron and infect to get infection contrasts within each iron supplementation level.

Note that the slopes are still in standardized units. We will convert them back into the original kg/day units for later presentation.

emt_diggle <- emtrends(diggle_ar1, revpairwise ~ iron | infect, var = 'dayafterstart')

emt_diggle

## $emtrends
## infect = NonInfected:
##  iron   dayafterstart.trend lower.HPD upper.HPD
##  NoIron             0.00535   0.00457   0.00611
##  Iron               0.00478   0.00389   0.00567
## 
## infect = Infected:
##  iron   dayafterstart.trend lower.HPD upper.HPD
##  NoIron             0.00444   0.00385   0.00493
##  Iron               0.00395   0.00341   0.00444
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95 
## 
## $contrasts
## infect = NonInfected:
##  contrast       estimate lower.HPD upper.HPD
##  Iron - NoIron -0.000572  -0.00176  0.000578
## 
## infect = Infected:
##  contrast       estimate lower.HPD upper.HPD
##  Iron - NoIron -0.000494  -0.00119  0.000244
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95

We see more or less what we expected after looking at the raw data. The slope is slightly larger in the NoIron group, by a similar margin in each infection group. However, the 95% credible interval is consistent with the true difference being zero, or even with the Iron group having a higher slope. Thus for both infected and uninfected animals, the evidence that iron supplementation affects weight gain rate is weak.

If you wanted to see whether the slope difference between iron levels is greater or smaller in infected cattle, you could contrast the slope differences with contrast(rbind(emt_diggle$contrasts), 'pairwise') (not shown).

We can use describe_posterior() from bayestestR to calculate two flavors of Bayesian p-value. The Bayesian p-values also don’t indicate strong evidence for a difference. Based on this, I would not conclude that iron supplementation affects weight gain rate, though we have to consider that the sample size is fairly small (n = 6 or 7 cows per group).

describe_posterior(emt_diggle$contrasts, ci = NULL, test = c('pd', 'p_map'), as_p = TRUE)

## Summary of Posterior Distribution
## 
## contrast      |      infect |    Median | p (MAP) |     p
## ---------------------------------------------------------
## Iron - NoIron | NonInfected | -5.72e-04 |   0.586 | 0.318
## Iron - NoIron |    Infected | -4.94e-04 |   0.353 | 0.186

Presentation ideas

How might we write this up in the methods and results section of a manuscript?

Methods section

A methods write-up might look like this. Note I include the model comparison results in the methods. I consider that to be more of a technical aspect of the model fitting, which would distract from the story of the results.

We fit a linear mixed model to the trend in standardized body weight over time with residual covariance structure to account for repeated measures within animals. The model had fixed effects for time, iron supplementation, tuberculosis infection, and their interactions. We fit models with unstructured, compound symmetry, and AR(1) residual covariance. A Normal(0, 5) prior distribution was assumed for the fixed effects. An LKJ prior with η = 3 was given to the unstructured residual covariance matrix. Models were fit with Hamiltonian Monte Carlo. The joint posterior distribution was sampled with four Markov chains, run for a variable number of warmup iterations and 1000 sampling iterations. We verified the R-hat statistic was < 1.01 for all parameters, indicating model convergence. In k-fold cross-validation, the first-order autoregressive model performed better than the unstructured (ΔELPD = 41) and compound symmetry models (ΔELPD = 295). Thus we present predictions from the autoregressive model.

Results section

If the focus of your inquiry is how much iron supplementation influences the rate of weight gain, and whether iron’s effect is modified by tuberculosis, you could write something like this in the results.

There was little evidence for an effect of iron supplementation on the rate of weight gain. In uninfected animals the posterior estimate of the difference in mean weight gain rate between animals receiving supplemental iron and those that did not was -0.046 kg/day (95% CI [-0.135, 0.046]; p_MAP = 0.64), while in infected animals it was -0.039 kg/day (95% CI [-0.094, 0.015]; p_MAP = 0.40).

Figure

Here is how you could make a figure that includes the raw data plotted over the modeled trends, with a 95% credible interval (plotting multiple credible intervals on this kind of plot makes it too busy). We have to calculate the expected value of the posterior predictive at a bunch of time points for each of the treatment combinations, then summarize the distribution across that grid using its median and 95% quantile CI.

Note that incl_autocor = FALSE is included in the prediction function so that we can get an overall prediction that averages over the autocorrelation within each animal. Also note that when creating the plotting data, I rescaled the day and weight variables to their original scales.

cow_weight_preds <- expand.grid(dayafterstart = 0:max(diggle.cow$dayafterstart), iron = c('NoIron', 'Iron'), infect = c('NonInfected', 'Infected')) %>%
  add_epred_draws(diggle_ar1, incl_autocor = FALSE) %>%
  mutate(day = dayafterstart + min(diggle.cow$day),
         weight = .epred * sd(diggle.cow$weight, na.rm = TRUE) + mean(diggle.cow$weight, na.rm = TRUE))

ggplot(diggle.cow, aes(x = day, y = weight, color = iron, fill = iron, linetype = infect)) +
  geom_line(aes(group = animal)) +
  stat_summary(geom = 'ribbon', data = cow_weight_preds, fun.min = ~ quantile(., 0.025), fun.max = ~ quantile(., 0.975), alpha = 0.25, show.legend = FALSE) +
  stat_summary(geom = 'line', data = cow_weight_preds, fun = 'median', linewidth = 2, show.legend = FALSE) +
  colorblind_friendly +
  labs(y = 'weight (kg)') +
  theme(legend.title = element_blank())

figure with raw data and model predictions for cow weight gain

The figure shows that the model is a reasonably good fit to the data, in addition to illustrating the fairly weak evidence for any impact of iron supplementation on weight gain rate. In practice, the one improvement I might make is to allow for a nonlinear trend over time, though this would make the simple interpretation of slope differences a little bit trickier.

Table

For the table, I rescaled the slopes from standardized units (standard deviations of weight/day) to more interpretable units (kg/day) by multiplying them by the standard deviation of the weight column in the data. The table shows the posterior median estimates of the weight gain slopes and some credible intervals.

I like to use 66% and 95% because if the posterior distribution is roughly normal, those are approximately 1 and 2 standard deviations above and below the mean.

weight_slope_summary <- gather_emmeans_draws(emt_diggle$emtrends) %>%
  mutate(.value = .value * sd(diggle.cow$weight, na.rm = TRUE)) %>%
  median_qi(.width = c(0.66, 0.95)) %>%
  pivot_wider(names_from = .width, values_from = c(.lower, .upper)) %>%
  select(-c(.point, .interval))

gt(weight_slope_summary) %>%
  fmt_number(decimals = 3) %>%
  cols_merge(c(.lower_0.66, .upper_0.66), pattern = '[{1}, {2}]') %>% 
  cols_merge(c(.lower_0.95, .upper_0.95), pattern = '[{1}, {2}]') %>%
  cols_label(infect = 'tuberculosis infection', .value = 'weight gain rate (kg/day)', .lower_0.66 = '66% CI', .lower_0.95 = '95% CI') %>%
  tab_options(column_labels.font.weight = 'bold')

iron	tuberculosis infection	weight gain rate (kg/day)	66% CI	95% CI
NoIron	NonInfected	0.424	[0.394, 0.452]	[0.357, 0.481]
NoIron	Infected	0.351	[0.330, 0.371]	[0.305, 0.391]
Iron	NonInfected	0.379	[0.345, 0.411]	[0.303, 0.444]
Iron	Infected	0.313	[0.294, 0.331]	[0.270, 0.351]

Spatial additive model

That was fun, but how about something really cool?!?

An alternative to modeling the residual covariance is to explicitly fit a spatial or temporal term in your model. Instead of just treating space and time variation as a nuisance term to be accounted for by a random effect, you can use this kind of model to make inference about the space and time trend in addition to your other effects of interest.

Okay, so let’s just slap a covariate for time or spatial coordinates into our linear model … problem solved, right? Keep it simple, right? Well, the effect of space and time is not often linear. Nonlinear effects are very likely to occur with geospatial data. Unless there is a very regular environmental gradient going from north to south or east to west over your study region, you will have nonlinear effects over space. To deal with this, we will take a step beyond GLMMs and enter the world of GAMMs (generalized additive mixed models). An additive term models the response variable as a function of some combination of unknown smooth functions of the covariates. In other words we are fitting a spline or smoother over space or time.

Today, we are just barely going to be introduced to the concepts and go over a quick example, but I highly recommend checking out the further reading section below to find resources to learn more.

Image (c) Farmers Weekly

Meet the example data

This dataset originally appeared in a paper by Durban et al. in 2003. It is a classic agricultural field trial of barley varieties in Scotland (important for you Scotch lovers out there). It is ideal as an example for us because I was asked to cover typical agricultural experimental designs. Here we are looking at an experiment comparing barley yield between two fungicide treatments and 70 genotypes, in a full factorial design. The experiment is a split-plot randomized complete block design. There are four blocks, each with two main plots, with fungicide treatment randomly assigned to each main plot. Each main plot is divided into 70 subplots, one for each genotype, randomly assigned. Importantly, the main plots were laid out adjacent to each other in one long line, with a total area of 10 rows of 56 beds each.

Here are some plots that will help you visualize how the study is laid out.

data(durban.splitplot)

ggplot(durban.splitplot, aes(x = bed, y = row, fill = block)) +
  geom_tile() +
  theme(aspect.ratio = 10/56, legend.position = 'bottom') +
  ggtitle('Blocks')

maps of experimental layout of barley study

ggplot(durban.splitplot, aes(x = bed, y = row, fill = fung)) +
  geom_tile() +
  theme(aspect.ratio = 10/56, legend.position = 'bottom') +
  ggtitle('Main plots (fungicide treatment)')

maps of experimental layout of barley study

ggplot(durban.splitplot, aes(x = bed, y = row, fill = gen)) +
  geom_tile() +
  theme(aspect.ratio = 10/56, legend.position = 'none') +
  ggtitle('Subplots (genotypes)')

maps of experimental layout of barley study

Examining spatial trends

So far so good. Why don’t we just go ahead and fit a linear mixed model with random effects for block and main plot? That is supposed to take care of the spatial variation, right? Because blocks and main plots are spatially contiguous regions, including random effects for them should account for any random variation in yield over space that isn’t a result of fungicide and genotype, right? Well, sometimes it’s not that easy. Let’s take a look at the spatial pattern of yield.

ggplot(durban.splitplot, aes(x = bed, y = row, fill = yield)) +
  geom_tile() +
  scale_fill_distiller(palette = 'Oranges', direction = 1) +
  theme(aspect.ratio = 10/56, legend.position = 'bottom')

heat map of barley yield

Looking at the above plot we can see there is a clear spatial pattern in the yield. You can make out different regions of high and low yield. Some of it is related to the treatments, it seems, because you can sort of see lines where one fungicide treatment ends and the other begins. But there are some spatial patterns not related to that. For instance there is a region of low yield in the north center that spans across multiple treatment plots. Also the upper left corner has much higher yield than anywhere else.

It makes sense that there might be a spatial trend that is not accounted for by the experimental design, because the blocks and main plots are all arranged in a single line from east to west. So, they will account for any environmental factors that change from east to west, but if anything is varying from north to south or otherwise, the blocks and main plots will not account for that.

Side note: I usually prefer to use a single-color scale for this kind of plot. It is more colorblind-friendly, and is less prone to misleading interpretation. If you use a two-color scale it introduces a false dichotomy that will stand out to people.

But people are good at seeing patterns where there aren’t any. Out of curiosity what would it look like if we randomly shuffled the spatial locations of the subplots within each main plot?

set.seed(125)

durban.splitplot_shuffle <- durban.splitplot %>%
  group_by(block, fung) %>%
  mutate(yield_shuffled = sample(yield))

ggplot(durban.splitplot_shuffle, aes(x = bed, y = row, fill = yield_shuffled)) +
  geom_tile() +
  scale_fill_distiller(palette = 'Oranges', direction = 1) +
  theme(aspect.ratio = 10/56, legend.position = 'bottom')

heat map of barley yield with randomly shuffled subplots

To me this is much more “random” looking, which is a sign that there really is a spatial pattern in the residuals beyond what would be accounted for with block and main plot random effects. But let’s fit the model and see.

Linear mixed model without a spatial component

Here is a model fit to the yield response variable. Our fixed effects are genotype, fungicide, and their interaction (gen + fung + gen:fung). We fit random intercepts for each block ((1 | block)) and each main plot ((1 | fung:block) because each combination of fungicide treatment and block uniquely identifies a main plot).

Note the response variable is scale(yield). That scales (z-transforms) the yield so that we can more easily define a prior on our fixed effects. A z-transformed variable is always in standard deviation units. As a result, a prior like normal(0, 3) is always going to be weakly informative because it encompasses both very large positive and very large negative effect sizes. This helps remove some of the guesswork in defining a prior distribution.

durban_lmm <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block), 
                  prior = c(
                    prior(normal(0, 3), class = b)
                  ),
                  data = durban.splitplot, 
                  chains = 4, iter = 4000, warmup = 2000,
                  seed = 111, file = 'fits/durban_lmm')

Checking residuals for spatial autocorrelation

Now the above model may be good enough for government work (ignoring for the moment any warnings about divergent transitions), but let’s take a look at the residuals. If we can detect any spatial pattern in them, that means that the block and main plot random effects did not fully account for it.

The residuals() function will give us the mean residual over all posterior draws for each data point, and its quantile credible interval. The mean is fine for a quick inspection; we don’t need to worry about the full posterior distribution of each residual. Extract the mean column from the matrix output by residuals(), append it to the data, and plot a map of the residuals.

resid_durban_lmm <- residuals(durban_lmm)

durban.splitplot <- mutate(durban.splitplot, resid_nospatial = resid_durban_lmm[,1])

ggplot(durban.splitplot, (aes(x = bed, y = row, fill = resid_nospatial))) +
  geom_tile() +
  scale_fill_distiller(palette = 'Greens', direction = 1) +
  theme(aspect.ratio = 10/56, legend.position = 'bottom')

heat map of residuals of nonspatial barley yield model

The spatial pattern of the residuals is obvious (at least it is to me). We can still clearly see patches of low and high values here and there. Low residuals mean the model is overpredicting yield, and high residuals mean the model is underpredicting.

Moran’s I

Although our residuals already fail the “eyeball test,” there is also a test statistic that we can calculate to measure how strong the spatial autocorrelation is: Moran’s I. Moran’s I takes a dataset of values that have spatial coordinates, and measures how correlated the difference between any two data points is, with the distance in space between those two data points. It uses the inverse distance, so the higher the correlation, the more likely two close-together data points are to have a similar value. Because Moran’s I is a correlation coefficient, it ranges from -1 to 1. A value of 0 means no spatial pattern and 1 means perfect spatial correlation. We don’t want to see high positive values of Moran’s I for our residuals.

To calculate Moran’s I we need to calculate an inverse distance matrix for all pairs of spatial points in our dataset and then feed that, along with the residuals, to the Moran.I() function in the ape package.

durban_inv_dist <- 1 / as.matrix(dist(cbind(durban.splitplot$bed, durban.splitplot$row)))
diag(durban_inv_dist) <- 0

Moran.I(durban.splitplot$resid_nospatial, durban_inv_dist)

## $observed
## [1] 0.04338147
## 
## $expected
## [1] -0.001788909
## 
## $sd
## [1] 0.003126121
## 
## $p.value
## [1] 0

The Moran’s I statistic is ~0.04. Honestly in practice that might be weak enough to hand-wave away and avoid having to deal with fitting a spatial model. But this is a lesson about how to address spatial autocorrelation in your data, so I am not going to let you get away without fitting a model that accounts for the spatial pattern!

Additive mixed model

Here’s where our linear mixed model becomes an additive mixed model. This model has the same categorical fixed effects and random block and main plot intercepts as the original model. But it also includes a spatial smoother, which is the s() part of the model formula. It sometimes takes a little tinkering to come up with an appropriate smoother, because there are so many different ways to do it. Unfortunately we don’t have time today to really delve into that (in the future I may write a dedicated lesson about it, or you can check the further reading section below for more details).

The function s() ultimately comes from the additive model fitting package mgcv. The developer of brms reworked this function to work within brms formulas.

Regardless of the other smoother details, the first two arguments are the columns with the x and y coordinates of each data point (bed and row respectively). The bs = 'gp' argument means this is a Gaussian process smoother. The Gaussian process smoother is a very flexible model that doesn’t make strong assumptions about the functional shape of the relationship, which is good for our needs. It is computationally intensive so it doesn’t work well with very big datasets, but since this dataset is very modest in size we can get away with using it. There are a lot of other options but we will not get into them right now.

The other argument, k, is called the basis dimension. That is a really important argument. It is the upper limit of the degrees of freedom of the smoother. If k is low, your smoother will be simpler with fewer parameters, but less flexible to capture every last “wiggle” in the data. If k is high, your smoother can fit the data more closely. The model fitting algorithm actually already penalizes overfitting, so even if you set k to a pretty high value, you are still protected against overfitting. So you don’t have to stress too much over choosing an exactly perfect k value. If it is too low, the model won’t be a good fit to the data. If it is too high, the problem isn’t overfitting as much as it is computational. It is very computationally intensive to fit additive models with a very high k.

We will fit the same model with three different k values: 10, 25, and 50. The additive term is always s(bed, row, k = _, bs = 'gp') with only the value of k differing between the three models.

durban_gamm_k10 <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block) + s(bed, row, k = 10, bs = 'gp'), 
                       prior = c(
                         prior(normal(0, 3), class = b)
                       ),
                       data = durban.splitplot, chains = 4, iter = 4000, warmup = 2000,
                       seed = 222, file = 'fits/durban_gamm_k10')

durban_gamm_k25 <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block) + s(bed, row, k = 25, bs = 'gp'), 
                       prior = c(
                         prior(normal(0, 3), class = b)
                       ),
                       data = durban.splitplot, chains = 4, iter = 4000, warmup = 2000,
                       seed = 222, file = 'fits/durban_gamm_k25')

durban_gamm_k50 <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block) + s(bed, row, k = 50, bs = 'gp'), 
                       prior = c(
                         prior(normal(0, 3), class = b)
                       ),
                       data = durban.splitplot, chains = 4, iter = 4000, warmup = 2000,
                       seed = 222, file = 'fits/durban_gamm_k50')

We get warnings about divergent transitions; in practice I would use more informative priors or increase the number of warmup iterations to reduce or eliminate those.

Comparing the models

Let’s compare the four models: the model with no spatial additive term, and the models with three different k values. We get a few warnings about problematic observations, but there are few enough that we can probably overlook them.

durban_lmm <- add_criterion(durban_lmm, 'loo')
durban_gamm_k10 <- add_criterion(durban_gamm_k10, 'loo')
durban_gamm_k25 <- add_criterion(durban_gamm_k25, 'loo')
durban_gamm_k50 <- add_criterion(durban_gamm_k50, 'loo')
loo_compare(durban_lmm, durban_gamm_k10, durban_gamm_k25, durban_gamm_k50)

##                 elpd_diff se_diff
## durban_gamm_k50    0.0       0.0 
## durban_gamm_k25  -59.7       9.2 
## durban_gamm_k10 -142.6      15.2 
## durban_lmm      -168.6      17.1

The verdict is clear. The model without the spatial additive term performs much more poorly than the others. In fact we have a steady increase in model performance as k increases, indicating that we might be able to increase it still further and get a better fit without overfitting. But let’s stick with the k = 50 model and use it to make predictions.

Checking residuals of spatial additive model

How do the residuals look when we plot them?

durban.splitplot <- mutate(durban.splitplot, resid_spatial = residuals(durban_gamm_k50)[,1])

ggplot(durban.splitplot, (aes(x = bed, y = row, fill = resid_spatial))) +
  geom_tile() +
  scale_fill_distiller(palette = 'Greens', direction = 1) +
  theme(aspect.ratio = 10/56, legend.position = 'bottom')

heat map of residuals of spatial barley yield model

At least to me, that looks way better! We don’t really see big patches of underpredicted or overpredicted yields. It looks random, as all good residuals should!

Calculate the Moran’s I autocorrelation statistic for the GAMM residuals.

Moran.I(durban.splitplot$resid_spatial, durban_inv_dist)

## $observed
## [1] -0.005733225
## 
## $expected
## [1] -0.001788909
## 
## $sd
## [1] 0.00312694
## 
## $p.value
## [1] 0.2071654

It is very close to zero. In fact the point estimate is negative. Excellent!

Impact of spatial additive term on our conclusions

But was there any point in doing that complicated exercise? Does it affect any of the conclusions we draw about the effects of genotype and fungicide on barley yield? If it doesn’t, we could have saved ourselves all that trouble.

Let’s calculate the estimated marginal means for each combination of genotype and fungicide treatment for the non-spatial and spatial models.

emm_durban_nonspatial <- emmeans(durban_lmm, ~ gen + fung)
emm_durban_spatial <- emmeans(durban_gamm_k50, ~ gen + fung)

Here’s some code to combine both sets of means into one dataset and plot them side-by-side to see whether any of our inferences were affected by adding the spatial additive term to the model. The error bars are the 95% highest posterior density credible intervals.

The fct_reorder() line sorts the genotypes in increasing order of their mean averaged over the different fungicide treatments and models. This is a nice trick for making plots.

means_durban_combined <- bind_rows(
  tibble(model = 'nonspatial', as_tibble(emm_durban_nonspatial)),
  tibble(model = 'spatial', as_tibble(emm_durban_spatial))
)

means_durban_combined %>%
  mutate(gen = fct_reorder(gen, emmean)) %>%
  ggplot(aes(x = gen, y = emmean, ymin = lower.HPD, ymax = upper.HPD, group = interaction(gen, model), color = model)) +
    facet_wrap(~ fung, ncol = 1) +
    geom_errorbar(position = position_dodge(width = .5)) +
    geom_point(position = position_dodge(width = .5), size = 2) +
    colorblind_friendly +
    theme(legend.position = 'bottom', axis.text.x = element_text(angle = 90))

plot of posterior means of barley yield by genotype x fungicide

Well, the differences aren’t super dramatic between the two models. But you can see that the yield for fungicide F2 is systematically overestimated by the non-spatial model compared to the spatial model. (The point estimate is higher for almost all genotypes.) So that might be a meaningful consequence.

Visualizing spatial additive term

Let’s also plot the spatial additive effect just to see exactly what the spatial model is estimating. We have to pull it out of the model by getting the model’s expected value of the posterior predictive for all combinations of bed, row, genotype, and fungicide. We will only take a few draws (10) from the posterior to make this computationally quicker, because we only want to plot one value per coordinate anyway. Then average the predictions over the genotype and fungicide levels and over the posterior draws, so we are left with one value per coordinate. Make a map!

pred_grid_durban <- expand_grid(
  bed = 1:56,
  row = 1:10,
  gen = unique(durban.splitplot$gen),
  fung = unique(durban.splitplot$fung))

spatial_field_durban <- add_epred_draws(pred_grid_durban, durban_gamm_k50, re_formula = ~ 0, ndraws = 10) %>% 
  group_by(bed, row) %>% 
  summarize(.epred = mean(.epred)) 

ggplot(spatial_field_durban, (aes(x = bed, y = row, fill = .epred))) +
  geom_tile() +
  scale_fill_distiller(palette = 'Reds', direction = 1) +
  theme(aspect.ratio = 10/56, legend.position = 'bottom')

heat map of spatial additive term from barley yield model

We can see that the smoother captured the region of low yield in the middle top and high yield in the bottom right. If you squint, this map looks a lot like the map of residuals from the non-spatial model. That’s exactly what we would want!

Estimating marginal means

A key result you might be interested in is the effect of fungicide, within each genotype and averaged over the different genotypes. The ever-handy emmeans package can help us out with this, producing the model predictions averaged over the random effects of block and main plot and over the spatial smoother. Here is code for making both those comparisons. I will not show the genotype-level predictions in this notebook in the interest of space, just the overall predictions.

emms_bygenotype <- emmeans(durban_gamm_k50, pairwise ~ fung | gen)
emms_overall <- emmeans(durban_gamm_k50, pairwise ~ fung)

emms_overall

## $emmeans
##  fung emmean lower.HPD upper.HPD
##  F1    0.342     -1.05     1.715
##  F2   -1.001     -2.31     0.383
## 
## Results are averaged over the levels of: gen 
## Point estimate displayed: median 
## HPD interval probability: 0.95 
## 
## $contrasts
##  contrast estimate lower.HPD upper.HPD
##  F1 - F2      1.35    0.0869      2.43
## 
## Results are averaged over the levels of: gen 
## Point estimate displayed: median 
## HPD interval probability: 0.95

We see that F1 has a higher mean yield than F2, averaged over the effect of genotype. However, these results are still given on the z-transformed scale. Below when presenting the results I will back-transform the means and contrast.

Presentation ideas

Here are some ideas for how you might describe this in the methods and results sections of your manuscript. I’ve included a methods section and a table; the results write-up and figure are left to you as an exercise!

Methods section

A methods description might look like this:

We fit a general additive mixed model to the z-transformed yield variable. The fixed effects were genotype, fungicide, and their interaction, with random intercepts for block and for main plot nested within block. To account for spatial variation, we included a Gaussian process spatial smoothing term. We fit models with basis dimension (k) values 10, 25, and 50. We put a Normal(0, 3) prior on all fixed effects. Models were fit with Hamiltonian Monte Carlo, with 4 Markov chains, each run for 2000 warmup iterations before sampling 2000 draws for each chain. Model convergence was assessed by checking that R-hat < 1.01 for all parameters, and model fit was assessed by examining posterior predictive check plots. We compared models with leave-one-out cross-validation. The model with basis dimension k=50 performed best (ΔELPD 58.8 or greater); thus we present predictions from that model.

Table

In the results section, I would focus on showing the predictions for the different genotypes under each fungicide level. A table might show the estimated marginal means for fungicides averaged over genotypes, and the difference between them, with a few different credible intervals. This is transformed back from the scale of the standardized variable to the original scale.

means_data_wide <- gather_emmeans_draws(emms_overall) %>%
  mutate(.value = .value * sd(durban.splitplot$yield),
         .value = if_else(.grid == 'emmeans', .value + mean(durban.splitplot$yield), .value)) %>%
  median_qi(.width = c(0.66, 0.95, 0.99)) %>%
  mutate(fung = coalesce(fung, contrast)) %>%
  pivot_wider(id_cols = c(fung, .value), names_from = .width, values_from = c(.lower, .upper)) 

gt(means_data_wide) %>%
  fmt_number(decimals = 3) %>%
  cols_merge(c(.lower_0.66, .upper_0.66), pattern = '[{1}, {2}]') %>%
  cols_merge(c(.lower_0.95, .upper_0.95), pattern = '[{1}, {2}]') %>%
  cols_merge(c(.lower_0.99, .upper_0.99), pattern = '[{1}, {2}]') %>%
  cols_label(fung = 'fungicide', .value = 'median', .lower_0.66 = '66% CI', .lower_0.95 = '95% CI', .lower_0.99 = '99% CI') %>%
  tab_options(column_labels.font.weight = 'bold')

fungicide	median	66% CI	95% CI	99% CI
F1	5.409	[5.125, 5.687]	[4.676, 6.055]	[4.374, 6.342]
F2	4.743	[4.464, 5.016]	[4.091, 5.426]	[3.789, 5.748]
F1 - F2	0.671	[0.457, 0.874]	[0.018, 1.188]	[−0.272, 1.478]

Conclusion

Congratulations, you’ve now made it through three lessons of the Bayesian crash course! As we always do, let’s look back over the learning objectives to see what we’ve learned.

Now you have experience:

Fitting a model that assumes residuals are correlated in time, and comparing different covariance structures
Diagnosing spatial autocorrelation in residuals, and fitting a GAMM with a spatial additive term to deal with the problem

You’re really starting to get the hang of this Bayes thing!

Going further

Here are some additional resources on (Bayesian) GAMMs to complement the lists I put at the ends of Lessons 1 and 2.

Beginner’s Guide to Generalized Additive Mixed Models with R: A great book by Alain Zuur et al., though it’s kind of hard to find a copy. A good introduction to GAMMs, both Bayesian and frequentist, with lots of examples.
QCBS R Workshop: Generalized additive models in R: One of a series of excellent tutorials; shows how to use R to fit frequentist GAMs and GAMMs
Gavin Simpson blog post on fitting GAMs with brms: This is a great blog to check out if you want to really nerd out on GAMs

Exercises

Here are some exercises to help you practice with the concepts we learned in this lesson.

Exercise 1

When we were working with the diggle.cow data, we noticed that the trend of weight gain over time was somewhat nonlinear. To address this, we can fit a temporal additive term instead of modeling the residual covariance, and include a random intercept for each animal. This turns the model into a GAMM.

Here is a GAMM model fit to the diggle.cow data, with a separate time spline fit to each fixed effect combination and a random intercept for each animal.

diggle_GAMM <- brm(weight_scaled ~ iron * infect + s(dayafterstart, by = interaction(iron, infect)) + (1 | animal),
                   prior = c(
                     prior(normal(0, 5), class = b)
                   ),
                   data = diggle.cow, 
                   seed = 1003, file = 'fits/diggle_GAMM')

Here is a quick way to plot the fitted values (not shown).

conditional_effects(diggle_GAMM, effects = c('dayafterstart:iron'), conditions = data.frame(infect = c('Infected','NonInfected')))

Comment qualitatively on the results. Does it look like the inference is different than when we modeled the residual covariance?
Use leave-one-out cross-validation to compare the GAMM to the autoregressive model. What can you conclude?

Exercise 2

In this exercise, we will take a look at yet another example dataset from the agridat package, broadbalk.wheat. These data were collected at the famous Rothamsted Experimental Station, documenting wheat yields annually from 1852-1925. There is a time series for 17 different plots (the plots were subject to different treatments but we will not consider that for this exercise). We are interested in whether there is a trend in wheat yield over time.

data(broadbalk.wheat)

ggplot(broadbalk.wheat, aes(x = year, y = grain, group = plot)) + geom_line()

time series plot of Broadbalk wheat yield data

Fit a model with compound symmetry covariance structure and a model with AR(1) covariance structure, grouped by plot. The response variable is grain and there should be a fixed effect for year. Here is some code to get you started. Fill in the blanks. Don’t worry about changing default priors and sampling options for the purpose of this exercise.

broadbalk_cosy <- brm(grain ~ ... + ...(time = ..., gr = ...), data = ...)
broadbalk_ar1 <- brm(grain ~ ... + ...(time = ..., gr = ..., p = ...), data = ...)

Compare the models using leave-one-out cross-validation. Which one performs better? Why do you think this is?
Is there evidence for a trend in grain yield over time?

Exercise 3

Here’s yet another example dataset from agridat. It is a variety trial with two replicates each of 64 genotypes. Yield was measured. The spatial coordinates of each plot are given in the columns col (x coordinate) and row (y coordinate).

data(burgueno.rowcol)

Make a heat map of yield. Does it look like there is a spatial pattern?
Fit the following models. The first does not have a spatial component. It is just a simple linear model with a fixed effect of gen. The second includes a spatial additive term (Gaussian process with basis dimension 10). Here is some code to get you started. Fill in the blanks. Don’t worry about changing default priors and sampling options for the purpose of this exercise.

burgueno_nonspatial <- brm(yield ~ ..., data = ...)
burgueno_spatial <- brm(yield ~ ... + s(..., ..., k = ..., bs = ...), data = ...)

Make a heat map of the residuals of each of these models. Comment on any difference between them. You can use residuals(modelname)[,1] to get the residuals for each model, and append that as a new column to the burgueno.rowcol data frame to plot them.

Click here for answers

A crash course in Bayesian mixed models with brms (Lesson 3)

Quentin D. Read

September 09, 2025

Introduction and setup

What did we learn in Lessons 1 and 2?

What will we learn in Lesson 3?

Space and time

Load packages

Model with residual covariance for repeated measures in time

Meet the example data

Different ways to model residual covariance

Model syntax for residual covariance structures

Fitting the residual covariance models

Comparing the models

Cross-validation two different ways

Assessing model predictions

Presentation ideas

Methods section

Results section

Figure

Table

Spatial additive model

Meet the example data

Examining spatial trends

Linear mixed model without a spatial component

Checking residuals for spatial autocorrelation

Moran’s I

Additive mixed model

Comparing the models

Checking residuals of spatial additive model

Impact of spatial additive term on our conclusions

Visualizing spatial additive term

Estimating marginal means

Presentation ideas

Methods section

Table

Conclusion

Going further

Exercises

Exercise 1

Exercise 2

Exercise 3