This lesson picks up where Lesson 2 left off, continuing along with a practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan).
Download the worksheet for this lesson here.
So far, we have covered the basics of Bayesian inference and learned about the basic components of a Bayesian model (prior, likelihood, and posterior). We’ve used the R package brms to fit Bayesian linear mixed models using the Hamiltonian Monte Carlo method, with the Stan language doing the sampling behind the scenes. We’ve learned some tricks to deal with convergence issues when sampling. We’ve used the posterior distributions from our models to make inferences: we’ve calculated credible intervals, plotted model parameters and predictions, compared models with information criteria, and done hypothesis testing with Bayesian-style p-values. That was all Lesson 1!
In Lesson 2, we explored prior distributions in more depth and learned how the inference you make can be sensitive to your choice of prior, especially if you’re working with small data. Then we learned how to specify and fit Bayesian GLMs (for non-normal data) and GLMMs (for non-normal data with random effects). We now know what response distributions and link function we can use if we have binary response data or positive right-skewed data.
Another piece of feedback I got when I taught the first Bayesian workshop is that people are interested in learning how to specify more complex model structures that go along with typical experimental designs in agricultural science. I think this lesson helps address that need. In ag science, we often have data collected over time and space. Usually, we can assume that the closer in time or space two data points are, the more they will be correlated with one another. In this lesson, we are going to learn how to account for that non-independence in time or space when we are trying to estimate an effect. We will learn:
The basic insight that you need to have to understand why specialized model structures are needed to deal with data collected over time and space is this: the closer two things are in time or space, the more closely related they are. In a spatial context, this is known as the First Law of Geography.
This has an important consequence if we’re collecting data and trying to use that data to make conclusions about the world. Let’s say we are doing an observational study of farms to look at how their management practices influence soil carbon sequestration. If you have farms that are right next to each other, they might have similar environmental conditions. So even though each farm may make independent management decisions, they are not fully independent because they share a spatial environment. This is not only an issue with observational studies, but also with designed experiments. Even if you randomly assign plots to treatments, plots that are near each other in space or time will share environmental influences.
The statistical models that you’re familiar with, including all the models we’ve covered so far in this course, assume that the residuals of every data point in the model are independent of one another. Even the mixed models we’ve seen so far assume that after you account for the random effects, the residuals are independent. Often, patterns of variation in space and time cause this assumption of independent residuals to be a bad one. What can we do about that?
There are two main ways that we can model data that are correlated in space and time.
Both these approaches can work for spatial and temporal data. In this lesson, we will work through two examples. The first is a GLMM with a residual covariance structure to model autocorrelation in time (first approach). The second is a GAMM with a spatial additive term (second approach).
Note: it is sometimes necessary to combine the two approaches and fit a model with an additive component that also models the residual covariance. We will not open that can of worms today!
Load the same packages and plotting themes as we have done before. We are also going to use the ape package to calculate spatial autocorrelation statistics.
library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)
library(ape)
theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))
options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'on_change')
Our first foray into the space-time continuum will be to deal with datasets where the same subjects (e.g., plots or animals) are measured at different time points. We want to account for measurements from the same subject at different times not being independent.
We are going to continue to use example data from the
agridat package in this lesson. For our first example,
we are going to use data from an experiment on weight gain in cows,
originally published by Lepper & Lewis in 1989 and then reanalyzed
by Diggle et al. in 1994. There were 26 cows in the study, divided
roughly evenly among four treatment combinations. The study was designed
to study how iron supplementation and tuberculosis infection interact to
affect body weight gain in cows. The two treatments in the 2×2 factorial
design were whether the animals got supplemental iron
(NoIron
and Iron
) and whether the animals were
inoculated with tuberculosis at six weeks of age
(NonInfected
and Infected
). Animal ID is in
the animal
column.
The cows were weighed 23 times. The number of days between weighings was slightly uneven, but they’re pretty close, so we’ll ignore that minor inconsistency for the purposes of this analysis. We’re interested in how much the rate of weight gain over time is influenced by iron supplementation and the presence of tuberculosis, and how much the effect of iron depends on tuberculosis (i.e., do iron and tuberculosis interact)?
Let’s load the example data and then relevel the factors so that the
control groups (NoIron
and NonInfected
) are
first in the factor level ordering. To me this makes the results more
interpretable because if the control group comes first it will be the
reference, or intercept, level.
data(diggle.cow)
diggle.cow <- diggle.cow %>%
mutate(iron = relevel(iron, ref = 'NoIron'),
infect = relevel(infect, ref = 'NonInfected'))
Make a plot of the data, showing each animal’s weight gain trend as an individual thin line, with the median trend for each treatment combination as a thick line. Color indicates iron supplementation and line type indicates infection treatment.
ggplot(diggle.cow, aes(x = day, y = weight, color = iron, linetype = infect, group = animal)) +
geom_line() +
stat_summary(aes(group = interaction(iron, infect)), fun = 'median', geom = 'line', linewidth = 2, show.legend = FALSE) +
colorblind_friendly +
labs(y = 'weight (kg)') +
theme(legend.title = element_blank())
We can see from the above plot that the weight gain trend over time is pretty linear. For simplicity here, we’ll assume a linear trend of weight with respect to day. The slopes seem different between the treatments. The rate of weight gain looks faster in the no iron supplement groups and in the uninfected groups. But the effect of iron supplementation does not appear to depend very strongly on infection; no iron looks higher than iron in both infected and uninfected. So I would not expect to see strong evidence for a statistical interaction.
When doing Monte Carlo sampling, it is a good idea to not have
extremely large or small values in the predictors or response. They lead
to numerical problems. So because the lowest number in the
day
column is 122, we will make a new column,
dayafterstart
, that begins at 0. In addition, we will
standardize (z-transform, or subtract mean and then divide by standard
deviation) the weight variable. Neither of those things will affect our
conclusions, because we are just adding and multiplying by constants.
But they will make the numerical computations done by brm()
easier.
Another benefit of standardizing the variables is that if we know our response variable is in standard deviation units, it makes it easier for us to decide on a prior distribution for the fixed effects. If there is a categorical fixed effect with a coefficient of 5 in standard deviation units, we know that represents a huge effect. But if the variable is body weight, 5 would be a big effect if weight is measured in kg but a tiny effect if it is measured in mg. No need to guess, if we standardize the variable first.
diggle.cow$dayafterstart <- diggle.cow$day - min(diggle.cow$day)
diggle.cow$weight_scaled <- scale(diggle.cow$weight)
We know that we are going to have to include
weight_scaled ~ dayafterstart * iron * infect
as the fixed
part of our model formula so that we can estimate the slope of body
weight versus time, and how that slope is affected by iron
supplementation, infection, and their interaction. But we have 26 cows
in the study that each have their own time trend. We need to make sure
our model accounts for body weight measurements from the same cow on
different days not being independent of each other.
Up to now, we’ve assumed that residuals in our statistical models are all conditionally independent. In other words, after accounting for any fixed and random effects, one residual has no relationship to any other. Those models still had a variance-covariance matrix for the residuals; it just had zero for any of the covariances between different residuals. If we call that matrix \(\textbf{R}\) then our models with independent residuals had this covariance structure:
\(\textbf{R} = \sigma^2\textbf{I}\)
That’s a \(n \times n\) matrix, where \(n\) is the number of time points. It has \(\sigma^2\) on every element of the diagonal, meaning that all residuals have the same variance. It has 0 everywhere else, meaning there is 0 covariance between any pair of residuals.
The big step we are taking here is to put some non-zero values into the \(\textbf{R}\) matrix to model correlation between the residuals. But how do we do that? There are lots of different ways we could think about modeling that so-called “residual autocorrelation.” We’ll go through three of the simplest ones here.
\[\textbf{R}=\sigma^2\begin{pmatrix}1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \end{pmatrix}\]
We are going to try to fit all three of the above covariance structures to the body weight gain data. It’s likely that the unstructured covariance model will be extremely overparameterized and cause a lot of model fitting issues. That’s because there are 23 time points, so we would have to estimate a parameter for all pairs of days, 253 in total! Normally I would probably not try to fit that model, but I did it here as proof of concept.
The model syntax for fitting the different covariance structures is
not too complex in brms. You need to specify (1) the
type of covariance structure as the function name, (2) the column in the
dataset that is the time variable as the time
argument, and
(3) the column in the dataset that is the grouping or subject variable
as the gr
argument. We want to model observations from the
same subject as being correlated, and potentially having that
correlation vary as a function of closeness in time.
unstr(time = dayafterstart, gr = animal)
cosy(time = dayafterstart, gr = animal)
ar(time = dayafterstart, gr = animal, p = 1)
. Note the
p = 1
specifies what order of autoregressive structure we
want. Increasing p
would yield more complex autoregressive
structures with deeper time lags.Now let’s fit the models. The unstructured covariance model takes quite a bit of time to run, and I had to do quite a bit of tweaking to the priors and initial values and increase the number of iterations so that the model would converge. In practice often these warnings are red flags that the model is overparameterized. The biggest red flag is that I had to initialize all the chains to starting values of zero just to get it to sample, which is not a good habit to get into. But as I said above we are trying to fit over 250 unique covariance parameters to a pretty small dataset with only ~600 rows. We know that’s not a great idea, but we are just fitting it to compare with the other two models.
Note: a normal prior is put on the fixed effects, with a fairly large standard deviation. An additional prior is placed on the covariance matrix for the unstructured covariance model. This is because we have so many parameters to estimate within that matrix. A prior that sets a low probability on high correlations is necessary to get the model to converge. By the way, I tried to fit identical models using the frequentist modeling package glmmTMB and could not successfully get the models to fit — we are now in the territory where Bayesian models really outperform their frequentist cousins in terms of actually getting you an answer!
diggle_unstructured <- brm(weight_scaled ~ iron * infect * dayafterstart + unstr(time = dayafterstart, gr = animal),
prior = c(
prior(normal(0, 5), class = b),
prior(lkj_corr_cholesky(3), class = Lcortime)
),
iter = 4000, warmup = 3000, init = 0,
data = diggle.cow,
seed = 1001, file = 'fits/diggle_unstructured')
diggle_compoundsymm <- brm(weight_scaled ~ iron * infect * dayafterstart + cosy(time = dayafterstart, gr = animal),
prior = c(
prior(normal(0, 5), class = b)
),
data = diggle.cow,
seed = 1002, file = 'fits/diggle_compoundsymm')
diggle_ar1 <- brm(weight_scaled ~ iron * infect * dayafterstart + ar(time = dayafterstart, gr = animal, p = 1),
prior = c(
prior(normal(0, 5), class = b)
),
data = diggle.cow,
seed = 1003, file = 'fits/diggle_ar1')
Let’s look at the posterior predictive check plots for each model.
pp_check(diggle_unstructured)
pp_check(diggle_compoundsymm)
pp_check(diggle_ar1)
They all look reasonably good; possibly the AR1 is a slightly better fit, judging from the check plot.
Unfortunately, the leave-one-out cross-validation that we’ve been using to compare models in Lessons 1 and 2 does not work well with models like the unstructured covariance model, where each observation basically has its own parameter (technical explanation can be found here). It will give misleading results. Instead, we have to use k-fold cross-validation to compare the unstructured model to the other two models. Because this requires resampling each model \(k\) times, it is very time-consuming. Thus I present the results here instead of having you run them.
kfold(diggle_unstructured, diggle_compoundsymm, diggle_ar1, folds = 'grouped', group = 'animal', K = 5, compare = TRUE)
Model comparisons:
elpd_diff se_diff
diggle_ar1 0.0 0.0
diggle_unstructured -41.3 19.0
diggle_compoundsymm -295.1 24.5
The k-fold cross-validation shows that the AR1 model is superior, somewhat better than the unstructured covariance model. But despite having tons of parameters, the unstructured model still outperforms the compound symmetry model. It isn’t surprising to me that the AR1 model performs well in this case, because we have an ideal dataset to illustrate a first-order autoregressive process. A cow’s weight at time \(t\) is directly dependent on its weight at the previous timepoint \(t-1\). Other processes might be better modeled with compound symmetry or unstructured covariance, such as annual grain yield from a field which might jump up and down with “random” environmental fluctuations from year to year.
In classical frequentist analysis, unstructured covariance models tend to perform very poorly because they are so overparameterized. However in the Bayesian world, models with high numbers of parameters, like unstructured covariance models, actually tend to perform relatively well. This is because of the regularizing effect of priors. Priors can “shrink” the parameter values toward zero so that even a model with lots and lots of parameters isn’t overfit.
Now that we’ve established which model is best, we can go ahead and
look at the posterior estimates. As always the emmeans
package has a trick for this. The function emtrends()
(estimated marginal trends) will get the posterior estimate of the slope
for each treatment combination, averaged over the random effects.
The syntax is like emmeans()
but with an additional
argument, var
. We pass the name of the time variable that
we want the trend over. We provide the specification
revpairwise ~ iron | infect
to get a reverse pairwise
contrast between each level of iron
within each level of
infect
. (It is reverse pairwise so that we subtract the
control from the treatment, to see if the treatment has a positive or
negative effect relative to control.) You can swap iron
and
infect
to get infection contrasts within each iron
supplementation level.
Note that the slopes are still in standardized units. We will convert them back into the original kg/day units for later presentation.
emt_diggle <- emtrends(diggle_ar1, revpairwise ~ iron | infect, var = 'dayafterstart')
emt_diggle
## $emtrends
## infect = NonInfected:
## iron dayafterstart.trend lower.HPD upper.HPD
## NoIron 0.00535 0.00458 0.00607
## Iron 0.00477 0.00385 0.00560
##
## infect = Infected:
## iron dayafterstart.trend lower.HPD upper.HPD
## NoIron 0.00444 0.00391 0.00495
## Iron 0.00396 0.00346 0.00446
##
## Point estimate displayed: median
## HPD interval probability: 0.95
##
## $contrasts
## infect = NonInfected:
## contrast estimate lower.HPD upper.HPD
## Iron - NoIron -0.000576 -0.00171 0.000572
##
## infect = Infected:
## contrast estimate lower.HPD upper.HPD
## Iron - NoIron -0.000488 -0.00119 0.000179
##
## Point estimate displayed: median
## HPD interval probability: 0.95
We see more or less what we expected after looking at the raw data.
The slope is slightly larger in the NoIron
group, by a
similar margin in each infection group. However, the 95% credible
interval is consistent with the true difference being zero, or even with
the Iron
group having a higher slope. Thus for both
infected and uninfected animals, the evidence that iron supplementation
affects weight gain rate is weak.
If you wanted to see whether the slope difference between iron levels is greater or smaller in infected cattle, you could contrast the slope differences with
contrast(rbind(emt_diggle$contrasts), 'pairwise')
(not shown).
We can use describe_posterior()
from
bayestestR to calculate two flavors of Bayesian
p-value. The Bayesian p-values also don’t indicate strong evidence for a
difference. Based on this, I would not conclude that iron
supplementation affects weight gain rate, though we have to consider
that the sample size is fairly small (n = 6 or 7 cows per group).
describe_posterior(emt_diggle$contrasts, ci = NULL, test = c('pd', 'p_map'))
## Summary of Posterior Distribution
##
## Parameter | Median | p (MAP) | pd
## --------------------------------------------------------
## Iron - NoIron NonInfected | -5.76e-04 | 0.635 | 83.85%
## Iron - NoIron Infected | -4.88e-04 | 0.404 | 92.00%
How might we write this up in the methods and results section of a manuscript?
A methods write-up might look like this. Note I include the model comparison results in the methods. I consider that to be more of a technical aspect of the model fitting, which would distract from the story of the results.
We fit a linear mixed model to the trend in standardized body weight over time with residual covariance structure to account for repeated measures within animals. The model had fixed effects for time, iron supplementation, tuberculosis infection, and their interactions. We fit models with unstructured, compound symmetry, and AR(1) residual covariance. A Normal(0, 5) prior distribution was assumed for the fixed effects. An LKJ prior with η = 3 was given to the unstructured residual covariance matrix. Models were fit with Hamiltonian Monte Carlo. The joint posterior distribution was sampled with four Markov chains, run for a variable number of warmup iterations and 1000 sampling iterations. We verified the R-hat statistic was < 1.01 for all parameters, indicating model convergence. In k-fold cross-validation, the first-order autoregressive model performed better than the unstructured (ΔELPD = 41) and compound symmetry models (ΔELPD = 295). Thus we present predictions from the autoregressive model.
If the focus of your inquiry is how much iron supplementation influences the rate of weight gain, and whether iron’s effect is modified by tuberculosis, you could write something like this in the results.
There was little evidence for an effect of iron supplementation on the rate of weight gain. In uninfected animals the posterior estimate of the difference in mean weight gain rate between animals receiving supplemental iron and those that did not was -0.046 kg/day (95% CI [-0.135, 0.046]; pMAP = 0.64), while in infected animals it was -0.039 kg/day (95% CI [-0.094, 0.015]; pMAP = 0.40).
Here is how you could make a figure that includes the raw data plotted over the modeled trends, with a 95% credible interval (plotting multiple credible intervals on this kind of plot makes it too busy). We have to calculate the expected value of the posterior predictive at a bunch of time points for each of the treatment combinations, then summarize the distribution across that grid using its median and 95% quantile CI.
Note that
incl_autocor = FALSE
is included in the prediction function so that we can get an overall prediction that averages over the autocorrelation within each animal. Also note that when creating the plotting data, I rescaled the day and weight variables to their original scales.
cow_weight_preds <- expand.grid(dayafterstart = 0:max(diggle.cow$dayafterstart), iron = c('NoIron', 'Iron'), infect = c('NonInfected', 'Infected')) %>%
add_epred_draws(diggle_ar1, incl_autocor = FALSE) %>%
mutate(day = dayafterstart + min(diggle.cow$day),
weight = .epred * sd(diggle.cow$weight, na.rm = TRUE) + mean(diggle.cow$weight, na.rm = TRUE))
ggplot(diggle.cow, aes(x = day, y = weight, color = iron, fill = iron, linetype = infect)) +
geom_line(aes(group = animal)) +
stat_summary(geom = 'ribbon', data = cow_weight_preds, fun.min = ~ quantile(., 0.025), fun.max = ~ quantile(., 0.975), alpha = 0.25, show.legend = FALSE) +
stat_summary(geom = 'line', data = cow_weight_preds, fun = 'median', linewidth = 2, show.legend = FALSE) +
colorblind_friendly +
labs(y = 'weight (kg)') +
theme(legend.title = element_blank())
The figure shows that the model is a reasonably good fit to the data, in addition to illustrating the fairly weak evidence for any impact of iron supplementation on weight gain rate. In practice, the one improvement I might make is to allow for a nonlinear trend over time, though this would make the simple interpretation of slope differences a little bit trickier.
For the table, I rescaled the slopes from standardized units (standard deviations of weight/day) to more interpretable units (kg/day) by multiplying them by the standard deviation of the weight column in the data. The table shows the posterior median estimates of the weight gain slopes and some credible intervals.
I like to use 66% and 95% because if the posterior distribution is roughly normal, those are approximately 1 and 2 standard deviations above and below the mean.
weight_slope_summary <- gather_emmeans_draws(emt_diggle$emtrends) %>%
mutate(.value = .value * sd(diggle.cow$weight, na.rm = TRUE)) %>%
median_qi(.width = c(0.66, 0.95)) %>%
pivot_wider(names_from = .width, values_from = c(.lower, .upper)) %>%
select(-c(.point, .interval))
gt(weight_slope_summary) %>%
fmt_number(decimals = 3) %>%
cols_merge(c(.lower_0.66, .upper_0.66), pattern = '[{1}, {2}]') %>%
cols_merge(c(.lower_0.95, .upper_0.95), pattern = '[{1}, {2}]') %>%
cols_label(infect = 'tuberculosis infection', .value = 'weight gain rate (kg/day)', .lower_0.66 = '66% CI', .lower_0.95 = '95% CI') %>%
tab_options(column_labels.font.weight = 'bold')
iron | tuberculosis infection | weight gain rate (kg/day) | 66% CI | 95% CI |
---|---|---|---|---|
NoIron | NonInfected | 0.424 | [0.394, 0.452] | [0.360, 0.479] |
NoIron | Infected | 0.352 | [0.331, 0.371] | [0.308, 0.391] |
Iron | NonInfected | 0.378 | [0.344, 0.412] | [0.306, 0.445] |
Iron | Infected | 0.313 | [0.294, 0.332] | [0.271, 0.350] |
That was fun, but how about something really cool?!?
An alternative to modeling the residual covariance is to explicitly fit a spatial or temporal term in your model. Instead of just treating space and time variation as a nuisance term to be accounted for by a random effect, you can use this kind of model to make inference about the space and time trend in addition to your other effects of interest.
Okay, so let’s just slap a covariate for time or spatial coordinates into our linear model … problem solved, right? Keep it simple, right? Well, the effect of space and time is not often linear. Nonlinear effects are very likely to occur with geospatial data. Unless there is a very regular environmental gradient going from north to south or east to west over your study region, you will have nonlinear effects over space. To deal with this, we will take a step beyond GLMMs and enter the world of GAMMs (generalized additive mixed models). An additive term models the response variable as a function of some combination of unknown smooth functions of the covariates. In other words we are fitting a spline or smoother over space or time.
Today, we are just barely going to be introduced to the concepts and go over a quick example, but I highly recommend checking out the further reading section below to find resources to learn more.
This dataset originally appeared in a paper by Durban et al. in 2003. It is a classic agricultural field trial of barley varieties in Scotland (important for you Scotch lovers out there). It is ideal as an example for us because I was asked to cover typical agricultural experimental designs. Here we are looking at an experiment comparing barley yield between two fungicide treatments and 70 genotypes, in a full factorial design. The experiment is a split-plot randomized complete block design. There are four blocks, each with two main plots, with fungicide treatment randomly assigned to each main plot. Each main plot is divided into 70 subplots, one for each genotype, randomly assigned. Importantly, the main plots were laid out adjacent to each other in one long line, with a total area of 10 rows of 56 beds each.
Here are some plots that will help you visualize how the study is laid out.
data(durban.splitplot)
ggplot(durban.splitplot, aes(x = bed, y = row, fill = block)) +
geom_tile() +
theme(aspect.ratio = 10/56, legend.position = 'bottom') +
ggtitle('Blocks')
ggplot(durban.splitplot, aes(x = bed, y = row, fill = fung)) +
geom_tile() +
theme(aspect.ratio = 10/56, legend.position = 'bottom') +
ggtitle('Main plots (fungicide treatment)')
ggplot(durban.splitplot, aes(x = bed, y = row, fill = gen)) +
geom_tile() +
theme(aspect.ratio = 10/56, legend.position = 'none') +
ggtitle('Subplots (genotypes)')
So far so good. Why don’t we just go ahead and fit a linear mixed model with random effects for block and main plot? That is supposed to take care of the spatial variation, right? Because blocks and main plots are spatially contiguous regions, including random effects for them should account for any random variation in yield over space that isn’t a result of fungicide and genotype, right? Well, sometimes it’s not that easy. Let’s take a look at the spatial pattern of yield.
ggplot(durban.splitplot, aes(x = bed, y = row, fill = yield)) +
geom_tile() +
scale_fill_distiller(palette = 'Oranges', direction = 1) +
theme(aspect.ratio = 10/56, legend.position = 'bottom')
Looking at the above plot we can see there is a clear spatial pattern in the yield. You can make out different regions of high and low yield. Some of it is related to the treatments, it seems, because you can sort of see lines where one fungicide treatment ends and the other begins. But there are some spatial patterns not related to that. For instance there is a region of low yield in the north center that spans across multiple treatment plots. Also the upper left corner has much higher yield than anywhere else.
It makes sense that there might be a spatial trend that is not accounted for by the experimental design, because the blocks and main plots are all arranged in a single line from east to west. So, they will account for any environmental factors that change from east to west, but if anything is varying from north to south or otherwise, the blocks and main plots will not account for that.
Side note: I usually prefer to use a single-color scale for this kind of plot. It is more colorblind-friendly, and is less prone to misleading interpretation. If you use a two-color scale it introduces a false dichotomy that will stand out to people.
But people are good at seeing patterns where there aren’t any. Out of curiosity what would it look like if we randomly shuffled the spatial locations of the subplots within each main plot?
set.seed(125)
durban.splitplot_shuffle <- durban.splitplot %>%
group_by(block, fung) %>%
mutate(yield_shuffled = sample(yield))
ggplot(durban.splitplot_shuffle, aes(x = bed, y = row, fill = yield_shuffled)) +
geom_tile() +
scale_fill_distiller(palette = 'Oranges', direction = 1) +
theme(aspect.ratio = 10/56, legend.position = 'bottom')
To me this is much more “random” looking, which is a sign that there really is a spatial pattern in the residuals beyond what would be accounted for with block and main plot random effects. But let’s fit the model and see.
Here is a model fit to the yield response variable. Our fixed effects
are genotype, fungicide, and their interaction
(gen + fung + gen:fung
). We fit random intercepts for each
block ((1 | block)
) and each main plot
((1 | fung:block)
because each combination of fungicide
treatment and block uniquely identifies a main plot).
Note the response variable is
scale(yield)
. That scales (z-transforms) the yield so that we can more easily define a prior on our fixed effects. A z-transformed variable is always in standard deviation units. As a result, a prior likenormal(0, 3)
is always going to be weakly informative because it encompasses both very large positive and very large negative effect sizes. This helps remove some of the guesswork in defining a prior distribution.
durban_lmm <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block),
prior = c(
prior(normal(0, 3), class = b)
),
data = durban.splitplot,
chains = 4, iter = 4000, warmup = 2000,
seed = 111, file = 'fits/durban_lmm')
Now the above model may be good enough for government work (ignoring for the moment any warnings about divergent transitions), but let’s take a look at the residuals. If we can detect any spatial pattern in them, that means that the block and main plot random effects did not fully account for it.
The residuals()
function will give us the mean residual
over all posterior draws for each data point, and its quantile credible
interval. The mean is fine for a quick inspection; we don’t need to
worry about the full posterior distribution of each residual. Extract
the mean column from the matrix output by residuals()
,
append it to the data, and plot a map of the residuals.
resid_durban_lmm <- residuals(durban_lmm)
durban.splitplot <- mutate(durban.splitplot, resid_nospatial = resid_durban_lmm[,1])
ggplot(durban.splitplot, (aes(x = bed, y = row, fill = resid_nospatial))) +
geom_tile() +
scale_fill_distiller(palette = 'Greens', direction = 1) +
theme(aspect.ratio = 10/56, legend.position = 'bottom')
The spatial pattern of the residuals is obvious (at least it is to me). We can still clearly see patches of low and high values here and there. Low residuals mean the model is overpredicting yield, and high residuals mean the model is underpredicting.
Although our residuals already fail the “eyeball test,” there is also a test statistic that we can calculate to measure how strong the spatial autocorrelation is: Moran’s I. Moran’s I takes a dataset of values that have spatial coordinates, and measures how correlated the difference between any two data points is, with the distance in space between those two data points. It uses the inverse distance, so the higher the correlation, the more likely two close-together data points are to have a similar value. Because Moran’s I is a correlation coefficient, it ranges from -1 to 1. A value of 0 means no spatial pattern and 1 means perfect spatial correlation. We don’t want to see high positive values of Moran’s I for our residuals.
To calculate Moran’s I we need to calculate an inverse distance
matrix for all pairs of spatial points in our dataset and then feed
that, along with the residuals, to the Moran.I()
function
in the ape package.
durban_inv_dist <- 1 / as.matrix(dist(cbind(durban.splitplot$bed, durban.splitplot$row)))
diag(durban_inv_dist) <- 0
Moran.I(durban.splitplot$resid_nospatial, durban_inv_dist)
## $observed
## [1] 0.04335386
##
## $expected
## [1] -0.001788909
##
## $sd
## [1] 0.003126117
##
## $p.value
## [1] 0
The Moran’s I statistic is ~0.04. Honestly in practice that might be weak enough to hand-wave away and avoid having to deal with fitting a spatial model. But this is a lesson about how to address spatial autocorrelation in your data, so I am not going to let you get away without fitting a model that accounts for the spatial pattern!
Here’s where our linear mixed model becomes an additive mixed model.
This model has the same categorical fixed effects and random block and
main plot intercepts as the original model. But it also includes a
spatial smoother, which is the s()
part of the model
formula. It sometimes takes a little tinkering to come up with an
appropriate smoother, because there are so many different ways to do it.
Unfortunately we don’t have time today to really delve into that (in the
future I may write a dedicated lesson about it, or you can check the
further reading section below for more details).
The function
s()
ultimately comes from the additive model fitting package mgcv. The developer of brms reworked this function to work within brms formulas.
Regardless of the other smoother details, the first two arguments are
the columns with the x and y coordinates of each data
point (bed
and row
respectively). The
bs = 'gp'
argument means this is a Gaussian process
smoother. The Gaussian process smoother is a very flexible model that
doesn’t make strong assumptions about the functional shape of the
relationship, which is good for our needs. It is computationally
intensive so it doesn’t work well with very big datasets, but since this
dataset is very modest in size we can get away with using it. There are
a lot of other options but we will not get into them right now.
The other argument, k
, is called the basis dimension.
That is a really important argument. It is the upper limit of the
degrees of freedom of the smoother. If k
is low, your
smoother will be simpler with fewer parameters, but less flexible to
capture every last “wiggle” in the data. If k
is high, your
smoother can fit the data more closely. The model fitting algorithm
actually already penalizes overfitting, so even if you set
k
to a pretty high value, you are still protected against
overfitting. So you don’t have to stress too much over choosing an
exactly perfect k
value. If it is too low, the model won’t
be a good fit to the data. If it is too high, the problem isn’t
overfitting as much as it is computational. It is very computationally
intensive to fit additive models with a very high k
.
We will fit the same model with three different k
values: 10, 25, and 50. The additive term is always
s(bed, row, k = _, bs = 'gp')
with only the value of
k
differing between the three models.
durban_gamm_k10 <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block) + s(bed, row, k = 10, bs = 'gp'),
prior = c(
prior(normal(0, 3), class = b)
),
data = durban.splitplot, chains = 4, iter = 4000, warmup = 2000,
seed = 222, file = 'fits/durban_gamm_k10')
durban_gamm_k25 <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block) + s(bed, row, k = 25, bs = 'gp'),
prior = c(
prior(normal(0, 3), class = b)
),
data = durban.splitplot, chains = 4, iter = 4000, warmup = 2000,
seed = 222, file = 'fits/durban_gamm_k25')
durban_gamm_k50 <- brm(scale(yield) ~ gen + fung + gen:fung + (1 | block) + (1 | fung:block) + s(bed, row, k = 50, bs = 'gp'),
prior = c(
prior(normal(0, 3), class = b)
),
data = durban.splitplot, chains = 4, iter = 4000, warmup = 2000,
seed = 222, file = 'fits/durban_gamm_k50')
We get warnings about divergent transitions; in practice I would use more informative priors or increase the number of warmup iterations to reduce or eliminate those.
Let’s compare the four models: the model with no spatial additive
term, and the models with three different k
values. We get
a few warnings about problematic observations, but there are few enough
that we can probably overlook them.
durban_lmm <- add_criterion(durban_lmm, 'loo')
durban_gamm_k10 <- add_criterion(durban_gamm_k10, 'loo')
durban_gamm_k25 <- add_criterion(durban_gamm_k25, 'loo')
durban_gamm_k50 <- add_criterion(durban_gamm_k50, 'loo')
loo_compare(durban_lmm, durban_gamm_k10, durban_gamm_k25, durban_gamm_k50)
## elpd_diff se_diff
## durban_gamm_k50 0.0 0.0
## durban_gamm_k25 -58.8 8.9
## durban_gamm_k10 -142.7 15.1
## durban_lmm -170.0 17.2
The verdict is clear. The model without the spatial additive term
performs much more poorly than the others. In fact we have a steady
increase in model performance as k
increases, indicating
that we might be able to increase it still further and get a better fit
without overfitting. But let’s stick with the k = 50
model
and use it to make predictions.
How do the residuals look when we plot them?
durban.splitplot <- mutate(durban.splitplot, resid_spatial = residuals(durban_gamm_k50)[,1])
ggplot(durban.splitplot, (aes(x = bed, y = row, fill = resid_spatial))) +
geom_tile() +
scale_fill_distiller(palette = 'Greens', direction = 1) +
theme(aspect.ratio = 10/56, legend.position = 'bottom')
At least to me, that looks way better! We don’t really see big patches of underpredicted or overpredicted yields. It looks random, as all good residuals should!
Calculate the Moran’s I autocorrelation statistic for the GAMM residuals.
Moran.I(durban.splitplot$resid_spatial, durban_inv_dist)
## $observed
## [1] -0.005510493
##
## $expected
## [1] -0.001788909
##
## $sd
## [1] 0.003126966
##
## $p.value
## [1] 0.2339842
It is very close to zero. In fact the point estimate is negative. Excellent!
But was there any point in doing that complicated exercise? Does it affect any of the conclusions we draw about the effects of genotype and fungicide on barley yield? If it doesn’t, we could have saved ourselves all that trouble.
Let’s calculate the estimated marginal means for each combination of genotype and fungicide treatment for the non-spatial and spatial models.
emm_durban_nonspatial <- emmeans(durban_lmm, ~ gen + fung)
emm_durban_spatial <- emmeans(durban_gamm_k50, ~ gen + fung)
Here’s some code to combine both sets of means into one dataset and plot them side-by-side to see whether any of our inferences were affected by adding the spatial additive term to the model. The error bars are the 95% highest posterior density credible intervals.
The
fct_reorder()
line sorts the genotypes in increasing order of their mean averaged over the different fungicide treatments and models. This is a nice trick for making plots.
means_durban_combined <- bind_rows(
tibble(model = 'nonspatial', as_tibble(emm_durban_nonspatial)),
tibble(model = 'spatial', as_tibble(emm_durban_spatial))
)
means_durban_combined %>%
mutate(gen = fct_reorder(gen, emmean)) %>%
ggplot(aes(x = gen, y = emmean, ymin = lower.HPD, ymax = upper.HPD, group = interaction(gen, model), color = model)) +
facet_wrap(~ fung, ncol = 1) +
geom_errorbar(position = position_dodge(width = .5)) +
geom_point(position = position_dodge(width = .5), size = 2) +
colorblind_friendly +
theme(legend.position = 'bottom', axis.text.x = element_text(angle = 90))
Well, the differences aren’t super dramatic between the two models. But you can see that the yield for fungicide F2 is systematically overestimated by the non-spatial model compared to the spatial model. (The point estimate is higher for almost all genotypes.) So that might be a meaningful consequence.
Let’s also plot the spatial additive effect just to see exactly what the spatial model is estimating. We have to pull it out of the model by getting the model’s expected value of the posterior predictive for all combinations of bed, row, genotype, and fungicide. We will only take a few draws (10) from the posterior to make this computationally quicker, because we only want to plot one value per coordinate anyway. Then average the predictions over the genotype and fungicide levels and over the posterior draws, so we are left with one value per coordinate. Make a map!
pred_grid_durban <- expand_grid(
bed = 1:56,
row = 1:10,
gen = unique(durban.splitplot$gen),
fung = unique(durban.splitplot$fung))
spatial_field_durban <- add_epred_draws(pred_grid_durban, durban_gamm_k50, re_formula = ~ 0, ndraws = 10) %>%
group_by(bed, row) %>%
summarize(.epred = mean(.epred))
ggplot(spatial_field_durban, (aes(x = bed, y = row, fill = .epred))) +
geom_tile() +
scale_fill_distiller(palette = 'Reds', direction = 1) +
theme(aspect.ratio = 10/56, legend.position = 'bottom')
We can see that the smoother captured the region of low yield in the middle top and high yield in the bottom right. If you squint, this map looks a lot like the map of residuals from the non-spatial model. That’s exactly what we would want!
A key result you might be interested in is the effect of fungicide, within each genotype and averaged over the different genotypes. The ever-handy emmeans package can help us out with this, producing the model predictions averaged over the random effects of block and main plot and over the spatial smoother. Here is code for making both those comparisons. I will not show the genotype-level predictions in this notebook in the interest of space, just the overall predictions.
emms_bygenotype <- emmeans(durban_gamm_k50, pairwise ~ fung | gen)
emms_overall <- emmeans(durban_gamm_k50, pairwise ~ fung)
emms_overall
## $emmeans
## fung emmean lower.HPD upper.HPD
## F1 0.341 -0.863 1.606
## F2 -1.089 -2.254 0.178
##
## Results are averaged over the levels of: gen
## Point estimate displayed: median
## HPD interval probability: 0.95
##
## $contrasts
## contrast estimate lower.HPD upper.HPD
## F1 - F2 1.42 0.252 2.38
##
## Results are averaged over the levels of: gen
## Point estimate displayed: median
## HPD interval probability: 0.95
We see that F1 has a higher mean yield than F2, averaged over the effect of genotype. However, these results are still given on the z-transformed scale. Below when presenting the results I will back-transform the means and contrast.
Here are some ideas for how you might describe this in the methods and results sections of your manuscript. I’ve included a methods section and a table; the results write-up and figure are left to you as an exercise!
A methods description might look like this:
We fit a general additive mixed model to the z-transformed yield variable. The fixed effects were genotype, fungicide, and their interaction, with random intercepts for block and for main plot nested within block. To account for spatial variation, we included a Gaussian process spatial smoothing term. We fit models with basis dimension (k) values 10, 25, and 50. We put a Normal(0, 3) prior on all fixed effects. Models were fit with Hamiltonian Monte Carlo, with 4 Markov chains, each run for 2000 warmup iterations before sampling 2000 draws for each chain. Model convergence was assessed by checking that R-hat < 1.01 for all parameters, and model fit was assessed by examining posterior predictive check plots. We compared models with leave-one-out cross-validation. The model with basis dimension k=50 performed best (ΔELPD 58.8 or greater); thus we present predictions from that model.
In the results section, I would focus on showing the predictions for the different genotypes under each fungicide level. A table might show the estimated marginal means for fungicides averaged over genotypes, and the difference between them, with a few different credible intervals. This is transformed back from the scale of the standardized variable to the original scale.
means_data_wide <- gather_emmeans_draws(emms_overall) %>%
mutate(.value = .value * sd(durban.splitplot$yield),
.value = if_else(.grid == 'emmeans', .value + mean(durban.splitplot$yield), .value)) %>%
median_qi(.width = c(0.66, 0.95, 0.99)) %>%
mutate(fung = coalesce(fung, contrast)) %>%
pivot_wider(id_cols = c(fung, .value), names_from = .width, values_from = c(.lower, .upper))
gt(means_data_wide) %>%
fmt_number(decimals = 3) %>%
cols_merge(c(.lower_0.66, .upper_0.66), pattern = '[{1}, {2}]') %>%
cols_merge(c(.lower_0.95, .upper_0.95), pattern = '[{1}, {2}]') %>%
cols_merge(c(.lower_0.99, .upper_0.99), pattern = '[{1}, {2}]') %>%
cols_label(fung = 'fungicide', .value = 'median', .lower_0.66 = '66% CI', .lower_0.95 = '95% CI', .lower_0.99 = '99% CI') %>%
tab_options(column_labels.font.weight = 'bold')
fungicide | median | 66% CI | 95% CI | 99% CI |
---|---|---|---|---|
F1 | 5.409 | [5.164, 5.658] | [4.796, 6.022] | [4.499, 6.244] |
F2 | 4.700 | [4.469, 4.985] | [4.137, 5.347] | [3.841, 5.560] |
F1 - F2 | 0.706 | [0.488, 0.908] | [0.113, 1.174] | [−0.241, 1.463] |
Congratulations, you’ve now made it through three lessons of the Bayesian crash course! As we always do, let’s look back over the learning objectives to see what we’ve learned.
Now you have experience:
You’re really starting to get the hang of this Bayes thing!
Here are some additional resources on (Bayesian) GAMMs to complement the lists I put at the ends of Lessons 1 and 2.
Here are some exercises to help you practice with the concepts we learned in this lesson.
When we were working with the diggle.cow
data, we
noticed that the trend of weight gain over time was somewhat nonlinear.
To address this, we can fit a temporal additive term instead of modeling
the residual covariance, and include a random intercept for each animal.
This turns the model into a GAMM.
Here is a GAMM model fit to the diggle.cow
data, with a
separate time spline fit to each fixed effect combination and a random
intercept for each animal.
diggle_GAMM <- brm(weight_scaled ~ iron * infect + s(dayafterstart, by = interaction(iron, infect)) + (1 | animal),
prior = c(
prior(normal(0, 5), class = b)
),
data = diggle.cow,
seed = 1003, file = 'fits/diggle_GAMM')
Here is a quick way to plot the fitted values (not shown).
conditional_effects(diggle_GAMM, effects = c('dayafterstart:iron'), conditions = data.frame(infect = c('Infected','NonInfected')))
In this exercise, we will take a look at yet another example dataset
from the agridat package, broadbalk.wheat
.
These data were collected at the famous Rothamsted Experimental Station,
documenting wheat yields annually from 1852-1925. There is a time series
for 17 different plots (the plots were subject to different treatments
but we will not consider that for this exercise). We are interested in
whether there is a trend in wheat yield over time.
data(broadbalk.wheat)
ggplot(broadbalk.wheat, aes(x = year, y = grain, group = plot)) + geom_line()
plot
. The
response variable is grain
and there should be a fixed
effect for year
. Here is some code to get you started. Fill
in the blanks. Don’t worry about changing default priors and sampling
options for the purpose of this exercise.broadbalk_cosy <- brm(grain ~ ... + ...(time = ..., gr = ...), data = ...)
broadbalk_ar1 <- brm(grain ~ ... + ...(time = ..., gr = ..., p = ...), data = ...)
Here’s yet another example dataset from agridat. It
is a variety trial with two replicates each of 64 genotypes. Yield was
measured. The spatial coordinates of each plot are given in the columns
col
(x coordinate) and row
(y coordinate).
data(burgueno.rowcol)
gen
. The second includes a spatial additive term (Gaussian
process with basis dimension 10). Here is some code to get you started.
Fill in the blanks. Don’t worry about changing default priors and
sampling options for the purpose of this exercise.burgueno_nonspatial <- brm(yield ~ ..., data = ...)
burgueno_spatial <- brm(yield ~ ... + s(..., ..., k = ..., bs = ...), data = ...)
residuals(modelname)[,1]
to get the residuals for each
model, and append that as a new column to the
burgueno.rowcol
data frame to plot them.