This is the fifth in a series of lessons that are a practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan).
Download the worksheet for this lesson here.
We’ve covered the big-picture concepts of Bayesian inference and we’ve been taking a whirlwind tour of many different classes of statistical model. In the past two lessons we’ve covered mixed models with residual covariance structures, generalized additive mixed models, GLMMs with binomial, gamma, and beta distributions, and zero-inflated and hurdle models. That’s a lot so far, but let’s keep on going!
We will just keep on adding tools to your Bayesian toolbox … and your statistical toolbox as a whole because these are useful tools even if you don’t do the Bayesian thing.
As always, I’ve chosen these techniques because they are useful for datasets that are common in ag science and that many ARS scientists work with.
library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)
theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))
options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'on_change')
In Lesson 2 in this series, we fit a logistic GLMM together. We use that kind of model when we have categorical data with two outcomes. In Lesson 2 the response variable was seed germination. Either a seed germinated (1) or it didn’t (0). But lots of categorical data have more than two outcomes.
There are two kinds of categorical data with multiple outcomes: nominal and ordinal. Nominal categories cannot be sorted into an increasing order, but ordinal categories can. Nominal categories might be something like blood type. There are statistical models for nominal response data, but we won’t focus on that today. In ag science it’s very common to have a response variable on an ordinal rating scale. For example, you might capture a complex disease phenotype by rating level of disease on a scale from 1 to 5. Or there might be a quantitative trait that is too cumbersome to measure directly, like the number of trichomes on a leaf or stem. Do you really want to sit there and count hairs when you can just say low, medium, or high?
When you are phenotyping individuals, it’s much faster to score their phenotypes on a rating scale than it is to physically measure the underlying trait(s) directly. But this speed has a price. It is not always appropriate to assume that the numerical ratings actually represent a continuous variable and analyze them as such. Luckily, there are statistical models for ordinal response data. They require a larger sample size to be as statistically powerful as a model for continuous data. We’ll fit one in this lesson.
I mentioned just now that if there are two categories, we can use a logistic, or logit, model. If you recall, the logit or log-odds link function lets us map the “linear predictor,” a term that combines fixed and random effects, and which can be positive or negative, to the probability scale which has to be between 0 and 1. Because the logit model is modeling a binary outcome (0 and 1), there are two probabilities that have to sum to 1. If you know the probability of getting a 1, you know the probability of getting a 0. The probability distribution we used for that model was the binomial distribution. A common example is that you can model a coin flip as following a binomial distribution. It has one parameter, the probability of getting heads.
If you have three or more ordered categories, we have to estimate \(n - 1\) probabilities. Once we know them, we know the nth probability because they all have to sum to 1. Just like in the binary case, these probabilities all have to be between 0 and 1. Now, instead of a binomial distribution, we are going to use a multinomial distribution. The multinomial distribution has \(n - 1\) parameters, the probability of each outcome except for the last. Sounds complicated, but it is nothing more than going from a coin flip to a dice roll. You can model a dice roll as following a multinomial distribution with equal probability, 1/6, of the six different outcomes. (It only has five parameters because once you know the probabilities of rolling a 1, 2, 3, 4, and 5, you know the probability of rolling a 6.)
With a multinomial distribution, we can still use a logit link function. But now there is a slight difference. Let’s say we have four categories, 1-4. We now need to estimate the probability of category 1, the probability of category 2 or less, and the probability of category 3 or less. (The probability of category 4 or less is 1 because 4 is the maximum). These are cumulative probabilities, so we are going to fit a cumulative logit model. We will now have \(n - 1\) link functions, one for each category except for the last. They will map our linear predictors for each category to the cumulative probability scale.
If our probabilities are \(p_{y \le 1}\), \(p_{y \le 2}\), and \(p_{y \le 3}\), the link functions are:
\(\log \frac {p_{y \le 1}}{1-p_{y \le 1}}\), \(\log \frac {p_{y \le 1} + p_{y \le 2}}{1-(p_{y \le 1} + p_{y \le 2})}\), and \(\log \frac {p_{y \le 1} + p_{y \le 2} + p_{y \le 3}}{1-(p_{y \le 1} + p_{y \le 2} + p_{y \le 3})}\).
That’s probably enough equations for the moment. The main point is that we now have to estimate fixed and random effects not just for one outcome, but for multiple outcomes: the cumulative probability of each class.
It’s easier to understand these complex models if we see one in
action. So let’s go ahead and meet our first example dataset for this
lesson, once again from the ever-useful agridat
package: karcher.turfgrass
. The dataset comes from a study
of the effect of management style and nitrogen level on turfgrass color.
Apparently, in the world of turfgrass, color is king. It is a completely
randomized design with four levels of management and two levels of
nitrogen. The four management levels refer to different ways of applying
N (at the surface or injected at different depths), and the two nitrogen
levels are low and high amounts. The experimental plots were rated for
four weeks. Each week each combination within each replicate was rated a
variable number of times. The rating scale was Poor < Average <
Good < Excellent.
Load the example dataset, look at the first few rows, and take a look
at the rating distribution. Apparently, the color wasn’t too
exceptionally great for these samples, because only one of the 128 rows
carries an Excellent
rating. Also note that the ratings are
ordered in alphabetical order. We’ll need to reorder them to ascending
order.
data(karcher.turfgrass)
head(karcher.turfgrass)
## week rep manage nitro rating count
## 1 1 R1 M1 N1 Poor 1
## 2 1 R1 M1 N2 Average 2
## 3 1 R1 M2 N1 Poor 1
## 4 1 R1 M2 N2 Poor 1
## 5 1 R1 M3 N1 Average 2
## 6 1 R1 M3 N2 Good 3
table(karcher.turfgrass$rating)
##
## Average Excellent Good Poor
## 49 1 35 43
Data for a multinomial model in brms need to be in a long format, with a single outcome per row. This data frame is currently in a more compact format with a count of outcomes in each rating in each row. The following code converts the data to long form. In addition, I decided to lump together the Good and Excellent ratings into a single class, “Good+”. This is because having only a single observation of Excellent rated turfgrass doesn’t give us enough information to make a good estimate of the probability of Excellent. Lumping the classes together is not strictly necessary because in a Bayesian context we could use prior distributions to fill in the blanks, but I think that would be relying too heavily on priors. We also convert week to a discrete variable.
karcher_long <- karcher.turfgrass %>%
mutate(rating = if_else(rating %in% c('Good', 'Excellent'), 'Good+', rating) %>% ordered(levels = c('Poor', 'Average', 'Good+'))) %>%
group_by(week, rep, manage, nitro) %>%
reframe(rating = rep(rating, count)) %>%
mutate(week = factor(week))
Take a look at the data. This is a stacked bar plot showing the number of ratings in each of the three classes, by treatment combination, with a separate panel for each week.
We observe that management treatments M3 and M4 have much higher proportions of Good+ ratings. There is also the suggestion of an interaction with nitrogen application level. Especially in the later two weeks, we see that the high application level (N2) has more Good+ ratings than the low level (N1), mainly in the M3 and M4 management treatments.
ggplot(karcher_long, aes(x = interaction(nitro, manage))) +
geom_bar(aes(fill = rating), stat = 'count', position = 'stack') +
facet_wrap(~ week) +
labs(x = 'Treatment combination')
We want to model the interactive effect of nitrogen treatment
nitro
and management treatment manage
on the
cumulative probability of each of the turfgrass color rating classes. To
account for the effect of time, we will include week
as a
categorical fixed effect. This being a relatively small dataset, we
won’t try to include interactions between week and treatments, although
you are free to explore those models as an exercise. So the fixed effect
portion of the model formula is
rating ~ nitro * manage + week
.
We also want to account for the repeated measurements over time
within plots. Measurements of the same plot at different times are not
independent of one another. We don’t have the right structure of the
data to include a residual covariance structure as we did in Lesson 3.
Instead, we will fit a random intercept to each plot. Combination of
replicate, nitrogen treatment, and management treatment uniquely
identifies plot. So the random effect portion of the model formula is
+ (1 | rep:nitro:manage)
.
For the response family, we specify
family = cumulative(link = 'logit', threshold = 'flexible')
.
This tells brm()
to fit a cumulative logit model. The
flexible threshold means that we are allowing the threshold between each
consecutive category to have a different intercept. You may try
threshold = 'equidistant'
, which forces all the intercepts
to be the same, for a simpler but less flexible model.
Lastly, we set a weakly informative normal(0, 2)
prior
on the b
or fixed effect parameters. This is reasonable if
the model is on a log odds scale because a coefficient of +/- 2
represents a fairly large effect.
karcher_clmm <- brm(rating ~ nitro * manage + week + (1 | rep:nitro:manage),
family = cumulative(link = 'logit', threshold = 'flexible'),
prior = c(
prior(normal(0, 2), class = b)
),
data = karcher_long,
seed = 928, file = 'fits/karcher_clmm')
Make two quick posterior predictive check plots, one grouped by management and one grouped by nitrogen. They show that the model (dots with error bars) predicts back the data (blue bars) well. It looks like a pretty good fit.
pp_check(karcher_clmm, type = 'bars_grouped', group = 'manage')
pp_check(karcher_clmm, type = 'bars_grouped', group = 'nitro')
Before we make more formal output, we can use the
conditional_effects()
function to produce some more quick
default plots. These show the model predictions. We have to add some
extra arguments to specify that we want to see the effect of management
within each level of nitrogen.
conditional_effects(karcher_clmm, categorical = TRUE, effects = 'manage', conditions = data.frame(nitro = c('N1', 'N2')))
At this point, for other models we’ve fit in this course, we would
usually proceed to using emmeans()
to estimate the marginal
means for different treatment combinations and compare them.
Unfortunately, the emmeans package does not produce the
estimates we want from a brm()
model fit with
family = cumulative
. So we have to get creative and write
our own code to get the estimates we need.
I will walk you through the process. First, we create a grid of the
combinations of nitro
, manage
, and
week
from our data; there are 32 such combinations. Then we
generate the the expected value of the posterior predictive distribution
for each posterior draw for each of those combinations. We do this using
add_epred_draws()
from tidybayes. Because
we have 4000 posterior draws, and our expected value is a vector of
three probabilities (one for each of the rating classes), we now have 32
* 4000 * 3 = 384,000 rows. The column with the expected posterior
estimate of probability is called .epred
by default.
pred_grid <- with(karcher_long, expand_grid(nitro = unique(nitro), manage = unique(manage), week = unique(week)))
karcher_epred <- pred_grid %>%
add_epred_draws(karcher_clmm, re_formula = ~ 0, category = 'rating')
We also might want to get predictions averaged over all the weeks for
each combination of nitro
and manage
.
Calculate that average, call it “overall”, and combine it with the other
predictions.
karcher_epred_avgtime <- karcher_epred %>%
group_by(nitro, manage, .draw, rating) %>%
summarize(.epred = mean(.epred))
karcher_epred <- bind_rows(karcher_epred, karcher_epred_avgtime) %>%
mutate(week = forcats::fct_na_value_to_level(week, level = 'overall'))
This already gives us enough information to make a plot of the model predictions.
ggplot(karcher_epred, aes(x = interaction(nitro, manage), y = .epred, group = rating)) +
stat_interval(aes(color = rating, color_ramp = after_stat(level)), .width = c(.5, .8, .95), position = position_dodge(width = .5)) +
stat_summary(fun = 'median', geom = 'point', position = position_dodge(width = .5)) +
facet_wrap(~ week, ncol = 2) +
colorblind_friendly +
labs(x = 'Treatment combination', color_ramp = 'CI width')
The plot above is nice because it shows the times and treatment combinations with a higher probability of “Good” and those with a higher probability of “Poor.” But it is a little bit busy. How can we produce a simpler summary?
We can make a slightly sketchy assumption. I know I just told you all that it may not always be appropriate to pretend that ordered categorical scores are continuous variables. But we can use the modeled probabilities of each category to estimate a mean predicted class. This does assume to some extent that it is meaningful to treat the score values as actual numbers. But at least the underlying model was fit treating them as ordered categories.
We can get the mean predicted class by taking a weighted average of
the class values, weighted by the probabilities. For example, if we
assume Poor = 1, Average = 2, and Good+ = 3, and one posterior draw for
one treatment combination estimates P(Poor) = 0.05, P(Average) = 0.20,
and P(Good+) = 0.75, we take 0.05 * 1 + 0.20 * 2 + 0.75 * 3
to get an estimated mean class of 2.70.
Estimate the mean predicted class for every posterior draw for each treatment combination.
karcher_epred_meanclass <- karcher_epred %>%
group_by(nitro, manage, week, .draw) %>%
summarize(mean_class = weighted.mean(x = as.numeric(rating), w = .epred))
Make a plot of the mean predicted classes, showing the median and quantile uncertainty intervals of the posterior distributions.
ggplot(karcher_epred_meanclass, aes(x = interaction(nitro, manage), y = mean_class)) +
stat_interval(.width = c(.5, .8, .95)) +
stat_summary(fun = 'median', geom = 'point') +
facet_wrap(~ week, ncol = 2) +
scale_color_brewer(palette = 'Blues') +
labs(x = 'Treatment combination', y = 'Predicted mean class', color = 'CI width')
What if we wanted to test the hypothesis that these mean class
predictions differ between the treatment combinations? Once again
unfortunately we can’t use the handy contrast()
function
from emmeans. We have to do it manually. I’ve written
some code to do this. First, I defined a function to go through a set of
values and take all pairwise differences by group. Then, for simplicity,
I only take the contrasts between the mean predicted classes averaged
over weeks. I do this both for comparisons of nitrogen level within each
management level, and comparisons of management level within each
nitrogen level. This is done for each posterior draw.
We could also do contrasts of the individual class probabilities, or contrasts within each week . . . anything we want. You just need to figure out how to code that up!
pairwise_diffs <- function(values, group_names, reverse = TRUE) {
if (reverse) {
values <- rev(values)
group_names <- rev(group_names)
}
diffs <- apply(combn(values, 2), 2, function(x) x[1] - x[2])
contrast_names <- apply(combn(as.character(group_names), 2), 2, paste, collapse = '-')
data.frame(contrast = contrast_names, value = diffs)
}
karcher_mgmt_contrasts <- karcher_epred_meanclass %>%
filter(week == 'overall') %>%
group_by(nitro, .draw) %>%
reframe(pairwise_diffs(mean_class, manage, reverse = TRUE))
karcher_nitro_contrasts <- karcher_epred_meanclass %>%
filter(week == 'overall') %>%
group_by(manage, .draw) %>%
reframe(pairwise_diffs(mean_class, nitro, reverse = TRUE))
Now we can generate a table with medians and credible intervals for
the contrasts, as well as two Bayesian p-value analogs, \(p_d\) and \(p_{MAP}\) (see previous lessons), using
describe_posterior()
from bayestestR. Note
that because \(p_d\) is based on a
finite number of posterior samples, which by default is 4000 because we
have 4 chains each with 1000 post-warmup iterations, we cannot make
finer distinctions than 1/4000. So, the pd = 1
should
really be interpreted as pd > 3999/4000
. We would have
to increase the number of posterior samples to improve on that result.
Because \(p_{MAP}\) is based on a
smoothed kernel density estimate of the posterior distribution, it
doesn’t necessarily have this strict limitation. But if you really want
to characterize the tails of a distribution well, as we do for these
kind of p-value analogs, you need a lot of samples. The default of 4000
is often inadequate.
The tables below might be something you would want to include in a manuscript.
(karcher_mgmt_contrast_post <- karcher_mgmt_contrasts %>%
group_by(nitro, contrast) %>%
summarize(describe_posterior(value, ci_method = 'eti', ci_width = c(.66, .95), test = c('pd', 'p_map'))))
## # A tibble: 12 × 9
## # Groups: nitro [2]
## nitro contrast Parameter Median CI CI_low CI_high p_MAP pd
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 N1 M2-M1 Posterior -0.243 0.95 -0.475 -0.0241 0.0977 0.985
## 2 N1 M3-M1 Posterior 0.951 0.95 0.595 1.22 0 1
## 3 N1 M3-M2 Posterior 1.19 0.95 0.871 1.44 0 1
## 4 N1 M4-M1 Posterior 0.720 0.95 0.390 0.965 0.00163 1.00
## 5 N1 M4-M2 Posterior 0.964 0.95 0.655 1.20 0 1
## 6 N1 M4-M3 Posterior -0.231 0.95 -0.535 0.0729 0.254 0.940
## 7 N2 M2-M1 Posterior -0.240 0.95 -0.598 0.0996 0.382 0.920
## 8 N2 M3-M1 Posterior 1.08 0.95 0.755 1.34 0 1
## 9 N2 M3-M2 Posterior 1.32 0.95 0.977 1.61 0 1
## 10 N2 M4-M1 Posterior 0.944 0.95 0.564 1.25 0 1
## 11 N2 M4-M2 Posterior 1.18 0.95 0.822 1.50 0 1
## 12 N2 M4-M3 Posterior -0.128 0.95 -0.405 0.126 0.553 0.860
(karcher_nitro_contrast_post <- karcher_nitro_contrasts %>%
group_by(manage, contrast) %>%
summarize(describe_posterior(value, ci_method = 'eti', ci_width = c(.66, .95), test = c('pd', 'p_map'))))
## # A tibble: 4 × 9
## # Groups: manage [4]
## manage contrast Parameter Median CI CI_low CI_high p_MAP pd
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 M1 N2-N1 Posterior 0.431 0.95 0.159 0.683 0.0118 0.998
## 2 M2 N2-N1 Posterior 0.430 0.95 0.125 0.726 0.0295 0.996
## 3 M3 N2-N1 Posterior 0.554 0.95 0.306 0.803 0.00268 1.00
## 4 M4 N2-N1 Posterior 0.654 0.95 0.357 0.924 0 1
Here is a version of the estimated probabilities figure that is slightly modified to be a little more presentation-friendly (remember I told you that the mean class prediction figure relies on somewhat sketchy assumptions to turn these individual probabilities into one number per treatment combination). Because our model didn’t try to capture interactions between treatment and week, this figure only presents the means averaged over weeks to make the figure easier to digest. Instead, I have separated it out by nitrogen treatment and rating class into a grid of panels.
I think this conveys fairly clearly that the combination of low N addition and M1 or M2 treatment has the highest probability of Poor rating and the lowest probability of Good rating. In contrast, high N addition and M3 or M4 treatment has almost zero probability of a Poor rating and a reasonably high probability of Good. This tells the story without having to “cheat” and treat the rating classes as if they were continuous variables.
ggplot(karcher_epred %>% filter(week == 'overall'), aes(x = manage, y = .epred)) +
stat_interval(aes(color = rating, color_ramp = after_stat(level)), .width = c(.5, .8, .95)) +
stat_summary(fun = 'median', geom = 'point') +
facet_grid(rating ~ nitro,
labeller = labeller(
nitro = c('N1' = 'low N addition', 'N2' = 'high N addition')
)) +
colorblind_friendly +
labs(x = 'Management treatment', y = 'Modeled class probability', color_ramp = 'CI width')
As in previous lessons, here is a description of the methods and results that you could include in a manuscript. The results section in particular could include more detailed results but this is a general idea of how they should be described.
We fit a generalized linear mixed model to the turfgrass color rating data, pooling Good and Excellent to a single category for a total of three categories. The model assumed a multinomial response distribution with cumulative logit link function. We allowed the intercept of each linear predictor to vary by rating category but estimated a single set of fixed effects (nitrogen treatment, management treatment, week as a discrete variable, and all their interactions) and random effects (random intercept for replicate nested within treatment combination) for each category threshold. We put weakly informative Normal(0, 2) priors on all fixed effects. Using Hamiltonian Monte Carlo, we sampled 1000 warmup iterations (discarded) and 1000 sampling iterations for each of four chains.
We generated posterior distributions of the estimated marginal probability of each rating category, and the mean category prediction, for each combination of treatment and week and averaged over weeks. We compared the mean category prediction for each pair of treatments averaged over weeks, testing against a null hypothesis of no difference using the Bayesian probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\). We characterized the posterior distributions using the medians and equal-tailed credible intervals.
The probability of higher color ratings tended to be higher in the M3 and M4 management treatments relative to M1 and M2, and higher in the high N addition treatment relative to the low N addition treatment. There were only minimal interactions between management treatment and nitrogen addition level, and the relative differences between the treatments varied very little over time.
(At this point I would refer to specific tables and figures because it would become cumbersome to repeat many probabilities in the text).
In Lesson 3, we learned how to use generalized additive models to model spatial variation. GAMs are called “semiparametric,” meaning that we don’t have a strong idea about the functional form of the relationship and we are willing to be flexible. But what if we have a strong theoretical expectation for how two variables should be related? We might have a nonlinear equation we want to fit. We can easily use brms to estimate the parameters of any nonlinear equation we want, with any error distribution we want. We also have full flexibility in terms of specifying fixed and random effects on any of the parameters.
The situation where this comes up most in the ag science research we do in ARS is when fitting growth curves, for example to microbial biomass data. You might have a hypothesis, based on existing knowledge and theory, of what functional form that growth curve should have. For example, theory predicts that a population of cells will grow exponentially, at an increasing rate over time, until they start crowding each other and bumping into the limits imposed by their environment. The growth will level off. Roughly speaking, you will see an S-shaped pattern. Although we could come up with more than one equation that looks like an S-curve, one that has been shown to model this kind of growth fairly well is the Gompertz equation:
\[f(t) = a e^{-b e^{-ct}}\]
This curve, which models population size as a function of time \(t\), has three parameters, a, b, and c. We can think of a as the carrying capacity of the environment, or the asymptote of the curve. As \(t\) gets larger, the \(e^{-b e^{-ct}}\) part of the equation approaches 1, so we’re modeling the population size as approaching \(a\). The \(b\) parameter offsets the curve to the left or right along the time axis. The \(c\) parameter determines the growth rate. As it increases, the population grows faster toward the carrying capacity.
Here are some example Gompertz curves with different parameters.
This example dataset is a small subset of a real ARS dataset. In this study, one of the research questions was how much a class of compounds called pyrrocidines inhibits the growth of Fusarium mold. The experiment was run using a technology called Bioscreen. Inside the machine is a growth plate with many wells. There were three replicate plates. I have only provided you with a subset of the treatments that were on each plate. The researchers measured Fusarium growth in the presence of three different compounds: A, B, and an equal mixture of A and B.
The size of the population in each well was measured at regular time intervals (in the example dataset I’ve provided you with the measurement from every fifth hour). It isn’t possible to directly measure the biomass in the wells. Instead, the reader shines a light through each well and measures how much of the light was absorbed by the stuff in the well. That is called the optical density or OD. This turns out to correlate very closely with the amount of biomass in the well, so that is what we are going to use as a proxy for biomass.
Let’s take a look at the data. The dataset is hosted on the GitHub
repository for this course. Alternatively, if you are doing this course
on the cloud server, you may read the data from the local
data
directory.
bioscreen <- read_csv('https://usda-ree-ars.github.io/SEAStats/brms_crash_course/bioscreen.csv')
bioscreen <- read_csv('data/bioscreen.csv')
Convert replicate, well, and treatment columns to factor and look at the first few rows of the dataset.
bioscreen <- bioscreen %>%
mutate(across(c(rep, well, compound), factor))
head(bioscreen)
## # A tibble: 6 × 10
## strain rep date well trt HOUR Time OD compound concentration
## <chr> <fct> <dbl> <fct> <chr> <dbl> <chr> <dbl> <fct> <dbl>
## 1 WT R1 81322 51 A 5 0 0:01:21 0.066 A 5
## 2 WT R1 81322 51 A 5 5 5:00:06 0.064 A 5
## 3 WT R1 81322 51 A 5 10 10:00:06 0.064 A 5
## 4 WT R1 81322 51 A 5 15 15:00:06 0.063 A 5
## 5 WT R1 81322 51 A 5 20 20:00:06 0.064 A 5
## 6 WT R1 81322 51 A 5 25 25:00:06 0.062 A 5
Plot the data with a thin semi-transparent line for each well and a thick line showing the median of each treatment. We see that when we put OD on a logarithmic scale, it follows an S-shaped trend that closely resembles the Gompertz curves we plotted earlier.
ggplot(bioscreen, aes(x = HOUR, y = OD, color = compound, group = interaction(compound, rep, well))) +
geom_line(alpha = 0.25, linewidth = 0.5) +
stat_summary(aes(group = compound), fun = 'median', geom = 'line', linewidth = 1.5) +
scale_y_log10() +
colorblind_friendly
As it turns out, because the Gompertz equation requires positive response values, and many of our log(OD) values are negative, we need to transform the data by taking the logarithm and then subtracting the minimum within each well so that the lowest value is zero.
bioscreen <- bioscreen %>%
group_by(well, rep) %>%
mutate(adjusted_log_OD = log(OD) - min(log(OD)))
Here’s what that looks like:
ggplot(bioscreen, aes(x = HOUR, y = adjusted_log_OD, color = compound, group = interaction(compound, rep, well))) +
geom_line(alpha = 0.25, linewidth = 0.5) +
stat_summary(aes(group = compound), fun = 'median', geom = 'line', linewidth = 1.5) +
colorblind_friendly
The first thing we will do is quickly fit a frequentist nonlinear
least squares model to the data. We’re using nls()
to fit a
Gompertz curve to the whole dataset ignoring any fixed and random
effects, just to get an idea of what prior distributions to put on the
intercepts for the three parameters of the curve.
nls(adjusted_log_OD ~ A * exp(-B * exp(-C * HOUR)), data = bioscreen, start = list(A = 3, B = 1, C = 0.1))
## Nonlinear regression model
## model: adjusted_log_OD ~ A * exp(-B * exp(-C * HOUR))
## data: bioscreen
## A B C
## 3.14552 23.26036 0.06075
## residual sum-of-squares: 118.1
##
## Number of iterations to convergence: 12
## Achieved convergence tolerance: 2.355e-06
Now it’s time to fit our full model. Just like how slopes and
intercept terms in a linear mixed model can each have any number of
fixed and random effects, we can put fixed and random effects in any
combination on any of the three parameters in our model, a
,
b
, and c
. To do this, we will use the
bf()
function for creating multi-part formulas. We will
pass a nonlinear formula to bf()
as its first argument, the
Gompertz equation:
adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR))
. Then we’ll
define a formula for each of the parameters of the nonlinear equation.
That’s where we include the fixed and random effects.
a ~ compound + (1 | well:rep)
means we are going to
allow the asymptote parameter a
to vary as a function of
compound (discrete fixed effect) and well nested within rep (random
intercept).b ~ 1
and c ~ 1
means we are going to fit
a single b
and c
parameter across all
compounds and wells.As a last argument to bf()
we pass
nl = TRUE
to tell bf()
this is a nonlinear
model.
We are assuming normally distributed error so we will not have a
family
argument. That might not be ideal because variance
increases with time, but it is OK for a first try. You can try other
options as an exercise.
The priors include normal distributions on the intercept for each of
the three parameters, with means that we determined from
nls()
above. We also include a normal(0, 1)
prior on the fixed effects of compound on the asymptote a
.
This is reasonable because the maximum value of adjusted log OD only
ranges between about 2 and 3. A fixed effect coefficient of 1 would
represent a huge effect, so normal(0, 1)
is a conservative
prior that gives relatively even prior probability to both large and
small effects.
Finally, notice that I have specified 3500 total iterations of which 1000 are warmups per chain, for a total of 2500 * 4 = 10000 sampling iterations. Because this is a fairly complex model, more samples than the default 4000 are needed to characterize the distributions well.
bioscreen_gompertz_fit <- brm(
bf(
adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)),
a ~ compound + (1 | well:rep),
b ~ 1,
c ~ 1,
nl = TRUE
),
prior = c(
prior(normal(0, 1), class = b, nlpar = a),
prior(normal(3, 1), class = b, nlpar = a, coef = Intercept),
prior(normal(20, 5), class = b, nlpar = b, coef = Intercept),
prior(normal(0, 1), class = b, nlpar = c, coef = Intercept)
),
data = bioscreen,
iter = 3500, warmup = 1000, chains = 4,
seed = 950, file = 'fits/bioscreen_gompertz_fit'
)
Look at the posterior predictive check plot. The data have a bimodal distribution, which you would expect from the S-curve shape. The populations spend a long time at a small size, quickly increase, then spend a long time near the carrying capacity. The model does a decent job of capturing this distribution, but it slightly underpredicts the lowest values around 0 and overpredicts intermediate values. As mentioned above, this may be because we are assuming normally distributed error and not accounting for the variance increasing with the mean.
pp_check(bioscreen_gompertz_fit)
Another way we might consider improving the model is to allow the
b
and c
parameters to vary with compound as
well. The previous model only fit a single intercept for each of those
two parameters. The following model adds the fixed effect
~ compound
to the right-hand side of the formulas for b and
c. We now also specify normal(0, 1)
priors for those fixed
effects. It is probably excessive to also include random effects of well
on those parameters, but you could try that if you are interested in
seeing how that affects things.
bioscreen_gompertz_fit_morepars <- brm(
bf(
adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)),
a ~ compound + (1 | well:rep),
b ~ compound,
c ~ compound,
nl = TRUE
),
prior = c(
prior(normal(0, 1), class = b, nlpar = a),
prior(normal(3, 1), class = b, nlpar = a, coef = Intercept),
prior(normal(0, 1), class = b, nlpar = b),
prior(normal(20, 5), class = b, nlpar = b, coef = Intercept),
prior(normal(0, 1), class = b, nlpar = c),
prior(normal(0, 1), class = b, nlpar = c, coef = Intercept)
),
data = bioscreen,
iter = 3500, warmup = 1000, chains = 4,
seed = 951, file = 'fits/bioscreen_gompertz_fit_morepars'
)
The posterior predictive check plot is similar to the previous one. Again, we have the issue of underpredicting values near zero and overpredicting in the “saddle” portion of the distribution.
pp_check(bioscreen_gompertz_fit_morepars)
Compare the two models using leave-one-out cross-validation. We see that the model with additional fixed effects performs much better. Let’s go ahead and use that model for predictions.
bioscreen_gompertz_fit <- add_criterion(bioscreen_gompertz_fit, 'loo')
bioscreen_gompertz_fit_morepars <- add_criterion(bioscreen_gompertz_fit_morepars, 'loo')
loo_compare(bioscreen_gompertz_fit, bioscreen_gompertz_fit_morepars)
## elpd_diff se_diff
## bioscreen_gompertz_fit_morepars 0.0 0.0
## bioscreen_gompertz_fit -77.5 18.4
Create a prediction grid with all combinations of time (evenly
spaced) and compound, and use add_epred_draws()
with
re_formula
set to ~ 0
to get the expected
value of the posterior predictive for each combination with random
effects set to zero. I also set ndraws = 1000
to only get
1000 of the 10000 posterior draws, which is plenty for a quick figure.
For a publication you may want to use all the posterior draws. The
resulting data frame has 363000 rows (121 time points * 3 compounds *
1000 posterior draws).
bioscreen_pred <- expand_grid(
HOUR = 0:120,
compound = unique(bioscreen$compound)
) %>%
add_epred_draws(bioscreen_gompertz_fit_morepars, re_formula = ~ 0, ndraws = 1000, seed = 1)
Here are two plots. The first one just shows the model predictions. Nice and clean! The second superimposes the model predictions on the raw data. I must admit that I prefer the second one. Not only does it show the model predictions, it also shows how well the model fits the data. It also emphasizes that the model predictions, valuable though they are, are an artificially clean and neat way to convey information about the underlying messy reality of the data.
ggplot(bioscreen_pred, aes(x = HOUR, y = .epred)) +
stat_lineribbon(aes(fill = compound, color = compound, fill_ramp = after_stat(level)), alpha = 0.5) +
colorblind_friendly
ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +
geom_line(aes(y = adjusted_log_OD, group = interaction(compound, rep, well)), data = bioscreen, alpha = 0.25, linewidth = 0.5) +
stat_lineribbon(aes(fill_ramp = after_stat(level)), alpha = 0.5) +
colorblind_friendly
Next, let’s estimate some means. Because we fit a parametric
nonlinear model, we can do hypothesis tests of how much any of the
estimated parameters differs between the compounds. We can actually use
emmeans()
for this because it supports nonlinear
brm()
models. Just add the argument nlpar
, for
example nlpar = 'a'
to compare the asymptote parameter
between compounds. Do this for all three parameters, specifying
pairwise ~ compound
to simultaneously estimate the means
and take the differences between each pair of compounds.
emm_a <- emmeans(bioscreen_gompertz_fit_morepars, pairwise ~ compound, nlpar = 'a')
emm_b <- emmeans(bioscreen_gompertz_fit_morepars, pairwise ~ compound, nlpar = 'b')
emm_c <- emmeans(bioscreen_gompertz_fit_morepars, pairwise ~ compound, nlpar = 'c')
Now use describe_posterior()
again to get equal-tailed
quantile credible intervals around the means, which are in an element
called emmeans
inside each object. The differences are in
an element called contrasts
and we can also use
describe_posterior()
to get their credible intervals. Do
Bayesian hypothesis tests for good measure.
emm_a_post <- describe_posterior(emm_a$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
emm_b_post <- describe_posterior(emm_b$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
emm_c_post <- describe_posterior(emm_c$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
contr_a_post <- describe_posterior(emm_a$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'))
contr_b_post <- describe_posterior(emm_b$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'))
contr_c_post <- describe_posterior(emm_c$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'))
Take a look at the posterior means for the parameters, showing some extra significant figures for the c (growth rate) parameter. They seem fairly similar for the three compounds.
emm_a_post
## Summary of Posterior Distribution
##
## compound | Median | 50% CI | 80% CI | 95% CI
## --------------------------------------------------------------
## A | 3.06 | [3.03, 3.10] | [3.00, 3.12] | [2.97, 3.15]
## AB | 3.15 | [3.12, 3.18] | [3.09, 3.21] | [3.06, 3.24]
## B | 3.19 | [3.16, 3.22] | [3.13, 3.25] | [3.10, 3.28]
emm_b_post
## Summary of Posterior Distribution
##
## compound | Median | 50% CI | 80% CI | 95% CI
## --------------------------------------------------------------------
## A | 22.72 | [22.10, 23.35] | [21.57, 23.94] | [20.98, 24.60]
## AB | 23.89 | [23.23, 24.59] | [22.65, 25.21] | [22.03, 25.93]
## B | 23.65 | [23.00, 24.29] | [22.43, 24.89] | [21.78, 25.62]
print(emm_c_post, digits = 4)
## Summary of Posterior Distribution
##
## compound | Median | 50% CI | 80% CI | 95% CI
## --------------------------------------------------------------------------
## A | 0.0610 | [0.0604, 0.0616] | [0.0599, 0.0621] | [0.0593, 0.0626]
## AB | 0.0593 | [0.0588, 0.0599] | [0.0583, 0.0604] | [0.0577, 0.0610]
## B | 0.0634 | [0.0628, 0.0640] | [0.0623, 0.0645] | [0.0617, 0.0651]
Pairwise contrasts by compound show some modest evidence that the growth rate parameter c is higher for the B compound than the other two compounds.
contr_a_post
## Summary of Posterior Distribution
##
## contrast | Median | 50% CI | 80% CI | 95% CI | p (MAP) | pd
## ---------------------------------------------------------------------------------------
## A - AB | -0.09 | [-0.13, -0.04] | [-0.17, -0.01] | [-0.21, 0.04] | 0.373 | 91.83%
## A - B | -0.13 | [-0.17, -0.09] | [-0.21, -0.05] | [-0.26, 0.00] | 0.145 | 97.55%
## AB - B | -0.04 | [-0.09, 0.00] | [-0.13, 0.04] | [-0.17, 0.09] | 0.843 | 73.37%
contr_b_post
## Summary of Posterior Distribution
##
## contrast | Median | 50% CI | 80% CI | 95% CI | p (MAP) | pd
## --------------------------------------------------------------------------------------
## A - AB | -1.17 | [-1.75, -0.59] | [-2.30, -0.05] | [-2.91, 0.55] | 0.398 | 91.03%
## A - B | -0.92 | [-1.49, -0.34] | [-2.01, 0.18] | [-2.56, 0.78] | 0.574 | 85.92%
## AB - B | 0.27 | [-0.51, 1.06] | [-1.19, 1.71] | [-1.96, 2.48] | 0.988 | 58.92%
print(contr_c_post, digits = 4)
## Summary of Posterior Distribution
##
## contrast | Median | 50% CI | 80% CI
## ------------------------------------------------------------
## A - AB | 0.0017 | [ 0.0011, 0.0022] | [ 0.0006, 0.0027]
## A - B | -0.0024 | [-0.0030, -0.0018] | [-0.0034, -0.0014]
## AB - B | -0.0041 | [-0.0047, -0.0034] | [-0.0054, -0.0028]
##
## contrast | 95% CI | p (MAP) | pd
## ------------------------------------------------
## A - AB | [ 0.0000, 0.0032] | 0.130 | 97.72%
## A - B | [-0.0040, -0.0008] | 0.021 | 99.75%
## AB - B | [-0.0060, -0.0021] | < .001 | 100%
The tables above would be ideal for presenting in a manuscript, with a bit of tweaking.
I think the figure I created above that shows the model predictions superimposed on the data would be good for displaying in a manuscript. I’ve recreated that figure below with a few extra options to make it more publication-quality: changing default axis and legend labels, removing the background grid which some people like but others find distracting, and moving legends inside the plot panel.
ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +
geom_line(aes(y = adjusted_log_OD, group = interaction(compound, rep, well)), data = bioscreen, alpha = 0.25, linewidth = 0.5) +
stat_lineribbon(aes(fill_ramp = after_stat(level)), alpha = 0.5) +
colorblind_friendly +
labs(fill_ramp = 'CI width', x = 'time (h)', y = 'adjusted log(OD)') +
theme(
panel.grid = element_blank(),
legend.position = 'inside',
legend.position.inside = c(0, 1),
legend.justification = c(0, 1),
legend.background = element_blank()
)
As usual, this is some text about the statistical model and its results that you could put in a manuscript. More detail could be added but this includes everything essential.
We fit a nonlinear mixed model to the OD data. The response variable was adjusted by taking the logarithm and subtracting the minimum value of each well so that the lowest value in all wells was zero. We assumed the growth rate of OD with time followed a Gompertz relationship \(f(t) = a e^{-b e^{-ct}}\). A fixed effect for compound was put on each of the three nonlinear parameters: \(a\), the asymptote; \(b\), the offset, and \(c\), the growth rate. In addition, we put a random intercept for each well nested within replicate on the asymptote parameter. We put weakly informative Normal(0, 1) priors on the fixed effects and priors for the intercepts of each nonlinear parameter which were normally distributed with means derived from the point estimates from a least squares nonlinear fit.
We sampled the posterior distribution of model parameters using the Hamiltonian Monte Carlo algorithm with four chains, each run for 1000 warmup iterations (discarded) and 2500 sampling iterations (retained). We compared this model with a similar model that did not allow the \(b\) and \(c\) parameters to vary by compound, and found the more complex model to be a better fit; we present predictions from only that model in what follows.
We obtained posterior estimates of the marginal mean for each nonlinear parameter for each compound and compared the estimates between each pair of compounds. We characterized the posterior distributions using the medians and equal-tailed credible intervals, and tested the contrasts against a null value of zero using the Bayesian probability of direction (\(p_d\)) and maximum a posteriori p-value (\(p_{MAP}\)).
The model predicted a similar asymptote value for adjusted log OD for each of the three compounds (A: 3.06, 95% CI [2.97, 3.15]; B: 3.19 [3.10, 3.28]; AB: 3.15 [3.06, 3.24]). The model predicted a slightly higher growth rate for B (0.063 [0.062, 0.065]) compared to A (0.061 [0.059, 0.063]) and AB (0.059 [0.058, 0.061]). This indicates that the A compound may have been slightly more effective at initially slowing the growth rate but that the long-term inhibition differs very little between the compounds.
Congratulations on making it through yet another Bayesian lesson! If you’ve made it this far you have now had the opportunity to practice a really wide variety of approaches in Bayesian statistical modeling. Today, you’ve learned how to
Well done!
Refit the turfgrass model including interaction terms between week and both the treatments. Use LOO to compare this model with the model we fit previously. What can you conclude? How might this change the way we present the model results?
The Gompertz equation is not the only functional form that we can use to model growth of a microbial population over time. You could also use the von Bertalanffy function:
\[f(t) = L(1 - e^{-k(t - t_0)})\]
In the above equation, \(L\) is the asymptotic population size, \(k\) is the growth rate coefficient, and \(t_0\) acts to offset the position of the curve.
Write the bf()
portion of the model statement of a model
fit to the Bioscreen data, with adjusted_log_OD
as the
response variable. Allow the asymptote parameter L
to vary
with a fixed effect for compound and a random intercept for well nested
within rep. Do not allow the other parameters to vary.