Introduction and setup

This is the fifth in a series of lessons that are a practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan).

Download the worksheet for this lesson here.

What did we learn in Lessons 1 through 4?

We’ve covered the big-picture concepts of Bayesian inference and we’ve been taking a whirlwind tour of many different classes of statistical model. In the past two lessons we’ve covered mixed models with residual covariance structures, generalized additive mixed models, GLMMs with binomial, gamma, and beta distributions, and zero-inflated and hurdle models. That’s a lot so far, but let’s keep on going!

What will we learn in Lesson 5?

We will just keep on adding tools to your Bayesian toolbox … and your statistical toolbox as a whole because these are useful tools even if you don’t do the Bayesian thing.

  • Cumulative logistic mixed models for ordinal data
  • Nonlinear mixed models

As always, I’ve chosen these techniques because they are useful for datasets that are common in ag science and that many ARS scientists work with.

Load packages

library(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)

theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))

options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'on_change')

Cumulative logistic mixed models for ordinal data

In Lesson 2 in this series, we fit a logistic GLMM together. We use that kind of model when we have categorical data with two outcomes. In Lesson 2 the response variable was seed germination. Either a seed germinated (1) or it didn’t (0). But lots of categorical data have more than two outcomes.

Image (c) Allison Horst
Image (c) Allison Horst

There are two kinds of categorical data with multiple outcomes: nominal and ordinal. Nominal categories cannot be sorted into an increasing order, but ordinal categories can. Nominal categories might be something like blood type. There are statistical models for nominal response data, but we won’t focus on that today. In ag science it’s very common to have a response variable on an ordinal rating scale. For example, you might capture a complex disease phenotype by rating level of disease on a scale from 1 to 5. Or there might be a quantitative trait that is too cumbersome to measure directly, like the number of trichomes on a leaf or stem. Do you really want to sit there and count hairs when you can just say low, medium, or high?

Image (c) Univ. Groningen
Image (c) Univ. Groningen

When you are phenotyping individuals, it’s much faster to score their phenotypes on a rating scale than it is to physically measure the underlying trait(s) directly. But this speed has a price. It is not always appropriate to assume that the numerical ratings actually represent a continuous variable and analyze them as such. Luckily, there are statistical models for ordinal response data. They require a larger sample size to be as statistically powerful as a model for continuous data. We’ll fit one in this lesson.

Cumulative logit model: from coins to dice

I mentioned just now that if there are two categories, we can use a logistic, or logit, model. If you recall, the logit or log-odds link function lets us map the “linear predictor,” a term that combines fixed and random effects, and which can be positive or negative, to the probability scale which has to be between 0 and 1. Because the logit model is modeling a binary outcome (0 and 1), there are two probabilities that have to sum to 1. If you know the probability of getting a 1, you know the probability of getting a 0. The probability distribution we used for that model was the binomial distribution. A common example is that you can model a coin flip as following a binomial distribution. It has one parameter, the probability of getting heads.

If you have three or more ordered categories, we have to estimate \(n - 1\) probabilities. Once we know them, we know the nth probability because they all have to sum to 1. Just like in the binary case, these probabilities all have to be between 0 and 1. Now, instead of a binomial distribution, we are going to use a multinomial distribution. The multinomial distribution has \(n - 1\) parameters, the probability of each outcome except for the last. Sounds complicated, but it is nothing more than going from a coin flip to a dice roll. You can model a dice roll as following a multinomial distribution with equal probability, 1/6, of the six different outcomes. (It only has five parameters because once you know the probabilities of rolling a 1, 2, 3, 4, and 5, you know the probability of rolling a 6.)

With a multinomial distribution, we can still use a logit link function. But now there is a slight difference. Let’s say we have four categories, 1-4. We now need to estimate the probability of category 1, the probability of category 2 or less, and the probability of category 3 or less. (The probability of category 4 or less is 1 because 4 is the maximum). These are cumulative probabilities, so we are going to fit a cumulative logit model. We will now have \(n - 1\) link functions, one for each category except for the last. They will map our linear predictors for each category to the cumulative probability scale.

If our probabilities are \(p_{y \le 1}\), \(p_{y \le 2}\), and \(p_{y \le 3}\), the link functions are:

\(\log \frac {p_{y \le 1}}{1-p_{y \le 1}}\), \(\log \frac {p_{y \le 1} + p_{y \le 2}}{1-(p_{y \le 1} + p_{y \le 2})}\), and \(\log \frac {p_{y \le 1} + p_{y \le 2} + p_{y \le 3}}{1-(p_{y \le 1} + p_{y \le 2} + p_{y \le 3})}\).

That’s probably enough equations for the moment. The main point is that we now have to estimate fixed and random effects not just for one outcome, but for multiple outcomes: the cumulative probability of each class.

Image (c) Todd Johnson
Image (c) Todd Johnson

Example dataset: turfgrass color ratings

It’s easier to understand these complex models if we see one in action. So let’s go ahead and meet our first example dataset for this lesson, once again from the ever-useful agridat package: karcher.turfgrass. The dataset comes from a study of the effect of management style and nitrogen level on turfgrass color. Apparently, in the world of turfgrass, color is king. It is a completely randomized design with four levels of management and two levels of nitrogen. The four management levels refer to different ways of applying N (at the surface or injected at different depths), and the two nitrogen levels are low and high amounts. The experimental plots were rated for four weeks. Each week each combination within each replicate was rated a variable number of times. The rating scale was Poor < Average < Good < Excellent.

Load the example dataset, look at the first few rows, and take a look at the rating distribution. Apparently, the color wasn’t too exceptionally great for these samples, because only one of the 128 rows carries an Excellent rating. Also note that the ratings are ordered in alphabetical order. We’ll need to reorder them to ascending order.

data(karcher.turfgrass)
head(karcher.turfgrass)
##   week rep manage nitro  rating count
## 1    1  R1     M1    N1    Poor     1
## 2    1  R1     M1    N2 Average     2
## 3    1  R1     M2    N1    Poor     1
## 4    1  R1     M2    N2    Poor     1
## 5    1  R1     M3    N1 Average     2
## 6    1  R1     M3    N2    Good     3
table(karcher.turfgrass$rating)
## 
##   Average Excellent      Good      Poor 
##        49         1        35        43

Data for a multinomial model in brms need to be in a long format, with a single outcome per row. This data frame is currently in a more compact format with a count of outcomes in each rating in each row. The following code converts the data to long form. In addition, I decided to lump together the Good and Excellent ratings into a single class, “Good+”. This is because having only a single observation of Excellent rated turfgrass doesn’t give us enough information to make a good estimate of the probability of Excellent. Lumping the classes together is not strictly necessary because in a Bayesian context we could use prior distributions to fill in the blanks, but I think that would be relying too heavily on priors. We also convert week to a discrete variable.

karcher_long <- karcher.turfgrass %>%
  mutate(rating = if_else(rating %in% c('Good', 'Excellent'), 'Good+', rating) %>% ordered(levels = c('Poor', 'Average', 'Good+'))) %>%
  group_by(week, rep, manage, nitro) %>%
  reframe(rating = rep(rating, count)) %>%
  mutate(week = factor(week))

Take a look at the data. This is a stacked bar plot showing the number of ratings in each of the three classes, by treatment combination, with a separate panel for each week.

We observe that management treatments M3 and M4 have much higher proportions of Good+ ratings. There is also the suggestion of an interaction with nitrogen application level. Especially in the later two weeks, we see that the high application level (N2) has more Good+ ratings than the low level (N1), mainly in the M3 and M4 management treatments.

ggplot(karcher_long, aes(x = interaction(nitro, manage))) +
  geom_bar(aes(fill = rating), stat = 'count', position = 'stack') +
  facet_wrap(~ week) +
  labs(x = 'Treatment combination')

Fit the model

We want to model the interactive effect of nitrogen treatment nitro and management treatment manage on the cumulative probability of each of the turfgrass color rating classes. To account for the effect of time, we will include week as a categorical fixed effect. This being a relatively small dataset, we won’t try to include interactions between week and treatments, although you are free to explore those models as an exercise. So the fixed effect portion of the model formula is rating ~ nitro * manage + week.

We also want to account for the repeated measurements over time within plots. Measurements of the same plot at different times are not independent of one another. We don’t have the right structure of the data to include a residual covariance structure as we did in Lesson 3. Instead, we will fit a random intercept to each plot. Combination of replicate, nitrogen treatment, and management treatment uniquely identifies plot. So the random effect portion of the model formula is + (1 | rep:nitro:manage).

For the response family, we specify family = cumulative(link = 'logit', threshold = 'flexible'). This tells brm() to fit a cumulative logit model. The flexible threshold means that we are allowing the threshold between each consecutive category to have a different intercept. You may try threshold = 'equidistant', which forces all the intercepts to be the same, for a simpler but less flexible model.

Lastly, we set a weakly informative normal(0, 2) prior on the b or fixed effect parameters. This is reasonable if the model is on a log odds scale because a coefficient of +/- 2 represents a fairly large effect.

karcher_clmm <- brm(rating ~ nitro * manage + week + (1 | rep:nitro:manage),
                    family = cumulative(link = 'logit', threshold = 'flexible'),
                    prior = c(
                      prior(normal(0, 2), class = b)
                    ),
                    data = karcher_long,
                    seed = 928, file = 'fits/karcher_clmm')

Make two quick posterior predictive check plots, one grouped by management and one grouped by nitrogen. They show that the model (dots with error bars) predicts back the data (blue bars) well. It looks like a pretty good fit.

pp_check(karcher_clmm, type = 'bars_grouped', group = 'manage')

pp_check(karcher_clmm, type = 'bars_grouped', group = 'nitro')

Before we make more formal output, we can use the conditional_effects() function to produce some more quick default plots. These show the model predictions. We have to add some extra arguments to specify that we want to see the effect of management within each level of nitrogen.

conditional_effects(karcher_clmm, categorical = TRUE, effects = 'manage', conditions = data.frame(nitro = c('N1', 'N2')))

Model predictions

At this point, for other models we’ve fit in this course, we would usually proceed to using emmeans() to estimate the marginal means for different treatment combinations and compare them. Unfortunately, the emmeans package does not produce the estimates we want from a brm() model fit with family = cumulative. So we have to get creative and write our own code to get the estimates we need.

I will walk you through the process. First, we create a grid of the combinations of nitro, manage, and week from our data; there are 32 such combinations. Then we generate the the expected value of the posterior predictive distribution for each posterior draw for each of those combinations. We do this using add_epred_draws() from tidybayes. Because we have 4000 posterior draws, and our expected value is a vector of three probabilities (one for each of the rating classes), we now have 32 * 4000 * 3 = 384,000 rows. The column with the expected posterior estimate of probability is called .epred by default.

pred_grid <- with(karcher_long, expand_grid(nitro = unique(nitro), manage = unique(manage), week = unique(week)))

karcher_epred <- pred_grid %>%
  add_epred_draws(karcher_clmm, re_formula = ~ 0, category = 'rating')

We also might want to get predictions averaged over all the weeks for each combination of nitro and manage. Calculate that average, call it “overall”, and combine it with the other predictions.

karcher_epred_avgtime <- karcher_epred %>%
  group_by(nitro, manage, .draw, rating) %>%
  summarize(.epred = mean(.epred)) 

karcher_epred <- bind_rows(karcher_epred, karcher_epred_avgtime) %>%
  mutate(week = forcats::fct_na_value_to_level(week, level = 'overall'))

This already gives us enough information to make a plot of the model predictions.

ggplot(karcher_epred, aes(x = interaction(nitro, manage), y = .epred, group = rating)) +
  stat_interval(aes(color = rating, color_ramp = after_stat(level)), .width = c(.5, .8, .95), position = position_dodge(width = .5)) + 
  stat_summary(fun = 'median', geom = 'point', position = position_dodge(width = .5)) +
  facet_wrap(~ week, ncol = 2) +
  colorblind_friendly +
  labs(x = 'Treatment combination', color_ramp = 'CI width')

The plot above is nice because it shows the times and treatment combinations with a higher probability of “Good” and those with a higher probability of “Poor.” But it is a little bit busy. How can we produce a simpler summary?

We can make a slightly sketchy assumption. I know I just told you all that it may not always be appropriate to pretend that ordered categorical scores are continuous variables. But we can use the modeled probabilities of each category to estimate a mean predicted class. This does assume to some extent that it is meaningful to treat the score values as actual numbers. But at least the underlying model was fit treating them as ordered categories.

We can get the mean predicted class by taking a weighted average of the class values, weighted by the probabilities. For example, if we assume Poor = 1, Average = 2, and Good+ = 3, and one posterior draw for one treatment combination estimates P(Poor) = 0.05, P(Average) = 0.20, and P(Good+) = 0.75, we take 0.05 * 1 + 0.20 * 2 + 0.75 * 3 to get an estimated mean class of 2.70.

Estimate the mean predicted class for every posterior draw for each treatment combination.

karcher_epred_meanclass <- karcher_epred %>%
  group_by(nitro, manage, week, .draw) %>%
  summarize(mean_class = weighted.mean(x = as.numeric(rating), w = .epred))

Make a plot of the mean predicted classes, showing the median and quantile uncertainty intervals of the posterior distributions.

ggplot(karcher_epred_meanclass, aes(x = interaction(nitro, manage), y = mean_class)) +
  stat_interval(.width = c(.5, .8, .95)) + 
  stat_summary(fun = 'median', geom = 'point') +
  facet_wrap(~ week, ncol = 2) +
  scale_color_brewer(palette = 'Blues') +
  labs(x = 'Treatment combination', y = 'Predicted mean class', color = 'CI width')

What if we wanted to test the hypothesis that these mean class predictions differ between the treatment combinations? Once again unfortunately we can’t use the handy contrast() function from emmeans. We have to do it manually. I’ve written some code to do this. First, I defined a function to go through a set of values and take all pairwise differences by group. Then, for simplicity, I only take the contrasts between the mean predicted classes averaged over weeks. I do this both for comparisons of nitrogen level within each management level, and comparisons of management level within each nitrogen level. This is done for each posterior draw.

We could also do contrasts of the individual class probabilities, or contrasts within each week . . . anything we want. You just need to figure out how to code that up!

pairwise_diffs <- function(values, group_names, reverse = TRUE) {
  if (reverse) {
    values <- rev(values)
    group_names <- rev(group_names)
  }
  diffs <- apply(combn(values, 2), 2, function(x) x[1] - x[2])
  contrast_names <- apply(combn(as.character(group_names), 2), 2, paste, collapse = '-')
  data.frame(contrast = contrast_names, value = diffs)
}

karcher_mgmt_contrasts <- karcher_epred_meanclass %>% 
  filter(week == 'overall') %>%
  group_by(nitro, .draw) %>%
  reframe(pairwise_diffs(mean_class, manage, reverse = TRUE))

karcher_nitro_contrasts <- karcher_epred_meanclass %>% 
  filter(week == 'overall') %>%
  group_by(manage, .draw) %>%
  reframe(pairwise_diffs(mean_class, nitro, reverse = TRUE))

Now we can generate a table with medians and credible intervals for the contrasts, as well as two Bayesian p-value analogs, \(p_d\) and \(p_{MAP}\) (see previous lessons), using describe_posterior() from bayestestR. Note that because \(p_d\) is based on a finite number of posterior samples, which by default is 4000 because we have 4 chains each with 1000 post-warmup iterations, we cannot make finer distinctions than 1/4000. So, the pd = 1 should really be interpreted as pd > 3999/4000. We would have to increase the number of posterior samples to improve on that result. Because \(p_{MAP}\) is based on a smoothed kernel density estimate of the posterior distribution, it doesn’t necessarily have this strict limitation. But if you really want to characterize the tails of a distribution well, as we do for these kind of p-value analogs, you need a lot of samples. The default of 4000 is often inadequate.

The tables below might be something you would want to include in a manuscript.

(karcher_mgmt_contrast_post <- karcher_mgmt_contrasts %>%
  group_by(nitro, contrast) %>%
  summarize(describe_posterior(value, ci_method = 'eti', ci_width = c(.66, .95), test = c('pd', 'p_map'))))
## # A tibble: 12 × 9
## # Groups:   nitro [2]
##    nitro contrast Parameter Median    CI CI_low CI_high   p_MAP    pd
##    <fct> <chr>    <chr>      <dbl> <dbl>  <dbl>   <dbl>   <dbl> <dbl>
##  1 N1    M2-M1    Posterior -0.243  0.95 -0.475 -0.0241 0.0977  0.985
##  2 N1    M3-M1    Posterior  0.951  0.95  0.595  1.22   0       1    
##  3 N1    M3-M2    Posterior  1.19   0.95  0.871  1.44   0       1    
##  4 N1    M4-M1    Posterior  0.720  0.95  0.390  0.965  0.00163 1.00 
##  5 N1    M4-M2    Posterior  0.964  0.95  0.655  1.20   0       1    
##  6 N1    M4-M3    Posterior -0.231  0.95 -0.535  0.0729 0.254   0.940
##  7 N2    M2-M1    Posterior -0.240  0.95 -0.598  0.0996 0.382   0.920
##  8 N2    M3-M1    Posterior  1.08   0.95  0.755  1.34   0       1    
##  9 N2    M3-M2    Posterior  1.32   0.95  0.977  1.61   0       1    
## 10 N2    M4-M1    Posterior  0.944  0.95  0.564  1.25   0       1    
## 11 N2    M4-M2    Posterior  1.18   0.95  0.822  1.50   0       1    
## 12 N2    M4-M3    Posterior -0.128  0.95 -0.405  0.126  0.553   0.860
(karcher_nitro_contrast_post <- karcher_nitro_contrasts %>%
  group_by(manage, contrast) %>%
  summarize(describe_posterior(value, ci_method = 'eti', ci_width = c(.66, .95), test = c('pd', 'p_map'))))
## # A tibble: 4 × 9
## # Groups:   manage [4]
##   manage contrast Parameter Median    CI CI_low CI_high   p_MAP    pd
##   <fct>  <chr>    <chr>      <dbl> <dbl>  <dbl>   <dbl>   <dbl> <dbl>
## 1 M1     N2-N1    Posterior  0.431  0.95  0.159   0.683 0.0118  0.998
## 2 M2     N2-N1    Posterior  0.430  0.95  0.125   0.726 0.0295  0.996
## 3 M3     N2-N1    Posterior  0.554  0.95  0.306   0.803 0.00268 1.00 
## 4 M4     N2-N1    Posterior  0.654  0.95  0.357   0.924 0       1

Presentation ideas

Figure

Here is a version of the estimated probabilities figure that is slightly modified to be a little more presentation-friendly (remember I told you that the mean class prediction figure relies on somewhat sketchy assumptions to turn these individual probabilities into one number per treatment combination). Because our model didn’t try to capture interactions between treatment and week, this figure only presents the means averaged over weeks to make the figure easier to digest. Instead, I have separated it out by nitrogen treatment and rating class into a grid of panels.

I think this conveys fairly clearly that the combination of low N addition and M1 or M2 treatment has the highest probability of Poor rating and the lowest probability of Good rating. In contrast, high N addition and M3 or M4 treatment has almost zero probability of a Poor rating and a reasonably high probability of Good. This tells the story without having to “cheat” and treat the rating classes as if they were continuous variables.

ggplot(karcher_epred %>% filter(week == 'overall'), aes(x = manage, y = .epred)) +
  stat_interval(aes(color = rating, color_ramp = after_stat(level)), .width = c(.5, .8, .95)) + 
  stat_summary(fun = 'median', geom = 'point') +
  facet_grid(rating ~ nitro,
             labeller = labeller(
               nitro = c('N1' = 'low N addition', 'N2' = 'high N addition')
             )) +
  colorblind_friendly +
  labs(x = 'Management treatment', y = 'Modeled class probability', color_ramp = 'CI width')

Methods and results

As in previous lessons, here is a description of the methods and results that you could include in a manuscript. The results section in particular could include more detailed results but this is a general idea of how they should be described.

Methods

We fit a generalized linear mixed model to the turfgrass color rating data, pooling Good and Excellent to a single category for a total of three categories. The model assumed a multinomial response distribution with cumulative logit link function. We allowed the intercept of each linear predictor to vary by rating category but estimated a single set of fixed effects (nitrogen treatment, management treatment, week as a discrete variable, and all their interactions) and random effects (random intercept for replicate nested within treatment combination) for each category threshold. We put weakly informative Normal(0, 2) priors on all fixed effects. Using Hamiltonian Monte Carlo, we sampled 1000 warmup iterations (discarded) and 1000 sampling iterations for each of four chains.

We generated posterior distributions of the estimated marginal probability of each rating category, and the mean category prediction, for each combination of treatment and week and averaged over weeks. We compared the mean category prediction for each pair of treatments averaged over weeks, testing against a null hypothesis of no difference using the Bayesian probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\). We characterized the posterior distributions using the medians and equal-tailed credible intervals.

Results

The probability of higher color ratings tended to be higher in the M3 and M4 management treatments relative to M1 and M2, and higher in the high N addition treatment relative to the low N addition treatment. There were only minimal interactions between management treatment and nitrogen addition level, and the relative differences between the treatments varied very little over time.

(At this point I would refer to specific tables and figures because it would become cumbersome to repeat many probabilities in the text).

Nonlinear mixed models

In Lesson 3, we learned how to use generalized additive models to model spatial variation. GAMs are called “semiparametric,” meaning that we don’t have a strong idea about the functional form of the relationship and we are willing to be flexible. But what if we have a strong theoretical expectation for how two variables should be related? We might have a nonlinear equation we want to fit. We can easily use brms to estimate the parameters of any nonlinear equation we want, with any error distribution we want. We also have full flexibility in terms of specifying fixed and random effects on any of the parameters.

Microbial growth curves

The situation where this comes up most in the ag science research we do in ARS is when fitting growth curves, for example to microbial biomass data. You might have a hypothesis, based on existing knowledge and theory, of what functional form that growth curve should have. For example, theory predicts that a population of cells will grow exponentially, at an increasing rate over time, until they start crowding each other and bumping into the limits imposed by their environment. The growth will level off. Roughly speaking, you will see an S-shaped pattern. Although we could come up with more than one equation that looks like an S-curve, one that has been shown to model this kind of growth fairly well is the Gompertz equation:

\[f(t) = a e^{-b e^{-ct}}\]

This curve, which models population size as a function of time \(t\), has three parameters, a, b, and c. We can think of a as the carrying capacity of the environment, or the asymptote of the curve. As \(t\) gets larger, the \(e^{-b e^{-ct}}\) part of the equation approaches 1, so we’re modeling the population size as approaching \(a\). The \(b\) parameter offsets the curve to the left or right along the time axis. The \(c\) parameter determines the growth rate. As it increases, the population grows faster toward the carrying capacity.

Here are some example Gompertz curves with different parameters.

Example data

This example dataset is a small subset of a real ARS dataset. In this study, one of the research questions was how much a class of compounds called pyrrocidines inhibits the growth of Fusarium mold. The experiment was run using a technology called Bioscreen. Inside the machine is a growth plate with many wells. There were three replicate plates. I have only provided you with a subset of the treatments that were on each plate. The researchers measured Fusarium growth in the presence of three different compounds: A, B, and an equal mixture of A and B.

The size of the population in each well was measured at regular time intervals (in the example dataset I’ve provided you with the measurement from every fifth hour). It isn’t possible to directly measure the biomass in the wells. Instead, the reader shines a light through each well and measures how much of the light was absorbed by the stuff in the well. That is called the optical density or OD. This turns out to correlate very closely with the amount of biomass in the well, so that is what we are going to use as a proxy for biomass.

Let’s take a look at the data. The dataset is hosted on the GitHub repository for this course. Alternatively, if you are doing this course on the cloud server, you may read the data from the local data directory.

bioscreen <- read_csv('https://usda-ree-ars.github.io/SEAStats/brms_crash_course/bioscreen.csv')
bioscreen <- read_csv('data/bioscreen.csv')

Convert replicate, well, and treatment columns to factor and look at the first few rows of the dataset.

bioscreen <- bioscreen %>%
  mutate(across(c(rep, well, compound), factor))

head(bioscreen)
## # A tibble: 6 × 10
##   strain rep    date well  trt    HOUR Time        OD compound concentration
##   <chr>  <fct> <dbl> <fct> <chr> <dbl> <chr>    <dbl> <fct>            <dbl>
## 1 WT     R1    81322 51    A 5       0 0:01:21  0.066 A                    5
## 2 WT     R1    81322 51    A 5       5 5:00:06  0.064 A                    5
## 3 WT     R1    81322 51    A 5      10 10:00:06 0.064 A                    5
## 4 WT     R1    81322 51    A 5      15 15:00:06 0.063 A                    5
## 5 WT     R1    81322 51    A 5      20 20:00:06 0.064 A                    5
## 6 WT     R1    81322 51    A 5      25 25:00:06 0.062 A                    5

Plot the data with a thin semi-transparent line for each well and a thick line showing the median of each treatment. We see that when we put OD on a logarithmic scale, it follows an S-shaped trend that closely resembles the Gompertz curves we plotted earlier.

ggplot(bioscreen, aes(x = HOUR, y = OD, color = compound, group = interaction(compound, rep, well))) +
  geom_line(alpha = 0.25, linewidth = 0.5) + 
  stat_summary(aes(group = compound), fun = 'median', geom = 'line', linewidth = 1.5) +
  scale_y_log10() +
  colorblind_friendly

As it turns out, because the Gompertz equation requires positive response values, and many of our log(OD) values are negative, we need to transform the data by taking the logarithm and then subtracting the minimum within each well so that the lowest value is zero.

bioscreen <- bioscreen %>%
  group_by(well, rep) %>%
  mutate(adjusted_log_OD = log(OD) - min(log(OD)))

Here’s what that looks like:

ggplot(bioscreen, aes(x = HOUR, y = adjusted_log_OD, color = compound, group = interaction(compound, rep, well))) +
  geom_line(alpha = 0.25, linewidth = 0.5) + 
  stat_summary(aes(group = compound), fun = 'median', geom = 'line', linewidth = 1.5) +
  colorblind_friendly

Fit the model

The first thing we will do is quickly fit a frequentist nonlinear least squares model to the data. We’re using nls() to fit a Gompertz curve to the whole dataset ignoring any fixed and random effects, just to get an idea of what prior distributions to put on the intercepts for the three parameters of the curve.

nls(adjusted_log_OD ~ A * exp(-B * exp(-C * HOUR)), data = bioscreen, start = list(A = 3, B = 1, C = 0.1))
## Nonlinear regression model
##   model: adjusted_log_OD ~ A * exp(-B * exp(-C * HOUR))
##    data: bioscreen
##        A        B        C 
##  3.14552 23.26036  0.06075 
##  residual sum-of-squares: 118.1
## 
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 2.355e-06

Now it’s time to fit our full model. Just like how slopes and intercept terms in a linear mixed model can each have any number of fixed and random effects, we can put fixed and random effects in any combination on any of the three parameters in our model, a, b, and c. To do this, we will use the bf() function for creating multi-part formulas. We will pass a nonlinear formula to bf() as its first argument, the Gompertz equation: adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)). Then we’ll define a formula for each of the parameters of the nonlinear equation. That’s where we include the fixed and random effects.

  • a ~ compound + (1 | well:rep) means we are going to allow the asymptote parameter a to vary as a function of compound (discrete fixed effect) and well nested within rep (random intercept).
  • b ~ 1 and c ~ 1 means we are going to fit a single b and c parameter across all compounds and wells.

As a last argument to bf() we pass nl = TRUE to tell bf() this is a nonlinear model.

We are assuming normally distributed error so we will not have a family argument. That might not be ideal because variance increases with time, but it is OK for a first try. You can try other options as an exercise.

The priors include normal distributions on the intercept for each of the three parameters, with means that we determined from nls() above. We also include a normal(0, 1) prior on the fixed effects of compound on the asymptote a. This is reasonable because the maximum value of adjusted log OD only ranges between about 2 and 3. A fixed effect coefficient of 1 would represent a huge effect, so normal(0, 1) is a conservative prior that gives relatively even prior probability to both large and small effects.

Finally, notice that I have specified 3500 total iterations of which 1000 are warmups per chain, for a total of 2500 * 4 = 10000 sampling iterations. Because this is a fairly complex model, more samples than the default 4000 are needed to characterize the distributions well.

bioscreen_gompertz_fit <- brm(
  bf(
    adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)),
    a ~ compound + (1 | well:rep),
    b ~ 1,
    c ~ 1,
    nl = TRUE
  ),
  prior = c(
    prior(normal(0, 1), class = b, nlpar = a),
    prior(normal(3, 1), class = b, nlpar = a, coef = Intercept),
    prior(normal(20, 5), class = b, nlpar = b, coef = Intercept),
    prior(normal(0, 1), class = b, nlpar = c, coef = Intercept)
  ),
  data = bioscreen,
  iter = 3500, warmup = 1000, chains = 4,
  seed = 950, file = 'fits/bioscreen_gompertz_fit'
)

Look at the posterior predictive check plot. The data have a bimodal distribution, which you would expect from the S-curve shape. The populations spend a long time at a small size, quickly increase, then spend a long time near the carrying capacity. The model does a decent job of capturing this distribution, but it slightly underpredicts the lowest values around 0 and overpredicts intermediate values. As mentioned above, this may be because we are assuming normally distributed error and not accounting for the variance increasing with the mean.

pp_check(bioscreen_gompertz_fit)

Another way we might consider improving the model is to allow the b and c parameters to vary with compound as well. The previous model only fit a single intercept for each of those two parameters. The following model adds the fixed effect ~ compound to the right-hand side of the formulas for b and c. We now also specify normal(0, 1) priors for those fixed effects. It is probably excessive to also include random effects of well on those parameters, but you could try that if you are interested in seeing how that affects things.

bioscreen_gompertz_fit_morepars <- brm(
  bf(
    adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)),
    a ~ compound + (1 | well:rep),
    b ~ compound,
    c ~ compound,
    nl = TRUE
  ),
  prior = c(
    prior(normal(0, 1), class = b, nlpar = a),
    prior(normal(3, 1), class = b, nlpar = a, coef = Intercept),
    prior(normal(0, 1), class = b, nlpar = b),
    prior(normal(20, 5), class = b, nlpar = b, coef = Intercept),
    prior(normal(0, 1), class = b, nlpar = c),
    prior(normal(0, 1), class = b, nlpar = c, coef = Intercept)
  ),
  data = bioscreen,
  iter = 3500, warmup = 1000, chains = 4,
  seed = 951, file = 'fits/bioscreen_gompertz_fit_morepars'
)

The posterior predictive check plot is similar to the previous one. Again, we have the issue of underpredicting values near zero and overpredicting in the “saddle” portion of the distribution.

pp_check(bioscreen_gompertz_fit_morepars)

Compare the two models using leave-one-out cross-validation. We see that the model with additional fixed effects performs much better. Let’s go ahead and use that model for predictions.

bioscreen_gompertz_fit <- add_criterion(bioscreen_gompertz_fit, 'loo')
bioscreen_gompertz_fit_morepars <- add_criterion(bioscreen_gompertz_fit_morepars, 'loo')

loo_compare(bioscreen_gompertz_fit, bioscreen_gompertz_fit_morepars)
##                                 elpd_diff se_diff
## bioscreen_gompertz_fit_morepars   0.0       0.0  
## bioscreen_gompertz_fit          -77.5      18.4

Model predictions

Create a prediction grid with all combinations of time (evenly spaced) and compound, and use add_epred_draws() with re_formula set to ~ 0 to get the expected value of the posterior predictive for each combination with random effects set to zero. I also set ndraws = 1000 to only get 1000 of the 10000 posterior draws, which is plenty for a quick figure. For a publication you may want to use all the posterior draws. The resulting data frame has 363000 rows (121 time points * 3 compounds * 1000 posterior draws).

bioscreen_pred <- expand_grid(
  HOUR = 0:120,
  compound = unique(bioscreen$compound)
) %>%
  add_epred_draws(bioscreen_gompertz_fit_morepars, re_formula = ~ 0, ndraws = 1000, seed = 1) 

Here are two plots. The first one just shows the model predictions. Nice and clean! The second superimposes the model predictions on the raw data. I must admit that I prefer the second one. Not only does it show the model predictions, it also shows how well the model fits the data. It also emphasizes that the model predictions, valuable though they are, are an artificially clean and neat way to convey information about the underlying messy reality of the data.

ggplot(bioscreen_pred, aes(x = HOUR, y = .epred)) +
  stat_lineribbon(aes(fill = compound, color = compound, fill_ramp = after_stat(level)), alpha = 0.5) +
  colorblind_friendly

ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +
  geom_line(aes(y = adjusted_log_OD, group = interaction(compound, rep, well)), data = bioscreen, alpha = 0.25, linewidth = 0.5) +
  stat_lineribbon(aes(fill_ramp = after_stat(level)), alpha = 0.5) +
  colorblind_friendly

Next, let’s estimate some means. Because we fit a parametric nonlinear model, we can do hypothesis tests of how much any of the estimated parameters differs between the compounds. We can actually use emmeans() for this because it supports nonlinear brm() models. Just add the argument nlpar, for example nlpar = 'a' to compare the asymptote parameter between compounds. Do this for all three parameters, specifying pairwise ~ compound to simultaneously estimate the means and take the differences between each pair of compounds.

emm_a <- emmeans(bioscreen_gompertz_fit_morepars, pairwise ~ compound, nlpar = 'a')
emm_b <- emmeans(bioscreen_gompertz_fit_morepars, pairwise ~ compound, nlpar = 'b')
emm_c <- emmeans(bioscreen_gompertz_fit_morepars, pairwise ~ compound, nlpar = 'c')

Now use describe_posterior() again to get equal-tailed quantile credible intervals around the means, which are in an element called emmeans inside each object. The differences are in an element called contrasts and we can also use describe_posterior() to get their credible intervals. Do Bayesian hypothesis tests for good measure.

emm_a_post <- describe_posterior(emm_a$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
emm_b_post <- describe_posterior(emm_b$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
emm_c_post <- describe_posterior(emm_c$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)

contr_a_post <- describe_posterior(emm_a$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'))
contr_b_post <- describe_posterior(emm_b$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'))
contr_c_post <- describe_posterior(emm_c$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'))

Take a look at the posterior means for the parameters, showing some extra significant figures for the c (growth rate) parameter. They seem fairly similar for the three compounds.

emm_a_post
## Summary of Posterior Distribution
## 
## compound | Median |       50% CI |       80% CI |       95% CI
## --------------------------------------------------------------
## A        |   3.06 | [3.03, 3.10] | [3.00, 3.12] | [2.97, 3.15]
## AB       |   3.15 | [3.12, 3.18] | [3.09, 3.21] | [3.06, 3.24]
## B        |   3.19 | [3.16, 3.22] | [3.13, 3.25] | [3.10, 3.28]
emm_b_post
## Summary of Posterior Distribution
## 
## compound | Median |         50% CI |         80% CI |         95% CI
## --------------------------------------------------------------------
## A        |  22.72 | [22.10, 23.35] | [21.57, 23.94] | [20.98, 24.60]
## AB       |  23.89 | [23.23, 24.59] | [22.65, 25.21] | [22.03, 25.93]
## B        |  23.65 | [23.00, 24.29] | [22.43, 24.89] | [21.78, 25.62]
print(emm_c_post, digits = 4)
## Summary of Posterior Distribution
## 
## compound | Median |           50% CI |           80% CI |           95% CI
## --------------------------------------------------------------------------
## A        | 0.0610 | [0.0604, 0.0616] | [0.0599, 0.0621] | [0.0593, 0.0626]
## AB       | 0.0593 | [0.0588, 0.0599] | [0.0583, 0.0604] | [0.0577, 0.0610]
## B        | 0.0634 | [0.0628, 0.0640] | [0.0623, 0.0645] | [0.0617, 0.0651]

Pairwise contrasts by compound show some modest evidence that the growth rate parameter c is higher for the B compound than the other two compounds.

contr_a_post
## Summary of Posterior Distribution
## 
## contrast | Median |         50% CI |         80% CI |         95% CI | p (MAP) |     pd
## ---------------------------------------------------------------------------------------
## A - AB   |  -0.09 | [-0.13, -0.04] | [-0.17, -0.01] | [-0.21,  0.04] |   0.373 | 91.83%
## A - B    |  -0.13 | [-0.17, -0.09] | [-0.21, -0.05] | [-0.26,  0.00] |   0.145 | 97.55%
## AB - B   |  -0.04 | [-0.09,  0.00] | [-0.13,  0.04] | [-0.17,  0.09] |   0.843 | 73.37%
contr_b_post
## Summary of Posterior Distribution
## 
## contrast | Median |         50% CI |         80% CI |        95% CI | p (MAP) |     pd
## --------------------------------------------------------------------------------------
## A - AB   |  -1.17 | [-1.75, -0.59] | [-2.30, -0.05] | [-2.91, 0.55] |   0.398 | 91.03%
## A - B    |  -0.92 | [-1.49, -0.34] | [-2.01,  0.18] | [-2.56, 0.78] |   0.574 | 85.92%
## AB - B   |   0.27 | [-0.51,  1.06] | [-1.19,  1.71] | [-1.96, 2.48] |   0.988 | 58.92%
print(contr_c_post, digits = 4)
## Summary of Posterior Distribution
## 
## contrast |  Median |             50% CI |             80% CI
## ------------------------------------------------------------
## A - AB   |  0.0017 | [ 0.0011,  0.0022] | [ 0.0006,  0.0027]
## A - B    | -0.0024 | [-0.0030, -0.0018] | [-0.0034, -0.0014]
## AB - B   | -0.0041 | [-0.0047, -0.0034] | [-0.0054, -0.0028]
## 
## contrast |             95% CI | p (MAP) |     pd
## ------------------------------------------------
## A - AB   | [ 0.0000,  0.0032] |  0.130  | 97.72%
## A - B    | [-0.0040, -0.0008] |  0.021  | 99.75%
## AB - B   | [-0.0060, -0.0021] |  < .001 |   100%

The tables above would be ideal for presenting in a manuscript, with a bit of tweaking.

Presentation ideas

Figure

I think the figure I created above that shows the model predictions superimposed on the data would be good for displaying in a manuscript. I’ve recreated that figure below with a few extra options to make it more publication-quality: changing default axis and legend labels, removing the background grid which some people like but others find distracting, and moving legends inside the plot panel.

ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +
  geom_line(aes(y = adjusted_log_OD, group = interaction(compound, rep, well)), data = bioscreen, alpha = 0.25, linewidth = 0.5) +
  stat_lineribbon(aes(fill_ramp = after_stat(level)), alpha = 0.5) +
  colorblind_friendly +
  labs(fill_ramp = 'CI width', x = 'time (h)', y = 'adjusted log(OD)') +
  theme(
    panel.grid = element_blank(),
    legend.position = 'inside',
    legend.position.inside = c(0, 1),
    legend.justification = c(0, 1),
    legend.background = element_blank()
  )

Methods and results

As usual, this is some text about the statistical model and its results that you could put in a manuscript. More detail could be added but this includes everything essential.

Methods

We fit a nonlinear mixed model to the OD data. The response variable was adjusted by taking the logarithm and subtracting the minimum value of each well so that the lowest value in all wells was zero. We assumed the growth rate of OD with time followed a Gompertz relationship \(f(t) = a e^{-b e^{-ct}}\). A fixed effect for compound was put on each of the three nonlinear parameters: \(a\), the asymptote; \(b\), the offset, and \(c\), the growth rate. In addition, we put a random intercept for each well nested within replicate on the asymptote parameter. We put weakly informative Normal(0, 1) priors on the fixed effects and priors for the intercepts of each nonlinear parameter which were normally distributed with means derived from the point estimates from a least squares nonlinear fit.

We sampled the posterior distribution of model parameters using the Hamiltonian Monte Carlo algorithm with four chains, each run for 1000 warmup iterations (discarded) and 2500 sampling iterations (retained). We compared this model with a similar model that did not allow the \(b\) and \(c\) parameters to vary by compound, and found the more complex model to be a better fit; we present predictions from only that model in what follows.

We obtained posterior estimates of the marginal mean for each nonlinear parameter for each compound and compared the estimates between each pair of compounds. We characterized the posterior distributions using the medians and equal-tailed credible intervals, and tested the contrasts against a null value of zero using the Bayesian probability of direction (\(p_d\)) and maximum a posteriori p-value (\(p_{MAP}\)).

Results

The model predicted a similar asymptote value for adjusted log OD for each of the three compounds (A: 3.06, 95% CI [2.97, 3.15]; B: 3.19 [3.10, 3.28]; AB: 3.15 [3.06, 3.24]). The model predicted a slightly higher growth rate for B (0.063 [0.062, 0.065]) compared to A (0.061 [0.059, 0.063]) and AB (0.059 [0.058, 0.061]). This indicates that the A compound may have been slightly more effective at initially slowing the growth rate but that the long-term inhibition differs very little between the compounds.

Conclusion

Congratulations on making it through yet another Bayesian lesson! If you’ve made it this far you have now had the opportunity to practice a really wide variety of approaches in Bayesian statistical modeling. Today, you’ve learned how to

  • Fit a statistical model when you have ordered categorical response data
  • Make different kinds of predictions and comparisons using a Bayesian cumulative logit-link mixed model
  • Fit a nonlinear model with fixed and random effects
  • Compare parameters of a nonlinear model between treatment groups in a Bayesian framework

Well done!

Exercises

Exercise 1

Refit the turfgrass model including interaction terms between week and both the treatments. Use LOO to compare this model with the model we fit previously. What can you conclude? How might this change the way we present the model results?

Exercise 2

The Gompertz equation is not the only functional form that we can use to model growth of a microbial population over time. You could also use the von Bertalanffy function:

\[f(t) = L(1 - e^{-k(t - t_0)})\]

In the above equation, \(L\) is the asymptotic population size, \(k\) is the growth rate coefficient, and \(t_0\) acts to offset the position of the curve.

Write the bf() portion of the model statement of a model fit to the Bioscreen data, with adjusted_log_OD as the response variable. Allow the asymptote parameter L to vary with a fixed effect for compound and a random intercept for well nested within rep. Do not allow the other parameters to vary.

Click here for answers