...
with code)Image (c) Allison Horst
karcher.turfgrass
from agridat packagelibrary(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)
theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))
options(mc.cores = 4, brms.file_refit = 'on_change') # If you don't have CmdStan
options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr') # If you do have CmdStan
ordered()
to set ascending orderrating ~ nitro * manage + week
nitro
and manage
on cumulative probability of each rating classweek
as categorical fixed effect(1 | rep:nitro:manage)
family = cumulative(link = 'logit', threshold = 'flexible')
flexible
means allow the threshold between each consecutive category to have a different interceptnormal(0, 2)
prior on fixed effects (reasonable for log-odds scale)Posterior predictive check
Quick way to look at model predictions (effect of management within each level of nitrogen)
emmeans()
to estimate and compare treatment marginal meansfamily = cumulative
in brmsre_formula = ~ 0
, just as emmeans()
does'rating
, posterior estimate column called '.epred'
by default'overall'
in the week
column and combine with the other predictionsggplot(karcher_epred, aes(x = interaction(nitro, manage), y = .epred, group = rating)) +
stat_interval(aes(color = rating, color_ramp = after_stat(level)), .width = c(.5, .8, .95), position = position_dodge(width = .5)) +
stat_summary(fun = 'median', geom = 'point', position = position_dodge(width = .5)) +
facet_wrap(~ week, ncol = 2) +
colorblind_friendly +
labs(x = 'Treatment combination', color_ramp = 'CI width')
0.05 * 1 + 0.20 * 2 + 0.75 * 3
or 2.70.ggplot(karcher_epred_meanclass, aes(x = interaction(nitro, manage), y = mean_class)) +
stat_interval(.width = c(.5, .8, .95)) +
stat_summary(fun = 'median', geom = 'point') +
facet_wrap(~ week, ncol = 2) +
scale_color_brewer(palette = 'Blues') +
labs(x = 'Treatment combination', y = 'Predicted mean class', color = 'CI width')
contrast()
from emmeans to do this but now we use compare_levels()
from tidybayesdescribe_posterior()
generates table of medians and credible intervals for contrastsp = 0
is really p < 1/4000
(karcher_mgmt_contrast_post <- karcher_mgmt_contrasts %>%
group_by(nitro, manage) %>%
summarize(describe_posterior(mean_class, ci_method = 'eti', ci_width = c(.66, .95), test = c('pd', 'p_map'), as_p = TRUE)))
(karcher_nitro_contrast_post <- karcher_nitro_contrasts %>%
group_by(manage, nitro) %>%
summarize(describe_posterior(mean_class, ci_method = 'eti', ci_width = c(.66, .95), test = c('pd', 'p_map'), as_p = TRUE)))
We fit a generalized linear mixed model to the turfgrass color rating data, pooling Good and Excellent to a single category for a total of three categories. The model assumed a multinomial response distribution with cumulative logit link function. We allowed the intercept of each linear predictor to vary by rating category but estimated a single set of fixed effects (nitrogen treatment, management treatment, week as a discrete variable, and all their interactions) and random effects (random intercept for replicate nested within treatment combination) for each category threshold. We put weakly informative Normal(0, 2) priors on all fixed effects. Using Hamiltonian Monte Carlo, we sampled 1000 warmup iterations (discarded) and 1000 sampling iterations for each of four chains.
We generated posterior distributions of the estimated marginal probability of each rating category, and the mean category prediction, for each combination of treatment and week and averaged over weeks. We compared the mean category prediction for each pair of treatments averaged over weeks, testing against a null hypothesis of no difference using the Bayesian probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\). We characterized the posterior distributions using the medians and equal-tailed credible intervals.
The probability of higher color ratings tended to be higher in the M3 and M4 management treatments relative to M1 and M2, and higher in the high N addition treatment relative to the low N addition treatment. There were only minimal interactions between management treatment and nitrogen addition level, and the relative differences between the treatments varied very little over time.
(At this point I would refer to specific tables and figures because it would become cumbersome to repeat many probabilities in the text).
ggplot(karcher_epred %>% filter(week == 'overall'), aes(x = manage, y = .epred)) +
stat_interval(aes(color = rating, color_ramp = after_stat(level)), .width = c(.5, .8, .95)) +
stat_summary(fun = 'median', geom = 'point') +
facet_grid(rating ~ nitro,
labeller = labeller(
nitro = c('N1' = 'low N addition', 'N2' = 'high N addition')
)) +
colorblind_friendly +
labs(x = 'Management treatment', y = 'Modeled class probability', color_ramp = 'CI width')
\[f(t) = a e^{-b e^{-ct}}\]
Read from CSV in the data
folder on cloud server/SCINet
If not on cloud server/SCINet, read from GitHub URL
bioscreen <- bioscreen %>%
mutate(across(c(rep, well), factor),
compound = factor(compound, levels = c('A','B','AB'), labels = c('A', 'B', 'A + B')))
head(bioscreen)
ggplot(bioscreen, aes(x = HOUR, y = OD, color = compound, group = interaction(compound, rep, well))) +
geom_line(alpha = 0.25, linewidth = 0.5) +
stat_summary(aes(group = compound), fun = 'median', geom = 'line', linewidth = 1.5) +
scale_y_log10() +
colorblind_friendly
Note logarithmic scale on y-axis
log(OD)
is negative for many values in our raw databf()
function with nl = TRUE
to create multi-line formulabf()
adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR))
a
to vary as a function of compound (fixed) and well nested within rep (random)
a ~ compound + (1 | well:rep)
b
and c
parameter across all compounds and wells
b ~ 1
and c ~ 1
bioscreen_gompertz_fit <- brm(
bf(
adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)),
a ~ compound + (1 | well:rep),
b ~ 1,
c ~ 1,
nl = TRUE
),
prior = c(
prior(normal(0, 1), class = b, nlpar = a),
prior(normal(3, 1), class = b, nlpar = a, coef = Intercept),
prior(normal(20, 5), class = b, nlpar = b, coef = Intercept),
prior(normal(0, 1), class = b, nlpar = c, coef = Intercept)
),
data = bioscreen,
iter = 3500, warmup = 1000, chains = 4,
seed = 950, file = 'fits/bioscreen_gompertz_fit'
)
family
)normal(0, 1)
prior on fixed effects on a
(conservative given range of log OD)nlpar
argumentbioscreen_gompertz_fit_morepars <- brm(
bf(
adjusted_log_OD ~ a * exp(-b * exp(-c * HOUR)),
a ~ compound + (1 | well:rep),
b ~ compound,
c ~ compound,
nl = TRUE
),
prior = c(
prior(normal(0, 1), class = b, nlpar = a),
prior(normal(3, 1), class = b, nlpar = a, coef = Intercept),
prior(normal(0, 1), class = b, nlpar = b),
prior(normal(20, 5), class = b, nlpar = b, coef = Intercept),
prior(normal(0, 1), class = b, nlpar = c),
prior(normal(0, 1), class = b, nlpar = c, coef = Intercept)
),
data = bioscreen,
iter = 3500, warmup = 1000, chains = 4,
seed = 951, file = 'fits/bioscreen_gompertz_fit_morepars'
)
b
and c
to vary with compound alsob
and c
normal(0, 1)
priors on fixed effectsPosterior predictive check plot for more complex models
Compare the two models using leave-one-out cross-validation
add_epred_draws()
to get expected value of posterior predictive
re_formula = ~ 0
sets random effects to zerondraws = 1000
only gets 1000 of the 10000 posterior draws (plenty for a quick plot)ggplot(bioscreen_pred, aes(x = HOUR, y = .epred)) +
stat_lineribbon(aes(fill = compound, color = compound, fill_ramp = after_stat(level)), alpha = 0.5) +
colorblind_friendly
ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +
geom_line(aes(y = adjusted_log_OD, group = interaction(compound, rep, well)), data = bioscreen, alpha = 0.25, linewidth = 0.5) +
stat_lineribbon(aes(fill_ramp = after_stat(level)), alpha = 0.5) +
colorblind_friendly
emmeans()
supports nonlinear brms models using nlpar
argumentpairwise ~ compound
estimates means and takes pairwise differences between compoundsemm_a_post <- describe_posterior(emm_a$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
emm_b_post <- describe_posterior(emm_b$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
emm_c_post <- describe_posterior(emm_c$emmeans, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = NULL)
contr_a_post <- describe_posterior(emm_a$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'), as_p = TRUE)
contr_b_post <- describe_posterior(emm_b$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'), as_p = TRUE)
contr_c_post <- describe_posterior(emm_c$contrasts, ci_method = 'eti', ci = c(0.5, 0.8, 0.95), test = c('pd', 'p_map'), as_p = TRUE)
$emmeans
objects$contrasts
objectsWe fit a nonlinear mixed model to the OD data. The response variable was adjusted by taking the logarithm and subtracting the minimum value of each well so that the lowest value in all wells was zero. We assumed the growth rate of OD with time followed a Gompertz relationship \(f(t) = a e^{-b e^{-ct}}\). A fixed effect for compound was put on each of the three nonlinear parameters: \(a\), the asymptote; \(b\), the offset, and \(c\), the growth rate. In addition, we put a random intercept for each well nested within replicate on the asymptote parameter. We put weakly informative Normal(0, 1) priors on the fixed effects and priors for the intercepts of each nonlinear parameter which were normally distributed with means derived from the point estimates from a least squares nonlinear fit.
We sampled the posterior distribution of model parameters using the Hamiltonian Monte Carlo algorithm with four chains, each run for 1000 warmup iterations (discarded) and 2500 sampling iterations (retained). We compared this model with a similar model that did not allow the \(b\) and \(c\) parameters to vary by compound, and found the more complex model to be a better fit; we present predictions from only that model in what follows.
We obtained posterior estimates of the marginal mean for each nonlinear parameter for each compound and compared the estimates between each pair of compounds. We characterized the posterior distributions using the medians and equal-tailed credible intervals, and tested the contrasts against a null value of zero using the Bayesian probability of direction (\(p_d\)) and maximum a posteriori p-value (\(p_{MAP}\)).
The model predicted a similar asymptote value for adjusted log OD for each of the three compounds (A: 3.07, 95% CI [2.98, 3.16]; B: 3.19 [3.10, 3.28]; A + B: 3.15 [3.06, 3.24]). The model predicted a slightly higher growth rate for B (0.063 [0.062, 0.065]) compared to A (0.061 [0.059, 0.063]) and A + B (0.059 [0.058, 0.061]). This indicates that the A compound may have been slightly more effective at initially slowing the growth rate but that the long-term inhibition differs very little between the compounds.
Previous figure recreated with more publication-quality options
ggplot(bioscreen_pred, aes(x = HOUR, y = .epred, fill = compound, color = compound)) +
geom_line(aes(y = adjusted_log_OD, group = interaction(compound, rep, well)), data = bioscreen, alpha = 0.25, linewidth = 0.5) +
stat_lineribbon(aes(fill_ramp = after_stat(level)), alpha = 0.5) +
colorblind_friendly +
labs(fill_ramp = 'CI width', x = 'time (h)', y = 'adjusted log(OD)') +
theme(
panel.grid = element_blank(),
legend.position = 'inside',
legend.position.inside = c(0, 1),
legend.justification = c(0, 1),
legend.background = element_blank()
)
You now have a lot of techniques in your Bayesian toolkit! Today you learned how to:
Well done!