...
with code)dbeta()
uses those parameters but calls them shape1
and shape2
mu
is the mean parameterphi
is the precision parametersnijders.fusarium
from agridat packagelibrary(brms)
library(agridat)
library(tidyverse)
library(emmeans)
library(tidybayes)
library(ggdist)
library(bayestestR)
library(gt)
library(fitdistrplus)
theme_set(theme_bw())
colorblind_friendly <- scale_color_manual(values = unname(palette.colors(5)[-1]), aesthetics = c('colour', 'fill'))
options(mc.cores = 4, brms.file_refit = 'on_change') # If you don't have CmdStan
options(mc.cores = 4, brms.file_refit = 'on_change', brms.backend = 'cmdstanr') # If you do have CmdStan
gen
, strain
, and year
to factor data typeggplot(snijders.fusarium, aes(x = year, y = p, color = strain)) +
geom_point(position = position_jitterdodge(jitter.width = 0.1, jitter.height = 0, dodge.width = 0.5), alpha = 0.5) +
stat_summary(fun = 'median', geom = 'point', size = 15, shape = 95, position = position_dodge(width = 0.5)) +
colorblind_friendly
p ~ strain * year + (1 + year | gen)
p
on the leftfamily = Beta(link = 'logit', link_phi = 'log')
phi
mu
, but only fits one global precision parameter phi
phi
to vary with fixed and/or random effects give us a better fit?brm()
we specify a separate formula for each model parameter, wrapped in bf()
phi ~ strain * year
means the precision parameter has fixed effects on it toophi
, using the argument dpar = phi
dpar
stands for “distributional parameter”fusarium_beta_fit_varyphi <- brm(
bf(
p ~ strain * year + (1 + year | gen),
phi ~ strain * year
),
family = Beta(link = 'logit', link_phi = 'log'),
data = snijders.fusarium,
prior = c(
prior(normal(0, 5), class = b),
prior(normal(0, 2), class = b, dpar = phi)
),
chains = 4, iter = 2000, warmup = 1000, seed = 777,
file = 'fits/fusarium_beta_fit_varyphi'
)
bf()
syntaxbf()
syntax in a typical general linear model with normally distributed error
sigma
can vary with fixed or random effects, if you want to model unequal variance among groupspp_check()
and model comparisonWhat can we conclude?
phi
type = 'response'
means use inverse link scale (the same scale as the original data)regrid(fusarium_emms, transform = 'log')
to get the contrasts in the form of ratios, not odds ratiosdescribe_posterior()
from the bayestestR package to get credible intervals for the means and pairwise differences
We fit a GLMM with beta response distribution to the proportion Fusarium infection data. A logit link function was used for the mean parameter, and a log link for the precision parameter \(\phi\). Strain and year were treated as fixed effects, and a random intercept and random slope with respect to year were fit for each genotype. We assigned Normal(0, 5) prior distributions to the fixed effects. Hamiltonian Monte Carlo was used to sample from the posterior distribution, with four Markov chains each run for 1000 warmup iterations and 1000 sampling iterations. We verified model convergence with the R-hat statistic and graphically assessed model fit with posterior predictive check plots. We compared this model with a model that had additional fixed effects on the precision parameter \(\phi\). LOO cross-validation showed this model was no better than the model without fixed effects on \(\phi\) (\(\Delta_{ELPD}\) = 9.0 with standard error = 9.7), therefore we present the simpler model.
We obtained posterior estimates of the marginal means for each combination of strain and year, back-transformed from the logit scale, and took pairwise ratios between strain means within each year. We characterized the posterior distributions of means and ratios using the median and equal-tailed credible intervals. We teested the null hypothesis that each ratio was equal to 1 using two different Bayesian p-value analogues: probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\).
Incidence of Fusarium infection was overall low except for strain F39 in 1986 and 1987, with estimated mean infection proportions 0.24 with 95% CI [0.14, 0.34] and 0.28 [0.18, 0.40] respectively. The evidence is consistent with a greater proportion infection for F39 relative to other strains, with approximately a tenfold difference in 1987 and smaller differences in the twofold to fivefold range in 1986 and 1988 (see Table).
gather_emmeans_draws(fusarium_emms) %>%
mutate(.value = plogis(.value)) %>%
ggplot(aes(x = year, y = .value, group = interaction(strain, year))) +
stat_interval(aes(color = strain, color_ramp = after_stat(level)), .width = c(.66, .95), position = ggpp::position_dodgenudge(width = .5, x = .03)) +
stat_summary(fun = 'median', geom = 'point', size = 2, position = ggpp::position_dodgenudge(width = .5, x = .03)) +
geom_point(aes(y = p, color = strain), data = snijders.fusarium, position = ggpp::position_dodgenudge(width = .5, x = -.03), alpha = .5) +
colorblind_friendly +
labs(y = 'proportion infected', color_ramp = 'CI level')
fusarium_contrasts_CI %>%
filter(CI == 0.95) %>%
dplyr::select(-c(pd, CI)) %>%
gt(groupname_col = 'year') %>%
fmt_number(n_sigfig = 3) %>%
cols_label(Median = 'ratio', CI_low = '95% CI lower bound', CI_high = '95% CI upper bound', p_MAP = md('p~MAP~')) %>%
sub_zero(columns = p_MAP, zero_text = '< 0.001') %>%
tab_options(row_group.background.color = 'slateblue3',
column_labels.font.weight = 'bold')
Images (c) Philippa Willitts and Kirby Lee/USA Today Sports
LOD/2
log(x + 1)
? log(x + 0.0000001)
?
Images from Li livres dou santé (13th c.) and Miracles de Notre Dame (15th c.)
beall.webworms
from agridat: classic dataset from 1940spray
and lead
, applied in a 2 × 2 factorial designy
) at 25 locations in each plotspray * lead
(1 | block) + (1 | spray:lead:block)
type = 'bars'
is good for discrete databf()
for the multi-line formulazi
is the zero-inflation component
zi
because the random effects on the mean are adequatedpar = zi
or default flat priors will be assignedwebworm_zip_fit <- brm(
bf(
y ~ spray * lead + (1 | block) + (1 | spray:lead:block),
zi ~ spray * lead
),
data = beall.webworms,
family = zero_inflated_poisson(link = 'log', link_zi = 'logit'),
prior = c(
prior(normal(0, 5), class = b),
prior(normal(0, 5), class = b, dpar = zi)
),
chains = 4, iter = 2000, warmup = 1000, seed = 222,
file = 'fits/webworm_zip_fit'
)
pp_check()
the ZIP modelmu
): predicted value of the egg mass count for each treatment conditional on there being a non-zero number of egg masseszi
): predicted probability of observing zero egg masses for each treatmentemmeans()
reports conditional mean by default
type = 'response'
inverts from the link function scale (log counts) to the original data scale (counts)dpar = 'zi'
gives us the probability of zero
type = 'response'
inverts from the link function scale (log odds) to the original data scale (probabilities)epred = TRUE
gives us the overall expected value
epred
stands for “expected value of the posterior predictive distribution.”emmeans()
to treat the expected values as if they were fit on the log scale
regrid(..., transform = 'log')
type = 'response'
inverts from the log count scale to the count scale'pairwise'
contrasts of no-spray versus spray within each level of lead, and no-lead versus lead within each level of sprayWe fit a zero-inflated Poisson GLMM to the webworm count data, with a log link function for the \(\lambda\) parameter and a logit link function for the zero-inflation parameter. The model included fixed effects for spray treatment, lead treatment, and their interactions, and random intercepts for block nested within treatment combination. We assigned Normal(0, 5) prior distributions to the fixed effects on both \(\lambda\) and the zero-inflation parameter. Hamiltonian Monte Carlo was used to sample from the posterior distribution, with four Markov chains each run for 1000 warmup iterations and 1000 sampling iterations. We verified model convergence with the R-hat statistic and graphically assessed model fit with posterior predictive check plots.
We obtained posterior estimates of the marginal means for each treatment combination of strain and year, combining the conditional mean and zero-inflation probability. We back-transformed these means from the log scale, and took pairwise ratios between strain means within each year. We characterized the posterior distributions of means and ratios using the median and equal-tailed credible intervals. We teested the null hypothesis that each ratio was equal to 1 using two different Bayesian p-value analogues: probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\).
There was statistical evidence that webworms were more abundant in the unsprayed condition versus sprayed. This was true both in the presence of lead where spraying reduced mean webworm abundance by a factor of 2.07 (95% CI [1.56, 2.73]), and in the absence of lead where spraying reduced abundance 2.72 times (95% CI [2.10, 3.51]). There was more modest evidence that lead treatment reduced webworm abundance, with a greater difference in the unsprayed condition (lead reduced abundance by a factor of 1.66, 95% CI [1.31, 2.12]) than sprayed (1.26, 95% CI [0.93, 1.68]).
gather_emmeans_draws(webworm_emms_overall) %>%
ggplot(aes(x = interaction(spray, lead, sep = ' + '), y = .value)) +
stat_histinterval(aes(y = y), data = beall.webworms, adjust = 5, point_interval = NULL) +
stat_interval(aes(color = after_stat(level)), .width = c(.66, .95), position = position_nudge(x = -0.1)) +
stat_summary(fun = 'median', geom = 'point', size = 2, position = position_nudge(x = -0.1)) +
scale_color_brewer(name = 'CI level', palette = 'Greens') +
scale_x_discrete(name = 'treatment', labels = c('-spray -lead', '+spray -lead', '-spray +lead', '+spray +lead')) +
scale_y_continuous(name = 'webworm count', breaks = 0:6, limits = c(0, 6)) +
coord_flip() +
theme(panel.grid = element_blank(), legend.position = 'inside', legend.position.inside = c(0.9, 0.85))
log(x + fudge_factor)
transformation and fit a model with normal errorRead from CSV in the data
folder on cloud server
If not on cloud server/SCINet, read from GitHub URL
ggplot(immunoassay, aes(x = antigen, y = chemiluminescence, color = vaccination, group = vaccination)) +
geom_point(position = position_jitterdodge(jitter.width = 0.1, jitter.height = 0, dodge.width = 0.5), size = 2, alpha = 0.5) +
stat_summary(geom = 'point', fun = 'median', shape = 95, size = 10, position = position_dodge(width = 0.5)) +
colorblind_friendly
fitdist()
from fitdistrplus package (maximum likelihood)nonzero_values <- immunoassay$chemiluminescence[immunoassay$chemiluminescence > 0]
gamma_fit <- fitdist(nonzero_values, 'gamma')
lnorm_fit <- fitdist(nonzero_values, 'lnorm')
ggplot(immunoassay %>% filter(chemiluminescence > 0), aes(x = chemiluminescence)) +
geom_histogram(aes(y = after_stat(density)), bins = 20) +
stat_function(fun = dgamma, args = list(shape = gamma_fit$estimate[1], rate = gamma_fit$estimate[2]), linewidth = 2, color = 'orange') +
stat_function(fun = dlnorm, args = list(meanlog = lnorm_fit$estimate[1], sdlog = lnorm_fit$estimate[2]), linewidth = 2, color = 'slateblue2')
vaccination * antigen
hu
(proportion of zeros)(1 | chickID)
only on the mean responsesigma
(standard deviation of the lognormal) has no fixed or random effectsimmuno_hurdlelognorm_fit <- brm(
bf(chemiluminescence ~ vaccination * antigen + (1 | chickID),
hu ~ vaccination * antigen
),
data = immunoassay,
family = hurdle_lognormal(link = 'identity', link_sigma = 'log', link_hu = 'logit'),
prior = c(
prior(normal(0, 3), class = 'b'),
prior(normal(0, 3), class = 'b', dpar = 'hu')
),
chains = 4, iter = 2000, warmup = 1000, seed = 333,
file = 'fits/immuno_hurdlelognorm_fit'
)
trans = 'pseudo_log'
allows us to show zeros on a log scaleepred
(expected value of the posterior predictive): mean of non-zeros × probability of non-zero'revpairwise'
(reverse pairwise) means subtract vaccinated - nonvaccinatedregrid(..., transform = 'log'
and type = 'response'
give us ratios instead of differences(immuno_emmeans <- emmeans(immuno_hurdlelognorm_fit, ~ vaccination | antigen, epred = TRUE))
(immuno_contrasts <- contrast(regrid(immuno_emmeans, transform = 'log'), 'revpairwise', type = 'response'))
describe_posterior(immuno_contrasts, ci_method = 'eti', ci = 0.95, test = c('p_map', 'pd'), null = 1)
normal(0, 3)
but what if we used normal(0, 1)
? normal(0, 10)
? Etc.We fit a Bayesian lognormal GLMM to the chemiluminescence data from the immunoassay. The model included a hurdle component to accommodate the zeros in the response. There was an identity link function for the mean parameter, a log link function for the standard deviation, and a logit link function for the hurdle parameter. The model included fixed effects for vaccination status, antigen type, and their interaction, and random intercepts for each individual bird. The hurdle component was fit with fixed effects as well. The lognormal distribution was chosen by visually comparing the fit of the lognormal and Gamma distributions to the nonzero values in the data. We assigned Normal(0, 3) prior distributions to the fixed effects on both the mean and the zero-inflation parameter. Hamiltonian Monte Carlo was used to sample from the posterior distribution, with four Markov chains each run for 1000 warmup iterations and 1000 sampling iterations. We verified model convergence with the R-hat statistic and graphically assessed model fit with posterior predictive check plots.
We obtained posterior estimates of the marginal means for each combination of vaccination status and antigen, combining the conditional mean and hurdle components. We took pairwise ratios between the vaccinated and unvaccinated means within each antigen. We characterized the posterior distributions of means and ratios using the median and equal-tailed credible intervals. We tested the null hypothesis that each ratio was equal to 1, indicating no response to vaccination, using two different Bayesian p-value analogues: probability of direction \(p_d\) and maximum a posteriori p-value \(p_{MAP}\).
There was evidence that each antigen provoked a stronger immune response in vaccinated birds, but the uncertainty in the strength of this effect varied by antigen. The largest ratio in response between vaccinated and unvaccinated birds was observed in response to the antigen FlgK (88.8, 95% credible interval [0.9, 9519.4]) and the smallest in response to the antigen FimW (1.8, [1.0, 3.4]).
gather_emmeans_draws(immuno_emmeans, value = 'chemiluminescence') %>%
ggplot(aes(x = vaccination, y = chemiluminescence, color = vaccination)) +
facet_wrap(~ antigen) +
stat_interval(aes(color_ramp = after_stat(level)), .width = c(.66, .95), position = position_nudge(x = 0.15)) +
stat_summary(fun = 'median', geom = 'point', color = 'black', size = 2, position = position_nudge(x = 0.15)) +
geom_point(data = immunoassay, position = position_jitter(width = 0.05, height = 0), size = 1, alpha = 0.5) +
colorblind_friendly +
labs(color_ramp = 'CI level') +
guides(color = 'none') +
theme(strip.background = element_blank(),
legend.position = c(0.07, 0.87),
legend.background = element_rect(color = 'black'))
zero_inflated_beta
and zero_one_inflated_beta
mu
phi
zoi
coi
carlson.germination
from agridat packagenacl
) from 0% to 2%gen
)germ
) for each combination of nacl
and gen
germ
from a percentage to a proportion for use with the Beta distributiongerm ~ nacl + (1 | gen)
zoi
and coi
to vary by salt concentrationphi
zero_one_inflated_beta()
carlson_zoibeta <- brm(
bf(
germ ~ nacl + (1 | gen),
zoi ~ nacl,
coi ~ nacl
),
family = zero_one_inflated_beta(link = 'logit', link_phi = 'log', link_zoi = 'logit', link_coi = 'logit'),
prior = c(
prior(normal(0, 2), class = b),
prior(normal(0, 2), class = b, dpar = zoi),
prior(normal(0, 2), class = b, dpar = coi)
),
data = carlson.germination,
file = 'fits/carlson_zoibeta'
)
zoi
or one-inflation coi
What can we conclude?
summary()
method shows the slope parameter estimatesdescribe_posterior()
nacl
: how much does log-odds of germination change for every 1-unit increase in salt?
We fit a zero-one-inflated beta hurdle model to the percent germination data. NaCl concentration was treated as a continuous fixed effect, and genotype as a categorical random effect. We used a logit link for the response, a log link for the dispersion parameter, and logit links for the zero-inflation and one-inflation hurdle parameters. We fit a single dispersion parameter, but the zero-inflation parameter and one-inflation parameter were modeled as varying by NaCl concentration. Weakly informative Normal(0, 2) priors were put on the slope parameters. Hamiltonian Monte Carlo was used to sample from the posterior distribution, with four Markov chains each run for 1000 warmup iterations and 1000 sampling iterations. We verified model convergence with the R-hat statistic and graphically assessed model fit with posterior predictive check plots. We generated posterior distributions of fitted values from the model at a grid of evenly spaced concentration values, accounting for the conditional mean, zero-inflation probability, and one-inflation probability.
The percent germination decreased strongly with increasing NaCl concentration; the posterior estimate for 0% NaCl was 99.3% (95% CI [94.5%, 99.9%]), for 1% NaCl 78.6% [69.1%, 85.1%], and for 2% NaCl 4.1% [2.8%, 6.2%].
epred
value at an evenly spaced set of concentration values to get overall trendgerm_pred <- data.frame(nacl = seq(0, 2, by = 0.1)) %>%
add_epred_draws(carlson_zoibeta, re_formula = ~ 0)
ggplot(carlson.germination, aes(x = nacl, y = germ)) +
stat_lineribbon(aes(y = .epred), data = germ_pred) +
geom_line(aes(group = gen), alpha = 0.3) +
scale_fill_brewer(palette = 'Greens', name = 'CI level') +
scale_x_continuous(name = 'NaCl concentration') +
scale_y_continuous(name = 'Germination', labels = scales::percent) +
theme(legend.position = 'bottom')
What did we learn to do today?
Enjoy the feeling of empowerment!!!