...
with code)\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
The probability of A being true given that B is true (\(P(A|B)\)) is equal to the probability that B is true given that A is true (\(P(B|A)\)) times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\))
\[P(model|data) = \frac {P(data|model)P(model)}{P(data)}\]
or
\[posterior \propto likelihood \times prior\]
Number of animals tested | Number immune | Number not immune | Observed immunity rate | |
---|---|---|---|---|
Scenario 1 | 10 | 7 | 3 | 70% |
Scenario 2 | 50 | 35 | 15 | 70% |
Scenario 3 | 100 | 70 | 30 | 70% |
Load packages.
brms.backend
gathmann.bt
example dataset from agridat package?gathmann.bt
in your console to pull up documentation for the datasetISO
) is treated as the reference level in the modelprior distribution | belief |
---|---|
normal(0,5) | We don't know if Bt has a positive or negative effect and we think there's a moderate probability of a large effect in either direction |
normal(0,1) | We don't know if Bt has a positive or negative effect and we think large effects in either direction are very improbable |
normal(5,5) | We think Bt has a positive effect but we're very uncertain |
normal(5,1) | We think Bt has a positive effect and we're very certain of that |
normal(-5,5) | We think Bt has a negative effect but we're very uncertain |
normal(-5,1) | We think Bt has a negative effect and we're very certain of that |
# Flat prior
m1gathmann <- brm(thysan ~ gen, data = gathmann.bt,
seed = 838, file = 'fits/m1gathmann')
# Prior saying that extreme differences between genotypes are unlikely
m2gathmann <- brm(thysan ~ gen, data = gathmann.bt,
prior = prior(normal(0, 5), class = b),
seed = 838, file = 'fits/m2gathmann')
# Prior with even stricter regularization
m3gathmann <- brm(thysan ~ gen, data = gathmann.bt,
prior = prior(normal(0, 1), class = b),
seed = 838, file = 'fits/m3gathmann')
# Prior if we think that Bt is more likely to have a positive impact on non-target species but we are uncertain
m4gathmann <- brm(thysan ~ gen, data = gathmann.bt,
prior = prior(normal(5, 5), class = b),
seed = 838, file = 'fits/m4gathmann')
# Prior if we think that Bt is more likely to have a positive impact on non-target species and we have strong prior evidence for that
m5gathmann <- brm(thysan ~ gen, data = gathmann.bt,
prior = prior(normal(5, 1), class = b),
seed = 838, file = 'fits/m5gathmann')
# Prior if we think that Bt is more likely to have a negative impact on non-target species but we are uncertain
m6gathmann <- brm(thysan ~ gen, data = gathmann.bt,
prior = prior(normal(-5, 5), class = b),
seed = 838, file = 'fits/m6gathmann')
# Prior if we think that Bt is more likely to have a negative impact on non-target species and we have strong prior evidence for that
m7gathmann <- brm(thysan ~ gen, data = gathmann.bt,
prior = prior(normal(-5, 1), class = b),
seed = 838, file = 'fits/m7gathmann')
get_post <- function(fit) {
fixedeff <- summary(fit)$fixed
setNames(fixedeff['genBt', c('Estimate','l-95% CI', 'u-95% CI')], c('median', 'lower95', 'upper95'))
}
gathmann_post_estimates <- cbind(
prior = c('flat', 'normal(0,5)', 'normal(0,1)', 'normal(5,5)', 'normal(5,1)', 'normal(-5,5)', 'normal(-5,1)'),
map_dfr(list(m1gathmann, m2gathmann, m3gathmann, m4gathmann, m5gathmann, m6gathmann, m7gathmann), get_post)
) %>%
mutate(prior = factor(prior, levels = unique(prior)))
ggplot(gathmann_post_estimates, aes(x = prior, y = median, ymin = lower95, ymax = upper95)) +
geom_pointrange() +
geom_hline(yintercept = 0, linetype = 'dashed', linewidth = 1) +
labs(y = 'effect of Bt on Thysanoptera abundance')
gathmann_pvalues <- data.frame(
prior = c('flat', 'normal(0,5)', 'normal(0,1)', 'normal(5,5)', 'normal(5,1)', 'normal(-5,5)', 'normal(-5,1)'),
pMAP = c(p_map(m1gathmann)$p_MAP[2], p_map(m2gathmann)$p_MAP[2], p_map(m3gathmann)$p_MAP[2], p_map(m4gathmann)$p_MAP[2], p_map(m5gathmann)$p_MAP[2], p_map(m6gathmann)$p_MAP[2], p_map(m7gathmann)$p_MAP[2])
)
\[y \sim \text{Normal}(Xb + Zu, \sigma), u \sim \text{Normal}(0, \sigma_u)\]
\[y \sim \text{Normal}(Xb + Zu, \sigma), u \sim \text{Normal}(0, \sigma_u)\]
\[g(y) \sim \text{Distribution}(Xb + Zu | \theta), u \sim \text{Normal}(0, \sigma_u)\]
crowder.seeds
example dataset from agridat package'O73'
and 'O75'
)n
: total number of seeds on the plategerm
: the number of seeds that germinatedp
(proportion of germinated seeds)germ | trials(n)
family = <distribution function>(link = '<link function>')
family = binomial(link = 'logit')
\[\text{logit}~p = \log \frac {p}{1-p}\]
normal(0, 5)
prior is sensible for fixed effects on the log odds scaleWhat do you notice about the posterior predictive check plots?
pairwise ~ extract | gen
: within each level of genotype (gen
), estimate the predicted mean for each level of extract and take the pairwise contrast between themtype = 'response'
transforms the results back to the data scaleqlogis()
and its inverse is plogis()
bean / cucumber
so we have evidence that bean extract is less effective at stimulating germination than cucumber extractdescribe_posterior()
from the bayestestR packagedescribe_posterior()
explaineddescribe_posterior()
gives us the median of the 4000 values as the point estimateci_method = 'eti'
, default width 95%
p_map
: the ratio of the probability of the null value to the most probable posterior value (point estimate)pd
(probability of direction): the percent of the posterior distribution that has the same direction (sign) as the point estimate; the closer to 100%, the more evidence in favor of a non-null differencenull = 1
for the odds ratio contrasts (ratio of 1 is no difference)contrast()
function and specify two levels of interaction contrast'revpairwise'
: reverse pairwiseWe fit a generalized linear model that modeled the germination probability for each dish as a function of extract type, genotype, and their interaction. The model assumed a binomial response distribution and logit link function. The model was fit with Hamiltonian Monte Carlo. Four chains were run, each with 1000 warmup iterations and 1000 sampling iterations. We verified model convergence by ensuring the R-hat statistics for all parameters were below 1.01. We assessed model fit by examining the posterior predictive check plot. We obtained posterior estimates of the marginal mean for each combination of extract and genotype, and of the odds ratio between each extract type within each genotype. We calculated the median and 95% equal-tailed credible interval for each estimated mean and odds ratio. To assess evidence for differences between extracts, we calculated the maximum a posteriori p-value (pMAP) and probability of direction (pd).
The modeled probability of germination was greater when seeds were treated with cucumber extract versus bean extract for both the O73 genotype (bean extract: posterior median 0.396, 95% CI [0.316, 0.478]; cucumber extract: 0.531 [0.458, 0.614]) and the O75 genotype (bean extract: 0.364 [0.305, 0.421]; cucumber extract: 0.681 [0.629, 0.734]). The odds ratio between cucumber and bean extract was smaller for the O73 genotype than the O75 genotype (ratio 0.46 [0.25, 0.83], pMAP < 0.001), indicating a greater relative effect of cucumber extract on germination probability in the O75 genotype.
gather_emmeans_draws()
from the tidybayes package to pull out the individual posterior draws from the emmeans
objectstat_interval()
to simultaneously compute our desired credible intervals and plot themmutate(.value = plogis(.value))
gather_emmeans_draws()
does not preserve the transformation informationgather_emmeans_draws(emmresponse_crowder_glm$emmeans) %>%
mutate(p = plogis(.value)) %>%
ggplot(aes(x = gen, y = p, group = interaction(gen, extract))) +
stat_interval(aes(color = extract, color_ramp = after_stat(level)), .width = c(0.5, 0.8, 0.95), position = position_dodge(width = .3)) +
stat_summary(fun = 'median', geom = 'point', size = 2, position = position_dodge(width = .3)) +
colorblind_friendly +
labs(x = 'Genotype', y = 'Modeled germination probability', color_ramp = 'CI width', color = 'Extract')
median_qi()
from tidybayes summarizes posterior draws with a median point estimate and a quantile credible interval of a given widthplogis()
to go from logit scale to probability scaleexp()
to go from log odds ratio scale to odds ratio scalemeans_summary <- gather_emmeans_draws(emmresponse_crowder_glm$emmeans) %>%
mutate(.value = plogis(.value)) %>%
median_qi(.width = 0.95) %>%
pivot_wider(id_cols = gen, names_from = extract, values_from = c(.value, .lower, .upper))
contrasts_summary <- gather_emmeans_draws(emmresponse_crowder_glm$contrasts) %>%
mutate(.value = exp(.value)) %>%
median_qi(.width = 0.95)
left_join(means_summary, contrasts_summary) %>%
select(-c(contrast, .width, .point, .interval)) %>%
gt(caption = 'Germination probability') %>%
fmt_number(decimals = 3) %>%
cols_merge(c(.value_bean, .lower_bean, .upper_bean), pattern = '{1} [{2}, {3}]') %>%
cols_merge(c(.value_cucumber, .lower_cucumber, .upper_cucumber), pattern = '{1} [{2}, {3}]') %>%
cols_merge(c(.value, .lower, .upper), pattern = '{1} [{2}, {3}]') %>%
cols_label(gen = 'Genotype', .value_bean = 'Bean extract', .value_cucumber = 'Cucumber extract', .value = 'Bean:cucumber odds ratio')
gen
, either "Barkant"
or "Marco"
)date
, either "21Aug1990"
or "28Aug1990"
)density
, 1, 2, 4, or 8 kg/ha)facet_wrap()
gen * date * density
+ (1 | block)
: random intercept for each blocknormal(0, 5)
as a prior on fixed effectslog()
around the response variablenormal(0, 2)
prior because yield only varies by 2 to 3 units on the log scaleFit a GLMM with gamma response distribution and log link function using family = Gamma(link = 'log')
.
type = 'response'
gives predictions on data scale, not log link scale'revpairwise'
so that the higher density is in the numeratorWe fit a generalized linear mixed model that modeled turnip yield as a function of genotype, planting date, planting density (discrete variable with four levels), and their interactions. The model had a gamma response distribution with log link function. A random intercept was fit to each block. The model was fit with Hamiltonian Monte Carlo. Four chains were run, each with 1000 warmup iterations and 1000 sampling iterations. We verified model convergence by ensuring the R-hat statistics for all parameters were below 1.01. We assessed model fit by examining the posterior predictive check plot. We obtained posterior estimates of the marginal mean for each combination of genotype, planting date, and planting density. We also estimated the yield ratio between all pairs of density treatments within each combination of planting date and genotype. We calculated the median and 95% equal-tailed credible interval for each estimated mean and ratio.
We found evidence that turnip yield was highest at the highest levels of planting density. The effect of density was highest for the later planting date. There was not strong evidence for a different effect of planting density between the genotypes; it was positive for both. Mean yield for density level 8 kg/ha at the later planting date was 14.66 (95% CI [4.28, 31.27]) for the Barkant genotype, versus 3.82 [1.14, 8.05] at 1 kg/ha; the ratio of yields between highest and lowest density levels was 3.84 [1.32, 7.60]. For the Marco genotype, the mean yield at later planting date was 9.80 [3.32, 21.01] for 8 kg/ha density and 1.31 [0.40, 2.82] for 1 kg/ha density; the yield ratio between the two density levels was 7.45 [2.63, 14.81].
exp()
is used to back-transform the posterior estimatesgather_emmeans_draws(emm_mcconway) %>%
mutate(.value = exp(.value)) %>%
ggplot(aes(x = density, y = .value, group = interaction(density, date))) +
stat_interval(aes(color = date, color_ramp = after_stat(level)), position = position_dodge(width = .5)) +
stat_summary(fun = 'median', geom = 'point', size = 2, position = position_dodge(width = .5)) +
facet_wrap(~ gen) +
colorblind_friendly +
labs(x = 'Planting density (kg/ha)', y = 'Modeled yield', color_ramp = 'CI width', color = 'Planting date')
We’ve made it to the end of Lesson 2! Let’s revisit the learning objectives!
Give yourself a well-deserved round of applause!