brms Crash Course Lesson 2 Exercise Answers

Exercise 1

Here is one way to do it. We get the posterior estimates of the ratios using contrast() and then pipe that result into describe_posterior(). Note it is very important to specify null = 1. The default is null = 0 which will give incorrect results because a ratio of 0 is not an appropriate null value.

contrast(emm_mcconway, 'revpairwise', by = c('date', 'gen')) %>%
  describe_posterior(test = c('pd', 'p_map'), null = 1)

## Summary of Posterior Distribution
## 
## contrast            |      date |     gen | Median |        95% CI | p (MAP) |     pd
## -------------------------------------------------------------------------------------
## density2 / density1 | 21Aug1990 | Barkant |   1.35 | [0.63,  2.91] |  0.914  | 78.40%
## density4 / density1 | 21Aug1990 | Barkant |   1.46 | [0.71,  3.14] |  0.742  | 84.67%
## density4 / density2 | 21Aug1990 | Barkant |   1.09 | [0.48,  2.51] |  > .999 | 58.53%
## density8 / density1 | 21Aug1990 | Barkant |   1.91 | [0.94,  4.19] |  0.353  | 96.43%
## density8 / density2 | 21Aug1990 | Barkant |   1.42 | [0.65,  3.28] |  0.844  | 81.30%
## density8 / density4 | 21Aug1990 | Barkant |   1.31 | [0.59,  2.95] |  0.930  | 74.85%
## density2 / density1 | 28Aug1990 | Barkant |   1.37 | [0.61,  3.12] |  0.904  | 78.70%
## density4 / density1 | 28Aug1990 | Barkant |   3.29 | [1.51,  7.25] |  0.065  | 99.78%
## density4 / density2 | 28Aug1990 | Barkant |   2.40 | [1.05,  5.35] |  0.251  | 98.15%
## density8 / density1 | 28Aug1990 | Barkant |   3.86 | [1.75,  8.46] |  < .001 |   100%
## density8 / density2 | 28Aug1990 | Barkant |   2.81 | [1.25,  6.12] |  0.133  | 99.12%
## density8 / density4 | 28Aug1990 | Barkant |   1.16 | [0.52,  2.58] |  0.997  | 65.40%
## density2 / density1 | 21Aug1990 |   Marco |   1.77 | [0.81,  3.85] |  0.528  | 92.88%
## density4 / density1 | 21Aug1990 |   Marco |   2.50 | [1.16,  5.43] |  0.175  | 98.98%
## density4 / density2 | 21Aug1990 |   Marco |   1.41 | [0.64,  3.18] |  0.866  | 80.77%
## density8 / density1 | 21Aug1990 |   Marco |   3.75 | [1.72,  8.00] |  0.036  | 99.98%
## density8 / density2 | 21Aug1990 |   Marco |   2.10 | [0.92,  4.61] |  0.322  | 96.50%
## density8 / density4 | 21Aug1990 |   Marco |   1.49 | [0.66,  3.36] |  0.808  | 83.93%
## density2 / density1 | 28Aug1990 |   Marco |   1.68 | [0.78,  3.83] |  0.661  | 90.72%
## density4 / density1 | 28Aug1990 |   Marco |   5.66 | [2.48, 12.63] |  < .001 |   100%
## density4 / density2 | 28Aug1990 |   Marco |   3.32 | [1.46,  7.42] |  0.081  | 99.70%
## density8 / density1 | 28Aug1990 |   Marco |   7.42 | [3.47, 16.85] |  < .001 |   100%
## density8 / density2 | 28Aug1990 |   Marco |   4.42 | [1.96, 10.07] |  < .001 |   100%
## density8 / density4 | 28Aug1990 |   Marco |   1.32 | [0.59,  3.09] |  0.945  | 76.30%

Exercise 2

Here we fit the model with a tighter prior.

crowder_glm_tightprior <- brm(germ | trials(n) ~ gen * extract, 
                              family = binomial(link = 'logit'),
                              prior = c(
                                prior(normal(0, 0.5), class = b)
                              ),
                              data = crowder.seeds, 
                              seed = 333, file = 'fits/crowder_glm_tightprior')

There are various ways to compare the inference we get from this model to the one fit with a wider prior on the fixed effects. One way is to estimate the marginal means and contrasts, and compare the output of describe_posterior() to the output from the model with a wider prior.

emmresponse_crowder_glm_tightprior <- emmeans(crowder_glm_tightprior, pairwise ~ extract | gen, type = 'response')

Here are the posterior estimates of the contrasts from the original model we fit with a wider prior:

describe_posterior(emmresponse_crowder_glm$contrasts, ci_method = 'eti', test = c('pd', 'p_map'), null = 1)

## Summary of Posterior Distribution
## 
## contrast        | gen | Median |       95% CI | p (MAP) |     pd
## ----------------------------------------------------------------
## bean / cucumber | O73 |   0.58 | [0.36, 0.93] |  0.052  | 98.88%
## bean / cucumber | O75 |   0.27 | [0.19, 0.38] |  < .001 |   100%

And here are those estimates from the model with the tighter prior:

describe_posterior(emmresponse_crowder_glm_tightprior$contrasts, ci_method = 'eti', test = c('pd', 'p_map'), null = 1)

## Summary of Posterior Distribution
## 
## contrast        | gen | Median |       95% CI | p (MAP) |     pd
## ----------------------------------------------------------------
## bean / cucumber | O73 |   0.56 | [0.38, 0.85] |  0.010  | 99.78%
## bean / cucumber | O75 |   0.29 | [0.21, 0.40] |  < .001 |   100%

You can see that the means and credible intervals are nearly identical to the ones above. It looks like our inference is not affected by reducing the standard deviation of the fixed effect prior by a factor of 10. I included this exercise to demonstrate that even for small to midsized datasets, the posterior is often very insensitive to your choice of prior. I consider that a good thing. In the example with the Bt corn dataset, the choice of prior was exceptionally influential because it was such a tiny dataset.

Exercise 3

The boxplot indicates that years with blight tended to have more rainy days in April and May.

data(johnson.blight)
ggplot(johnson.blight, aes(x = factor(blight), y = rain.am)) + geom_boxplot()

boxplot of potato blight data

Here is one way to fit the model. FYI, there is a family called bernoulli that is for the special case of binary trials with only a single trial in each row of the dataset. An equivalent model to this one would be brm(blight ~ rain.am, family = bernoulli(link = 'logit'), ...)

blight_glm <- brm(blight | trials(1) ~ rain.am, 
                  family = binomial(link = 'logit'),
                  prior = c(
                    prior(normal(0, 5), class = b)
                  ),
                  data = johnson.blight,
                  seed = 123,
                  file = 'fits/blight_glm')

summary(blight_glm)

##  Family: binomial 
##   Links: mu = logit 
## Formula: blight | trials(1) ~ rain.am 
##    Data: johnson.blight (Number of observations: 25) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -5.80      2.31   -11.01    -1.90 1.00     3455     2329
## rain.am       0.52      0.21     0.18     0.98 1.00     3374     2139
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The model summary indicates that there is evidence that as rain in April and May increases, the probability of potato blight increases. The point estimate of the coefficient is 0.53 and the 95% CI ranges between 0.17 and 0.99. Because the model is on the log-odds scale, the point estimate can be interpreted as: For every 1 additional rainy day in April or May, the log-odds of potato blight increases by 0.53. The differences are log odds ratios, so this would mean that the odds are increased by a factor of exp(0.53), or 1.70, for each extra day of rain.