What is this course? It is a crash course in brms: a brief and practical introduction to fitting Bayesian multilevel models in R and Stan using the brms package (Bayesian Regression Models using Stan). Because this is intended as a practical skills course, I will only give a quick and surface-level description of Bayesian analysis and how it can be used for multilevel regression models.
Download the worksheet for this lesson here.
This course is designed for practicing researchers who have some experience with statistical analysis and coding in R.
If you don’t know what those packages are, don’t sweat it because you can just follow along by copying the code.
At the end of this course, you will understand …
At the end of this course, you will be able to …
This may or may not actually be a portrait of Thomas Bayes
In its simplest form, Bayesian inference is a method of statistical inference that allows you to use information you already know to assign a prior probability to a hypothesis, then update the probability of that hypothesis as you get more information. If that seems like common sense, it is!
It is a very powerful tool that has been applied across many fields of human endeavor. Today we are only going to look at its application in estimating the parameters of multilevel statistical models to analyze scientific data, but that’s only one of the many things we can use Bayesian inference for. Bayesian methods are only growing in popularity, thanks in large part to the rapid growth of user-friendly, open-source, computationally powerful software – like Stan and its companion R package brms that we are going to learn about today!
Bayesian inference is named after Reverend Thomas Bayes, an English clergyman, philosopher, and yes, statistician, who wrote a scholarly work on probability (published after his death in 1763). His essay included an algorithm to use evidence to find the distribution of the probability parameter of a binomial distribution … using what we now call Bayes’ Theorem! However, much of the foundations of Bayesian inference were really developed by Pierre-Simon Laplace independently of Bayes and at roughly the same time. Give credit where it’s due!
Bayes’ Theorem is the basis of Bayesian inference. It is a rule that describes how likely an event is to happen (its probability), based on our prior knowledge about conditions related to that event. In other words, it describes the conditional probability of an event A occurring, conditioned on the probability of another event B occurring.
The mathematical statement of Bayes’ theorem is:
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
This means the probability of A being true given that B is true (\(P(A|B)\)) is equal to the probability that B is true given that A is true (\(P(B|A)\)) times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\)).
This may not make sense yet but some more concrete examples might help.
Let’s say \(A\) is a statistical model. A statistical model is a hypothesis about the world. We want to know the probability that our hypothesis is true. We start out with some prior knowledge about the probability that our model \(A\) is true. So \(P(A)\) is called the prior probability. We go out there and get some data \(B\) to test our hypothesis. We can calculate the probability of observing that data \(B\), given that \(A\) is true: \(P(B|A)\). This is also known as the likelihood of the model \(A\), in light of our observations \(B\). We use this to update our estimate of the probability that \(A\) is true. We now have \(P(A|B)\), the probability our model, or hypothesis, is true, given the data we just observed.
Notice we ignored the denominator, \(P(B)\), the probability that the data are true. This is called the marginal probability. We can usually ignore it in the context of statistical inference. That’s because the best we can do is to compare the relative probability of different models, given the data. We can’t really say (nor should we say) what the absolute probability of a model is. (“All models are wrong.”) If we are comparing two models \(A_1\) and \(A_2\), both fit using the same data \(B\), and we take the ratio \(\frac{P(A_1|B)}{P(A_2|B)} = \frac{P(B|A_1)P(A_1)/P(B)}{P(B|A_2)P(A_2)/P(B)}\), the two \(P(B)\)s cancel out. Thus we don’t really need to care about \(P(B)\) in practice.
So we can get rid of the denominator and write Bayes’ theorem as
\[P(model|data) \propto P(data|model)P(model)\]
or
\[posterior = likelihood \times prior\]
or, flipping the left and right sides around,
what we believed before about the world (prior) × how much our new data changes our beliefs (likelihood) = what we believe now about the world (posterior)
That’s pretty cool!
Let’s say we pick up a coin off the street. We have a very strong prior belief that the true probability of flipping heads on any given flip is 0.5. Let’s say we flipped the coin 10 times and got 8 heads. We probably wouldn’t change that belief very much based on that evidence. It simply isn’t enough evidence to sway us off that belief. The likelihood of getting 8 heads, given that the true probability is 0.5, is still high enough that our posterior estimate of the probability is still around 0.5.
But let’s say a shady character on the street corner approaches us, shows us a coin, and offers to flip the coin 10 times. He offers to pay us $1 for every tails if we pay him $1 for every heads. Most likely, before we even got any new evidence, we’d be likely to think that the true probability of heads is not exactly 0.5. Or at least we would have a more skeptical prior belief, and we would be willing to entertain the possibility that the true probability of heads might be higher than 0.5. So now if we observe 8 heads, we might update our estimate of the true probability of heads to something higher, maybe 0.7.