The data is the observation from your werewolf detector that the mailman is a werewolf. That is information you went out and collected, presumably to test the hypothesis that the mailman is a werewolf.
The likelihood is the probability of the data given the model. In this case, it’s the probability that the data point (the werewolf detector observing that the mailman is a werewolf) would be observed, given that the model is true (the model being the hypothesis that the mailman is a werewolf). As we know the detector correctly identifies werewolves 99% of the time, it’s 0.99.
The prior is the information from your field guide that 1 in every 1000 people is a werewolf. This is knowledge that you have before you set out to test whether anyone is a werewolf.
The posterior is the final result: if we observe a positive werewolf test on an individual, given our prior knowledge that 1/1000 people are werewolves and the fact that our test is 99% accurate for both positive and negative results, the posterior probability that the mailman is a werewolf is 9%.
Incidentally, this seems like a very low probability but it can be demonstrated using Bayes’ Theorem. See the worked example on the Wikipedia page on Bayes’ Theorem. It uses drug testing as an example, which is not as fun as werewolf testing, but the math is the same.
Trace plot A resembles a “hairy caterpillar,” representing good model convergence. All Markov chains have arrived at the correct solution and are moving back and forth around it. The amount they go back and forth is proportional to the uncertainty we have about the value of the parameter.
In trace plot B, some of the Markov chains show trends in one direction or another represents poor model convergence. The chains are still moving around trying to find the correct solution. They are not mixing well, meaning they are not moving back and forth with the same stationary distribution.
In the first case, we can tell the researcher they can use the posterior distribution to make some inferences. They can do things like calculate the median and quantiles of the distribution, and make predictions.
In the second case, we would recommend the researcher to either run the MCMC for a greater number of iterations, or change the prior distributions to constrain the chains to explore parameter space more narrowly so they are more likely to converge on the correct solution.
Here is how you would proceed if you wanted to add both interaction terms to the model. This adds the interaction terms to the last model we fit that included a random slope with respect to soil N for each field.
fit_withinteractions <- brm(yield ~ 1 + variety + soilN + rainfall + variety:soilN + soilN:rainfall + (1 + soilN | field), data = yield_data, prior = c(prior(normal(0,5), class = b)), chains = 4, iter = 4000, warmup = 3000, seed = 999, file = 'fit_withinteractions')
Then, to examine the model diagnostics, look at the trace plots and the posterior predictive check plot (output not shown here). We should see “hairy caterpillar” traceplots and a posterior predictive check plot where the fitted values taken from the posterior have a similar distribution to the data.
Next, look at the model summary. This will show you that the R-hat statistics are close to 1 to further confirm the model converged. That means we can make inferences based on the parameter estimates.
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: yield ~ 1 + variety + soilN + rainfall + variety:soilN + soilN:rainfall + (1 + soilN | field) ## Data: yield_data (Number of observations: 1000) ## Draws: 4 chains, each with iter = 4000; warmup = 3000; thin = 1; ## total post-warmup draws = 4000 ## ## Group-Level Effects: ## ~field (Number of levels: 25) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS ## sd(Intercept) 0.41 0.15 0.09 0.71 1.00 1216 ## sd(soilN) 0.05 0.03 0.00 0.11 1.01 375 ## cor(Intercept,soilN) 0.03 0.50 -0.79 0.93 1.00 1064 ## Tail_ESS ## sd(Intercept) 848 ## sd(soilN) 899 ## cor(Intercept,soilN) 2000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept -1.19 0.41 -2.02 -0.38 1.00 2393 2667 ## varietytall 1.00 0.20 0.61 1.38 1.00 2803 1836 ## soilN 0.80 0.07 0.67 0.94 1.00 2278 2231 ## rainfall 0.25 0.03 0.20 0.30 1.00 2085 2030 ## varietytall:soilN 0.05 0.04 -0.02 0.12 1.00 2697 1805 ## soilN:rainfall -0.02 0.00 -0.03 -0.01 1.00 2207 1751 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 0.72 0.02 0.69 0.75 1.00 5365 2513 ## ## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on this summary, we can see, for example, that there is some evidence for a negative interaction between soil N and rainfall. This indicates that as soil N increases, the effect of rainfall on yield decreases, and conversely, that as rainfall increases, the effect of soil N on yield decreases.
Finally, we can use LOO cross-validation to compare the model to others. In this case I will just compare the model with both interaction terms to the random-slope model with only the three main effects and no interaction terms that we fit during the main part of the lesson.
fit_withinteractions <- add_criterion(fit_withinteractions, 'loo') loo_compare(fit_withinteractions, fit_soilNrandomslope)
## elpd_diff se_diff ## fit_withinteractions 0.0 0.0 ## fit_soilNrandomslope -4.9 3.0
This shows us that the fit that includes the two interaction terms performs better despite the addition of two fixed effect parameters.