At the end of this lesson, students will …
Welcome to the final lesson of the mixed models in R workshop! Pat yourselves on the back for making it this far.
We are going to talk about estimated marginal means. You might know the term “least square means” or “lsmeans” if you are a SAS user. This statistical technique was pioneered by statisticians working for SAS. Estimated marginal means are the same thing as lsmeans in most cases. It is probably better to use the term estimated marginal means because the least squares method is not always used to calculate them.
The name estimated marginal means makes sense because they are estimates from a model, not data. It’s best to keep that in mind – when we calculate these estimated means, they are functions of model parameters and sensitive to all the assumptions we’ve made in constructing the model. And remember, all models are wrong and only some of them are useful. They’re called marginal because they can be calculated averaged across the margins of all other fixed and random effects in the model.
These estimated means are population-level predictions. It’s important to note that the confidence interval or uncertainty around an estimated marginal mean is not the range of values that any individual in the population might have.
For instance, a dataset gathered between 2015 and 2018 shows that the population level mean of the height of an adult female in the United States is 161.3 cm (5’ 3.5”) and the heights are roughly normally distributed. The 90% confidence interval around that estimate goes from 160.9 cm to 161.6 cm (5’ 3.38” to 5’ 3.63”). Because this mean was estimated from a large sample of 5,510 women, we are really sure we have that number pinned down to within a tiny range of uncertainty.
But if we took a random adult female from the United States and measured her height, does that mean we are 90% confident it would be between 160.9 cm and 161.6 cm? No! In fact, the 5th and 95th percentile are 149.8 cm (4’11”) and 172.5 cm (5’8”) – so 90% of the time if we selected a random female we would expect her height to be between those much more widely separated values. So estimated means tell us what the most likely expected value of a random individual from the population would be, but not about the variation in the population itself.
This is even more important to keep in mind when we are averaging the estimated means across random effects, where the subpopulations we are averaging across may be very different from each other. The estimated mean is a “construct” that may not correspond to any individual, like the proverbial household with 2.2 children.
The emmeans package is the workhorse of this section. It was developed by Russ Lenth, an emeritus statistics professor from the University of Iowa. I am very indebted to Russ for all his help – even though he is retired, he somehow always manages to answer any question I post on the emmeans package’s GitHub page within a few hours, and almost always fixes the problem right away. He is like the “man behind the curtain” in the Wizard of Oz – whenever I appear to be helping ARS scientists with their stats, I am actually going to him to get the answer!
mild-mannered professor or wizard?
Today we’re only going to have time for a brief introduction to the capabilities of emmeans. The emmeans package page has a long list of vignettes that you might find helpful if you want to get deeper into it. For example, emmeans also works with some of the Bayesian modeling packages in R, so you can use similar code to get the estimated means regardless of what method you used to fit the model.
Let’s revisit the fake biomass versus fertilization dataset that we worked with earlier. Reload the data and refit the model. Load the emmeans and multcomp packages along with the other ones we need.
library(tidyverse) library(lme4) library(emmeans) library(multcomp)
fert_biomass <- read_csv('datasets/fertilizer_biomass.csv') %>% mutate(treatment = factor(treatment, levels = c('control', 'low', 'high')))
## Rows: 150 Columns: 3 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (2): field, treatment ## dbl (1): biomass ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fert_mm <- lmer(biomass ~ treatment + (1 | field), data = fert_biomass)
By the way I am also going to set the global ggplot2 theme to black on white because the default gray on white theme burns my eyeballs.
If you recall, we could use
summary(fert_mm) to show the
coefficients for the intercept and the two non-control treatment levels,
and t-tests associated with each one. But the inference that you’re
often interested in is whether the means of each treatment differ from
We can use the
emmeans() function to make that
comparison. The function can take many arguments, but it needs at least
two. The first argument is a fitted model object. The second argument is
a one-sided formula beginning with
~ that gives the
variable or combination of variables for which we want to estimate the
marginal means. In this model there is only one such variable,
treatment. The result is an object which contains the
estimated marginal means, which are averaged across all other fixed
effects and all the random effects as well.
fert_emm <- emmeans(fert_mm, ~ treatment) fert_emm
## treatment emmean SE df lower.CL upper.CL ## control 17.9 3.43 4.01 8.43 27.4 ## low 20.0 3.43 4.01 10.52 29.5 ## high 21.9 3.43 4.01 12.38 31.4 ## ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95
Note that we now have estimated marginal means and a confidence interval (default 95%) for each one. The degrees of freedom are approximated, as must be the case with mixed models. (There are different methods we can choose from to approximate the degrees of freedom but we will not get into that right now.)
You can use the
plot() function because
emmeans objects have a built-in plotting method which will
show the means and 95% confidence intervals.
We can see from the plot that the 95% confidence intervals of the means overlap; however this does not mean they are not significantly different from one another. To test that, we would need to take the difference between each pair of means and test whether it is significantly different from zero.
contrast() function allows us to do comparisons
between means. The first argument is the
emmeans object we
created just now, and the second argument,
method, is the
type of contrast. Using
method = 'pairwise' means compare
all pairs of treatments. It takes the difference between each pair of
means and calculates the associated t-statistic and
p-value. It automatically adjusts the p-value for
multiple comparisons using the Tukey adjustment.
contrast(fert_emm, method = 'pairwise')
## contrast estimate SE df t.ratio p.value ## control - low -2.09 0.203 143 -10.273 <.0001 ## control - high -3.95 0.203 143 -19.419 <.0001 ## low - high -1.86 0.203 143 -9.146 <.0001 ## ## Degrees-of-freedom method: kenward-roger ## P value adjustment: tukey method for comparing a family of 3 estimates
We can choose other methods of p-value adjustment using the
adjust argument. I usually prefer the Sidak adjustment. In
this case it makes very little difference.
contr_sidak <- contrast(fert_emm, method = 'pairwise', adjust = 'sidak') contr_sidak
## contrast estimate SE df t.ratio p.value ## control - low -2.09 0.203 143 -10.273 <.0001 ## control - high -3.95 0.203 143 -19.419 <.0001 ## low - high -1.86 0.203 143 -9.146 <.0001 ## ## Degrees-of-freedom method: kenward-roger ## P value adjustment: sidak method for 3 tests
contrast objects also have a built-in plotting
method using the
I like to add a dashed line at zero which helps identify the contrasts for which the 95% confidence interval does not contain zero.
plot(contr_sidak) + geom_vline(xintercept = 0, linetype = 'dashed', linewidth = 1)
In this case, all of them are very far from 0.
If you have many pairwise contrasts, it’s pretty common to summarize
them with “multiple comparison letters.” You can get the letters for a
emmeans using the
from the multcomp package. (CLD stands for Compact
cld(fert_emm, adjust = 'sidak')
## treatment emmean SE df lower.CL upper.CL .group ## control 17.9 3.43 4.01 4.45 31.4 1 ## low 20.0 3.43 4.01 6.54 33.5 2 ## high 21.9 3.43 4.01 8.40 35.4 3 ## ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95 ## Conf-level adjustment: sidak method for 3 estimates ## P value adjustment: sidak method for 3 tests ## significance level used: alpha = 0.05 ## NOTE: If two or more means share the same grouping symbol, ## then we cannot show them to be different. ## But we also did not show them to be the same.
By default, you get labels
3 instead of
c, but here is how to get the more familiar letters:
cld(fert_emm, adjust = 'sidak', Letters = letters)
## treatment emmean SE df lower.CL upper.CL .group ## control 17.9 3.43 4.01 4.45 31.4 a ## low 20.0 3.43 4.01 6.54 33.5 b ## high 21.9 3.43 4.01 8.40 35.4 c ## ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95 ## Conf-level adjustment: sidak method for 3 estimates ## P value adjustment: sidak method for 3 tests ## significance level used: alpha = 0.05 ## NOTE: If two or more means share the same grouping symbol, ## then we cannot show them to be different. ## But we also did not show them to be the same.
We can use other contrast methods. For instance we might only be
concerned with the difference between the control and each other
method = 'trt.vs.ctrl' means to only
do contrasts between each other level and the reference level (the first
level in the factor ordering). This will apply Dunnett’s adjustment to
the p-values which has higher power to detect differences from the
control as we are ignoring differences between the non-control
contrast(fert_emm, method = 'trt.vs.ctrl')
## contrast estimate SE df t.ratio p.value ## low - control 2.09 0.203 143 10.273 <.0001 ## high - control 3.95 0.203 143 19.419 <.0001 ## ## Degrees-of-freedom method: kenward-roger ## P value adjustment: dunnettx method for 2 tests
In the example above, we weren’t dealing with a model with a
transformed response, or a GLMM with a link function. However if either
of those things is true, the estimated marginal means will be output on
the transformed scale and not the original data scale by default. But
this is easy to fix. You can add an extra argument,
type = 'response', to
emmeans(). This will
tell it to present the estimated marginal means transformed back to the
original data scale (known as the response scale).
Let’s revisit one of the examples from lesson 5 to illustrate this, the Stirret corn borers dataset.
data('stirret.borers', package = 'agridat') stirret.borers <- stirret.borers %>% mutate(trt = factor(trt, levels = c('None', 'Early', 'Late', 'Both'))) glmm_borers <- glmer(count2 ~ trt + (1|block), data = stirret.borers, family = poisson)
Here are the estimated marginal means on the log scale.
emmeans(glmm_borers, ~ trt)
## trt emmean SE df asymp.LCL asymp.UCL ## None 3.42 0.0966 Inf 3.23 3.61 ## Early 3.18 0.0994 Inf 2.99 3.38 ## Late 2.42 0.1134 Inf 2.20 2.65 ## Both 2.44 0.1130 Inf 2.22 2.66 ## ## Results are given on the log (not the response) scale. ## Confidence level used: 0.95
And here they are on the response scale.
emm_borers <- emmeans(glmm_borers, ~ trt, type = 'response') emm_borers
## trt rate SE df asymp.LCL asymp.UCL ## None 30.7 2.96 Inf 25.39 37.1 ## Early 24.1 2.40 Inf 19.82 29.3 ## Late 11.3 1.28 Inf 9.04 14.1 ## Both 11.5 1.30 Inf 9.20 14.3 ## ## Confidence level used: 0.95 ## Intervals are back-transformed from the log scale
Let’s take the contrasts between each pair of treatments.
contrast(emm_borers, method = 'pairwise', adjust = 'sidak')
## contrast ratio SE df null z.ratio p.value ## None / Early 1.274 0.0871 Inf 1 3.538 0.0024 ## None / Late 2.719 0.2378 Inf 1 11.439 <.0001 ## None / Both 2.674 0.2324 Inf 1 11.317 <.0001 ## Early / Late 2.135 0.1934 Inf 1 8.370 <.0001 ## Early / Both 2.099 0.1891 Inf 1 8.232 <.0001 ## Late / Both 0.983 0.1036 Inf 1 -0.159 1.0000 ## ## P value adjustment: sidak method for 6 tests ## Tests are performed on the log scale
Notice that the second column is now called
is because when you subtract two values on a log scale, that is
equivalent to taking the ratio of those values when you back-transform
to the original data scale. Because our count data model uses a log link
function, any comparisons we make are on a ratio scale. For example, the
None / Late is 2.72, meaning that the model estimates
about 2.72 times as many corn borers would be expected on a plot with no
fungal spores applied, versus one that received the late fungal spore
So far we’ve only looked at estimated marginal means for a single predictor variable. Let’s look at a model where we had multiple main effects and an interaction effect. Remember back to the fish fillets model from the Lesson 3 exercises. Let’s load that dataset again and fit the interaction model. The main effects are species (channel catfish and hybrid catfish) and preparation method (fresh, frozen, and raw), and the random intercept is for individual fish where there are multiple thickness measurements per fish.
fish_fillets <- read_csv('datasets/fish_fillets.csv')
## Rows: 1168 Columns: 6 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (3): fishID, prep, species ## dbl (3): position, hardness, thickness ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fit_fillets <- lmer(thickness ~ species + prep + species:prep + (1|fishID), data = fish_fillets)
We can estimate the marginal means for species, which will average across all the preparation method types, and do the same for preparation method averaged across species.
emm_species <- emmeans(fit_fillets, ~ species)
## NOTE: Results may be misleading due to involvement in interactions
emm_prep <- emmeans(fit_fillets, ~ prep)
## NOTE: Results may be misleading due to involvement in interactions
## species emmean SE df lower.CL upper.CL ## Channel 14.7 0.153 138 14.4 15.0 ## Hybrid 14.1 0.155 140 13.8 14.4 ## ## Results are averaged over the levels of: prep ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95
## prep emmean SE df lower.CL upper.CL ## fresh 16.0 0.166 141 15.7 16.3 ## frozen 14.4 0.180 137 14.1 14.8 ## raw 12.8 0.215 140 12.4 13.3 ## ## Results are averaged over the levels of: species ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95
Notice we get a message that the
prep variables are involved in an interaction. So we need
to be careful about how we interpret these estimates.
What if we wanted to get the estimated marginal mean for combinations
of species and preparation method? We can do that by separating terms
emm_species_x_prep <- emmeans(fit_fillets, ~ species + prep) emm_species_x_prep
## species prep emmean SE df lower.CL upper.CL ## Channel fresh 16.8 0.233 138 16.3 17.2 ## Hybrid fresh 15.3 0.236 143 14.8 15.7 ## Channel frozen 14.7 0.256 138 14.2 15.2 ## Hybrid frozen 14.2 0.254 135 13.7 14.7 ## Channel raw 12.8 0.301 138 12.2 13.3 ## Hybrid raw 12.9 0.308 142 12.3 13.5 ## ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95
Then if we use the
cld() function to get the multiple
comparisons letters, the comparison will be done for all six
cld(emm_species_x_prep, adjust = 'sidak', Letters = letters)
## species prep emmean SE df lower.CL upper.CL .group ## Channel raw 12.8 0.301 138 12.0 13.6 a ## Hybrid raw 12.9 0.308 142 12.1 13.7 a ## Hybrid frozen 14.2 0.254 135 13.5 14.8 b ## Channel frozen 14.7 0.256 138 14.0 15.4 bc ## Hybrid fresh 15.3 0.236 143 14.6 15.9 c ## Channel fresh 16.8 0.233 138 16.1 17.4 d ## ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95 ## Conf-level adjustment: sidak method for 6 estimates ## P value adjustment: sidak method for 15 tests ## significance level used: alpha = 0.05 ## NOTE: If two or more means share the same grouping symbol, ## then we cannot show them to be different. ## But we also did not show them to be the same.
If we want to do multiple comparisons of one variable grouped by
another variable, we use the
| symbol. For example, this
code will allow us to compare preparation methods and do a multiple
comparisons test, separately by species.
prep | species
means get estimated marginal means for
prep grouped by
species. Then the
cld() comparison will be
done within each species separately (correcting for a fewer number of
comparisons in each case).
emm_prep_within_species <- emmeans(fit_fillets, ~ prep | species) cld(emm_prep_within_species, adjust = 'sidak', Letters = letters)
## species = Channel: ## prep emmean SE df lower.CL upper.CL .group ## raw 12.8 0.301 138 12.0 13.5 a ## frozen 14.7 0.256 138 14.1 15.3 b ## fresh 16.8 0.233 138 16.2 17.3 c ## ## species = Hybrid: ## prep emmean SE df lower.CL upper.CL .group ## raw 12.9 0.308 142 12.2 13.7 a ## frozen 14.2 0.254 135 13.5 14.8 b ## fresh 15.3 0.236 143 14.7 15.8 c ## ## Degrees-of-freedom method: kenward-roger ## Confidence level used: 0.95 ## Conf-level adjustment: sidak method for 3 estimates ## P value adjustment: sidak method for 3 tests ## significance level used: alpha = 0.05 ## NOTE: If two or more means share the same grouping symbol, ## then we cannot show them to be different. ## But we also did not show them to be the same.
The relative ordering of the prep methods is the same for both species, so whatever interaction there may be is not too strong.
Some people have learned a “two-step” process where the first step is to examine the ANOVA table. If the p-value associated with the F-test is less than 0.05, only then do you proceed to the step of means comparisons; otherwise, you declare that the means are not different, end of story. However, if you properly account for multiple comparisons you do not need to worry about the F-test. I typically recommend against considering the F-test because it does not really give you a useful inference in most cases. The best it can do is say that there is some difference between the groups, but it does not tell you which are different or by how much.
Another important component of hypothesis testing is model comparison. Different models that we can build represent different hypotheses about how the world works, and we might want to evaluate the relative amount of evidence for each one. You may have heard of likelihood ratio tests and information criteria. Those are probably the main ways of selecting or comparing mixed models, to decide whether particular fixed effect terms should be included. I would have loved to cover that in this workshop, but there is so much to learn and so little time. Baby steps … I hope to cover that topic in a future workshop.
This concludes the mixed models in R workshop. We have learned a lot in the past two days!
This only scratches the surface of mixed models with R. Feel free to contact your friendly neighborhood area statistician (me) if you ever need help! I’m looking forward to seeing folks at the next workshop!
For these exercises, we will get practice with estimated marginal means in the same way that we did in the rest of this lesson: refitting a model from a previous lesson, estimating the marginal means, and comparing them.
We will refit the Jansen apple model from the lesson 5 exercises:
data(jansen.apple, package = 'agridat') jansen.apple <- jansen.apple %>% mutate(inoculum = factor(inoculum)) fit_apples <- glmer(cbind(y, n - y) ~ gen + inoculum + gen:inoculum + (1|block), data = jansen.apple, family = binomial(link = 'logit'))
type = 'response'.
method = 'pairwise'and
adjust = 'sidak'.
Letters = lettersto get lowercase letters instead of numbers for the groups.
Do the same as in Exercise 1, but for genotype.
Find the estimated marginal means by inoculum within each genotype and output the compact letter display, again using the Sidak adjustment.
~ inoculum | genotype.