Lesson 4: Going further with linear mixed models
Lesson 4 learning objectives
At the end of this lesson, students will …
- Fit a linear mixed model with crossed and nested random effects.
- Fit a linear mixed model with a transformed response variable.
Crossed and nested random effects
- We are finally going to use some real data!
- Dataset of results from a study of Precision Zonal Management (PZM) conservation agriculture practices
- Two experimental treatments
- two different tillage systems
- crossed with presence or absence of rye cover crop
- Measured maize yield
Load needed packages and read the data.
library(tidyverse)
library(easystats)
library(lme4)
library(lmerTest)
maize <- read_csv('https://usda-ree-ars.github.io/SEAStats/mixed_models_in_R/datasets/pzm_maize.csv')
Examine the data
- Multiple locations (US states) and multiple years
plot
and block
columns have integers identifying the plots and blocks
tillage
column has two different values, "cp"
(chisel plow) and "rt"
(ridge tillage)
cover.trt
column also has two different values, "none"
and "rye"
Create factor columns
- Convert the
yr
, plot
, and block
columns to factor type
- They are integers and we want to treat them as categorical for the model
- Use
mutate()
with the pipe %>%
maize <- maize %>%
mutate(
yr = factor(yr),
plot = factor(plot),
block = factor(block)
)
Boxplot of the data
Use ggplot
(don’t worry about this code for now)
ggplot(maize, aes(y = yield, x = tillage, fill = cover.trt)) +
facet_grid(location ~ yr) +
geom_boxplot(position = position_dodge()) +
theme_bw()
- In some location-year combinations, the tillage treatments seem to result in different yields
- Mostly minimal differences between the cover treatments
- The majority of the variation is between locations and years.
Setting up the model
- Fixed effect for tillage treatment
- Fixed effect for cover treatment
- Interaction between the two treatments
yield ~ tillage + cover.trt + tillage:cover.trt
Adding random effects
- Should location and year be random or fixed?
- Depends on the statistical inference we want to draw
- If we are interested in change over time, year should be a fixed effect
- If are not interested in comparing means of the different years, year should be a random effect
- Treating years as a random population of environments
- The same goes for location
- Give each location-year combination its own intercept
- This would mean location is nested within year as well
- Three levels of nesting for block
- Use this if the locations are far enough apart that the random variation of years isn’t consistent across locations
yield ~ tillage + cover.trt + tillage:cover.trt + (1 | location:yr) + (1 | block:location:yr)
- Random slopes
- Must specify which effects are random in the design component
- Specify which grouping factors get separate slopes fitted to them
- For example, fit random slopes only to each main effect and only for the year-location grouping factor:
yield ~ tillage + cover.trt + tillage:cover.trt + (tillage + cover.trt | location:yr) + (1 | block:location:yr)
Examine the data
ggplot(barley_don, aes(x = Variety, y = DON_ppm, fill = Fungicide)) +
geom_boxplot(position = position_dodge()) +
theme_bw()
Fit a model
- Variety, fungicide, and interaction are the fixed effects
- Each
Rep
gets a random intercept
fit_don <- lmer(DON_ppm ~ Variety + Fungicide + Variety:Fungicide + (1|Rep), data = barley_don)
Improvement to residuals
- Transformations should be determined by the science, but it’s still good to look at the residuals
- Residuals in untransformed model have non-homogeneous variance and bad match to normal quantile line
- Both diagnostics better for the log-transformed response
- Note:
check
argument to check_model()
lets you select specific diagnostic plots to make
- See all options by looking at help documentation:
?check_model
check_model(fit_don, check = c('homogeneity', 'qq'))
check_model(fit_don_log, check = c('homogeneity', 'qq'))