Lesson 4: Going further with linear mixed models

Lesson 4 learning objectives

At the end of this lesson, students will …

  • Fit a linear mixed model with crossed and nested random effects.
  • Fit a linear mixed model with a transformed response variable.

Crossed and nested random effects

  • We are finally going to use some real data!
  • Dataset of results from a study of Precision Zonal Management (PZM) conservation agriculture practices
  • Two experimental treatments
    • two different tillage systems
    • crossed with presence or absence of rye cover crop
  • Measured maize yield

Load needed packages and read the data.

library(tidyverse)
library(easystats)
library(lme4)
library(lmerTest)
maize <- read_csv('https://usda-ree-ars.github.io/SEAStats/mixed_models_in_R/datasets/pzm_maize.csv')

Examine the data

glimpse(maize)
  • Multiple locations (US states) and multiple years
  • plot and block columns have integers identifying the plots and blocks
  • tillage column has two different values, "cp" (chisel plow) and "rt" (ridge tillage)
  • cover.trt column also has two different values, "none" and "rye"

Create factor columns

  • Convert the yr, plot, and block columns to factor type
  • They are integers and we want to treat them as categorical for the model
  • Use mutate() with the pipe %>%
maize <- maize %>%
  mutate(
    yr = factor(yr),
    plot = factor(plot),
    block = factor(block)
  )

Boxplot of the data

Use ggplot (don’t worry about this code for now)

ggplot(maize, aes(y = yield, x = tillage, fill = cover.trt)) +
  facet_grid(location ~ yr) +
  geom_boxplot(position = position_dodge()) +
  theme_bw()
  • In some location-year combinations, the tillage treatments seem to result in different yields
  • Mostly minimal differences between the cover treatments
  • The majority of the variation is between locations and years.

Setting up the model

  • Fixed effect for tillage treatment
  • Fixed effect for cover treatment
  • Interaction between the two treatments

yield ~ tillage + cover.trt + tillage:cover.trt

Adding random effects

  • Should location and year be random or fixed?
  • Depends on the statistical inference we want to draw
  • If we are interested in change over time, year should be a fixed effect
  • If are not interested in comparing means of the different years, year should be a random effect
  • Treating years as a random population of environments
  • The same goes for location

Building model formula

  • Let’s say year and location will both be random effects
  • Fit them as crossed random effects
  • An intercept will be fit for each year and each location
  • This means we assume the effect of year is constant across locations, and vice versa

yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location)

  • No random slopes (baseline yield differs across environments but not the treatment effect)

Building model formula some more

  • There are many blocks at each location, each of which has many plots
  • Block treated as a random effect nested within location
  • Block 1 in Illinois is not the same as Block 1 in Minnesota
  • Separate intercept fitted for each block in each location
  • We do not need to include plot explicitly, because the plots are individual data points
  • Use the : operator to separate nesting terms.

yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location) + (1 | block:location)

Full model formula

fit_maize <- lmer(yield ~ tillage + cover.trt + tillage:cover.trt + (1 | yr) + (1 | location) + (1 | block:location), data = maize)

Alternative formula 1

  • Give each location-year combination its own intercept
  • This would mean location is nested within year as well
  • Three levels of nesting for block
  • Use this if the locations are far enough apart that the random variation of years isn’t consistent across locations

yield ~ tillage + cover.trt + tillage:cover.trt + (1 | location:yr) + (1 | block:location:yr)

Alternative formula 2

  • Random slopes
  • Must specify which effects are random in the design component
  • Specify which grouping factors get separate slopes fitted to them
  • For example, fit random slopes only to each main effect and only for the year-location grouping factor:

yield ~ tillage + cover.trt + tillage:cover.trt + (tillage + cover.trt | location:yr) + (1 | block:location:yr)

Data transformations

  • Not just to “make the data normal” (actually the residuals)
  • Transformations should be guided by theory and science
  • For example, response variables that are ratios or concentrations
  • Difference between two values on a logarithmic scale is multiplicative difference (ratio) on an arithmetic scale
  • Example: Difference between a ratio of 0.01 and a ratio of 1 is 100 times
  • Difference etween a ratio of 1 and 100 is 100 times
  • On a log scale those two differences are treated the same (which is good)

Log transformation example

  • Small subset of an ARS scientist’s data (noise added)
  • Different varieties of barley crossed with different types of fungicide
  • Multiple plots of each combination of variety and fungicide in each experimental block (column Rep)
  • Amount of DON (a mycotoxin) produced measured, units of parts per million (column DON_ppm)
barley_don <- read_csv('https://usda-ree-ars.github.io/SEAStats/mixed_models_in_R/datasets/barley_don.csv')

Examine the data

glimpse(barley_don)
ggplot(barley_don, aes(x = Variety, y = DON_ppm, fill = Fungicide)) +
  geom_boxplot(position = position_dodge()) +
  theme_bw()

Fit a model

  • Variety, fungicide, and interaction are the fixed effects
  • Each Rep gets a random intercept
fit_don <- lmer(DON_ppm ~ Variety + Fungicide + Variety:Fungicide + (1|Rep), data = barley_don)

Log transformed response

  • Same as before but the response variable is log-transformed
  • This can be done directly in the model formula
  • Concentration is on a ratio scale so we are looking at multiplicative changes
fit_don_log <- lmer(log(DON_ppm) ~ Variety + Fungicide + Variety:Fungicide + (1|Rep), data = barley_don)

Improvement to residuals

  • Transformations should be determined by the science, but it’s still good to look at the residuals
  • Residuals in untransformed model have non-homogeneous variance and bad match to normal quantile line
  • Both diagnostics better for the log-transformed response
  • Note: check argument to check_model() lets you select specific diagnostic plots to make
    • See all options by looking at help documentation: ?check_model
check_model(fit_don, check = c('homogeneity', 'qq'))
check_model(fit_don_log, check = c('homogeneity', 'qq'))