R for SAS Users Lesson 2 Exercise Answers

Exercise 1

1. The fixed effects are the main effects of calcium, fertilizer, and soil, and all two-way and three-way interactions between them. In lme4 syntax, calcium * fert * soil.
1. There are four possible random effect groups that can be included. Here I am providing the random intercept syntax for each one.
- Measurements from the same rep, or block, are not independent: (1 | rep)
- Measurements from the same vertical strip within each rep are not independent. This is defined by each unique combination of fert and rep: (1 | fert:rep).
- Measurements from the same horizontal strip within each rep are not independent. This is defined by each unique combination of soil and rep: (1 | soil:rep).
- Measurements from the same plot within each combination of strip and rep are not independent. This is defined by each unique combination of fert, soil, and rep: (1 | fert:soil:rep).
- By the time we get down to the subplot level, there is only one measurement per subplot so no explicit random effect term is needed. The model residuals are the subplot errors.

Exercise 2

fit1 <- lmer(yield ~ calcium * fert * soil + (1 | rep) + (1 | fert:rep) + (1 | soil:rep) + (1|fert:soil:rep), data = cox.stripsplit)

Here we have all main effects and their two-way and three-way interactions, and random intercepts by four different groups: rep, vertical strip, horizontal strip, and plot.

Exercise 3

We get the message boundary (singular) fit: see help('isSingular'). We can inspect the random effects and their variance-covariance matrix using ranef(fit1) and VarCorr(fit1). When we do this, we see that the random intercepts for plot (fert:soil:rep) are all near zero. The fitting algorithm is unable to estimate random intercepts for the plots. This is probably because of two things: first, there are only two measurements per plot so we are dealing with limited data. Also, there may be little additional random variation at the plot level once we have accounted for variation by rep, vertical strip, and horizontal strip.

As discussed above, some practitioners would say this message can be ignored. Others would recommend simplifying the model by removing the plot random intercept term. This gets rid of the singular fit message because all random intercepts can be successfully estimated. The simpler model can be fit like this:

fit2 <- lmer(yield ~ calcium * fert * soil + (1 | rep) + (1 | fert:rep) + (1 | soil:rep), data = cox.stripsplit)

Exercise 4

fit3 <- lmer(yield ~ calcium * fert * soil + (1 + calcium | rep) + (1 | fert:rep) + (1 | soil:rep), data = cox.stripsplit)

The random effect term grouped by rep now is (1 + calcium | rep) which means we are allowing both the mean yield (random intercept) and the effect of calcium (random slope) to vary by rep.

We see yet another singular fit message. We can inspect ranef(fit3) and VarCorr(fit3). This shows us that the random slopes are perfectly correlated with the random intercepts for rep. This should indicate to us that we have insufficient data to estimate random slope variation between reps for the calcium treatment.

Exercise 5

Note this requires the emmeans and multcomp packages to be loaded. We are using the simplified model without the plot-level random intercepts.

emm <- emmeans(fit2, ~ fert | calcium)
contrast(emm, 'pairwise', adjust = 'tukey')
cld(emm, adjust = 'tukey', Letters = letters)

We see that in calcium level C0, F3 has a higher mean yield than F1, whereas in calcium level C1, F2 has a higher mean yield than F0. Based on this, we cannot make a general statement about which fertilizer increases yield the most, that is valid regardless of calcium level. We have to say that the effect of fertilizer on yield depends on the level of calcium. This is the textbook definition of a statistical interaction.