Exercise 1

library(tidyverse)

oats <- read_csv('https://usda-ree-ars.github.io/SEAStats/R_for_SAS_users/datasets/Edwards_oats.csv')

# Alternative if on cloud server
oats <- read_csv('data/Edwards_oats.csv')

Exercise 2

oats_subset <- oats %>%
  filter(
    year == 2001, 
    gen %in% c('Belle', 'Blaze', 'Brawn', 'Chaps')
  )

You can separate multiple conditions inside filter() with either , or &.

Exercise 3

oats_subset %>%
  group_by(gen) %>%
  summarize(
    mean_yield = mean(yield),
    stdev_yield = sd(yield)
  )

Exercise 4

ggplot(oats_subset, aes(x = loc, y = yield)) +
  geom_boxplot()

Exercise 5

oats_fit <- lm(yield ~ gen, data = oats_subset)
check_model(oats_fit) # Regression diagnostics
summary(oats_fit) # Displays model coefficients
anova(oats_fit) # ANOVA table

Here there is no need to explicitly specify that gen is a categorical variable as we do in SAS with class gen;. It is detected automatically. As in the lesson, you can see that the model fitting, regression diagnostics, display of coefficients, and ANOVA table must be called up with individual lines of code instead of all being folded into the same proc as we do in SAS.

Exercise 6

oats_fit_GxE <- lm(yield ~ gen + loc + gen:loc, data = oats_subset)
check_model(oats_fit_GxE) # Regression diagnostics
summary(oats_fit_GxE) # Displays model coefficients
anova(oats_fit_GxE) # ANOVA table

This is actually a much better model than above. You can see here that the model formula with multiple predictors has the predictors separated with +. The interaction between two predictors is specified by putting a : between two predictors.