library(tidyverse)
oats <- read_csv('https://usda-ree-ars.github.io/SEAStats/R_for_SAS_users/datasets/Edwards_oats.csv')
# Alternative if on cloud server
oats <- read_csv('data/Edwards_oats.csv')
oats_subset <- oats %>%
filter(
year == 2001,
gen %in% c('Belle', 'Blaze', 'Brawn', 'Chaps')
)
You can separate multiple conditions inside filter()
with either ,
or &
.
oats_subset %>%
group_by(gen) %>%
summarize(
mean_yield = mean(yield),
stdev_yield = sd(yield)
)
ggplot(oats_subset, aes(x = loc, y = yield)) +
geom_boxplot()
oats_fit <- lm(yield ~ gen, data = oats_subset)
check_model(oats_fit) # Regression diagnostics
summary(oats_fit) # Displays model coefficients
anova(oats_fit) # ANOVA table
Here there is no need to explicitly specify that gen
is
a categorical variable as we do in SAS with class gen;
. It
is detected automatically. As in the lesson, you can see that the model
fitting, regression diagnostics, display of coefficients, and ANOVA
table must be called up with individual lines of code instead of all
being folded into the same proc
as we do in SAS.
oats_fit_GxE <- lm(yield ~ gen + loc + gen:loc, data = oats_subset)
check_model(oats_fit_GxE) # Regression diagnostics
summary(oats_fit_GxE) # Displays model coefficients
anova(oats_fit_GxE) # ANOVA table
This is actually a much better model than above. You can see here
that the model formula with multiple predictors has the predictors
separated with +
. The interaction between two predictors is
specified by putting a :
between two predictors.