IMPORTANT NOTE: In this lesson, the numerical results of the R and SAS code may no longer be identical. This is because of details in model specifications and differences in the fitting algorithms between R and SAS.
During this workshop, you will …
We will …
...
with code)
Zebu, photo by Scott Bauer, USDA-ARS
cbpp
(contagious bovine pleuropneumonia): dataset pre-loaded with the lme4 package.period
is a factor)filename csvFile url "https://usda-ree-ars.github.io/SEAStats/R_for_SAS_users/datasets/cbpp.csv";
proc import datafile = csvFile out = cbpp replace dbms = csv; run;
data cbpp; set cbpp;
rate = incidence/size;
run;
incidence/size
for each herd at each time periodincidence/size
,period
,(1 | herd)
Formula for predicting the probability of disease for individual \(i\) at time period \(j\) in herd \(k\)
\[\text{logit}~\hat{y}_{i,j,k} = \beta_0 + P_j + u_k\]
\[\text{logit}~p = \log \frac {p}{1-p}\]
glmer()
instead of lmer()
, same formula as beforefamily
argument, for the “family” of response distributions we are usingbinomial(link = 'logit')
weights = size
accounts for the fact that the herds are different sizes; bigger herd = bigger contribution to the estimatecheck_model()
are not relevant here but we still need to check for issuestype = 'response'
ilink
(inverse link) in lsmeans
and slice
statements
Diseased cotton seedlings, photo by University of Missouri Extension
DACD
, RT
, and TSX
) applied to cotton seedlings, and untreated controlfilename csvFile2 url "https://usda-ree-ars.github.io/SEAStats/R_for_SAS_users/datasets/fungicide.csv";
proc import datafile = csvFile2 out = fungicide replace dbms = csv; run;
If you are using cloud server:
If you are running the code locally:
fungicide
treatment columnggplot(fungicide, aes(x = time, color = fungicide, group = fungicide, y = n_healthy)) +
geom_point(size = 2, position = position_dodge(width = .5)) +
stat_summary(fun = 'median', geom = 'point', shape = 95, size = 10, position = position_dodge(width = .5), alpha = .5) +
theme_bw() +
scale_color_okabeito(order = c(9, 1, 2, 3))
facet_wrap()
to split treatments into different panelsglmmTMB package allows us to fit more complex covariance structures to the data
cbind(n_healthy, n_plants - n_healthy) ~ fungicide + time + fungicide:time + (1 | block)
Response variable is two column vectors side-by-side: cbind(n_healthy, n_plants - n_healthy)
.
n_healthy
: number of “successes” or healthy plantsn_plants - n_healthy
: number of “failures” or the total number of plants minus the number of healthy plantsFixed effects are fungicide, time, and their interaction: fungicide + time + fungicide:time
Random intercept term for each block: (1 | block)
us(0 + time | fungicide:block)
cs(0 + time | fungicide:block)
ar1(0 + time | fungicide:block)
0 + time
because we are only fitting the residual covariance for the time slope, not the intercept.
fit_us <- glmmTMB(
cbind(n_healthy, n_plants - n_healthy) ~ fungicide + time + fungicide:time + (1 | block) + us(0 + time | fungicide:block),
data = fungicide, family = binomial
)
fit_cs <- glmmTMB(
cbind(n_healthy, n_plants - n_healthy) ~ fungicide + time + fungicide:time + (1 | block) + cs(0 + time | fungicide:block),
data = fungicide, family = binomial
)
fit_ar1 <- glmmTMB(
cbind(n_healthy, n_plants - n_healthy) ~ fungicide + time + fungicide:time + (1 | block) + ar1(0 + time | fungicide:block),
data = fungicide, family = binomial
)
cld()
summarizes multiple comparisons with letterscld_fung <- cld(emm_fung$emmeans, adjust = 'tukey', Letters = letters)
cld_fung_by_time <- cld(emm_fung_by_time$emmeans, adjust = 'tukey', Letters = letters)
plot_data <- bind_rows(as_tibble(cld_fung), as_tibble(cld_fung_by_time)) %>%
mutate(time = fct_na_value_to_level(time, 'overall') %>%
factor(labels = c(paste(c(12, 20, 42), 'days after planting'), 'overall')))
ggplot(plot_data, aes(x = fungicide, y = prob, ymin = asymp.LCL, ymax = asymp.UCL, label = trimws(.group))) +
geom_errorbar(linewidth = 0.9, width = 0.2) +
geom_point(size = 3) +
geom_text(aes(y = asymp.UCL), vjust = -0.5, size = 5) +
facet_wrap(~ time) +
theme_bw() +
scale_y_continuous(name = 'probability of healthy plant', limits = c(0.6, 0.9))
method = laplace
, a maximum-likelihood method, specified to obtain AIC values/* Unstructured covariance */
proc glimmix data = fungicide method = laplace;
class block fungicide time;
model n_healthy/n_plants = fungicide time fungicide*time;
random intercept / subject = block;
random time / subject = fungicide(block) type = un;
lsmeans fungicide / ilink cl diff lines adjust = tukey;
slice fungicide*time / sliceby = time ilink cl diff lines adjust = tukey;
run;
/* Compound symmetry */
proc glimmix data = fungicide method = laplace;
class block fungicide time;
model n_healthy/n_plants = fungicide time fungicide*time;
random intercept / subject = block;
random time / subject = fungicide(block) type = cs;
lsmeans fungicide / ilink cl diff lines adjust = tukey;
slice fungicide*time / sliceby = time ilink cl diff lines adjust = tukey;
run;
/* AR1 */
proc glimmix data = fungicide method = laplace;
class block fungicide time;
model n_healthy/n_plants = fungicide time fungicide*time;
random intercept / subject = block;
random time / subject = fungicide(block) type = ar(1);
lsmeans fungicide / ilink cl diff lines adjust = tukey;
slice fungicide*time / sliceby = time ilink cl diff lines adjust = tukey;
run;
What have you learned in this lesson?
Nice job! Pat yourselves on the back once again!