Welcome back to the R for SAS users workshop! This workshop is intended for SAS users who want to learn R. The people who will get the most out of this course are practicing researchers who have a decent working knowledge of SAS, and of basic statistical analysis (descriptive stats and regression models) as it applies to their field.
This is lesson 3 of 3 in a series. Lesson 1 covered the basics: importing data, cleaning and reshaping data, summary statistics, simple graphs and tables, and a few simple statistical models. Lesson 2 got a little more advanced, covering linear mixed models for more sophisticated experimental designs and how to produce and compare estimated marginal means.
Download the worksheet for this lesson here.
IMPORTANT NOTE: In this lesson, the numerical results of the R and SAS code may no longer be identical, as they were in previous lessons. This is because different fitting algorithms are used by SAS PROC GLIMMIX and by the R model fitting packages that we are demonstrating. A full discussion of these differences is outside the scope of this lesson!
During this workshop, you will …
As in Lessons 1 and 2, we will work through a “data to doc” pipeline in R, comparing R code to SAS code each step of the way. We will use yet another different dataset.
We will …
...
with
code)As in the previous lessons, we will start with raw data and work our way to a finished product. Hopefully this is becoming second nature to you by now!
Here we’ll load the R packages we are going to work with today. These are mostly the same as the previous lessons. This includes the tidyverse package for reading, manipulating, and plotting data, the lme4 package for fitting linear mixed models, and the easystats package which has some good model diagnostic plots. Now we’re also using glmmTMB, a more advanced mixed model fitting package and DHARMa for GLMM model residual diagnostic plots.
library(tidyverse)
library(lme4)
library(easystats)
library(lmerTest)
library(emmeans)
library(multcomp)
library(glmmTMB)
library(DHARMa)
We will use a couple of different datasets for this lesson. One of
them, the cbpp
or contagious bovine pleuropneumonia
dataset, is pre-loaded with the lme4 package. The
number of Ethiopian zebu cattle that developed the disease in each herd,
and the total number of cattle in the herd, is recorded, for each of
four time periods. The herds (1-15) are identified with numerical IDs,
and the time periods are identified by the integers 1-4. Note the
period
column is already a factor
variable
when you examine the pre-loaded dataset.