Welcome back to the R for SAS users workshop! This workshop is intended for SAS users who want to learn R. The people who will get the most out of this course are practicing researchers who have a decent working knowledge of SAS, and of basic statistical analysis (descriptive stats and regression models) as it applies to their field.

This is lesson 3 of 3 in a series. Lesson 1 covered the basics: importing data, cleaning and reshaping data, summary statistics, simple graphs and tables, and a few simple statistical models. Lesson 2 got a little more advanced, covering linear mixed models for more sophisticated experimental designs and how to produce and compare estimated marginal means.

Download the worksheet for this lesson here.

IMPORTANT NOTE: In this lesson, the numerical results of the R and SAS code may no longer be identical, as they were in previous lessons. This is because different fitting algorithms are used by SAS PROC GLIMMIX and by the R model fitting packages that we are demonstrating. A full discussion of these differences is outside the scope of this lesson!

During this workshop, you will …

- Learn what generalized linear mixed models (GLMMs) are
- Learn more about different predictions you can make from models
- Learn about how to deal with more complex covariance structures

As in Lessons 1 and 2, we will work through a “data to doc” pipeline in R, comparing R code to SAS code each step of the way. We will use yet another different dataset.

We will …

- Import the data from a CSV file
- Clean and reshape the data
- Calculate some summary statistics and make some exploratory plots
- Fit a generalized linear mixed-effects model with repeated measures error structure
- Make plots and tables of results

- Slides and text version of lessons are online
- Fill in R code in the worksheet (replace
`...`

with code) - This lesson also includes a template notebook that you can fill in
- You can always copy and paste code from text version of lesson if you fall behind
- Notes on best practices will be marked with
**PROTIP**as we go along!

As in the previous lessons, we will start with raw data and work our way to a finished product. Hopefully this is becoming second nature to you by now!

Here we’ll load the R packages we are going to work with today. These
are mostly the same as the previous lessons. This includes the **tidyverse** package for reading,
manipulating, and plotting data, the **lme4**
package for fitting linear mixed models, and the **easystats** package which
has some good model diagnostic plots. Now we’re also using **glmmTMB**,
a more advanced mixed model fitting package and **DHARMa**
for GLMM model residual diagnostic plots. Set a default plotting theme
as well.

```
library(tidyverse)
library(lme4)
library(easystats)
library(emmeans)
library(multcomp)
library(glmmTMB)
library(DHARMa)
theme_set(theme_bw())
```

The first dataset we will use for this lesson is the
`cbpp`

or contagious bovine pleuropneumonia dataset. It is
pre-loaded with the **lme4** package. The number of
Ethiopian zebu cattle that developed the disease in each herd, and the
total number of cattle in the herd, is recorded, for each of four time
periods. The herds (1-15) are identified with numerical IDs, and the
time periods are identified by the integers 1-4. Note the
`period`

column is already a `factor`

variable
when you examine the pre-loaded dataset.