Introduction

What is this workshop?

Welcome back to the R for SAS users workshop! This workshop is intended for SAS users who want to learn R. The people who will get the most out of this course are practicing researchers who have a decent working knowledge of SAS, and of basic statistical analysis (descriptive stats and regression models) as it applies to their field.

This is lesson 3 of 3 in a series. Lesson 1 covered the basics: importing data, cleaning and reshaping data, summary statistics, simple graphs and tables, and a few simple statistical models. Lesson 2 got a little more advanced, covering linear mixed models for more sophisticated experimental designs and how to produce and compare estimated marginal means.

Download the worksheet for this lesson here.

IMPORTANT NOTE: In this lesson, the numerical results of the R and SAS code may no longer be identical, as they were in previous lessons. This is because different fitting algorithms are used by SAS PROC GLIMMIX and by the R model fitting packages that we are demonstrating. A full discussion of these differences is outside the scope of this lesson!

What will you learn from this workshop?

Conceptual learning objectives

During this workshop, you will …

  • Learn what generalized linear mixed models (GLMMs) are
  • Learn more about different predictions you can make from models
  • Learn about how to deal with more complex covariance structures

Practical skills

As in Lessons 1 and 2, we will work through a “data to doc” pipeline in R, comparing R code to SAS code each step of the way. We will use yet another different dataset.

We will …

  • Import the data from a CSV file
  • Clean and reshape the data
  • Calculate some summary statistics and make some exploratory plots
  • Fit a generalized linear mixed-effects model with repeated measures error structure
  • Make plots and tables of results

How to follow along with this workshop

  • Slides and text version of lessons are online
  • Fill in R code in the worksheet (replace ... with code)
  • This lesson also includes a template notebook that you can fill in
  • You can always copy and paste code from text version of lesson if you fall behind
  • Notes on best practices will be marked with PROTIP as we go along!

Cattle pneumonia example data analysis

As in the previous lessons, we will start with raw data and work our way to a finished product. Hopefully this is becoming second nature to you by now!

Load R packages

Here we’ll load the R packages we are going to work with today. These are mostly the same as the previous lessons. This includes the tidyverse package for reading, manipulating, and plotting data, the lme4 package for fitting linear mixed models, and the easystats package which has some good model diagnostic plots. Now we’re also using glmmTMB, a more advanced mixed model fitting package and DHARMa for GLMM model residual diagnostic plots. Set a default plotting theme as well.

library(tidyverse)
library(lme4)
library(easystats)
library(emmeans)
library(multcomp)
library(glmmTMB)
library(DHARMa)

theme_set(theme_bw())

The dataset

The first dataset we will use for this lesson is the cbpp or contagious bovine pleuropneumonia dataset. It is pre-loaded with the lme4 package. The number of Ethiopian zebu cattle that developed the disease in each herd, and the total number of cattle in the herd, is recorded, for each of four time periods. The herds (1-15) are identified with numerical IDs, and the time periods are identified by the integers 1-4. Note the period column is already a factor variable when you examine the pre-loaded dataset.