What is this workshop?

This workshop is intended for SAS users who want to learn R. The people who will get the most out of this course are practicing researchers who have a decent working knowledge of SAS, and of basic statistical analysis (descriptive stats and regression models) as it applies to their field.

This is lesson 2 in a series. (I am currently working on Lesson 3 and may develop even more lessons in the future.) Lesson 1 covered the basics: importing data, cleaning and reshaping data, summary statistics, simple graphs and tables, and a few simple statistical models.

Download the worksheet for this lesson here.

What will you learn from this workshop?

Conceptual learning objectives

During this workshop, you will …

  • Review the steps of the “data to doc” pipeline we covered in Lesson 1
  • Learn how to fit linear mixed models to different experimental designs
  • Learn about estimated marginal means, otherwise known as least-square means

Practical skills

As in Lesson 1, we will work through a “data to doc” pipeline in R, comparing R code to SAS code each step of the way. We will use a different dataset this time.

We will …

  • Import the data from a CSV file
  • Clean and reshape the data
  • Calculate some summary statistics and make some exploratory plots
  • Fit a linear mixed-effects model with categorical fixed effects
  • Fit and compare linear mixed-effects models with random intercepts and random slopes
  • Make plots and tables of results

How to follow along with this workshop

  • Slides and text version of lessons are online
  • Fill in R code in the worksheet (replace ... with code)
  • You can always copy and paste code from text version of lesson if you fall behind
  • Notes on best practices will be marked with PROTIP as we go along!

Data to doc pipeline, take two

As in the previous lesson, we will start with raw data and work our way to a finished product. The first few steps of the pipeline will not be completely new to you if you did Lesson 1 … but it is good to get some extra practice!

Load R packages

Here we’ll load the R packages we are going to work with today. These are mostly the same as Lesson 1. This includes the tidyverse package for reading, manipulating, and plotting data, the lme4 package for fitting linear mixed models (this is a different package than in Lesson 1), and the easystats package which has some good model diagnostic plots.

Note about packages: Notice in Lesson 1 we used the nlme package to fit the linear mixed models but now we’re using lme4. This fits in with the idea that the best thing about R is “there are many ways to do something,” and the worst thing about R is also “there are many ways to do something.” I would say that nowadays the lme4 package is probably the most widely used package for fitting linear mixed models, so it’s important to be familiar with it if you are doing stats with R. The nlme package also has some useful capabilities so I would also recommend familiarizing yourself with it. Even more complex models can be fit with glmmTMB and other packages. Ultimately, you can fit multilevel models of any level of sophistication using Bayesian methods with packages like brms.


Background on dataset

In this lesson, we’re going to use a dataset kindly provided by Andrea Onofri, a regular contributor to r-bloggers. This tutorial is loosely based on this blog post.

fava bean plant with flowers