Introduction

What is this workshop?

This workshop is intended for SAS users who want to learn R. The people who will get the most out of this course are practicing researchers who have a decent working knowledge of SAS, and of basic statistical analysis (descriptive stats and regression models) as it applies to their field.

This is lesson 1 in a series. Lessons 2 and 3 are also available on the SEAStats page.

Download the worksheet for this lesson here.

A word of warning: I have much more experience working with R compared to SAS. So if you notice any issues or glaring flaws in the SAS code, chalk that up to my poor SAS skills! The SAS code is provided here mainly for comparison purposes, so that you will be able to better understand what the R code is trying to do by comparing it to a familiar bit of SAS code.

What will you learn from this workshop?

Conceptual learning objectives

During this workshop, you will …

  • Learn the similarities and differences between SAS and R, both different tools for the job
  • Get introduced to what R packages are, in particular the “tidyverse”
  • Learn which common SAS statistical procedures correspond to which R packages/functions

Practical skills

In this workshop, participants will go through a “data to doc” pipeline in R, comparing R code to SAS code each step of the way. As you go through the pipeline, you will …

  • Import the data from a CSV file
  • Clean and reshape the data
  • Calculate some summary statistics and make some exploratory plots
  • Fit a linear model and a linear mixed-effects model
  • Make plots and tables of results

How to follow along with this workshop

  • Slides and text version of lessons are online
  • Fill in R code in the worksheet (replace ... with code)
  • You can always copy and paste code from text version of lesson if you fall behind
  • Notes on best practices will be marked with PROTIP as we go along!

Background

R versus SAS

Before we get into the code, let’s talk a little bit about R and SAS. SAS has been around quite a bit longer than R. It was developed in the late 1960s as a “statistical analysis system” for agricultural researchers at North Carolina State University, and got spun off as an independent business in 1976. In contrast, R was first released in 1993, when it was created by statisticians Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand. All that is to say that SAS has been in use longer, especially in agricultural research in government and academia in the United States. So, many ARS long-timers cut their teeth on SAS.