Statistical interactions: what are they and what do they mean, anyway?

Quentin D. Read

Who is this talk for?

Scientists who do experiments or collect observational data

People who want to learn how different causal factors interact to explain the world

Quiz

Which of these represents an interaction between treatment and sex? Why or why not?

A: NO INTERACTION. Treatment has the same effect on both sexes.

B: INTERACTION. Effect of treatment depends on sex. no effect in females, positive effect in males.

C: NO INTERACTION. Males are different than females, but treatment has no effect regardless of sex.

D: INTERACTION. Effect of treatment depends on sex. small effect in females, large effect in males.

What is a statistical interaction?

When a predictor variable affects the response, and that effect depends on the value of another predictor variable.

Basic recap of a linear model

Predict the mean of a response variable y

Linear combination of predictor variables x
- We’ll ignore fixed vs. random effects for now. Anyway, they both are different kinds of variables that help us predict and explain variation in y.

Error term (model residuals)
- Simplest linear model assumes error is normally distributed
- We’ll ignore generalized linear models with link functions and non-normal error distributions for now. The interpretation of interactions isn’t changed by this.

\[y = \beta_0 + \beta_1 x + \epsilon\]

\[ \hat{y} = \beta_0 + \beta_1 x\]

Model with more than one predictor

In the linear model framework, when we include multiple explanatory variables in a model, their effects are added

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon\]

This does not allow for interactions

Model with more than one predictor: Example

Effect of blood pressure \(x_1\) on heart rate \(y\) in coffee drinkers and non coffee drinkers \(x_2\)

Continuous outcome variable, one continuous predictor and one categorical predictor with two possible values
The categorical predictor takes values of either 0 or 1. In this case no coffee = 0 and coffee = 1

In this case it’s still a good model but only because the effect of \(x_1\) does not depend on \(x_2\) and vice versa!

Here we show marginal trends for the relationship of \(y\) and \(x_1\) for each value of \(x_2\)
Slopes for coffee drinkers and non coffee drinkers are the same

Model with an interaction between a continuous and categorical predictor

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \epsilon\]

This is probably the easiest type of interaction to conceptualize and interpret

\(x_1\) is a continuous variable, \(x_2\) is a binary variable that can take values of 0 or 1

The effect of \(x_1\) of \(y\) depends on \(x_2\) and vice versa

Still a linear combination of variables, we just have a new variable defined by taking the product of \(x_1\) and \(x_2\)

Model with an interaction between a continuous and categorical predictor: Example

Income (\(y\)) depends on years of education (\(x_1\)) in males and females (\(x_2\))

Males have higher average income and steeper slope of income vs. education
Thus the effect of education on income depends on sex — interaction!

Model with an interaction between two continuous predictors

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \epsilon\]

Slightly tougher to conceptualize but still the effect of \(x_1\) of \(y\) depends on \(x_2\) and vice versa

Same equation as before (does not matter if predictors are continuous or categorical)

Model with an interaction between two continuous predictors: Example

How do calories consumed per day (\(x_1\)) and minutes of exercise per day (\(x_2\)) affect body mass index (\(y\))?

Visualizing continuous interactions often requires:
- picking a few values for one of the predictor variables
- plotting predicted trendlines between \(y\) and the other predictor at each of those levels

Model with an interaction between categorical predictors

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \epsilon\]

Same equation as previously, now both \(x\) variables are binary with values of 0 or 1

“Dummy variables” can be used if the categorical variables have more than two possibilities but the math is the same

Easy to plot and visualize interactions between categorical variables

Three-way interactions

Time to blow your minds!

Quiz number two

Which of these represents a three-way interaction between treatment, sex, and age? Why or why not?

A: NO 3-WAY INTERACTION.Two-way treatment by age interaction, and main effect of sex, but sex does not interact with either treatment or age

B: 3-WAY INTERACTION. The treatment is most effective in young males (treatment by sex by age). There is also a two-way treatment by age interaction

Three-way interactions and more

The same logic applies for three-way interactions, and higher, as for two-way

The effect of \(x_3\) depends on the combination of \(x_2\) and \(x_1\)

Or equivalently, the effect of the combination of \(x_3\) and \(x_2\) depends on \(x_1\), etc.

Issues with interactions

Discussing main effects in the presence of an interaction
Incorrectly diagnosing interactions
Nonlinear interactions
The data scale matters

Discussing main effects if there’s an interaction

Trying to make conclusions about the main effect in the presence of an interaction is not “always wrong” but can be dangerous.

Same goes for trying to make conclusions about two-way interactions when a three-way or even higher order interaction exists.

It is tempting to try to oversimplify but sometimes it isn’t possible.

How would you discuss each of these scenarios?

Can you say “there is evidence for a positive treatment effect” in any of these cases? How do interactions, or lack thereof, affect how you would describe them?

Incorrectly diagnosing interactions

We can incorrectly say there’s an interaction if the explanatory variables are correlated with each other

This occurs more in observational data and is not as much of an issue if the treatments are experimentally randomized

Incorrectly diagnosing interactions: Example

Candidate G×E research testing the hypothesis that an environmental exposure in childhood interacts with genetic factors to determine an outcome (e.g., depression)

Even if you include covariates like age, gender, ethnicity in the model, it will not fully control for the effect

You also need to account for the interactions of covariate with environment and covariate with gene

Many studies find G×E effects when it is really something like a covariate × environment effect (for example, certain ethnic groups may be less likely to report depression)
- Keller 2014, Biological Psychiatry

Nonlinear relationships and nonlinear interactions

Some relationships may be nonlinear, and interactions too

For instance, the interactive effect of age and flu exposure on mortality

Babies and elderly are vulnerable; young adults are not as vulnerable
If age is a continuous variable, the effect of flu on mortality does not increase linearly with age

Just throwing an age × exposure interaction term into your model will not give good results
You might need to add a quadratic interaction term

The scale of the data matters

\[\log y = \beta_0 + \beta_1 x_1 + \beta_2 x_2\]

No interaction on log scale = interaction on linear scale

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2\]

Interaction on log scale = no interaction on linear scale

If \(x_1\) and \(x_2\) don’t interact on the linear scale, they do interact on the log scale, and vice versa

It’s always important to have a good reason to transform your data (not just to “make it normal”), especially when you have interactions in your model
- Duncan & Kefford 2021, Methods in Ecology & Evolution

Statistical interactions: what are they and what do they mean, anyway?

Who is this talk for?

Quiz

What is a statistical interaction?

Basic recap of a linear model

Model with more than one predictor

Model with more than one predictor: Example

Model with an interaction between a continuous and categorical predictor

Model with an interaction between a continuous and categorical predictor: Example

Model with an interaction between two continuous predictors

Model with an interaction between two continuous predictors: Example

Model with an interaction between categorical predictors

Three-way interactions

Quiz number two

Three-way interactions and more

Issues with interactions

Discussing main effects if there’s an interaction

How would you discuss each of these scenarios?

Incorrectly diagnosing interactions

Incorrectly diagnosing interactions: Example

Nonlinear relationships and nonlinear interactions

The scale of the data matters

Questions?