Experimental design matters!

ARS MMRU Little Rock, July 14, 2026

Quentin D. Read, SEA Statistician

Key points

Experiments are imperfect models of the real world that help us understand cause and effect in the real world
The question you are trying to answer determines how you should set up your experiment — be aware of tradeoffs!
Designing an experiment right helps you maximize the ratio of signal to noise and get the best answer to the questions you care about

What is an experiment?

An experiment is a controlled procedure to compare hypotheses

Gather evidence to support or refute a hypothesis, or compare between competing ones
Give us information about causality: when x changes, what happens to y?
(Oversimplified) scientific method: make observations of nature, use observations + prior knowledge to make a hypothesis, test the hypothesis with an experiment

An experiment is a conceptual model of reality

“All models are wrong, but some are useful” - George Box
Experiments sacrifice realism to achieve greater control — Tradeoffs!

A super brief history of experiments

The first known experiments

1000s: ibn al-Haytham experiments with lenses and mirrors to test the hypothesis that light goes into your eyes, not out of your eyes

1600s: Galileo experiments with steel balls to test hypotheses about how forces act on physical objects

The first randomized clinical trial

1747: James Lind split 12 sailors into 6 groups of 2 men
Each group got a different treatment for scurvy: cider, sulfuric acid, seawater, etc.
The ones who ate oranges and lemons (Vitamin C) got better!

Experiments become more systematic

Late 1800s: the principles of experimental design we still use today were formalized
1880s: Charles S. Peirce’s randomized controlled trial in psychology, testing sensitivity to pressure with a repeated measures design
1920s: R. A. Fisher’s work on experimental design in agricultural science
- That’s why people across all fields use words like “plots” and “blocks”

Experiments enter the modern era

1948: First modern randomized controlled trial in medicine, testing effectiveness of an antibiotic at treating tuberculosis
- Before this, anecdotal studies (which are still useful and important today) were all there was
1950s: Japanese industry pioneers experimentation for statistical quality control of products
2026+: What is the role of small-scale controlled experiments in the world of big data and artificial intelligence?

Basic concepts in experimental design

Terms to know

Treatment
Factor
Control
Experimental unit
Randomization
Replication
Local control
Optimal design

What is a treatment?

Treatment: Something that we manipulate or impose on subjects in an experiment
Factor: A variable whose levels are set by the experimenter; different treatments are different levels of a factor
- Example: different doses of a medication, different management methods for an agricultural field

What is a control?

Control: Observations that show the effect of no treatment or a mock/placebo treatment; used to establish a baseline
What constitutes a control depends on what exactly you are controlling for!
Ideally, all variables we manipulate in an experiment have a non-manipulated counterpart in the control group(s)
Positive and negative controls
- Negative controls provide a baseline to compare the treatment effect to
- Positive controls ensure the procedure is giving expected results using standard/usual methods
In some designs, pre-treatment measurements serve as controls

What is an experimental unit?

The thing or object to which treatments are independently randomly assigned
- Randomly assign a genotype to be planted on a plot of land
- Randomly assign a vaccination treatment to be given to individual plant or animal
- Randomly assign a management treatment to an entire experimental watershed
Not necessarily the same as the observational unit we make measurements on

Experimental unit: population of E. coli cells, 100 mL

Experimental unit: drainage basin, ~1 km²

Ford et al. 2021, *Hydrological Processes*

Components of experimental design

Treatment structure (What are the treatments and how they relate to each other)
Design structure (How treatments are assigned to experimental units, how units are structured in space and time)
Response structure (What responses will you measure, and when/where/how in relation to the experimental units?)
Observational studies also have structure even though they’re not randomized, so these concepts also apply

Treatment structure

Examples: one-way, factorial, regression, response surface
Factorial design: multiple factors with different levels, experiment includes combinations of levels across factors
- Common garden experiment to estimate G × E interaction
More interactions = more realistic, but you need more samples and it’s harder to interpret — Tradeoffs!

Principles of experimental design

Randomization
- Which units get which treatments should be random, or you may overestimate the effect of treatments
Replication
- We need many units per treatment so that we can estimate variation within and between treatments
Local control
- Conditions should be as consistent as possible across treatment groups
- Minimize the influence of extraneous variables with good protocols
- Use statistical models to account for whatever you can’t eliminate

Optimal design

Optimal design: the “best” possible experimental design with respect to some criterion from a statistical model
Find the design, given your resources, that maximizes your statistical power to answer a certain question
You may see the term efficient also
Maximize ratio of signal to noise
We want estimates without bias and with minimal variance
Which design is optimal? Depends on the specific statistical (and biological) hypothesis you’re testing

A tour of experimental designs

Completely randomized design (CRD)

No grouping of experimental units; each one is independently randomized
No local control: error term includes variation due to treatments but also variation due to the environment
Good design for highly controlled and homogeneous lab/greenhouse environments, not the most efficient for field environments
“Workaround”: control for environmental variation through spatial random effects

What about blocks?

Block: a group of experimental units that are somehow related to each other
Units within the same block may be close to each other in space and/or time
Used to account for other influences in the environment, besides whatever we’re experimentally manipulating
A blocked design can have more statistical power than a CRD
Examples
- Field divided up into groups each with 6 plots (each group of plots is a block)
- Study done at multiple schools where multiple classrooms within a school each get a different intervention (each school is a block)
- Study done at multiple hospitals where some patients at each hospital get a drug and some get a placebo (each hospital is a block)

Randomized complete block design (RCBD)

Each block is a complete replication: it contains every treatment combination once

Split-plot design

Two (or more) levels of randomization
One treatment is randomized at main plot level and another at subplot level
Still a complete block design: all treatment combinations appear in each block
May be used if it is logistically difficult to completely randomize
More power than pure RCBD to detect subplot treatment effect and interaction, but less power to detect main plot treatment effect

Trouble in RCBD paradise

ARS scientists love randomized complete block designs, but that’s not all there is
Other designs may be more efficient and give you more power for the same number of experimental units

Incomplete block design

Block design where only a subset of treatments are found in each block
There’s no law that says all treatments have to be represented in every block
Size of the maximally large environmentally homogeneous area \(\neq\) size of the area that fits one plot of each treatment
If you have 100s of genotypes, complete blocks may be too big to be homogeneous
Also, incomplete designs may be more efficient/powerful than complete-block designs!

Balanced incomplete block design

Each treatment appears equally often in the same block with every other treatment
If perfectly balanced is not possible, there are algorithms that can get you close

Block 1	2	4	6
Block 2	5	6	7
Block 3	3	4	5
Block 4	1	4	7
Block 5	2	3	7
Block 6	1	3	6
Block 7	1	2	5

Each treatment in a block with each other one exactly once
Each treatment replicated 3 times

Disconnected incomplete block design

Each block is assigned one of several sets of treatments
Usually worse than a balanced incomplete design: harder to estimate the difference between treatments that don’t ever appear in the same block together

Block 1	1	3	2
Block 2	5	6	4
Block 3	3	2	1
Block 4	5	6	4
Block 5	3	1	2
Block 6	5	4	6
Block 7	3	1	2
Block 8	6	5	4

Augmented design

Controls or checks replicated in a standard experimental design (often RCBD)
Most treatments unreplicated or have fewer replicates
Error from checks used to evaluate the unreplicated treatments
Useful for breeding trials where many genotypes are being evaluated

Other designs

May be chosen because they are optimal to detect certain effects, or because of logistical constraints

Strip-plot
- Two treatments applied crosswise to each other
Row-column designs
- Blocking in row and column direction
- Special case: Latin square (an underrated and efficient design)
\(\alpha\)-lattice (type of incomplete block design)
- Blocks are grouped so that each group has one complete set of treatments
- As balanced as possible

Blocking in space versus time

All the above designs are not just for spatial arrangement of plots
Experimental “runs” or “batches” may also be considered blocks
Growth chamber studies may be blocked in time

Experiments repeated in space and/or time

Multi-environment trials often replicate a blocked design in multiple environments (locations, years, or both)
For experiments repeated in time, a new randomization may be done each time, or not, depending on the study system
Re-randomizing plot locations each year is ideal for annual crops

Treatment-by-block interaction

The blocked designs I’ve showed you allow you to account for differences between block means
But what if the relative differences between the treatments vary by environment too?
Most blocked experimental designs do not let you estimate treatment-by-block interaction very easily
Some statisticians recommend replicating one or more treatments within each block to enable estimating this interaction

Common experimental design mistakes

Heterogeneity inside blocks
Pseudoreplication
No randomization within blocks

Blocks should be spatially compact

Units within the same block should be closer to each other than they are to units in other blocks

Pseudoreplication

Pseudoreplication: Incorrectly considering experimental units to be independent, when they are correlated
Confusion between experimental unit and observational unit
May lead to confounding where we can’t distinguish different influences on the outcome
This may be OK, but you must consider what population you can make inference about

Faulty randomization

What’s wrong with this experimental design? (Set up as RCBD)

This design would be OK if the whole thing were replicated multiple times
Or try a spatial random effect, but variation in the x direction is always going to be confounded with treatment

Statistics for designed experiments

Statistical power

Power: The probability that you will be able to statistically detect the phenomenon you care about, if it is real
Power depends on the sample size, the design, and what effect you are trying to estimate
Usually we design an experiment to get a desired level of statistical power
We have to balance between false positives and false negatives — again, tradeoffs!

Statistical models for experimental designs

Statistical model should represent the data-generating process … including the experimental design
Mixed models include fixed and random effects, ideal for analyzing data from designed experiments
- Fixed effects: Effects whose levels are “fixed” from experiment to experiment
- Random effects: Effects that we model as being drawn at random from a statistical distribution, usually normal

Fixed or random?

Typically, treatments are treated as fixed and blocks as random
If someone else repeated your experiment, they would use the same treatments (those levels would be fixed)
But they would/could not use the same blocks (they’re randomly drawn from a population of possible blocks that could have been used)

Fixed or random? (continued)

Whether an effect is treated as fixed or random depends on your question (for example, time in an experiment repeated over multiple years)
We can never reproduce the exact combination of environmental variables from a past year in a future year
The question of “fixed or random” is less important than making sure all relevant influences on your response variable are accounted for in your model
If unsure, consult a statistician!

Software

R

CRAN Task View on experimental design
AlgDesign
FielDHub
ibd
pwr4exp

SAS

PROC FACTEX
PROC PLAN
PROC OPTEX

Putting it all together

Caution: experiments aren’t everything

R. A. Fisher (again) took the idea that correlation does not imply causation too far
Exploited pro-experimental bias to discredit anyone trying to link tobacco use to disease
Just because randomized trials are the best way to conclusively establish causality, does not mean they are the only way or the best way to obtain knowledge
We now have statistical methods to robustly infer causality from observational data
Even if experiments establish that one thing causes another in controlled conditions, those are artificial and don’t “prove” that something will work in the real world

Recommendations for effective experimental design

Break design down into parts: treatment structure, design structure, and response structure
Be explicit about what question you’re trying to answer
Think about what balance you want to strike between realism and control
But don’t overdo it with interactions unless you are prepared to deal with the complexity
Be creative (think outside the RCBD box)!
Do power analysis before setting up your experiment, if you can
Know how your data will be structured, and how you will analyze your data, before you do the experiment
Consult your local statistician!

Thank you!

Resources

Classic and semi-classic works

The design of experiments, Mead
Statistical design, Casella

Online slide decks and tutorials

Penn State course notes STAT 503: Design of experiments
Plant Breeding & Genomics community has lots of presentations on experimental design
UNH Lecture notes on experimental design