Experimental design matters!

Food & Feed Safety RU Science Coffee Hour, May 28, 2025

Quentin D. Read, SEA Statistician

Key points

  • Experiments are imperfect models of the real world that help us understand cause and effect in the real world
  • The question you are trying to answer determines how you should set up your experiment — be aware of tradeoffs!
  • Designing an experiment right helps you maximize the ratio of signal to noise and get the best answer to the questions you care about

What is an experiment?

An experiment is a controlled procedure to compare hypotheses

  • Gather evidence to support or refute a hypothesis, or compare between competing ones
  • Give us information about causality: when x changes, what happens to y?
  • (Oversimplified) scientific method: make observations of nature, use observations + prior knowledge to make a hypothesis, test the hypothesis with an experiment

An experiment is a conceptual model of reality

  • “All models are wrong, but some are useful” - George Box
  • Experiments sacrifice realism to achieve greater control — Tradeoffs!

A super brief history of experiments

The first known experiments

Image (c) History of Islam

1000s: ibn al-Haytham experiments with lenses and mirrors to test the hypothesis that light goes into your eyes, not out of your eyes

Image (c) Franca Principe/IMSS Florence

1600s: Galileo experiments with steel balls to test hypotheses about how forces act on physical objects

The first randomized clinical trial

  • 1747: James Lind split 12 sailors into 6 groups of 2 men
  • Each group got a different treatment for scurvy: cider, sulfuric acid, seawater, etc.
  • The ones who ate oranges and lemons (Vitamin C) got better!

Experiments become more systematic

  • Late 1800s: the principles of experimental design we still use today were formalized
  • 1880s: Charles S. Peirce’s randomized controlled trial in psychology, testing sensitivity to pressure with a repeated measures design
  • 1920s: R. A. Fisher’s work on experimental design in agricultural science
    • That’s why people across all fields use words like “plots” and “blocks”

Peirce & Jastrow 1885

Experiments enter the modern era

  • 1948: First modern randomized controlled trial in medicine, testing effectiveness of an antibiotic at treating tuberculosis
    • Before this, anecdotal studies (which are still useful and important today) were all there was
  • 1950s: Japanese industry pioneers experimentation for statistical quality control of products
  • 2025+: What is the role of small-scale controlled experiments in the world of big data and artificial intelligence?

Image (c) Mazda

Basic concepts in experimental design

Terms to know

  • Treatment
  • Factor
  • Control
  • Experimental unit
  • Randomization
  • Replication
  • Local control
  • Optimal design

What is a treatment?

  • Treatment: Something that we manipulate or impose on subjects in an experiment
  • Factor: A variable whose levels are set by the experimenter; different treatments are different levels of a factor
    • Example: different doses of a medication, different management methods for an agricultural field

What is a control?

  • Control: Observations that show the effect of no treatment or a mock/placebo treatment; used to establish a baseline
  • What constitutes a control depends on what exactly you are controlling for!
  • Ideally, all variables we manipulate in an experiment have a non-manipulated counterpart in the control group(s)
  • Positive and negative controls
    • Negative controls provide a baseline to compare the treatment effect to
    • Positive controls ensure the procedure is giving expected results using standard/usual methods
  • In some designs, pre-treatment measurements serve as controls

What is an experimental unit?

  • The thing or object to which treatments are independently randomly assigned
    • Randomly assign a genotype to be planted on a plot of land
    • Randomly assign a vaccination treatment to be given to individual plant or animal
    • Randomly assign a management treatment to an entire experimental watershed
  • Not necessarily the same as the observational unit we make measurements on

Experimental unit: population of E. coli cells, 100 mL

Image (c) Brian Baer & Neerja Hajela

Experimental unit: drainage basin, ~1 km2

Ford et al. 2021, Hydrological Processes

Components of experimental design

  • Treatment structure (What are the treatments and how they relate to each other)
  • Design structure (How treatments are assigned to experimental units, how units are structured in space and time)
  • Response structure (What responses will you measure, and when/where/how in relation to the experimental units?)
  • Observational studies also have structure even though they’re not randomized, so these concepts also apply

Treatment structure

Image (c) Robert Junker
  • Examples: one-way, factorial, regression, response surface
  • Factorial design: multiple factors with different levels, experiment includes combinations of levels across factors
    • Common garden experiment to estimate G × E interaction
  • More interactions = more realistic, but you need more samples and it’s harder to interpret — Tradeoffs!

Principles of experimental design

  • Randomization
    • Which units get which treatments should be random, or you may overestimate the effect of treatments
  • Replication
    • We need many units per treatment so that we can estimate variation within and between treatments
  • Local control
    • Conditions should be as consistent as possible across treatment groups
    • Minimize the influence of extraneous variables with good protocols
    • Use statistical models to account for whatever you can’t eliminate

Optimal design

  • Optimal design: the “best” possible experimental design with respect to some criterion from a statistical model
  • Find the design, given your resources, that maximizes your statistical power to answer a certain question
  • You may see the term efficient also
  • Maximize ratio of signal to noise
  • We want estimates without bias and with minimal variance
  • Which design is optimal? Depends on the specific statistical (and biological) hypothesis you’re testing

A tour of experimental designs

Completely randomized design (CRD)

  • No grouping of experimental units; each one is independently randomized
  • No local control: error term includes variation due to treatments but also variation due to the environment
  • Good design for highly controlled and homogeneous lab/greenhouse environments, not the most efficient for field environments
  • “Workaround”: control for environmental variation through spatial random effects

What about blocks?

  • Block: a group of experimental units that are somehow related to each other
  • Units within the same block may be close to each other in space and/or time
  • Used to account for other influences in the environment, besides whatever we’re experimentally manipulating
  • A blocked design can have more statistical power than a CRD
  • Examples
    • Field divided up into groups each with 6 plots (each group of plots is a block)
    • Study done at multiple schools where multiple classrooms within a school each get a different intervention (each school is a block)
    • Study done at multiple hospitals where some patients at each hospital get a drug and some get a placebo (each hospital is a block)

Randomized complete block design (RCBD)

Each block is a complete replication: it contains every treatment combination once

Split-plot design

  • Two (or more) levels of randomization
  • One treatment is randomized at main plot level and another at subplot level
  • Still a complete block design: all treatment combinations appear in each block
  • May be used if it is logistically difficult to completely randomize
  • Compared to RCBD, increased power to detect subplot treatment effect and interaction, but less power to detect main plot treatment effect

Trouble in RCBD paradise

  • ARS scientists love randomized complete block designs, but that’s not all there is
  • Other designs may be more efficient and give you more power for the same number of experimental units

Incomplete block design

  • Block design where only a subset of treatments are found in each block
  • There’s no law that says all treatments have to be represented in every block
  • Size of the maximally large environmentally homogeneous area \(\neq\) size of the area that fits one plot of each treatment
  • If you have 100s of genotypes, complete blocks may be too big to be homogeneous
  • Also, incomplete designs may be more efficient/powerful than complete-block designs!

Balanced incomplete block design

  • Each treatment appears equally often in the same block with every other treatment
  • If perfectly balanced is not possible, there are algorithms that can get you close
Block 1 2 4 6
Block 2 5 6 7
Block 3 3 4 5
Block 4 1 4 7
Block 5 2 3 7
Block 6 1 3 6
Block 7 1 2 5
  • Each treatment in a block with each other one exactly once
  • Each treatment replicated 3 times

Disconnected incomplete block design

  • Each block is assigned one of several sets of treatments
  • Usually worse than a balanced incomplete design: harder to estimate the difference between treatments that don’t ever appear in the same block together
Block 1 1 3 2
Block 2 5 6 4
Block 3 3 2 1
Block 4 5 6 4
Block 5 3 1 2
Block 6 5 4 6
Block 7 3 1 2
Block 8 6 5 4

Augmented design

  • Controls or checks replicated in a standard experimental design (often RCBD)
  • Most treatments unreplicated or have fewer replicates
  • Error from checks used to evaluate the unreplicated treatments
  • Useful for breeding trials where many genotypes are being evaluated

Other designs

May be chosen because they are optimal to detect certain effects, or because of logistical constraints

  • Strip-plot
    • Two treatments applied crosswise to each other
  • Row-column designs
    • Blocking in row and column direction
    • Special case: Latin square (an underrated and efficient design)
  • \(\alpha\)-lattice (type of incomplete block design)
    • Blocks are grouped so that each group has one complete set of treatments
    • As balanced as possible

Blocking in space versus time

Image (c) Conviron
  • All the above designs are not just for spatial arrangement of plots
  • Experimental “runs” or “batches” may also be considered blocks
  • Growth chamber studies may be blocked in time

Experiments repeated in space and/or time

  • Multi-environment trials often replicate a blocked design in multiple environments (locations, years, or both)
  • For experiments repeated in time, a new randomization may be done each time, or not, depending on the study system
  • Re-randomizing plot locations each year is ideal for annual crops

Treatment-by-block interaction

  • The blocked designs I’ve showed you allow you to account for differences between block means
  • But what if the relative differences between the treatments vary by environment too?
  • Most blocked experimental designs do not let you estimate treatment-by-block interaction very easily
  • Some statisticians recommend replicating one or more treatments within each block to enable estimating this interaction

Common experimental design mistakes

Common experimental design mistakes

  • Heterogeneity inside blocks
  • Pseudoreplication
  • No randomization within blocks

Blocks should be spatially compact

Units within the same block should be closer to each other than they are to units in other blocks

Pseudoreplication

  • Pseudoreplication: Incorrectly considering experimental units to be independent, when they are correlated
  • Confusion between experimental unit and observational unit
  • May lead to confounding where we can’t distinguish different influences on the outcome
  • This may be OK, but you must consider what population you can make inference about

Faulty randomization

What’s wrong with this experimental design? (Set up as RCBD)

  • This design would be OK if the whole thing were replicated multiple times
  • Or try a spatial random effect, but variation in the x direction is always going to be confounded with treatment

Statistics for designed experiments

Statistical power

  • Power: The probability that you will be able to statistically detect the phenomenon you care about, if it is real
  • Power depends on the sample size, the design, and what effect you are trying to estimate
  • Usually we design an experiment to get a desired level of statistical power
  • We have to balance between false positives and false negatives — again, tradeoffs!

Statistical models for experimental designs

  • Statistical model should represent the data-generating process … including the experimental design
  • Mixed models include fixed and random effects, ideal for analyzing data from designed experiments
    • Fixed effects: Effects whose levels are “fixed” from experiment to experiment
    • Random effects: Effects that we model as being drawn at random from a statistical distribution, usually normal

Fixed or random?

  • Typically, treatments are treated as fixed and blocks as random
  • If someone else repeated your experiment, they would use the same treatments (those levels would be fixed)
  • But they would/could not use the same blocks (they’re randomly drawn from a population of possible blocks that could have been used)

Fixed or random? (continued)

  • Whether an effect is treated as fixed or random depends on your question (for example, time in an experiment repeated over multiple years)
  • We can never reproduce the exact combination of environmental variables from a past year in a future year
  • The question of “fixed or random” is less important than making sure all relevant influences on your response variable are accounted for in your model
  • If unsure, consult a statistician!

Software

R

SAS

  • PROC FACTEX
  • PROC PLAN
  • PROC OPTEX

Putting it all together

Caution: experiments aren’t everything

National Enquirer, 1968
  • R. A. Fisher (again) took the idea that correlation does not imply causation too far
  • Exploited pro-experimental bias to discredit anyone trying to link tobacco use to disease
  • Just because randomized trials are the best way to conclusively establish causality, does not mean they are the only way or the best way to obtain knowledge
  • We now have statistical methods to robustly infer causality from observational data
  • Even if experiments establish that one thing causes another in controlled conditions, those are artificial and don’t “prove” that something will work in the real world

Recommendations for effective experimental design

  • Break design down into parts: treatment structure, design structure, and response structure
  • Be explicit about what question you’re trying to answer
  • Think about what balance you want to strike between realism and control
  • But don’t overdo it with interactions unless you are prepared to deal with the complexity
  • Be creative (think outside the RCBD box)!
  • Do power analysis before setting up your experiment, if you can
  • Know how your data will be structured, and how you will analyze your data, before you do the experiment
  • Consult your local statistician!

Thank you!

Resources

Classic and semi-classic works

  • The design of experiments, Mead
  • Statistical design, Casella

Online slide decks and tutorials