When done properly, power analysis:
If that doesn’t motivate you to learn about power analysis, I don’t know what will!
You will get the most out of this course if you …
... with code)
POWER: If an effect exists, the chance that our study will find it
Tradition!
t-test with \(n = 50\) per group and a medium effect size
Some are known and some need to be estimated:
| Effect | Effect size metric | Description | Example |
|---|---|---|---|
| Difference of the mean of two groups | Cohen’s d | The difference between the two means divided by their combined standard deviation. How many standard deviations greater is one mean than the other? | The effect of a management intervention on biomass yield |
| Difference in a binary outcome between two groups | Odds ratio | The ratio of the odds of the outcome happening in one group, to the odds of it happening in another group | The effect of a seed treatment on seed germination rate |
| Strength of relationship between two continuous variables | Pearson correlation coefficient r | You should know this one! | The effect of temperature on soil respiration |
| Variance explained by treatment group when comparing three or more groups | Cohen’s f | (Square root of) the ratio of variance explained by group, to the residual variance not explained by group | The effect of multiple genotypes and environments on a trait |
R packages:
SAS procedures:
proc powerproc glmpowerproc glimmixbesag.beans dataset from agridat packageanova() function calculates sums of squares for one-way ANOVA of yield by genotypecohens_f() from effectsize package
Formula for Cohen’s d effect size:
\[d = \frac{\bar{x}_1 - \bar{x}_2}{s}\]
Formula for \(s\), the pooled standard deviation:
\[s = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2} {n_1 + n_2 - 2}}\]
Examine the first few rows of the data
abs() of effect sizes because we are interested in detecting either direction of effect with a two-sided testBt_d_quantiles <- quantile(abs(Bt_Cohen_d), probs = c(.25, .5, .75))
ggplot(data.frame(d = abs(Bt_Cohen_d)), aes(x = d)) +
geom_histogram(bins = 10) +
geom_vline(xintercept = Bt_d_quantiles[1], color = 'forestgreen', linewidth = 1) +
geom_vline(xintercept = Bt_d_quantiles[2], color = 'darkorange', linewidth = 2) +
geom_vline(xintercept = Bt_d_quantiles[3], color = 'forestgreen', linewidth = 1) +
theme_bw()
round(Bt_d_quantiles, 3)| d | f | Effect size |
|---|---|---|
| 0.20 | 0.10 | Small |
| 0.50 | 0.25 | Moderate |
| 0.80 | 0.40 | Large |
help(package = 'WebPower')wp.anova() to calculate power for a one-way ANOVA designwp.anova()In all cases assume k = 4 groups being compared
k = 4: number of groupsf = 0.25: moderate Cohen’s f effect sizealpha = 0.05: typical significance levelpower = 0.8: typical power to targettype = 'overall': power for the overall F-statistic in the ANOVA
type arguments are possible, for example the power to detect a difference in two specific group meansf this time, but specify n = 60f = 0.4n = 60 samplespowernswp.anova() passing the vector of many n and f values to the function$power column from the output of wp.anova() and add it to our original data frameggplot(n_f_grid, aes(x = n, y = power)) +
geom_line(aes(group = f, color = factor(f)), linewidth = 0.8) +
geom_point(aes(fill = factor(f)), size = 3, shape = 21) +
geom_hline(yintercept = 0.8, linetype = 'dashed', linewidth = 1) +
scale_color_brewer(name = "Cohen's f", palette = 'Reds', aesthetics = c('color', 'fill')) +
scale_x_continuous(breaks = ns) +
theme_bw()anova_MDES <- expand.grid(n = ns)
anova_MDES$MDES <- map_dbl(anova_MDES$n, ~ wp.anova(k = 4, n = ., alpha = 0.05, power = 0.8, type = 'overall')$f)
ggplot(anova_MDES, aes(x = n, y = MDES)) +
geom_line(linewidth = 0.8) + geom_point(size = 3) +
geom_hline(yintercept = 0.25, linetype = 'dashed', linewidth = 1) +
scale_x_continuous(breaks = ns) +
theme_bw()rnorm()t.test() with all default options to compare means of the two random samplesfor loopp_vals, a vector of zerosfor loop executes the same code 10000 times, increasing the value of i by 1 each time
rnorm()ith element of p_valsp < 0.05 = power!wp.t()linder.wheat from agridat package: wheat yield trial in Switzerland
gen, environment env, G by E interaction gen:env(1 | block:env)joint_tests(): ANOVA table with F-testsVarCorr(): variances for block and for individual plots (residuals)lmer()extend()extend() creates a new model object to simulate from
along = 'block' means add new levels to block (n = 8)powerCurve()along = 'block' to simulate datasets with different numbers of blocks, up to the extended value 8test specifies we’re testing the fixed effect of genotype using the F-testn = 10 means simulate 10 datasets for each number of blockscbpp dataset from lme4 package
Zebu, photo by Scott Bauer, USDA-ARS
disease = 0 if uninfected and disease = 1 if infected)extend() in two different waysalong = 'herd'within = 'period+herd'breaks, specific numbers of cattle to test, otherwise default is usedcbpp_pcurve_n_herds <- powerCurve(cbpp_glmm_n_herds,
along = 'herd',
test = fixed('period', method = 'chisq'),
nsim = 10,
seed = 4)
cbpp_pcurve_n_within_herd <- powerCurve(cbpp_glmm_n_within_herd,
within = 'period+herd',
test = fixed('period', method = 'chisq'),
nsim = 10,
seed = 5,
breaks = c(3, 5, 10, 15, 25, 50))No power = no new knowledge = no science
quentin.read@usda.gov!