+++ UPDATE 2018-10-07: Please also try the new ShinyApp! It has never been so easy to calculate statistical power in meta analyses... +++

You want to calculate statistical power of a meta-analysis in order to better interpret the results? Or maybe you are wondering if there is still a need for further research to underpin the results of an existing quantitative review? In this article I explain how you can calculate statistical power of fixed- and random effects model meta-analyses and what practical benefits that has.

On that way we'll find out, that many (and possibly most!) of all conclusions drawn from biomedical research are probably false, because the majority of studies are systematically underpowered and their significant results often do not reflect true effects!

**Fig 1.** Meta analysis: a statistical method, developed and used to summarize and contrast a body of research that addresses a specific research question. It has the
potential to reduce complexity in a dedicated research field, but is has not always potential to increase statistical power to a sufficient level.

### What exactly is statistical power?

First let's start with a short primer on statistical power that we are all on the same page: Power tells us something about the probability that an effect will be detected in a study. Suppose that a particular intervention has an effect, and the alternative hypothesis is true, statistical power describes the proportion of experiments that leads to statistically significant results. In other words: when studies in a given field are designed with a statistical power of 50 %, we can assume that if there are 100 genuine non-null effects to be discovered in that field, these studies are expected to discover only half of them!

In conclusion, a low power study design always has less chance to detect a true effect. And at the same time - and this is far less known - low statistical power also reduces the probability that a statistically significant result reflects a true effect! The probability for that is called the "positive predictive value" (PPV) and is expressed in the following formula:

PPV = ((1-β)*R)/((1-β)*R)+α

**Eq 1.** Calculation of PPV given the power (1 − β), the type II error β, type I error α and R as an expression of the pre-study odds.

For studies with a given pre-study odds R and a given type I error - for example the widespread p = 0.05 threshold - lower statistical power inevitably leads to a lower PPV! So we should always keep in mind that high post-hoc power confirms a test result, whereas low post-hoc power puts it more into perspective.

### Power failure: small sample sizes undermine the reliability of neuroscience!

Watching on neuroscience with the insights from above, we must accept that the majority of available studies in our research field are systematically underpowered.

Empirically, researchers estimate the median statistical power of studies is somewhere between ∼10 % and ∼30 %! Consequences of such low power include overestimates of effect sizes and low reproducibility of results. And that has also an ethical dimension because unreliable research is inefficient and wasteful!

### Meta-analyses: a panacea to compensate methodological weaknesses of primary research?

Meta analysis is a statistical method, developed and used to summarize and contrast a body of research that addresses a specific research question. The basic idea is, that by greater total numbers of samples, a true effect is more reliably estimable.

Therefore, power analyses play an important role in this context. Their results can provide insightful information about the strength of a meta-analysis, for example, helping to determine the number of studies needed to underpin present findings.

Far more often we should use such power calculations to evaluate the robustness of effects in meta-analyses. This also can be useful if we think about updating an existing quantitative review, or when we want to know whether further research may lead to deeper knowledge on a specific research question. In the end, power analyses of meta-analyses should influence future primary research!

But in planning and interpretation of meta-analyses statistical power is normally not considered. A key reason for this could be that there are no user-friendly ways to calculate statistical power of meta-analyses. Standard statistics software like SPSS and Co. do not provide such features, and software like G*Power or power packages such as the “pwr” R package are exclusively designed for primary research analyses.

### The Meta-Power Calculator - a freeware Excel-based tool to calculate statistical power of meta analyses

Therefore, some time ago, I wrote my own excel-based power calculator, which allows to calculate statistical power of both, fixed- and random-effects model meta-analyses based on a standardized effect size.

**Fig 3. **Meta-Power Calculator screenshot.

You can download the calculator here for free! All calculations are based on formulas, available in this paper!

For the analysis are only 3 information required:

- the standardized overall effect size of a meta-analysis (Cohen's d or Hedge's g),
- the total number of studies included in the meta-analysis (k).
- and the average number of participants per group.

The current version do not offer the possibility to change the significance level or to determine the test direction. We are following widely used conventions and always assume a two-sided test with an alpha = .05 level of significance. To carry out more detailed analyses, I recommend to use the R-scripts, which are also available on the project page. These can easily be modified to specific needs:

#Power Analysis for Random effects Model

es <- 0.15 # Overall Effect Size

nt <- 10 # Number of participants in treatment group

nc <- 10 # Number of participants in control group

k <- 40 # Number of included studies

hg <- c(0, .33, 1, 3) # Heterogeneity (0 no, .33 small, 1 moderate, 3 high)

eq1 <- ((nt+nc)/((nt)*(nc))) + ((es^2)/(2*(nt+nc)))

eq2 <- hg*(eq1)

eq3 <- eq2+eq1

eq4 <- eq3/k

eq5 <- (es/sqrt(eq4))

power <- (1-pnorm(1.96-eq5)) # when alpha .05 -> 1.64 for one-tailed and 1.96 for two tailed

power

**Eq 3.** R-script example for power calculation of a random effects model meta analysis.

After entering all required information, the excel-script automatically calculates power-values and additional X-Y plots with power-curves for several numbers of icluded studies. In total four power-values are reported. These reflect the statistical power, dependent on the heterogeneity of the included studies.

If there is high heterogeneity between included studies, power decreases - then more studies are needed to reliably detect an effect. In many meta-analyses, the percentage of heterogeneity between studies is represented by the I² statistic. It's a fairly intuitive expression of inconsistency: an I² of 0, 25, 50, or 75 % reflects zero, low, moderate, or high heterogeneity. This may help to decide which power-value is the most reliable when interpreting results.

### When a meta-analysis is sufficiently powered?

Last but not least we have to clarify when a meta-analysis is sufficiently powered. Well, this is - as so often in statistics - a compromise and depends in most cases on the particular research question. However, a power of 80% (beta = 0.2) normally reflects high power and should be the targeted minimum value!

### Factors influencing power

Statistical power in meta-analyses may depend on a number of factors. Some factors may be particular to a specific testing situation, but at a minimum, power nearly always depends on the overall effect size, the average group size, the number of included studies and their heterogeneity. Based on the matrix figure below, the relationship may easily be traced.

**Fig 4. **Matrix illustration shows relationship of effect size, average group size, number of studies and heterogeneity. The blue, green, purple and red curves reflect 5,
10, 15, 20 effect sizes.

Meta-analyses with small average group sizes and a low number of included studies are only sufficiently powered to detect large overall effects (top row), whereas the detection of small effects is nearly impossible - even if many studies are included and the samples are bigger sized (bottom row)! In case of medium effect sizes, mainly the heterogeneity between the included studies determines whether and when sufficient power may be achieved (middle row).

### Meta-analyses are not a panacea to overcome the methodological weaknesses of primary research!

We have to accept that meta-analyses can not be a panacea to compensate basic methodological weaknesses of primary research! Improving reproducibility of studies has to be a key priority in neuro-science and requires attention to well-established, but often ignored, methodological principles!

+++ UPDATE 2018-10-07: Please also try the new ShinyApp! It has never been so easy to calculate statistical power in meta analyses... +++