I was a bit shocked when I saw that the researchers report a high significant result in their paper and reject the null hypothesis. What I found was a non significant result and a slightly higher propability for the null hypothesis. But let’s start from the beginning...
Today I have taken the time to analyse a study about cyclic movement therapy with motomed, published in 2005 by Kamps & Schule.
I was most interested in the results of the 6-Minute-Walk-Test (6-MWT) because I've found some evidence that cyclic movement therapy may have a beneficial effect on mobility in general and specially on walking capacity after stroke. My aim was a detailed investigation of the reported effect sizes and the statistical power of the study.
Step 1: Power-analysis
First I ran a post hoc power calculation with g*power to estimate the 1-beta error probability of the 6-MWT post intervention outcome, between the intervention and control group. I didn't really wonder that the study is highly underpowered, because of the small effect-size (d = 0.42), the low sample-size (n = 31) and the high variability of the data.
Given the fact (and this is often neglected!) that low power reduces the likelihood that a statistically significant result reflects a true effect in the population, I planed to run a summary stats bayesian independent sample t-test. This bayesian alternative to a conventional student's t-test provides much richer information about the samples and the difference in means than a simple p-value and its more or less subjective interpretation of probability.
Fig 1. Post hoc power calculation of the reported 6-Minute-Walk-Test results of the study.
Step 2: Computation of t-value
For the bayesian approach I needed the t-value of the test statistic, which was not reported in the paper. So I ran a sum t-test in R to compute the t-value with the given Means, SD's and Sample-Sizes of the groups:
# Write function
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=TRUE)
if( equal.variance==FALSE )
se <- sqrt( (s1^2/n1) + (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
df <- n1+n2-2
t <- (m1-m2-m0)/se
dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
# Calculate t-statistic and p-value
t.test2(m1=237.84, m2=195.29, s1=115.66, s2=85.94, n1=16,n2=15,
equal.variance = TRUE)
Fig 2. R-script for sum.t.test
Step 3: A shock!
I was a bit shocked about the result you see below:
Fig 3. Result of the sum.t.test
The sum t-test gave out a t-value of 1.15 and a p-value of .26, which means that the result of the t-test is NOT(!) significant. However, Kamps & Schule reported a p-value of .003, wich reflects a high significant result of their group interaction calculation.
Even it's clear that they performed an ANOVA and not a t-test, I'm wondering how they got this high significant result. For my understanding that seems to be impossible!
How they got this result? I don't know! Did the researchers report wrong results or is something wrong with my computation?
Immediately I contacted Schule via researgate and sent him my calculations. Further I asked him to send me a protocol of the statistical analysis they made and/ or allow me to get access to the raw-data.
Now I'm curious how the story ends and for sure I'll keep you updated!
By the way... If I run the bayesian t-test with my computed values in JASP, I find a higher probability for the null hypothesis!
Fig 4. Result of the bayesian independent samples t-test with a Couchy prior of 0.707 and my computed t = 1.15 value.
UPDATE - 2019 Sep. 17th: He faded away when it was clear that I wanted to see the data... I got him as a new follower in research gate but didn't received any answer!