Did I find a p-hacked Motomed study?

I was a bit shocked when I saw that the researchers report a high significant result in their paper and reject the null hypothesis. What I found was a non significant result and a slightly higher propability for the null hypothesis. But let’s start from the beginning...


Today I have taken the time to analyse a study about cyclic movement therapy with motomed, published in 2005 by Kamps & Schule. 


I was most interested in the results of the 6-Minute-Walk-Test (6-MWT) because I've found some evidence that cyclic movement therapy may have a beneficial effect on mobility in general and specially on walking capacity after stroke. My aim was a detailed investigation of the reported effect sizes and the statistical power of the study. 


Step 1: Power-analysis

First I ran a post hoc power calculation with g*power to estimate the 1-beta error probability of the 6-MWT post intervention outcome, between the intervention and control group. I didn't really wonder that the study is highly underpowered, because of the small effect-size (d = 0.42), the low sample-size (n = 31) and the high variability of the data.


Given the fact (and this is often neglected!) that low power reduces the likelihood that a statistically significant result reflects a true effect in the population, I planed to run a summary stats bayesian independent sample t-test. This bayesian alternative to a conventional student's t-test provides much richer information about the samples and the difference in means than a simple p-value and its more or less subjective interpretation of probability.

Fig 1. Post hoc power calculation of the reported 6-Minute-Walk-Test results of the study.

Step 2: Computation of t-value


For the bayesian approach I needed the t-value of the test statistic, which was not reported in the paper. So I ran a sum t-test in R to compute the t-value with the given Means, SD's and Sample-Sizes of the groups:


# Write function


t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=TRUE)


  if( equal.variance==FALSE ) 


    se <- sqrt( (s1^2/n1) + (s2^2/n2) )

    # welch-satterthwaite df

    df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )

  } else


    # pooled standard deviation, scaled by the sample sizes

    se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) ) 

    df <- n1+n2-2


  t <- (m1-m2-m0)/se 

  dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))    

  names(dat) <- c("Difference of means", "Std Error", "t", "p-value")




# Calculate t-statistic and p-value


t.test2(m1=237.84, m2=195.29, s1=115.66, s2=85.94, n1=16,n2=15,

        equal.variance = TRUE)


Fig 2. R-script for sum.t.test

Step 3: A shock!

I was a bit shocked about the result you see below:



Fig 3. Result of the sum.t.test


The sum t-test gave out a t-value of 1.15 and a p-value of .26, which means that the result of the t-test is NOT(!) significant. However, Kamps & Schule reported a p-value of .003, wich reflects a high significant result of their group interaction calculation.


Even it's clear that they performed an ANOVA and not a t-test, I'm wondering how they got this high significant result. For my understanding that seems to be impossible!


How they got this result? I don't know! Did the researchers report wrong results or is something wrong with my computation?


Immediately I contacted Schule via researgate and sent him my calculations. Further I asked him to send me a protocol of the statistical analysis they made and/ or allow me to get access to the raw-data. 


Now I'm curious how the story ends and for sure I'll keep you updated!


By the way... If I run the bayesian t-test with my computed values in JASP, I find a higher probability for the null hypothesis!


Fig 4. Result of the bayesian independent samples t-test with a Couchy prior of 0.707 and my computed t = 1.15 value.


UPDATE - 2019 Sep. 17th: He faded away when it was clear that I wanted to see the data... I got him as a new follower in research gate but didn't received any answer!