Reproducibility: releasing code is just part of the solution

This week in Retraction Watch: Hypertension retracts paper over data glitch.

The retraction notice describes the “data glitch” in question (bold emphasis added by me):

…the authors discovered an error in the code for analyzing the data. The National Health and Nutrition
Examination Survey (NHANES) medication data file had multiple observations per participant and
was merged incorrectly with the demographic and other data files. Consequently, the sample size was
twice as large as it should have been (24989 instead of 10198). Therefore, the corrected estimates of
the total number of US adults with hypertension, uncontrolled hypertension, and so on, are significantly
different and the percentages are slightly different.

Let’s leave aside the observation that 24989 is not 2 x 10198. I tweeted:

Not that simple though, is it? Read on for the Twitter discussion.

Bosco replied:

Good point. The best response I could manage at the time was that errors may not be detected during review, but stand a better chance of eventual detection if code is available. Or as Bosco says, this is not really what review is for:

Chris made a similar point:

Summary
It’s easy to make the mistake of thinking that making code available is the solution to reproducibility and error detection. In fact, it’s just a small part of the solution, since someone (or something) then has to (1) look at the code and (2) notice the errors.

While we wait for that change to sweep aside traditional scientific publishing, I suggest that if you want to publish research that uses code to generate the results – at least get your colleagues to take a look before submission.

2 thoughts on “Reproducibility: releasing code is just part of the solution

  1. Eric Talevich (@etalevich)

    Required code submission could have made a meaningful difference. Since authors do worry at least a bit about how their own code would stand up under the harsh light of review, if code submission were required, then authors would probably give their code another quality check or two before submitting the whole batch to a journal. That could have turned up the error. We already check the article text for typos before submission, right?

    1. nsaunders Post author

      Absolutely agree; code checking should be like typo checking. One “problem”, I guess, is that journals fear submissions will drop if they enforce standards. Personally, I don’t see how “less crap” is a problem.

Comments are closed.