I imagine that most people, when asked “do you think that independent confirmation of research findings is important?” would answer “yes”. I also imagine, when told that in most cases this is not possible, that those people might be concerned or perhaps incredulous. However, this really is the case, which is why I spend much of my working life in a state of concern and incredulity.
Over the years, many articles have been written on how to improve this state of affairs by adopting best practices, collectively-termed reproducible research. One of the latest is an editorial in Nature. I’ve pulled out a few quotes for discussion.
To ensure their results are reproducible, analysts should show their workings
These words form the article summary. Absolutely. No arguments about that one.
…how should researchers document and report their use of software?
Properly, is the short and simple answer to that one.
…many in the field merely shrug their shoulders and insist that is how things are done
Here, the field is genomics and “how things are done” is “in a non-reproducible way.” I don’t think this is entirely accurate. In my experience, most researchers express profound regret that this is “how things are done” but make excuses for it: lack of time, lack of incentives and focus on results at all costs are frequently-cited factors.
Nature does not require authors to make code available…
Two thoughts: (1) it should and (2) why not? Let’s be honest: work in which the major focus is bioinformatics does not generally appear in Nature. However, Nature publishes other types of research with a strong computational component (physics, climate science). Is their concern that imposing standards might dissuade authors from submission? I have heard that public databases are reluctant to enforce strict standards for this reason, but perhaps that is cynical speculation.
…but we do expect a description detailed enough to allow others to write their own code to do a similar analysis
Without analysing the methods section of many Nature articles, I have no evidence that this is not the case – but I seriously doubt that it is. Anyone who has ever tried to reconstruct an analysis from a description in any journal article, never mind Nature, knows that it is rarely easy and often impossible. There’s no substitute for the code.
Some in the field say that it should be enough to publish only the original data and final results…
The article contains a lot of anonymous “some”, “others” and “many”. Presumably, they’re too ashamed to go on record. I can’t imagine anyone in bioinformatics who would say this.
…given the complexity of the analyses, is it [transparency] realistic?
It sure is. That’s what computers do: calculate stuff repeatedly, over and over. The article implies, in fact, that the 1000 Genomes project does little else but mechanised number crunching.
…it is important that the community consider such solutions [workflows]
This closing sentence seems rather wishy-washy and at odds with the strong article summary. And to digress, “the community” is one of those old-fashioned, clichéd science terms that personally, I can’t stand – along with “the field”.
However, putting that aside: I hope that Nature and other journals count themselves as part of a community – the community of modern, forward-looking twenty-first century science. This community acknowledges that when it comes to reproducible research, the traditional journal article and publication process is a large part of the problem. Journals can take a lead role by (1) enforcing standards; (2) insisting on good reproducible research practices; (3) providing or recommending repositories for code/data and (4) more generally, going beyond the “designed for printing on dead trees” mentality that still persists, over 20 years after the birth of the World Wide Web.