Credit for code: enough with the half-measures already

May as well begin 2014 where we left off: complaining about the attitude of scientific publishers regarding reproducible computational research.

I had a “Twitter blurt”. That’s when you read, react and tweet. Happens to the best of us. With hindsight, it was perhaps a little harsh:

The link is to an editorial in Nature Genetics, “Credit for code.” It points out, quite rightly, that “review, replication, reuse and recognition are all incentives to provide code” in research publications. After that promising start though, things get a little strange.

The article is written in a rather awkward, unconvincing style which suggests the editor(s) are not familiar nor comfortable with the subject. Phrases like “instantiated in software written for computers and other laboratory machines” sound, well, just weird. As for “it is also useful to offer the code actually used nonexclusively to the journal in a supplementary text or archived file” – first, that barely makes sense and second, it’s the legalese of people more accustomed to coaxing authors into giving up copyright. It’s unlikely to sit well with many scientific programmers.

The article uses CRAN and Bioconductor as examples of good practice in scientific software development, but again the tone is a little odd.

The journal has sufficient experience with these resources to endorse their use by authors. We do not yet provide any endorsement for the suitability or usefulness of other solutions but will work with our authors and readers, as well as with other journals, to arrive at a set of principles and recommendations.

What are they trying to say? “Our authors seem to use R a lot, so we’re guessing it’s good and besides, we don’t know about anything else”? There’s a substantial and active online community which has already developed principles and recommendations for publishing computational research. I’d suggest the editors get started by visiting Software Carpentry, searching Titus’s blog and reading this Ten Simple Rules article.

The last paragraph is the reason for my “Twitter blurt”. It begins:

If these best practices are not possible, there are ways not to make the current situation worse.

I’d rather we – especially the journals – strive for best practice, rather than adopt an air of resignation. It gets worse, though:

If none of these solutions are feasible, please do declare when there is code involved in the work, even if it is proprietary or unavailable, and provide equations or algorithms that enable a reader to understand and replicate analytical decisions made with the research software.

Few things are more frustrating, or more likely to result in irreproducibility and error, than trying to reconstruct a computational analysis based on a prosaic description of an algorithm in a research article. Yet this is a very typical part of the working day in my field (bioinformatics) and I imagine, in many others.

I may have blurted, but 12 retweets and 10 favourites suggest I hit a chord with a few people. As I suggested in a reply, I’d rather see journals leading the way by mandating standards for publishing computational research, rather than making weak suggestions.

8 thoughts on “Credit for code: enough with the half-measures already

  1. I have to say that Nature Genetics does face at two big challenges here, though:

    1) Convincing bioinformaticians that they should give out their code for the sake of reproducibility and reuse, while at the same time accepting that the raw GWAS data it was used to analyze is not give out for privacy reasons. Which of course means the results cannot be reproduced in any case.

    2) Dealing with ownership of the software. At many universities – including the one I am at – the code written by researchers on their payroll are considered inventions that belong to the university. Negotiating deals with the intellectual property rights offices will be decades of fun.

    • 1. I have no good answer to 1. The conflict between privacy in human GWAS data and the need for research transparency & reproducibility is something I have not yet seen a good solution to. However, many publications do require data deposition (like Titus said), so I am not sure how privacy is maintained. In the US, at least, patients sign off to have their information used for research. This is a rather broad term: i.e., if research requires deposition in a public database, and patients were well-informed of the potential (if rare) privacy consequences, they may not agree. As Yaniv Erlich and other have shown, de-anonymyzing DNA data is not that hard. I’m not sure how long that situation will continue, especially with privacy of all sorts being at the forefront of public discourse.

      2. Things I found useful in getting agreement to license my software as OS is “argument from fiat”. That is, I argue (correctly) that writing open-source distribution into a grant proposal data management plan greatly enhances the attractiveness of the proposal. Here this is especially true with NSF funded grants which require a strong outreach component, but also, to some extent, with NIH grants. Not sure what the situation is with European and Danish grants, though. I also argue that it is easier to get published in some journals (e.g. bioinformatics, PLoS-CB) if you offer code freely, and under an OSS or Free license. Indeed, these journals strongly supports this practice , although no journal mandates any kind of license.

  2. Lars, is the raw GWAS data not publishable (due to privacy concerns)? I thought it had to be deposited somewhere.

    Re #2, it can be useful to draw a distinction on license here. The researchers could be required to release the source code under a license that prohibits reuse for any purpose other than replication. I don’t know of anyone arguing that, for basic requirements of replication, code needs to be released under an OSS license (although I would love to argue this on principle :); but the code needs to be available to reviewers and readers. Think of it like a patent: to file a patent, you need to release all of the information on how to build the patented device. That doesn’t mean that you relinquish any rights.

  3. A retired Army officer friend calls them Brain Droppings. We all have them. But this particular tweet-of-the-moment is on target. The article’s tone of resignation is understandable. The obstacles are real but this is hardly a Helm’s Deep moment. They were wrong to publish in despair.

    • At first glance, it looks quite promising. It’s always good to see people making use of Github.

      In general though, no journal is doing anywhere near as much as people like me would like to see. I just wish a few publications would insist on standards, rather than politely suggesting them. Do they fear that it would reduce submissions?

  4. Pingback: New Examples of Collaboration, Freedom, and Transparency at Work | Techrights

Comments are closed.