Trust no-one: errors and irreproducibility in public data

Just when I was beginning to despair at the state of publicly-available microarray data, someone sent me an article which…increased my despair.

The article is:

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology (2009)
Keith A. Baggerly and Kevin R. Coombes
Ann. Appl. Stat. 3(4): 1309-1334

It escaped my attention last year, in part because “Annals of Applied Statistics” is not high on my journal radar. However, other bloggers did pick it up: see posts at Reproducible Research Ideas and The Endeavour.

In this article, the authors examine several papers in their words “purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response.” They find that not only are the results difficult to reproduce but in several cases, they simply cannot be reproduced due to simple, avoidable errors. In the introduction, they note that:

…a recent survey [Ioannidis et al. (2009)] of 18 quantitative papers published in Nature Genetics in the past two years found reproducibility was not achievable even in principle for 10.

You can get an idea of how bad things are by skimming through the sub-headings in the article. Here’s a selection of them:

  • Training data sensitive/resistant labels are reversed
  • Heatmaps show sample duplication in the test data
  • Only 84/122 test samples are distinct; some samples are labeled both sensitive and resistant
  • At least 3/8 of the test data is incorrectly labeled resistant
  • Two of the “outlier” genes are not on the arrays used
  • Genes are offset, and sensitive/resistant labels are reversed for pemetrexed
  • Treatment is confounded with run date
  • The gene list doesn’t match the heatmap
  • Sensitive/resistant label reversal is common

Following a detailed analysis of several case studies, they conclude that:

…the most common errors are simple…conversely, it is our experience that the most simple errors are common.

Finally, the authors offer suggestions as to how research reproducibility might be improved. Not surprisingly, since they are statisticians who use R, they offer Sweave as part of the solution.

This is a great article, which deserves to be more widely-read. It strikes me that most bioinformatics is “forensic bioinformatics”; picking through messy data in the hope of figuring out what’s going on – or perhaps, who committed the crime.

3 thoughts on “Trust no-one: errors and irreproducibility in public data

  1. Fundamentally the problem is the low cost of making mistakes. There seem to be little if any repercussions, therefore it does not make “economic” sense to be very careful and double and triple check the results.

    The scientific publishing system that we have in place rewards lots of publications and cares a lot less about mistakes.

    Sadly I don’t think there is an easy solution in sight.

  2. Pingback: Tidbits, 29 September 2010 | Book of Trogool

  3. At the risk of deepening your despair(sorry!) I would point out how rare it is that a team like Baggerly and Coombes would go to the effort of establishing just how bad a paper is. Baggerly and Coombes only put in the effort they did because physicians asked them to do so as part of their service work within a cancer center. Statisticians typically have no incentive to take hundreds of hours away from their own research to reverse engineer someone else’s work.

Comments are closed.