The boys over at Biocurious have posted an interesting story, termed the great pentaretraction by some unkind observers. In summary: a structural biology group recently discovered a nasty bug in their in-house software which changed the sign of data values in two columns. Unfortunately they made the discovery after publishing their results which has resulted in the retraction of three high-profile Science papers, two further retractions and serious problems for anyone else who has cited their work.
The post set off a flurry of thoughts. Initially sympathy of course – everyone makes mistakes and this is just a nightmare for any researcher. Admiration too – the authors have done the right thing by quickly announcing the error and withdrawing their findings. However, I soon realised that this story neatly summarises everything that’s wrong about the way research is performed and published. Let’s imagine how differently events could have unfolded if the principles of “Open Science” had been applied.
The ultimate culprit in the affair is a bug in a piece of in-house software. It’s all too easy to imagine the scenario. A graduate student or postdoc with no formal training in computer science, asked to whip up a quick script as quickly as possible, working alone in a department where noone else has the computer literacy or the inclination to check the code.
Now, imagine that the code was open-source and publicly available. It could be kept in a web-accessible repository, either on a local machine or a community resource similar to Sourceforge. Imagine too that when articles are submitted for peer review, a link to the code is included. There’d be a far higher chance that someone else would spot the error.
Several observers have noted that they had difficulty reconciling the published structures with other biochemical data. So where were these people when it mattered – i.e. at the pre-publication stage? The answer is that they were unable to have any useful input until the paper was written or even published. Whereas in an open climate with data, software and preliminary conclusions all available for scrutiny and discussion, someone might have noticed that something wasn’t quite right, so avoiding the post-publication inconvenience and embarassment. This ties in with the next problem – the peer review system.
Above all else, this story illustrates that the current system of peer review doesn’t work and that anyone who has faith in it is deluded. Here’s a shocking fact for you: most reviewers don’t review papers properly. They will read the abstract, glance at the tables and figures, check for obvious, glaring errors and typos, check that the conclusions make sense and are not overly-speculative, then write a few words of criticism just for the sake of sounding critical. The whole process probably takes less than an hour. I’ve received quite a few reviewers comments in my time and I honestly can’t recall a single occasion when a comment has made me think “Hey, they’re right! I should address that.” On the other hand, I’ve had any number of useless one-liners such as “some figures could be clearer”. Um, right. Which ones and how?
Really though – how can a reviewer say anything useful when they are presented with only a fraction of the story? Which is what a journal article is. It’s a neatly-packaged soundbite, dressed up to sound as convincing as possible. A reviewer only gets to read the author’s interpretation – they don’t see the raw data, the software used for analysis, the ambiguous results.
Now imagine a community approach. Imagine a pre-print server for biology, similar to arXiv but with links to data and software as well as written articles. Imagine discussion forums, comments, open peer review. In short, imagine everything that the Open Access movement and many Open Access journals are implementing.
You might argue that much of this relies on user participation. True enough – but in an open climate at least if an error slips through, the author can say “Well, none of you guys noticed either! It’s as much your fault as mine!”
Note also that these erroneous articles are now archived in print for all time. Yes, systems such as PubMed will point users to the retractions, but not everyone will immediately notice the link. And what of the numerous articles which cite the erroneous papers and so draw erroneous conclusions? In theory, it should be possible to flag online articles, group them together and even withdraw a whole bunch of them – perhaps to a “rejects” folder, if we have agreed standards for markup and electronic publication. Not so easy to erase articles in print from the collective memory.
Perhaps the aspect of this story that least impressed me was a letter in Science entitled Pretty Structures, But What About the Data? The author tries to point out that it’s important to understand how software works, distinguish between data and models and take note of data from all sources, including what he terms “good old-fashioned biochemistry”. All good points and fair enough, but phrases such as “lessons that we aging baby boomer professors ram down the throats of our proteomically aroused graduate students” don’t elicit waves of empathy from me.
He should try to understand why young scientists are excited about techniques such as proteomics. It’s because in contrast to the old-fashioned approach of learning each and every arcane aspect of some niche model system, we can now combine computing and high-throughput techniques to learn a lot more a lot faster. And you know, if our senior colleagues showed some interest in the importance of computer literacy, perhaps there would be less chance that young researchers, struggling alone in their department to develop software, would make mistakes.
Now go and read Bill Hooker’s essay: The Future of Science is Open, Part 3: An Open Science World. I’m done banging my head against this brick wall.