Data standards

Via Nodalpoint: Nature Biotechnology are seeking comments on a set of prospective manuscripts aimed at improving life science data standards. This is important so go there, read, leave comments.

I think the usability of the consultation web pages could be greatly improved. How about a discussion forum and some RSS feeds, rather than posting PDFs of papers and comments on a weekly basis? “Check back when my feed reader says so” is so much easier than remembering to “check back from time to time”.

Lessons learned the hard way

The boys over at Biocurious have posted an interesting story, termed the great pentaretraction by some unkind observers. In summary: a structural biology group recently discovered a nasty bug in their in-house software which changed the sign of data values in two columns. Unfortunately they made the discovery after publishing their results which has resulted in the retraction of three high-profile Science papers, two further retractions and serious problems for anyone else who has cited their work.

The post set off a flurry of thoughts. Initially sympathy of course – everyone makes mistakes and this is just a nightmare for any researcher. Admiration too – the authors have done the right thing by quickly announcing the error and withdrawing their findings. However, I soon realised that this story neatly summarises everything that’s wrong about the way research is performed and published. Let’s imagine how differently events could have unfolded if the principles of “Open Science” had been applied.

Open Software
The ultimate culprit in the affair is a bug in a piece of in-house software. It’s all too easy to imagine the scenario. A graduate student or postdoc with no formal training in computer science, asked to whip up a quick script as quickly as possible, working alone in a department where noone else has the computer literacy or the inclination to check the code.

Now, imagine that the code was open-source and publicly available. It could be kept in a web-accessible repository, either on a local machine or a community resource similar to Sourceforge. Imagine too that when articles are submitted for peer review, a link to the code is included. There’d be a far higher chance that someone else would spot the error.

Open Data
Several observers have noted that they had difficulty reconciling the published structures with other biochemical data. So where were these people when it mattered – i.e. at the pre-publication stage? The answer is that they were unable to have any useful input until the paper was written or even published. Whereas in an open climate with data, software and preliminary conclusions all available for scrutiny and discussion, someone might have noticed that something wasn’t quite right, so avoiding the post-publication inconvenience and embarassment. This ties in with the next problem – the peer review system.

Peer Review
Above all else, this story illustrates that the current system of peer review doesn’t work and that anyone who has faith in it is deluded. Here’s a shocking fact for you: most reviewers don’t review papers properly. They will read the abstract, glance at the tables and figures, check for obvious, glaring errors and typos, check that the conclusions make sense and are not overly-speculative, then write a few words of criticism just for the sake of sounding critical. The whole process probably takes less than an hour. I’ve received quite a few reviewers comments in my time and I honestly can’t recall a single occasion when a comment has made me think “Hey, they’re right! I should address that.” On the other hand, I’ve had any number of useless one-liners such as “some figures could be clearer”. Um, right. Which ones and how?

Really though – how can a reviewer say anything useful when they are presented with only a fraction of the story? Which is what a journal article is. It’s a neatly-packaged soundbite, dressed up to sound as convincing as possible. A reviewer only gets to read the author’s interpretation – they don’t see the raw data, the software used for analysis, the ambiguous results.

Now imagine a community approach. Imagine a pre-print server for biology, similar to arXiv but with links to data and software as well as written articles. Imagine discussion forums, comments, open peer review. In short, imagine everything that the Open Access movement and many Open Access journals are implementing.

You might argue that much of this relies on user participation. True enough – but in an open climate at least if an error slips through, the author can say “Well, none of you guys noticed either! It’s as much your fault as mine!”

Note also that these erroneous articles are now archived in print for all time. Yes, systems such as PubMed will point users to the retractions, but not everyone will immediately notice the link. And what of the numerous articles which cite the erroneous papers and so draw erroneous conclusions? In theory, it should be possible to flag online articles, group them together and even withdraw a whole bunch of them – perhaps to a “rejects” folder, if we have agreed standards for markup and electronic publication. Not so easy to erase articles in print from the collective memory.

Crusty professors
Perhaps the aspect of this story that least impressed me was a letter in Science entitled Pretty Structures, But What About the Data? The author tries to point out that it’s important to understand how software works, distinguish between data and models and take note of data from all sources, including what he terms “good old-fashioned biochemistry”. All good points and fair enough, but phrases such as “lessons that we aging baby boomer professors ram down the throats of our proteomically aroused graduate students” don’t elicit waves of empathy from me.

He should try to understand why young scientists are excited about techniques such as proteomics. It’s because in contrast to the old-fashioned approach of learning each and every arcane aspect of some niche model system, we can now combine computing and high-throughput techniques to learn a lot more a lot faster. And you know, if our senior colleagues showed some interest in the importance of computer literacy, perhaps there would be less chance that young researchers, struggling alone in their department to develop software, would make mistakes.

Further reading
Now go and read Bill Hooker’s essay: The Future of Science is Open, Part 3: An Open Science World. I’m done banging my head against this brick wall.

Sensing a trend

In general, I don’t find the Careers section in Nature very helpful. It often features rather general, obvious advice such as “acquiring new skills is good”, then fails to expand further. Once in a while though, there is a longer more helpful article and these are more popular, judging by the hits. The top two articles for last December are:

Should I stay or should I go?
Gut check time: should you stay in academia, on the bench or even quit science?

How to ask yourself questions about major career decisions.

Which says rather a lot, I think.

It’s good to know you’re not alone

My admiration for BMC Bioinformatics knows no bounds:

Publishing perishing? Towards tomorrow’s information architecture

From the abstract:

Although the Internet has revolutionized the way our society thinks about information, the traditional text-based framework of the scientific article remains largely unchanged.

To truly integrate scientific information we must modernize academic publishing to exploit the power of the Internet.

And later in the article:

Scientific contribution should not be measured solely by journal publications.

They also mention the role of funding agencies in promoting publication avenues other than the traditional journal article. I’ve come to think that a lot of the fault in academia lies with funding agencies. In Australia for instance, the journal article is the sole currency of any worth when it comes to applying for grants or seeking permanent positions. The agencies set the rules, what can the researchers do but follow them if they want to succeed?

I spent much of the past week feeling depressed about how slowly internet innovation trickles into the academic consciousness. This article cheered me considerably.


My new favourite Firefox add-on is Fotofox. From a sidebar in your browser you can select images from disk, add titles, descriptions and tags and upload to a Flickr set. Works nicely for me.

I’m beginning to enjoy the Flickr experience. If you’re used to running your own server and rolling your own solutions, there’s a process of mental adjustment before you’re comfortable with hosted community websites. Often all it takes is the right toolkit at your end. For images I find Picasa is excellent for organisation, viewing, simple manipulation and export, though the Linux version has some minor issues to iron out. You can’t go past GIMP for more complex manipulation. When you’re happy with your collection, select using Fotofox and off they go to Flickr where you can mess with organisation, sharing, geotagging and the rest.

What’s an expert?

Here’s a fun story via Improbable Research. Harry Collins is a social scientist who studies physicists – in particular, physicists who are looking for gravity waves. He’s been doing this for 30 years and as you might imagine, has learned a lot about the field of gravity waves along the way.

So much so in fact, that in a blind test, gravity wave physicists were convinced that he was one of their own. Which has sparked some debate as to the question “what’s an expert?” What’s the difference between simulated understanding and real understanding? The Guardian article is not very good but makes the point that in general, real understanding allows you to make a genuine contribution to a field of study. A commenter defines an expert as the winner of a popularity contest, which I like.

What we’re seeing here is another step away from old ideas about academic fields, access to ideas, information and knowledge, where “laypeople” look to “experts” for guidance. I have no problem with the ability of self-educated amateurs being able to contribute to debate. It’s all part of the democratisation of information, for which we have the WWW to thank.

Comet McNaught update

Comet McNaught Still bright last night with a clearly-visible tail. I had to get a better view and image than the the last attempt, so I wandered up the road to Dutton Park which has clear views west and is reasonably dark.
Image at left links to Flickr, which I’ve resolved to make better use of this year (with thanks to Duncan for spurring me on).

Comet McNaught

comet+car Each night this week around sunset I’ve left the house, wandered up the street, squinted in disgust at the clouds on the western horizon and gone back inside. Last night, we finally got lucky with the weather.

The visual experience is a good deal more impressive than these pictures suggest. I managed to choose a location with the highest concentration of suspended power lines in Brisbane, balanced the camera on a wheelie bin in the middle of the road and hoped for the best.

The first shot includes a passing car for dramatic effect. The second shows the tail quite nicely, if you can ignore the cables and the glaring street lamps. The third is a cropped version of the first.

All in all, not a great photographic experience which only enhanced my desire to live in the country! Still, I can say that I witnessed the great comet of 2007 and captured it for posterity.

Wikipedia has put together a very nice Comet McNaught entry with plenty of useful links and a great image gallery. Also try Comet McNaught as a Flickr tag.