Two more signs

  1. The blogosphere is alight with the announcement from Science Commons of a protocol for implementing open access data. This Technorati search with keywords “science commons open data” throws up 528 posts, many of which are relevant. I suggest that you also follow developments via Deepak’s blog and links therein.

    Now, this is great news for those of us who care about all things open – open access, “open science”, open data; and people who follow developments in web technology. However, we need to make it relevant and accessible to the people who matter: interested research scientists with the question “how can I make my data more accessible”. Those people are unlikely to subscribe to the W3C mailing list. So rise up blogging community – start writing short, clear informative posts in non-specialist language, aimed at explaining to a bench scientist why they should care and what they should know concerning this protocol.

  2. John Hawks has had a busy week following his major publication on accelerated evolution. He writes:

What I most want to point out is that the discussion on blogs is at a very high level — people are reading the paper with much more precision than I have ever experienced in the peer review process

If I could have one wish come true in 2008, it would be for more scientists in academia to realise that so-called “non-traditional” modes of publishing, debate and communication are of equal value (for me personally, higher value) to the old ways that they insist on defending, regardless of the evident flaws in those ways.

The Web as science communication platform: two more signs

  1. People are finding many outlets for their work. Pierre maintains a repository of tools where you can find IBDStatus, his latest software for genetic analysis.
  2. Spotted in Nature this week:


Makes perfect sense doesn’t it: if you publish an article on a structure, include a link to the PDB resource. Yet so far as I can tell this is a new feature, since it jumped out at me. Given that the WWW is such a rich publishing platform, simply because of hyperlinks that connect data, how long before paper copies of all journals are considered quaint and obsolete?


Bioinformaticians like tabular data; plain ASCII text delimited by tabs, commas or whatever.

open IN, $file;
  while( <in> ) {
    my @line = split("\t", $_);
## count fields in a line and check for header
    if( $#line == 6 && $line[0] ne "ID" ) {
## do something with fields
close IN;

Bad programmer! Why not use one of the many Perl modules for handling CSV files, such as Tie::Handle::CSV?
OpenOffice Google Docs extension

More reasons to use both Google Docs and OpenOffice: the OpenOffice.org2GoogleDocs extension. Allows you to upload from OO to Google Docs or download from Google Docs to OO. Requires that your JRE is set up correctly in OO and is Java 6.

Without realising it I’ve become a big user of Google Docs, mostly by importing from GMail attachments. On rare occasions I find the 500 KB maximum file size to be a problem – I’d suggest 1 MB is a more sensible maximum – but it does encourage people to keep embedded content out of their text documents.

On the topic of word processor alternatives, AbiWord has come a long way and now boasts a seriously-useful plugin list.

Open (notebook) science gathers momentum

Pedro has started an open science project to study domain family expansion. He’s trialling Google Code as his project repository. I think this is a great idea and a very exciting approach. If you have anything to contribute, go and check it out. While you’re there, click the bioinformatics tag to see another 54 projects at Google Code. Quite a resource, although a few are not very active.

And in open science synchronicity, David Ng publicises Rosie Redfield’s lab on BoingBoing and links to his blog post where she discusses the benefits of open science and blogging. The few comments so far focus on the old “but won’t we get scooped” argument, so head over there and say something positive.

We must keep pushing the agenda – open research will be the norm one day, I’m sure of it.

Two related stories

  1. Bosco wonders whether “read the code and you’ll get it” is really an adequate description of a file format
  2. In the much-neglected Source Code for Biology and Medicine, Bioinformatics Computational Journal (BCJ) – a framework for conducting and managing computational experiments

I like the concept of workflows – really, I do – and I understand that they are used widely in industry: biotech, pharma, drug design and so on. But I predict that they will never find wide application in academic biological sciences research. Why? Because in my experience it’s essentially impossible to convince biologists that things like standards, file formats, appropriate software tools, clean code and logical organisation of computational data are important. Let me give you a typical example of a “bioinformatics problem” in academia:

Dear Neil,
Here are the sequences that you asked for. They are in fasta format, except that I’ve marked the acetylation sites with a “*” and after that, a score in square brackets.

Gee thanks – oh, it’s a Word file too, better and better. Taking my cue from Rosie, I give you the Saunders principle:

The first step in any collaboration is to reformat the data sent by your collaborators.

Bio::Blogs #17

Glancing through my November archive, I note (1) few if any bioinformatics posts and (2) few posts of any kind. Well, sometimes blogging has to take a back seat. It seems many of us are snowed under in the run-up to the holidays.

Despite this, Paulo has compiled an excellent edition of Bio::Blogs, number 17. Go there for a summary of news from the bioinformatics blogosphere in the past month. Remember, volunteers to host future editions are always welcome.