Two related stories

  1. Bosco wonders whether “read the code and you’ll get it” is really an adequate description of a file format
  2. In the much-neglected Source Code for Biology and Medicine, Bioinformatics Computational Journal (BCJ) – a framework for conducting and managing computational experiments

I like the concept of workflows – really, I do – and I understand that they are used widely in industry: biotech, pharma, drug design and so on. But I predict that they will never find wide application in academic biological sciences research. Why? Because in my experience it’s essentially impossible to convince biologists that things like standards, file formats, appropriate software tools, clean code and logical organisation of computational data are important. Let me give you a typical example of a “bioinformatics problem” in academia:

Dear Neil,
Here are the sequences that you asked for. They are in fasta format, except that I’ve marked the acetylation sites with a “*” and after that, a score in square brackets.

Gee thanks – oh, it’s a Word file too, better and better. Taking my cue from Rosie, I give you the Saunders principle:

The first step in any collaboration is to reformat the data sent by your collaborators.