Real bioinformaticians write code

A lot of questions at BioStar begin along these lines:

Where can I find…?
I am looking for a resource…?
Is there some database…?

I tweeted some concerns about this:

Many #biostar questions begin “I am looking for a resource..”. The answer is often that you need to code a solution using the data you have.

Chris tweeted back:

@neilfws Lit. or Google search is first step, asking around is the next logical step. (Re-)inventing wheels is last. Worth asking, IMHO.

We had a little chat and I realised that 140 characters or less was not getting my point across (not for the first time). What I was trying to say was something like this.

Chris is quite right; searching, then asking for existing solutions is the correct first approach. However, the tone of certain questions makes it sound as though some people believe that there must be a ready-made resource for any given situation, or for their exact circumstances. For example, a moments thought would make it clear that you are unlikely to find, just lying around on the Web:

  • A list of DNA sequence accessions for a gene from your list of organisms
  • A set of secondary structure predictions for your list of proteins

What you will find on the Web are larger datasets from which you can extract your subset of interest – and the tools to do it. In the examples above, this entails:

  • Retrieving identifiers for your organisms from a taxonomy database, linking them to identifiers of DNA sequences and filtering for your gene
  • Retrieving the sequence of your proteins, performing secondary structure prediction either locally or remotely and parsing the results

In other words: know the data sources, know the right tools and you can always sculpt a solution for your own situation.

Good web search skills are an essential part of the bioinformatics toolkit, but they don’t define the job. Real bioinformaticians write code.

13 thoughts on “Real bioinformaticians write code

  1. I have to say that whilst I find the Q&A sites an extremely useful resource, the huge numbers of questions posed which can be answered easily with a Google search and a little common sense is becoming distressing.

    I have found I am less willing to answer questions on because of the recurrence of the same types of problems (“how do I get coordinates of my SNPs from BAM file etc. etc.”)

    “Teach a man to fish” etc.

    • This can be frustrating. I think the BioStar community has recently become a lot better at dealing with the trivial or irrelevant questions – basically, by closing them if questioners fail to respond to requests for improving the question. It takes a lot of effort though.

  2. It’s like anything else – half the battle is in knowing the right questions to ask.

    The other half is knowing what to do with the answer.

    I think Q&A sites, especially the friendly folks on BioStar, are a good way to get started.

  3. And usually the questions about software errors or code are relegated to a lower degree. Good that I left Biostar some time ago, and only sporadically check the main page.

    • I hope that BioStar has helped to promote BioMart; a lot of answers point users to that resource. It’s one of the best tools for those “given a list of X, return a list of Y” queries that are so common.

  6. I have to say I’m in total agreement with pretty much everything you wrote, though I might have a couple caveats :).

    yes, real bioinformaticians write code (thank goodness), the bulk of research biologists probably have to rely on existing tools and databases (thanks to bioinformaticians)

    Though I think Google is an excellent place to start, from my personal and professional experience, it can only answer the question a minority of the time, or finding the answer can be quite frustrating.

    So, I agree with this statement whole heartedly: “know the data sources, know the right tools and you can always sculpt a solution for your own situation.”


    BioMart, UCSC Genome Browser, Galaxy, etc, etc are excellent tools and data sources and could probably answer about 80% of most posed questions :). But my caveat would be that knowing the data sources and right tools can be a bit of a daunting task.

  7. No like you’re being provocative or anything :)

    I guess you could also say that real scientists do experiments; this is really what is at issue here, the unwillingness to explore a problem deeply. Everything already has an answer right ?

    As for reinventing the wheel: in general it is a bad idea and asking around first is smart. But I would argue that you can’t really contribute until you understand an existing solution, and to do that often you have to re-implement. It is a learning/discovery process.

