All kinds of bioinformatics

If you haven’t already, go and read Mike’s amusing and pertinent post, World of Bioinformatics Quest: Character generation. Which one are you?

I often think that in academic research at least, there are 3 types of bioinformatics:

  1. Bioinformatics that provides insight into biological systems
    The ideal case being that you make a computational prediction which is then confirmed experimentally. Requires close collaboration between you and wet lab colleagues. By far the rarest category.
  2. Bioinformatics that provides insight into biological data
    An example might be a statistical analysis of the PDB to identify factors common to protein chains that interact. Often useful and may overlap with type (1) in the best cases.
  3. Bioinformatics that develops an algorithm or statistical procedure, but provides no insight into biology whatsoever
    By far the commonest category and the most prevalent in the bioinformatics literature. Normally takes the form: (a) amass some variables, (b) build a SVM, (c) run 10-fold cross-validation, (d) report sensitivity, specificity, accuracy etc. etc. Leading to the imminent death of bioinformatics as a respected research discipline. Largely responsible for the divide between bioinformaticians and bench scientists.

15 thoughts on “All kinds of bioinformatics

  1. 4. Bioinformatics that doesn’t even create a new algorithm/procedure as in 3., but just reports the (not very interesting) results of existing programs. Lately, I’ve been asked to review a lot of these. Sometimes I feel sort of bad rejecting them because they often come out of third-world countries, but really, I have to wonder why the authors bothered to write them.

  2. Jonathan, don’t feel bad about it; just makes sure to write up in your review report what they are doing wrong, so that they learn how to do it properly.

    Regarding the three/four categories… these are not categories of bioinformatics; they are degrees are scientific soundness. The first case shows a situation where a hypothesis is made, and tested. BTW, nothing wrong with just proposing a hypothesis! Einstein went ahead, and it took years and years before his work could experimentally be tested!

    Even worse, I do not feel you chose your examples well. Model building (e.g. with SVM) commonly has two purposes, data exploration/visualization, and pure prediction on the other hand to allow virtual screenings of various kinds.

    We reviewed the ENCODE-What’s-a-Gene article yesterday, and here a similar thing happened, though not even close to using SVMs: they propose a filter (‘gene’) and make up some story on how that allows filtering out non-functional genetic code.

    And that’s what the real problem is, and unfortunately applies to all categories: biologists make up stories, matching what they think is happening. New ‘insights’… but rarely backed up by clear experimental evidence.

    Surely, many people do not understand how to use modeling methods… but please do not mix that up with the ability to use it to provide arguments for your hypothesis.

  3. Mmm, debate :)

    I’ll let this one run for a while and see if anyone else has some thoughts.

    Egon – all good points and I don’t disagree with any of them. However, the merits of modelling and the art of hypothesis building are not the reasons for this post. I’m more concerned with whether the way academia is structured is forcing bioinformatics research in a particular (and undesirable) direction. I believe that it is and hopefully will articulate why I think so at some point!

  4. Egon: SVM: I think SVMs are very bad models. You (usually) cannot interprete the results. I like e.g. decision trees followed by a discussion WHY the tree looks like the one that was found.

  5. Oh, I certainly agree that comparative benchmarks are worthwhile; it’s valuable to know if a program you use is really the best or if you are just using it out of inertia because it’s what you learned to use in grad school. But in these cases the papers have test data with the correct answers known.

    The papers I was talking about simply run a bunch of programs on data where the answers aren’t known and then report the results uncritically, or claim that program X is better because “it found more genes” without any analysis of whether the additional genes are false positives or not.

  6. You’ve forgotten the essential thing that item 3 may actually be used by further researchers to do 1 and 2.

  7. Anonymous – I have not forgotten that at all. The question is – are the results of (3) trickling down to (1) and (2) as well as they could? And in academia, is there too much emphasis on (3)? I content no and yes, respectively.

  8. I am curious to hear more. Surely it is wrong to categorise bioinformatics so narrowly and is it necessary that an algorithm provides a direct insight into biology? Surely no one algorithm can provide such an insight and ultimately the hope is that algorithms are inspired by a problem and will eventually be used within a general solution to some biological question, thereby providing insight? Also, is there some reason why a distinction is made between academic and industrial bioinformatics? Again, just curious….

  9. Well, I just saw an annoucement for a celebration of GenBank’s 25th anniversary on the NCBI home page. I did my first database search circa 1983 when there were only 2400 sequences in GenBank.

  10. Martino – glad you’re curious! I will have to post on this topic soon, I can see. My bioinformatics categories are a little “tongue in cheek”, in case you (or anyone else) were taking them too seriously ;)

    And yes, there is a reason why I singled out academic bioinformatics research. Stay tuned…

  11. Pingback: What bioinformatics do you do? « My Weblog on Bioinformatics, Consed, Phrap, Genome science.

Comments are closed.