I’m a biologist Jim, not a programmer

A interview with the head of a bioinformatics software company got Tiago thinking about how much time biologists should devote to computing. Deepak also has a few ideas on the topic. This one is always a favourite in the “biologists versus bioinformaticians” debate and here’s my $0.02.

First, I can’t answer the question “how much programming should a biologist know?” It depends on the individual and the nature of their work. What I will say is that since we live in an age where the ability to acquire, process and interpret large amounts of data is an important research skill, I’d assume that any biologist with any common sense is thinking about improving their computer skills.

Second, my own experience as a biologist moving to bioinformatics has been entirely positive. I’ve learned new skills, been able to work on a wide variety of problems and improved my job prospects (if not in academic research, than elsewhere). I’ve also found that thinking about problems computationally has vastly improved my decision making, problem solving and data analysis skills. My generation of biologists were taught that biology was different to other sciences; it was fuzzy, messy, illogical, a collection of facts with no common threads. That attitude was rubbish then, it most certainly is now and we can thank bioinformatics for helping to change it.

Third, I cannot understand the attitude of some biologists that learning computational skills “gets in the way” of their research. “I’m a biologist Jim, not a programmer”, as Bones McCoy might have said. These same people think nothing of expending time and effort to learn a new laboratory technique, because they know that it will benefit their research. Yet they see learning a computational technique as tangential, or a potential waste of time, or something that they shouldn’t have to think about because it’s just a tool, a means to an end. I just cannot see the difference between learning how to write a Perl script and learning how to purify a protein, or sequence a plasmid insert. If you need it and it benefits you, you do it.

Incidentally, my personal suspicion is that many PIs discourage their young researchers from getting into computing because they know that they will find it far more enjoyable and satisfying than lab work and stop doing experiments ;)

My fourth point is very dear to me – I just cannot stand to see my biologist colleagues performing a computational task badly. It almost causes me pain. I look at it like this: why are you spending 6 hours on a manual task (copying and pasting between a web page, Word and Excel, for instance) when you know that a Perl script of a few lines would do the same job more or less instantaneously? “Because I don’t have time to learn Perl.” Well, if it’s a task that you perform regularly on similar datasets, those hours soon add up. Essentially, you’re wasting 6 hours of your life every time you do that task inefficiently. Do it five times, you’ve wasted 30 hours. And you know, most intelligent people can learn a lot of basic scripting in 30 hours.
Time saved now is time saved later.

So I’m in full agreement with Tiago when he objects to the sentence “when biologists start asking about where they can learn to program a computer, just so they can do their job you know something is wrong”. I’d say it’s a sign that something is very much right.

10 thoughts on “I’m a biologist Jim, not a programmer

  1. Yeah, I think Drummond is on the wrong path when he says things like “It seemed to me that these kind of basic productivity problems had been solved in other workplaces like the office — so why had they not been solved in the research laboratory?”. The difference is that there are only a few things needed in an office and they don’t change much — but the very nature of science is that it doesn’t stand still.

    And yeah, I agree with you that it is painful to see people to things manually that ought to be scripted — and you’d be surprised on how much manual work goes on even at a sequencing center where most work is technically “bioinformatics” of one form or another (few of us actually run the sequencing machines).

  2. Great post. Actually you’d be amazed at home many spreadsheets still fly around and how much time is spent just re-formatting data. Perhaps, it’s the cynical side in me, but I feel that bench scientists, while being more comfortable using computers than before are still not comfortable getting under the hood. I wonder if that will every change. I’ve actually seen it get worse over the years.

  3. many PIs discourage their young researchers from getting into computing because they know that they will find it far more enjoyable and satisfying than lab work

    One huge difference difference which fuels this enjoyment is immediacy of satisfaction. You don’t have to wait a week for your Perl script to grow into colonies large enough to pick, or overnight for the bacteria to make enough PHP to move on to the next step.

    Time saved now is time saved later.

    The rub here is that you often don’t have time now. I may be able to squeeze in five twitching, screaming hours to pound that fucking Word form into shape, especially since I can get up and do the occasional bit of benchwork in between, but at no time am I permitted 30 hours away from the bench. The urgency of competition is such that I am supposed to be frantic for data at all times. It’s the difference between important (learning to script) and urgent (NEED MORE DATA NOW NOW NOW!!!!!)

    I am not disputing the long term advantages of, say, learning some scripting. I’m saying that if there’s always something URGENT on your plate, you never get around to merely important things. If I were a bit further up the foodchain I might be able to follow your logic, but as a postdoc I am simply not given the choice.

  4. Great post! I must admit that I spent the first couple years of my graduate career thinking that good programming was essential to becoming a good biologist. Now that I have that in my back pocket and I can get focus on what is important…the biology!

  5. I strongly support the idea that biologists should learn computer programming which should include, not only scripting languages, but languages such as C, C#, JAVA, Visual Basic and database SQL to name a few. Computers are an inherent part of laboratory instrument control and data analysis which are used every day by biologists and by learning programming they will be more productive directly since it will enable them to understand the tool better (no longer will it be magic) and to perform simple analysis, as was mentioned, in a shorter time frame. For more complex analysis they will be able to describe their request using the language constructs of the IT/Software engineer — that is, communication will be greatly improved in both directions. They also will be able to organize and/or structure data better for computer analysis. Biologists that I have taught Relational Database(RDB) architecture to were excited because they learned a new way to view, organize and convey their data and results without actually building a physical database. It also forced them to always think about data linkages and what data to gather and in what form so that analysis routines would provide them with results from a hierarchy of questions based on results from previous levels. Today’s Biologists should be thinking with a mindset of datamining their electronic notebooks and integrating it with other streams of information and data. Journal articles are littered with results that cannot be compared due to the ambiguities in nomenclature and incomplete experimental/analysis data parameters; therefore, knowing programming may help in this area as well.

  6. And this is why, if ‘biologists’ are not to have to waste all their time and the time of bioinformaticians, we need SHARABLE, RE-USABLE, HIGH-LEVEL workflows that even a non-programming biologist can run and understand. Yes, it needs a programmer to make the first one, but after that, if the tasks ARE repetitive as is suggested, the tasks amount to little more than changing a file name, and if you can manage to submit a PAPER to a journal nowadays you ought to be able to repeat rather complex kinds of workflow given a template (e.g. in Taverna) and share them in myExperiment. THEN the biologist can BOTH do the necessary bioinformatics AND concentrate on the biology with a minimum of disruption. Once hooked they’ll be able, with a little help, to edit and do more complex workflow tasks.

  7. Hey guys,

    I am a biologist and also a lecturer in a computer science department – so it should be obvious that I support the idea that ‘biologists’ learn ‘computing skills’. In fact I have just finished a roadshow visiting ten high schools here in New Zealand telling young biologists that they *need* to keep up there computing, maths and statistics skills as well because biology is fast becoming a computational and statistical science! Maybe I wasn’t clear in my interview. What I want is better software in the biological sciences. The current software is crappy. While I very much support biologists learning computational concepts and skills I don’t want to live in a world where every biologist has to be able to program PERL and Java and C++ just to do their job. That is stupid. Yes, biologists that wish to should upskill in computer science (and maths and statistics!), but no we shouldn’t use that as an excuse to keep writing crappy software :-)


  8. >a href=”https://nsaunders.wordpress.com/2007/08/18/im-a-biologist-jim-not-a-programmer/#comment-13395″>Bill has it right. You either learn this stuff when you’re taking classes as a student, or you’ll never have the chance. The pressure to generate data NOW is too great. Maybe if there were scheduled classes or seminars that were offered at conferences people could justify taking the time, but most of the time it just isn’t seen as a good investment, and those of us whose careers depend on getting publications out and meeting grant milestones have to deal with that reality.

    I’ve wanted our group to bring in a bioinformatician, but so far there’s no enthusiasm.

  9. I think you only need to be good enough at programming to get the analysis done. Knowing object orientated design patterns is useful if you’re creating commercial applications that need to maintained, but is overkill for day-to-day bioinformatics, which is essentially data manipulation. But on the other hand if you don’t know enough programming, this will be hinderance, as it takes long to do what you need to.

    I was lucky enough to do a strong computationally based masters degree, and so have never really had to many problems with programming, mysql, and linux. I would disagree with Bill though about having 30 hours to learn a programming language. I recently took three weeks off to go a data analysis course, which I think was worth it. One of the reasons I’m in academia is because it allows a more relaxed life style compared with business, it’s certainly not for the money. If my job ever became as pressured as business, I would seriously consider going to work for an investment bank, better pay and treatment by your employer.

    As an example, spending a couple of days to learn sql and databases will really pay off in the amount of time saved from manipulating crappy datasets.

Comments are closed.