Boring, monotonous day-to-day tasks? That’s synonymous with bioinformatics.

In response to this question, I can only point out that J.C.R. Licklider figured it out over 50 years ago:

Despite the fact that there is a voluminous literature on thinking and problem solving, including intensive case-history studies of the process of invention, I could find nothing comparable to a time-and-motion-study analysis of the mental work of a person engaged in a scientific or technical enterprise. In the spring and summer of 1957, therefore, I tried to keep track of what one moderately technical person actually did during the hours he regarded as devoted to work. Although I was aware of the inadequacy of the sampling, I served as my own subject.

It soon became apparent that the main thing I did was to keep records, and the project would have become an infinite regress if the keeping of records had been carried through in the detail envisaged in the initial plan. It was not. Nevertheless, I obtained a picture of my activities that gave me pause. Perhaps my spectrum is not typical–I hope it is not, but I fear it is.

About 85 per cent of my “thinking” time was spent getting into a position to think, to make a decision, to learn something I needed to know. Much more time went into finding or obtaining information than into digesting it. Hours went into the plotting of graphs, and other hours into instructing an assistant how to plot. When the graphs were finished, the relations were obvious at once, but the plotting had to be done in order to make them so. At one point, it was necessary to compare six experimental determinations of a function relating speech-intelligibility to speech-to-noise ratio. No two experimenters had used the same definition or measure of speech-to-noise ratio. Several hours of calculating were required to get the data into comparable form. When they were in comparable form, it took only a few seconds to determine what I needed to know.

Throughout the period I examined, in short, my “thinking” time was devoted mainly to activities that were essentially clerical or mechanical: searching, calculating, plotting, transforming, determining the logical or dynamic consequences of a set of assumptions or hypotheses, preparing the way for a decision or an insight. Moreover, my choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.

I give up

It’s what – 10 years or more? – since we began to wonder when web technologies such as RSS, wikis and social bookmarking sites would be widely adopted by most working scientists, to further their productivity.

The email that I received today which began “I’ve read 3 interesting papers” and included 1 .doc, 3 .docx and 4 .pdf files as attachments is indicative of the answer to this question, which is “not any time soon.”

I’ve given up trying to educate colleagues in best practices. Clearly, I’m the one with the problem, since this is completely normal, acceptable behaviour for practically everyone that I’ve ever worked with. Instead, I’m just waiting for them to retire (or die). I reckon most senior scientists (and they’re the ones running the show) are currently aged 45-55. So it’s going to be 10-20 years before things improve.

Until then, I’ll just have to keep deleting your emails. Sorry.

What I learned this week about: productivity, MongoDB and nginx

It was an excellent week. It was also a week in which I payed much less attention than usual to Twitter and FriendFeed. Something to think about…

I’m now using MongoDB so frequently that I’d like the database server to start up on boot and stay running. There are 2 init scripts for Debian/Ubuntu in this Google Groups thread. I followed the instructions in the second post, with minor modifications to the script:

# I like to keep mongodb in /opt
# you don't need 'run' in these options
DAEMON_OPTS='--dbpath /data/db'
# I prefer lower-case name for the PID file
# I think /data/db should be owned by the mongodb user
sudo chown -R mongodb /data/db
# don't forget to make script executable and update-rc.d
sudo /usr/sbin/update-rc.d mongodb defaults

nginx (and Rails and mod_rails)
This week, I deployed my first working Rails application to a remote server. It’s nothing fancy – just a database front-end that some colleagues need to access. I went from no knowledge to deployment in half a day, thanks largely to this article, describing how to make nginx work with Phusion Passenger.

Along the way, I encountered a common problem: how do you serve multiple applications, each living in a subdirectory?

# You have:
# You want these URLs to work:

Here’s how to do that:

# Make sure the Rails public/ directories are owned by the web server user:
sudo chown -R www-data.www-data /var/www/myapp1/public /var/www/myapp2/public
# Make symbolic links from public/ to the location which will be the web server document root
sudo ln -s /var/www/myapp1/public /var/www/rails/app1
sudo ln -s /var/www/myapp2/public /var/www/rails/app2
# These are the key lines in the server section of nginx.conf:
server {
        listen       8080;
        root /var/www/rails;
        passenger_enabled on;
        passenger_base_uri /app1;
        passenger_base_uri /app2;

That’s me for the week.

Data capture versus data archiving

The commonest complaint that I hear whenever electronic lab notebooks (ELNs) or laboratory information management systems (LIMS) are discussed is that it doubles the workload. People who work in labs enjoy the convenience of their paper notebooks. They perform an action or a process occurs – they write a note. A machine generates a photo – they tear it off and paste it in. Transferring that information to a digital archive is a pain: they have to sit down at a computer with their lab book, scan and upload images, enter text into form fields and so on.

I sympathise, absolutely. At present, data capture and data archiving are for most people, disconnected processes. Their only comfort is that smart people are working on these problems. One day, laboratory equipment will emit data in machine-readable format directly to digital archives, lab members will carry PDA-like devices and note-taking as we know it will become a relic of the past. That day is some way off, but it will come.

As to why they should invest time in archiving data – just answer these questions:

  • Is your paper notebook searchable?
  • Can other people use old records from your paper notebook to do anything practical?
  • For that matter, can you?
  • Imagine that you have just moved to a new lab and none of your predecessors, now moved on and not contactable, left any record of their activity – how would you feel?

These are questions for individuals, but also I feel for a training system (academia) that encourages individual prowess over community spirit.

Bioinformaticians in the service of bench biologists

Stumbled out of bed to the feed reader and came close to spraying cereal over the screen when I read this exchange on a Nature Network blog:

Original post:

Like them or loathe them, it’s not really possible to analyze a genome-wide screen without a large number of [Excel spreadsheets]

Comment #1 from our Pierre:

Oh please, please, please, no, don’t that with excel, please

He’s quite right, of course. Unfortunately, the ensuing debate is heading down a familiar track: “that’s all very well for you hardcore computer types, but we’re just simple bench biologists”.

Well look – a lot of us “computer types” were, or are, bench biologists too. We weren’t born with magical computer skills, nor did we learn them overnight. We know what we know and recommend it to others not out of geekiness or snobbery, but because we believe that if there’s a better way to perform a task, we owe it to ourselves to learn it. If others can’t make that commitment, we’re more than happy to help out and share what we’ve learned.

Just be prepared to meet us half-way, OK?


In yet another moment of BBGM synchronicity, I started to think about lifestreaming and its applications as Deepak wrote about it. My inspiration was the recent article 35 ways to stream your life.

I’ve tried (and you can find me at):

  • Mugshot – aggregates a limited number of sources, doesn’t seem to update properly from, has conversation features (quips, comments)
  • FriendFeed – nice look and feel, a limited number of sources, has conversation features (comments, ratings)
  • Profilactic – by far my favourite in terms of look/feel and sources (you can add anything that has a feed) but no conversations as yet

Lifestreams are fun. I don’t expect anyone to care about what I just played on (and likewise), but these are all ways of broadcasting yourself and making connections. Read Deepak’s post for some thoughts on how this might apply to science.

Here’s a crazy idea – the workstream:

  • Neil parsed SwissProt entry Q38897 using parser script
  • Bob calculated all intersubunit contacts in PDB entry 2jdq using CCP4 package contact


Can every workflow be automated?

Some random thoughts for a Friday afternoon.

Many excellent posts by Deepak on the topic of workflows have got me thinking about the subject. I very much like the notion that all analysis in computational biology should be automated and repeatable, so far as is practicable. However, I’ve not yet experienced a “workflow epiphany”. There are some impressive and interesting projects around, notably Taverna and myExperiment, but I see these as prototypes and testbeds for how the future might look, rather than polished solutions usable by the “average researcher”.

I also can never quite escape the feeling that this type of workflow doesn’t describe how many researchers go about their business, at least in academia. Wrong directions, dead ends, trial and error, bad decisions. To me a workflow is rather like a scientific paper: an artificial summary of your work that you put together at the end, describing an imaginary path from starting point to destination that you couldn’t know you were going to follow when you set out. Useful for others who want to follow the same path, less so for the person blazing the trail. Is this in fact the primary purpose of a workflow? To allow others to follow the same path, rather than to plan your own?

I wonder in particular about operations where manual intervention and decision making is required. In structural biology for instance, I often see my coworkers doing something like this:

  • Open experimental data (e.g. electron density) in a GUI-based application
  • “Fiddle” with it until it “looks right”
  • Save output

How do you automate that middle step? It may be that the operation is described using parameters which can be saved and run again later, but a lot of science seems to rely on a human decision as to whether something is “sensible”.

I don’t know if we can capture everything that we do in a form that a machine can run. Perhaps workflows highlight to us the difference between research versus analysis; a creative thought process versus a set of algorithms.

Think Free – then think again

ThinkFree is an online office productivity application in the same style as Google Docs and Zoho. I read a favourable review and was all set to try it out…

…when I discovered that for those of us in Australia or New Zealand, ThinkFree is distributed exclusively via BigPond – the internet service provided by Telstra, our largest telco. Which means: not a BigPond customer, no access to ThinkFree.

We have a saying in Australia: all telcos are bastards. Now you know why.

When it’s obvious to you but not to…anyone else

Do you know this feeling? You’ve been trialling a software package or online service for years. You think it’s great, so do your online community friends and you finally decide to share the love with your work colleagues. As soon as you do so, they discover a usage issue that you’ve never even thought about. It completely ruins the experience for them and makes your beloved application look like a piece of crap.

This keeps happening to me with Google and a large part of the problem concerns email addresses and Google accounts.
Read the rest…

OpenOffice Google Docs extension

More reasons to use both Google Docs and OpenOffice: the OpenOffice.org2GoogleDocs extension. Allows you to upload from OO to Google Docs or download from Google Docs to OO. Requires that your JRE is set up correctly in OO and is Java 6.

Without realising it I’ve become a big user of Google Docs, mostly by importing from GMail attachments. On rare occasions I find the 500 KB maximum file size to be a problem – I’d suggest 1 MB is a more sensible maximum – but it does encourage people to keep embedded content out of their text documents.

On the topic of word processor alternatives, AbiWord has come a long way and now boasts a seriously-useful plugin list.