Monthly Archives: September 2007

Random conference thoughts

When my mind wanders during a conference talk, I often find that short sentences summarising how I feel about work come into my head. Here’s what I scribbled in my notepad during the ComBio meeting this week:

Information relevant to me is communal, not owned by individuals
I wrote that down when thinking about how biologists interact (or not) at meetings. As a computational biologist, most of my day-to-day problems are programming and software issues. If I need information, I go straight to the Web. However, wet-lab biologists seem to get much more of their information by talking to other biologists. If you’re interested in an organism, a model system, a laboratory technique or if you just want to get your hands on a plasmid, you talk to someone who works on it. It strikes me that a lot of wet-lab group leaders claim some sort of ownership over the information that their lab generates, resulting in the “so-and-so is the world expert in system X, you should talk to him” mentality. On the other hand, the idea of schmoozing with “the Perl expert” is a tad silly.
That, at least, is my excuse for not networking much at biological conferences ;)

Bioinformaticians need to be free
We (or at least, I) are happiest when working on a range of problems. A main project and a bunch of fun, side projects with plenty of variety is the key to a happy bioinformatician. Conversely, getting bogged down for months or years on a single project, particularly one on which you work largely alone with little external input makes for a sad bioinformatician.
Much has been written about Google’s 20% time, where employees are encouraged to spend 20% of their time on projects that they think are fun, cool and interesting. I think this would be a great policy to implement for bioinformaticians, computational biologists and other researchers in academia.

ComBio: day 4 and wrap-up

Notes and hyperlinks from day 4, the last day of ComBio 2007.

On the whole, I found ComBio rather disappointing this year. It seemed much smaller than last year with far less of interest to me. There were perhaps four really great plenary talks and many of the sessions bore little relation to their titles. ComBio is supposed to be wide-ranging, but I felt that the balance between range and depth was wrong. Hopefully I can attend a more relevant bioinformatics/computational biology meeting in the near-future.
On the plus side, I got to spend a week in Sydney.
Talk notes

On the road: ComBio 2007

Heading off to Sydney today for ComBio 2007. It’s the annual meeting of the Australian Society for Biochemistry and Molecular Biology and usually features a wide-ranging program including some bioinformatics, “-omics” and structural biology. I’m looking forward to a couple of talks from overseas guest Peer Bork this year. On a personal note, Sydney was home for 6 years, so I’m looking forward to spending time there.

Previous experience has shown that freely-available internet access of any kind is unlikely at ComBio (despite being held in a convention centre with great wireless facilities), so blogging may be sporadic. Furthermore, as the meeting is in the heart of the Sydney CBD, we are informed that “accordingly, lunches do not need to be provided”. No wonder tourists complain that Sydney is an expensive place. I’m beginning to wonder exactly what’s covered by the registration fee!

Where N is an arbitrarily large fraction approaching one

What’s N? It’s the fraction of time that bioinformaticians spend obtaining, formatting and getting raw data ready to use, as opposed to analysing it.

There’ll be a longer post on this topic soon. Suffice to say, I’ve spent the last month evaluating the performance of 5 predictive tools that are available on the web. To do this, a test dataset of 200 or so sequences had to be submitted to each one. Each tool generates a score for particular residues in the sequence. The final output, which is what I require to do some statistical analysis, looks something like this:

P08153  114     method     61.74   0
P08153  522     method     82.10   1

where we have a sequence UniProt accession number, a sequence position, the name of the tool used (method), a score and either 1 (a positive instance) or 0 (a negative instance).

Doesn’t look too hard, does it? Except that:

  • None of the web servers provide web services or APIs
  • None of them provide standalone software for download
  • Most of them don’t generate easily-parsed output (delimited plain text)
  • Most of them have limited batch upload and processing capabilities

The solution, as always, is to hack together several hundred lines of poorly-written Perl (using HTML::Form in my case) to send each sequence to the server, grab the HTML that comes back, parse it and write out text files in the format shown above.

That’s 3-4 weeks and 500 lines of throwaway code just to get the raw data in the right state for analysis

When I started out in bioinformatics, I used to joke that at least 50% of my time was spent just obtaining raw data and formatting files. Over the years, I’ve revised my estimate. It’s currently at around 80-90% and I’m not sure that it’s still a joke.

Why is this trend in the wrong direction? When does it become untenable? I’m starting to think that my job title should be “data munger”, not “research officer”. I wouldn’t mind if data munging was perceived as a skill in academia but when funding is results-based, it will only ever be seen as the means to an end. Which it is, of course.

Google Docs now with Presentations

The title of the post says it all, really. The news is slowly making its way through the tech blogs.

I just tried uploading a file saved as Powerpoint (ugh) from OpenOffice. First attempt – server error. Second attempt – success. On the whole, the import preserved formatting pretty well, except for some text formatting (spacing changes, tab stops vanish). Two attempts to display the online slideshow have resulted in a black window with no slides. Oh, and you can only save as zipped HTML.

Early days – let’s hope they iron out the bugs and introduce OpenOffice import/export soon.

The OA debate reaches my workplace

It’s good to know that your workplace is (relatively) progressive:

“Open Access and the Changing Nature of Scholarly Communication”

Forum to be held on Monday 17 September 2007 12.30pm to 5pm
Seminar Room, Level 4, Sustainable Minerals Institute, Sir James Foots Building
The University of Queensland St Lucia campus

Here’s the program, if you happen to be in Brisbane that day.