Briefings in Bioinformatics: computational proteomics issue

Every so often, a new issue of the under-rated Briefings in Bioinformatics appears in my feed reader.

The latest is a special issue on computational proteomics. High-throughput proteomics is all the rage in academic, clinical and industrial settings just now, so this is well worth a read.

Bioinformaticians looking for ways to help out with the management and analysis of proteomic data should look in particular at:

ScienceRoll Search

Unhappy with PubMed or the other biomedical search engines?

Bertalan has created ScienceRoll Search, described in his blog post.

I just gave it a quick run and it looks rather impressive. Give it a go and let him know what you think.

Bioinformaticians in the service of bench biologists

Stumbled out of bed to the feed reader and came close to spraying cereal over the screen when I read this exchange on a Nature Network blog:

Original post:

Like them or loathe them, it’s not really possible to analyze a genome-wide screen without a large number of [Excel spreadsheets]

Comment #1 from our Pierre:

Oh please, please, please, no, don’t that with excel, please

He’s quite right, of course. Unfortunately, the ensuing debate is heading down a familiar track: “that’s all very well for you hardcore computer types, but we’re just simple bench biologists”.

Well look - a lot of us “computer types” were, or are, bench biologists too. We weren’t born with magical computer skills, nor did we learn them overnight. We know what we know and recommend it to others not out of geekiness or snobbery, but because we believe that if there’s a better way to perform a task, we owe it to ourselves to learn it. If others can’t make that commitment, we’re more than happy to help out and share what we’ve learned.

Just be prepared to meet us half-way, OK?

Experiments and structured data

I’m going to be lazy and point you to some interesting discussion over at Cameron’s blog on the use of structured data to describe experiments: part 1; part 2; part 3.

My experience of discussing electronic lab notebooks, which is mostly from biochemistry/molecular biology labs, is that many biologists are quite resistant to the idea of structured data. I think one reason that the paper notebook persists is that people like free-form notes. You may believe that a lab notebook is a highly-ordered record of experiments but trust me, it’s not uncommon to see notes such as “Bollocks! Failed again! I’m so sick of this purification…” scrawled in the margins.

My take on the problem is that biologists spend a lot of time generating, analysing and presenting data, but they don’t spend much time thinking about the nature of their data. When people bring me data for analysis I ask questions such as: what kind of data is this? ASCII text? Binary images? Is it delimited? Can we use primary keys? Not surprisingly this is usually met with blank stares, followed by “well…I ran a gel…”.

I do believe that any experiment can be described in a structured fashion, if researchers can be convinced to think generically about their work, rather than about the specifics of their own experiments. All experiments share common features such as: (1) a date/time when they were performed; (2) an aim (”generate PCR product”, “run crystal screen for protein X”); (3) the use of protocols and instruments; (4) a result (correct size band on a gel, crystals in well plate A2). The only free-form part is the interpretation. Is the result good, bad, expected? What to do next? My simplistic view is that an XML element named “notes” of data type “string” covers anything free-form that somebody might want to say about their experiment. Now we just have to design the schema, build a nice forms-based web interface and force everyone in the lab to use it :)

One more point: we need to teach students that every activity leading to a result is an experiment. From my time as a Ph.D. student in the wet lab, I remember feeling as though my day-to-day activities: PCR reactions, purifications, cloning weren’t really experiments - they were just means to an end. Experiments were clever, one-shot procedures performed by brilliant postdocs to answer big questions. When I started to view each step: obtaining the right size PCR product, sequencing it, ligation, transformation, plasmid purification etc. as an experiment in its own right, with a defined goal, I felt a lot better about myself. Break your activities into steps and ways to describe them as structured data should suggest themselves.

My first greasemonkey script

Seems almost compulsory for web2.0 enthusiasts to write a brief greasemonkey article these days!

Here’s my attempt. Nothing whatsoever to do with bioinformatics; instead, this one resizes Flickr images on Profilactic mashup pages, such as this one. My aim is just to convince you that greasemonkey development is quite easy, even for JavaScript novices like myself.
Read the rest…

More social web snippets

Busy. No time for real posts. Brief updates:

  • Attila is set to resume the great live thesis online experiment
  • I have succumbed to Twitter, woe is me
  • On a related note, Firefox extension Shareaholic is a nice idea, if a bit rough round the edges just now

Evolution of an idea

It’s great to sit back and watch ideas and software unfold.

Just over a year ago, Euan asked whether anyone was employing AJAX in graphical genome browsers. The old-style “reload on refresh” browsers (UCSC, Gbrowse, Ensembl) were starting to look a bit Web 1.0.

This sparked plenty of discussion, including a pointer to X:Map: a very nice alternative view of Ensembl data using the Google Maps API (update: and of course ajax-ification of Gbrowse).

Jump forward to today and thanks to Euan’s del.icio.us feed via FriendFeed, I discover Genome Projector, which takes the zoom-able Google Maps idea to a new level.

And that’s how social networks let you discover stuff. Brilliant.

Published

It’s online, so I guess I can tell you about:

Lonic, A. Barry, E.F., Quach, C., Kobe, B., Saunders, N.F.W. and Guthridge, M.A. (200 8)
FGFR2 phosphorylation on Serine 779 couples to 14-3-3 and regulates cell survival and proliferation.
Mol. Cell. Biol. (ahead of print); DOI:10.1128/MCB.01837-07 [Abstract] | [Manuscript]

A minor contribution from me: they asked which kinases might phosphorylate S779, I gave them a list (using a tool that may see the light of day eventually), they showed that activation of a candidate kinase leads to increased phosphorylation. That would rate an acknowledgement from some people, but these guys were kind enough to add our names to the paper.

Just another scene from the life of the “go-to” bioinformatician.

Rewards, output and academia

Academia takes a very narrow view of what constitutes “output”. Rewards (such as funding or tenure) are given out on the basis of (1) publications, preferably first-author, preferably in so-called high-impact journals; (2) citations, in the same journals and (3) previous rewards - “demonstrated ability in securing funding”. I always find that last catch-22 clause particularly amusing.

I started to think about this when I read What is principal component analysis? [DOI 10.1038/nbt0308-303], in the current issue of Nature Biotechnology (subscription only). Now, I’m not criticising the article or its publication: it’s well-written, educational and a good basic overview of PCA for biologists who have not previously encountered the method. However, my first reaction was to recall a number of excellent blog posts on the same topic that I’ve read recently. For example:

The Nature Biotechnology article is recognised by academia and qualifies for academic rewards. The blog posts - which are longer, more detailed, written by enthusiastic communicators and in theory, accessible to a much wider audience (as opposed to people with a subscription to Nature Biotechnology) - are not.

It doesn’t seem right to me. How does your institution evaluate and reward “non-traditional” output?

Lifestreaming

In yet another moment of BBGM synchronicity, I started to think about lifestreaming and its applications as Deepak wrote about it. My inspiration was the recent article 35 ways to stream your life.

I’ve tried (and you can find me at):

  • Mugshot - aggregates a limited number of sources, doesn’t seem to update properly from del.icio.us, has conversation features (quips, comments)
  • FriendFeed - nice look and feel, a limited number of sources, has conversation features (comments, ratings)
  • Profilactic - by far my favourite in terms of look/feel and sources (you can add anything that has a feed) but no conversations as yet

Lifestreams are fun. I don’t expect anyone to care about what I just played on last.fm (and likewise), but these are all ways of broadcasting yourself and making connections. Read Deepak’s post for some thoughts on how this might apply to science.

Here’s a crazy idea - the workstream:

  • Neil parsed SwissProt entry Q38897 using parser script swiss2features.pl
  • Bob calculated all intersubunit contacts in PDB entry 2jdq using CCP4 package contact

No?