Experiments and structured data
March 26, 2008 — nsaundersI’m going to be lazy and point you to some interesting discussion over at Cameron’s blog on the use of structured data to describe experiments: part 1; part 2; part 3.
My experience of discussing electronic lab notebooks, which is mostly from biochemistry/molecular biology labs, is that many biologists are quite resistant to the idea of structured data. I think one reason that the paper notebook persists is that people like free-form notes. You may believe that a lab notebook is a highly-ordered record of experiments but trust me, it’s not uncommon to see notes such as “Bollocks! Failed again! I’m so sick of this purification…” scrawled in the margins.
My take on the problem is that biologists spend a lot of time generating, analysing and presenting data, but they don’t spend much time thinking about the nature of their data. When people bring me data for analysis I ask questions such as: what kind of data is this? ASCII text? Binary images? Is it delimited? Can we use primary keys? Not surprisingly this is usually met with blank stares, followed by “well…I ran a gel…”.
I do believe that any experiment can be described in a structured fashion, if researchers can be convinced to think generically about their work, rather than about the specifics of their own experiments. All experiments share common features such as: (1) a date/time when they were performed; (2) an aim (”generate PCR product”, “run crystal screen for protein X”); (3) the use of protocols and instruments; (4) a result (correct size band on a gel, crystals in well plate A2). The only free-form part is the interpretation. Is the result good, bad, expected? What to do next? My simplistic view is that an XML element named “notes” of data type “string” covers anything free-form that somebody might want to say about their experiment. Now we just have to design the schema, build a nice forms-based web interface and force everyone in the lab to use it
One more point: we need to teach students that every activity leading to a result is an experiment. From my time as a Ph.D. student in the wet lab, I remember feeling as though my day-to-day activities: PCR reactions, purifications, cloning weren’t really experiments - they were just means to an end. Experiments were clever, one-shot procedures performed by brilliant postdocs to answer big questions. When I started to view each step: obtaining the right size PCR product, sequencing it, ligation, transformation, plasmid purification etc. as an experiment in its own right, with a defined goal, I felt a lot better about myself. Break your activities into steps and ways to describe them as structured data should suggest themselves.


March 26, 2008 at 4:57 pm
I thought you were just pointing to Cameron’s posts :).
But seriously, this one and Cameron’s posts are very very good. Can’t wait to spend some time taking them in.
In my day job we face these challenges all the time from data being generating by the gigabytes of all types. By and large things are structured and usually process driven, but their is a lot of variation, leading to all kinds of challenges in software design.
March 26, 2008 at 6:01 pm
A bit of a sidenote, but the New Yorker recently had a fascinating artilce titled The Checklist which describes how hospitals have slowly structured their thousands of steps for the hundreds of routine procedures that must occur for even the simplest medical condition into pragmatic checklists of tasks. By codifying this seemingly trivial activity, enormous operational improvements were obtained. Draw lessons about structured steps in experiments if you will, but be glad that someone will not be dying if you make a mistake.
March 27, 2008 at 2:08 am
[...] Neil [Saunder] My take on the problem is that biologists spend a lot of time generating, analysing and [...]
March 27, 2008 at 2:49 am
I don’t see the point of eletronic lab notebooks either and I am a computer scientist. My bench is miles away from my computer and while pipetting I need to look at my notes. It’s easier to work with a pipette in one hand and a pen in the other hand than with a whole big keyboard / screen around my bench, where there really isn’t enough space for all of this anyways.
March 27, 2008 at 5:24 am
Just FYI, the suggestions made are similar to work being done by NIAID’s http://www.immport.org and by the MIBBI community (http://mibbi.sourceforge.net/) with respect to the checklists.
March 27, 2008 at 11:48 am
@max - I thought you were a computer scientist; what are you doing with a pipette?
I take your point - practically, it’s much easier to record at the bench in a paper notebook. Have a look at Cameron’s blog for some ideas about how in the future, a lot of this recording could be done by the machines in the lab.
A lot of this discussion is based on open notebook science, or at least shared notebook science. If you’re interested in sharing raw experimental data: with the group, colleagues or the outside world, electronic is the only option.
March 27, 2008 at 11:42 pm
[...] A data model for life-science experiments; FuGE Posted on March 27, 2008 by peanutbutter This post may be one in a series of responses to Cameron’s post on “Proposing a data model for Open Notebooks“. When I originally read this post I commented on the fact that a data model for experiments actually exists and that he may get some mileage out of it rather than starting from scratch and re-creating the wheel. Several discussions have followed on from this original post and Neil has picked up on it as well, with sentiments that I agree with. [...]
March 30, 2008 at 10:16 pm
[...] the discussion that kicked off here and has continued here [1, 2, 3, 4] and in other places [1, 2] along the way. Frank’s exposition on using FuGE as a data model is very clear in what it [...]
March 31, 2008 at 7:57 pm
True, but it’s much easier to define a model after you have all the data/have finished the experiment (which is the point at which it’s getting passed to the bioinformatician, or you’re repeating the experiment several times). To define all the fields beforehand is really difficult.