Some random thoughts for a Friday afternoon.
Many excellent posts by Deepak on the topic of workflows have got me thinking about the subject. I very much like the notion that all analysis in computational biology should be automated and repeatable, so far as is practicable. However, I’ve not yet experienced a “workflow epiphany”. There are some impressive and interesting projects around, notably Taverna and myExperiment, but I see these as prototypes and testbeds for how the future might look, rather than polished solutions usable by the “average researcher”.
I also can never quite escape the feeling that this type of workflow doesn’t describe how many researchers go about their business, at least in academia. Wrong directions, dead ends, trial and error, bad decisions. To me a workflow is rather like a scientific paper: an artificial summary of your work that you put together at the end, describing an imaginary path from starting point to destination that you couldn’t know you were going to follow when you set out. Useful for others who want to follow the same path, less so for the person blazing the trail. Is this in fact the primary purpose of a workflow? To allow others to follow the same path, rather than to plan your own?
I wonder in particular about operations where manual intervention and decision making is required. In structural biology for instance, I often see my coworkers doing something like this:
- Open experimental data (e.g. electron density) in a GUI-based application
- “Fiddle” with it until it “looks right”
- Save output
How do you automate that middle step? It may be that the operation is described using parameters which can be saved and run again later, but a lot of science seems to rely on a human decision as to whether something is “sensible”.
I don’t know if we can capture everything that we do in a form that a machine can run. Perhaps workflows highlight to us the difference between research versus analysis; a creative thought process versus a set of algorithms.