Visionaries, followers and fools

Others have written about PRISM; my personal favourite is Bora’s post (which inspired my title). Go to their site, have a laugh then add your voice to the derision.

I still recall the excitement of publishing my first journal article. I also recall the confusion that I felt when I was sent an agreement by email, telling me that I was signing away all rights to my article. “This can’t be right”, I thought, “I generated the data, I did the work, I wrote the words and it’s not mine?” Ultimately, your peers convince you to accept this as quite normal practice and move on. It’s not normal though, is it?

Data modification and lawyers

Science in the open is running an interesting series of posts on how to implement an open e-notebook. The latest, blogs vs wikis and third party timestamps, raises some interesting issues regarding the modification of e-notebook posts and legal validation. For instance:

I do wonder whether from a legal perspective at least that an in house time stamp in a well regulated and transparent system might be as good (or no worse than) a signed and dated paper notebook.

I’ve never been able to understand why ink on paper is held in such high regard as a measure of validity and honesty. Back when I was writing up my Ph.D. thesis, people were just beginning to incorporate scanned images and there was much debate in universities as to whether this was acceptable. I would point out to anyone who would listen that it doesn’t take a computer to fake data – it takes malicious intent. There was nothing to stop me making up a bunch of values for an enzyme assay and plotting them out with a pencil on squared graph paper (remember that?) if I so desired. Typing the numbers into a spreadsheet just makes the fabrication less arduous. So where’s the difference between my signed and dated piece of graph paper and my Excel chart?

Bioinformaticians are used to the idea that data changes all the time. Genome sequences are reannotated, software is continually revised, docking parameters are optimised. As the original blog post points out, modifications to data records shouldn’t be a problem. What’s required is transparent, third-party tracking of modifications. This is surely easy enough to achieve by hard-coding it into the ELN application – then if questions are raised, the person in the lab with any coding skills will be the only natural “suspect” ;)

Admittedly I have never worked on research that required lawyers, so I’m ignorant of how people reach decisions about what constitutes a legally-binding document (though my suspicion is that they just make it up). Just another incentive to adopt completely open research. When everyone has their raw data and methods hanging out there for all to see, we can dispense with the legal fees!

Add this to your “to read” list

Deepak’s post on using the web to harness scientific information struck a chord and also directed me to the August edition of CTWatch Quarterly; the CT stands for Cyberinfrastructure Technology. It contains an excellent set of articles under the heading “The Coming Revolution in Scholarly Communications & Cyberinfrastructure” by authors including Timo Hannay from NPG and Philip Bourne of UCSD and PLoS.

I especially like the opening quote from Cyberinfrastructure For Knowledge Sharing by John Wilbanks from Science Commons:

Infrastructure never gets adequately funded because it cuts across disciplinary boundaries, it doesn’t benefit particular groups. Infrastructure is a prerequisite to great leaps forward and is thus never captured within disciplinary funding, or normal governmental operations. We need to revise radically our conception of cyberinfrastructure. It isn’t just a set of tubes through which bytes flow, it is a set of structures that network different areas of knowledge…and that is software and social engineering, not fiber optic cable. The superhighways of the biological information age should not be understood as simply physical data roads, long ropes of fiber and glass. They need to be structures of knowledge. The Eisenhower Freeways of Biological Knowledge are yet to be built. But that doesn’t mean the task isn’t worth starting.

There’s a PDF (7.6 MB) available too.

SciVee: first impressions

I’ve been busy the past few days and not paying full attention to the information stream from the web, but I kept seeing this word SciVee.

I can’t summarise it any better than Deepak: “SciVee provides scientists, especially authors with a platform to essentially set up video podcasts, or as they call them, PubCasts, around a publication.” It’s built in partnership with PLoS, amongst others and provides channels for several PLoS journals.

Just got around to watching the featured pubcast, Structural Evolution of the Protein Kinase–Like Superfamily and I’m really impressed with the concept. SciVee is also discussed by Bora, John and Frank and is tagged at and technorati.

Of course, some scientists might be a bit shy about appearing in a video. This conjures up amusing images of young, photogenic Ph.D. students hiding from the PI with a newfound enthusiasm for videocasts.

Zotero word processor integration

Zotero have quietly released plugins for integration with OpenOffice, NeoOffice and the other commonly-used word processor.

There’s not much documentation yet. Here’s what I did for OpenOffice, Ubuntu/Feisty:

  1. Download and unzip the extension
  2. In OpenOffice go to Tools->Extension Manager->Add
  3. Locate the Zotero.oxt file and install
  4. Restart OpenOffice

You should now see a new menu like this:

Clicking on “insert citation” brings up your Zotero database. It’s a bit rough and ready just now with limited options, but definitely an exciting development.

Technorati Tags: ,

University-hosted blogs

This is what I love about the web – the way you start off “here” and end up “there”.

Up pops an interesting headline in my Google Reader – A Great Example of Fun, from Science in the Open over at openwetware. I follow the links to this post on a fun GFP experiment, at Life of a Lab Rat.

Then I look at the URL. Sydney University? Follow it back and I find myself at Blogs dot Usyd. “Blogs dot Usyd is a showcase for blogs that support the research and other projects of University staff.”

Good on them. There I was thinking that Australian academia was way too backward for this science 2.0 stuff :)

Technorati Tags: ,

Bioinformaticians, take heart

It’s difficult for bioinformaticians to publish in so-called “high impact” journals at all, never mind as first author. Many of us are not group leaders with tightly-defined research programs; we are the “go to” guys, happy to apply our skills to any dataset that comes our way. In academia at least, we’re caught somewhere between research scientist and IT support. It can be a frustrating life.

So it’s good to see that Nature, a journal not renowned for publishing articles with a strong computational biology component, has seen fit to publish this:

Structure-based activity prediction for an enzyme of unknown function
Hermann, JC et al. (2007)
Nature 448: 775-779.
Abstract | Full text | N & V

It’s a fascinating piece of work. The authors started with an enzyme of unknown activity from Thermotoga maritima, a thermophilic bacterium. Initial structure/sequence analysis suggested a superfamily for the enzyme. They then compiled a list of ~ 4 000 potential substrates, modelled tetrahedral intermediates that could resemble the transition state for each one and performed docking simulations of each model with a model of the enzyme active site. This generated 4 candidate substrates, 3 of which were confirmed by biochemical assay. For a finale, they determined a crystal structure for the enzyme + bound product which agreed closely with the model and identified a new metabolic pathway for orthologues of the enzyme in other genomes.

There are a few criticisms of the approach. This is a case where modelling was a good approximation to reality, but there are plenty of cases where that isn’t true. It also relies on quite a lot of prior knowledge concerning sequence/structure and metabolism, which isn’t available for many uncharacterised proteins. And it’s currently quite computationally expensive, so not exactly high-throughput, although that will change of course. Still, if you enjoy genome-scale bioinformatics and structural biology, it’s almost enough to make you drool.

Technorati Tags: , , ,