I spent some of the weekend thinking about tags and how they should be used in bioinformatics. I see that Deepak did the same.
I’m an avid Flickr user and a Flickr feature that I really like is geotagging: the addition of tags that describe geographic coordinates to an image. Why is this useful? Well, the point of tagging is to make it easier for people to find what they are looking for. The problem is that tags can be arbitrary. For instance if I tag an image of Uluru I might use tags such as “uluru”, “ayers rock”, “central australia”, “red centre” and so on. Other people will only find images of the same region if they search with one or all of those arbitrary tags.
On the other hand, the latitude and longitude of Uluru are fixed physical properties (disregarding continental drift). I can use them to locate Uluru on a map, then say “show me all the images in this location”. Tagging with a non-arbitrary physical property on which everyone agrees gets me straight to the relevant information.
Unfortunately at present most tagging, scientific and otherwise, falls into the first rather than the second category. What we really need are agreed standards for tagging scientific data. There’s a lot of discussion about controlled vocabularies and the semantic web, but I wonder if physical properties could be used effectively as bioinformatic tags. For instance if I’m interested in a particular set of proteins, I might search tags for properties such as length, pI, molecular weight and hydrophobicity values, as well as functional descriptions such as “DNA-binding protein”.