A new twist on the identifier mapping problem

Yesterday, Deepak wrote about BridgeDB, a software package to deal with the “identifier mapping problem”. Put simply, biologists can name a biological entity in any way that they like, leading to multiple names for the same object. Easily solved, you might think, by choosing one identifier and sticking to it, but that’s apparently way too much of a challenge.

However, there are times when this situation is forced upon us. Consider this code snippet, which uses the Bioconductor package GEOquery via the RSRuby library to retrieve a sample from the GEO database:

require "rubygems"
require "rsruby"

if ENV['R_HOME'].nil?
  ENV['R_HOME'] = "/usr/lib/R"

r = RSRuby.instance
sample = r.getGEO("GSM434143")
table  = r.Table(sample)
keys   = table.keys
puts keys

All good so far. What if I try to save the data table, which contains entries such as { “DETECTION.P.VALUE” => “0.000146581” }, to my new favourite database, MongoDB?

key must not contain '.'

So what am I to do, other than modify the key using something like:

newkey = key.gsub(/\./, "_")

Voilà, my own personal contribution to the identifier mapping problem.

What’s the solution? Here are some options – rank them in order of silliness if you like:

  • Biological databases should avoid potentially “troublesome” keys
  • Database designers should allow any symbols in keys
  • Database driver writers should include methods to check keys and alter them if necessary
  • End users should create their own maps by storing the original key with the modified version

2 thoughts on “A new twist on the identifier mapping problem

  1. “Biological databases should avoid potentially “troublesome” keys”
    Good point – but hard to enforce.
    I was parsing some blast against swissprot output a while ago… some gene descriptions contained characters including “\t” “#” as well as quotes. Yipeee!

  2. Hi there,

    I couldn’t find a “contact us” page (well, there is one, but it is not working), so I am writing to you this massage here:

    I run the service R-bloggers:
    An aggregator of R related articles, from blogs.

    And wanted to encourage you to join R-bloggers.com at:

    The idea behind the project is to share readers in order to gain readers: R-bloggers already has over 800 RSS subscribers (that are growing everyday).

    I built it in order to find all the R bloggers out there. So far I found over 45 bloggers, which also agreed to add there feed (and some to give a link back and post about it).

    And would love it if you might agree to join as well.

    Feel free to erase this comment if it clutters the blog too much.

    All the best,

Comments are closed.