A recent question at BioStar asked “Is the NAR database list available in a computer readable format?” The short answer is “no” and Pierre has done some excellent preliminary work to address the issue.
I’ve been working on a database and web application to check the associated URLs but quite frankly, this is tedious, a waste of everyone’s time and could be entirely avoided if the publishing industry did a better job. All that’s required is that either NAR or PubMed provide structured data – XML, Medline format, I don’t care what – containing a field that looks something like this:
That way, we could all avoid writing regular expressions to detect URLs in abstracts. No wait – to detect broken URLs in abstracts. You would not believe how many of them look like this:
URL http://www.amaze.ulb. ac.be/ ^
Someone helpfully informed me via Twitter that this is “often a result of typesetting.” Thanks for that.