Tag Archives: pubmed

Monitoring PubMed retractions: updates

chart

PubMed cumulative retractions 1977-present

There’s been a recent flurry of interest in retractions. See for example: Scientific Retractions: A Growth Industry?; summarised also by GenomeWeb in Take That Back; articles in the WSJ and the Pharmalot blog; and academic articles in the Journal of Medical Ethics and Infection & Immunity.

Several of these sources cite data from my humble web application, PMRetract. So now seems like a good time to mention that:

  • The application is still going strong and is updated regularly
  • I’ve added a few enhancements to the UI; you can follow development at GitHub
  • I’ve also added a long-overdue about page with some extra information, including the fact that I wrote it :)

Now I just need to fix up my Git repositories. Currently there’s one which pushes to GitHub and a second, with a copy of the Sinatra code for pushing to Heroku, which isn’t too smart.

Monitoring PubMed retractions: a Heroku-hosted Sinatra application

In a previous post analysing retractions from PubMed, I wrote:

It strikes me that it would be relatively easy to build a web application (Rails, Heroku), which constantly monitors retraction data at PubMed and generates a variety of statistics and charts.

“Relatively easy” it was. Let me introduce you to PMRetract, my first publicly-available web application.
Read the rest…

Analysis of retractions in PubMed

As so often happens these days, a brief post at FriendFeed got me thinking about data analysis. Entitled “So how many retractions are there every year, anyway?”, the post links to this article at Retraction Watch. It discusses ways to estimate the number of retractions and in particular, a recent article in the Journal of Medical Ethics (subscription only, sorry) which addresses the issue.

As Christina pointed out in a comment at Retraction Watch, there are thousands of scientific journals of which PubMed indexes only a fraction. However, PubMed is relatively easy to analyse using a little Ruby and R. So, here we go…
Read the rest…

Findings increasingly novel, scientists say…

…was the tongue-in-cheek title of an image that I posted to Twitpic this week. It shows the usage of the word “novel” in PubMed article titles over time. As someone correctly pointed out at FriendFeed, it needs to be corrected for total publications per year.

It was inspired by a couple of items that caught my attention. First, a question at BioStar with the self-explanatory title Locations of plots of quantities of publicly available biological data. Second, an item at FriendFeed musing on the (over?) use of the word “insight” in scientific publications.

I’m sure that quite recently, I’ve read a letter to a journal which analysed the use of phrases such as “novel insights” in articles over time, but it’s currently eluding my search skills. So here’s my simple roll-your-own approach, using a little Ruby and R.
Read the rest…

DokuWiki, PubMed and Ruby

I recently built a wiki for a research group using DokuWiki, one of my favourite wiki packages. As with many other wikis, developers have extended its functionality by writing plugins. Some of these are excellent, allowing users to generate lots of content with a minimum of syntax. For example, using the PubMed plugin, you type this:

{{pubmed>long:15595725}}

and the result is this:
pubmed

Which got me thinking. Assuming that you’ve searched PubMed and retrieved a bunch of references in XML format, how might you generate text in DokuWiki syntax, to paste into your wiki? Here’s the small parser that I wrote in ruby:

#!/usr/bin/ruby
require 'rubygems'
require 'hpricot'

h = {}
d = Hpricot.XML(open('pubmed_result.xml'))

(d/:PubmedArticle).each do |a|
  (h["=== #{a.at('DateCreated/Year').inner_html} ==="] ||= []) << "{{pubmed>long:#{a.at('PMID').inner_html}}}"
end

puts h.sort {|a,b| b<=>a}

Nine lines – how cool is that? It uses Hpricot to parse the XML and creates a hash of arrays. Hash key is the year, formatted to show a level 4 headline in DokuWiki; hash value is an array of PMIDs, formatted with PubMed plugin syntax. At the end we just print it all out, sorting by year from newest – oldest.

As Pierre would say – that’s it.

GoPubMed

Sometimes it takes a while for information to sink in. Having read posts by Bertalan, Deepak and now Frank on the topic of GoPubMed, I finally got around to looking at the site.

If all interfaces to biomedical databases were as good as this, we’d all be happier and more productive. Go and try it out if you haven’t yet done so; it’s a really impressive piece of work.