Analysis of retractions in PubMed

As so often happens these days, a brief post at FriendFeed got me thinking about data analysis. Entitled “So how many retractions are there every year, anyway?”, the post links to this article at Retraction Watch. It discusses ways to estimate the number of retractions and in particular, a recent article in the Journal of Medical Ethics (subscription only, sorry) which addresses the issue.

As Christina pointed out in a comment at Retraction Watch, there are thousands of scientific journals of which PubMed indexes only a fraction. However, PubMed is relatively easy to analyse using a little Ruby and R. So, here we go…
Read the rest…

Dropbox tip continued: convert a file tree to HTML

A couple of posts ago, I outlined a small bash script to generate an index.html file, containing links to other files in a directory. This was for generating links to files in a Dropbox public directory.

I had completely forgotten about the very useful UNIX/Linux command named tree. If not installed, it should be in your distribution repository (e.g. sudo apt-get install tree for Ubuntu/Debian). Then simply:

cd Dropbox/Public/mydirectory
tree -H . > index.html
Next, navigate to index.html at the Dropbox website and you should see something like the tree on the right. It’s a little ugly and obviously, not as convenient as something like Github, but can be a good quick and dirty fix if you need to share a hierarchy of directories and files.

A quick Bash tip: add an index.html file to a Dropbox public folder

You know that Dropbox is terrific, of course. No? Go and check it out now.

One issue: files in your Public folder have a public URL, that you can send to other people. Unfortunately, directories do not. So how do you share a public directory full of files?

Answer: create an index.html file and share that. Let’s say that your files end in “.txt” and reside in ~/Dropbox/Public/entrez. Do this:

cd ~/Dropbox/Public/entrez
echo "<ol>" > index.html
for i in `ls *.txt`; do echo "<li><a href='$i'>$i</a></li>" >> index.html; done
echo "</ol>" >> index.html

Now you can share the link to the index.html, which when clicked will display a list of links to all the other files in the directory.

First publication for a while: a review of nuclear localization

Thanks to my former colleagues at UQ for involving me with:

Marfori, M., Mynott, A., Ellis, J.J., Mehdid, A.M., Saunders, N.F.W., Curmi, P.M., Forwood, J.K., Bodén, M. and Kobe, B. (2010)
Molecular basis for specificity of nuclear import and prediction of nuclear localization.
Biochimica et Biophysica Acta (BBA) – Molecular Cell Research: in press – uncorrected proof.

A real group effort this one; I made a small contribution to Section 3 – Nuclear localization databases and computational tools to predict nuclear localization.

Note that this version is an uncorrected proof and may change slightly before full publication. If nuclear localization is your thing, I think you’ll enjoy it; it’s a pretty comprehensive review.

What the world needs is: lists of Entrez database fields

You know the problem. You want to qualify your NCBI/Entrez database search term using a field. For example: “autism[TIAB]”, to search PubMed for the word autism in either Title or Abstract. Problem – you can’t find a list of fields specific to that database.

Now you can. Follow the links in this public Dropbox file, to see a CSV file containing name, full name and description of the fields for each Entrez database.

Code to generate the files is listed below. This may or may not be the first in an occasional, irregular “what the world needs” series.

require 'rubygems'
require 'bio'
require 'hpricot'
require 'open-uri'

Bio::NCBI.default_email = ""
ncbi =

ncbi.einfo.each do |db|
  puts "Processing #{db}...""#{db}.txt", "w") do |f|
    doc = Hpricot(open("{db}"))
    (doc/'//fieldlist/field').each do |field|
      name = (field/'/name').inner_html
      fullname = (field/'/fullname').inner_html
      description = (field/'description').inner_html