Andrew asks:
…anyone know of a tool that will take a pubmed query and plot the number of articles by year?
I figured that this was a good excuse to improve my lowly Javascript skills by building a toy web application.
First, video proof that I did get something to work, up to a point:
| Next, some code. I created a Sinatra application, with the directory structure shown. It’s fairly simple: one main file, app.rb, a “spinner” graphic to indicate loading operations, the jQuery and Highcharts Javascript libraries and one view, wrapped in a layout. |
Next, the code in app.rb. It’s about as simple as it gets:
require "rubygems" require "sinatra" require "haml" get "/" do haml :index end
Layout, controlled by layout.haml, simply loads the javascripts and creates a DIV element for content:
!!! XML
!!!
%html
%head
%title PubMed terms by year
%script{:type => "text/javascript", :src => "/javascripts/jquery.js"}
%script{:type => "text/javascript", :src => "/javascripts/highcharts.js"}
%script{:type => "text/javascript", :src => "/javascripts/exporting.js"}
%body
%div
= yield
The action happens in index.haml. First, the elements for content:
%div
%span{:id => "myform"}
%input{:type => "text", :name => "terms", :id => "terms"}
%input{:type => "submit", :value => "Search", :id => "button"}
%span{:id => "loader"}
%img{:src => "images/spinner.gif"}
%div{:id => "container"}
Since there is no server-side (Ruby) processing, I didn’t want to mess around with form submission, so I simply included input and button elements without wrapping them in a form. This is probably very bad practice with regard to valid HTML but this is just a toy application, so I don’t care very much.
Next, the Javascripts. Ideally, these should be saved in public/javascripts and loaded by the layout but for testing purposes, I wrote them inline. Please bear with me, I’m not a great Javascript programmer.
The first one simply hides the content (the animated spinner GIF) of the element with ID loader, except when an AJAX process is running. Thanks to nickf at StackOverflow for that tip.
:javascript
$('#loader')
.hide() // hide it initially
.ajaxStart(function() {
$(this).show();
})
.ajaxStop(function() {
$(this).hide();
});
The second runs the PubMed query, parses the results and plots a chart of publications by year.
:javascript
$("#button").click(function() {
var terms = $("#terms").val();
var dates = [];
var d = [];
var args = {'apikey' : 'YOUR ENTREZ-AJAX API KEY',
'db' : 'pubmed',
'term' : terms,
'retmax' : 5000, // maximum number of results from Esearch
'max' : 5000, // maximum number of results passed to Esummary
'start' : 0
};
$.getJSON('http://entrezajax.appspot.com/esearch+esummary?callback=?', args, function(data) {
if(data.entrezajax.error == true) {
$("#container").html('<p>' + 'Sorry - EntrezAjax failed with error ' + data.entrezajax.error_message + '</p>');
return;
}
$.each(data.result, function(i, item) {
var date = item.PubDate;
dates.push(/^\d{4}/.exec(date));
});
// count by year
var count = {};
for(i in dates)
if(count[dates[i]]) {
count[dates[i]]++;
}
else {
count[dates[i]] = 1;
}
// create data array
for(i in count)
d.push([Date.UTC(i, 0, 1), count[i]]);
// build chart
var options = {
chart: {
renderTo: 'container',
defaultSeriesType: 'column',
width : 900
},
title : {
text : terms + ' - ' + dates.length + ' total'
},
legend : {
enabled : false
},
credits : {
enabled : false
},
tooltip : {
formatter: function() {
return Highcharts.dateFormat('%Y', this.x) + ' : ' + this.y + ' entries';
}
},
xAxis : {
type : 'datetime',
dateTimeLabelFormats : {
year : '%Y'
}
},
yAxis : {
title : { text : 'Entries' }
},
series: [{
data: d
}]
};
var chart = new Highcharts.Chart(options);
});
});
Let’s work through that. Searching happens in lines 2-17. When the button is clicked, the search terms are passed to Entrez-AJAX and a result is returned. If there was an error, the error message is displayed in the DIV with ID = ‘container’ and the program stops. See the Entrez-AJAX API documentation for the details.
Results are parsed in lines 18-33. The value of PubDate is extracted for each record. If it begins with 4 digits (the year), it’s pushed onto an array (dates); otherwise, the record is not counted. Next, we step through the array and build an object, count, where the keys are years and the values the sum of publications for that year. Finally, we step through the object and build an array (d) with elements of the form “[Date.UTC(YYYY, 0, 1), N]“. Here, YYYY is the year and N the count of publications for that year. The month (0) and day (1) are arbitrary; they could also (and perhaps should) be 11 (Dec) and 31.
The chart is built in lines 35-68. First we create the options, then add them to a new Highcharts.Chart object, which is rendered in DIV ID = ‘container’. Refer to the Highcharts documentation for the details.
In the video clip, you’ll note that the final query fails with “ApplicationError : 1″. This seems to happen when the query returns too many results. In fact only queries that return less than a couple of hundred or so results seem to work with Entrez-AJAX. This prompted me to tweet:
yes, these fancy-pants ajaxified web apps are all very well but to do real work, you need to download full datasets
Don’t get me wrong: I’m not criticising Entrez-AJAX, which is a great piece of work. It’s just not designed to retrieve a large number of results and neither is any other web application; it would simply take too long. I could, for example, write a function in app.rb to run Esearch and Esummary, fetch and parse the XML results. However, for queries returning a large number of results, a user would be staring at a blank web page for a very long time. Life would be easier if NCBI improved their API so as only specified fields (such as dates) were returned, but that seems unlikely at present.
The take-home message: if you want publications by year for a particular topic, don’t expect a web application to fetch the data on the fly. You’ll need either a local database or else just write some R code that you can leave running in the background as you do something else.



