You know the problem. You want to qualify your NCBI/Entrez database search term using a field. For example: “autism[TIAB]”, to search PubMed for the word autism in either Title or Abstract. Problem – you can’t find a list of fields specific to that database.
Now you can. Follow the links in this public Dropbox file, to see a CSV file containing name, full name and description of the fields for each Entrez database.
Code to generate the files is listed below. This may or may not be the first in an occasional, irregular “what the world needs” series.
#!/usr/bin/ruby require 'rubygems' require 'bio' require 'hpricot' require 'open-uri' Bio::NCBI.default_email = "me@me.com" ncbi = Bio::NCBI::REST.new ncbi.einfo.each do |db| puts "Processing #{db}..." File.open("#{db}.txt", "w") do |f| doc = Hpricot(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=#{db}")) (doc/'//fieldlist/field').each do |field| name = (field/'/name').inner_html fullname = (field/'/fullname').inner_html description = (field/'description').inner_html f.write("#{name},#{fullname},#{description}\n") end end end
You talked about this before (July 2009),
https://nsaunders.wordpress.com/2009/05/27/querying-ncbi-entrez-database-fields-using-ruby/
I was inspired to write up a Biopython solution (June 2009),
http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/
It looks like at some point since then the NCBI made this easier to do via their web interface by adding a drop down list to the advanced search pages:
http://www.ncbi.nlm.nih.gov/pubmed/advanced
http://www.ncbi.nlm.nih.gov/nuccore/advanced
Yup, this is the extended “any database” version.
Thanks very much for a useful post. Do you know if there is a way to get the index list for the properties field? In some of the databases it runs to the thousands, and scrolling 200 at a time through the web interface (eg. http://www.ncbi.nlm.nih.gov/protein/advanced) is not practical.
I’ve not seen EUtils used in this way but if those data are exposed through the web interface, it should be possible. I’ll look into it.
I actually have something like this now within BioPerl’s eutil tools, all wrapped up in a script (in the core distribution). One can call:
bp_einfo.pl -e my@foo.bar -d pubmed
to get both the links and the fields (along with descriptions).
Nice. Yes, the einfo functionality in Bioperl is currently better than BioRuby, which only lists databases hence the need for the second step using the raw URL to fetch XML.
One thing of note NCBI quietly recently added (it’s somewhat buried in their updated documentation).
If you have a few thousand UIDs, to get around the long URL issue one can now use a post request (instead of a get request) on pretty much all the eutils besides einfo. Apparently this also works for esearch.