What the world needs is: lists of Entrez database fields

You know the problem. You want to qualify your NCBI/Entrez database search term using a field. For example: “autism[TIAB]”, to search PubMed for the word autism in either Title or Abstract. Problem – you can’t find a list of fields specific to that database.

Now you can. Follow the links in this public Dropbox file, to see a CSV file containing name, full name and description of the fields for each Entrez database.

Code to generate the files is listed below. This may or may not be the first in an occasional, irregular “what the world needs” series.

#!/usr/bin/ruby
require 'rubygems'
require 'bio'
require 'hpricot'
require 'open-uri'

Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new

ncbi.einfo.each do |db|
  puts "Processing #{db}..."
  File.open("#{db}.txt", "w") do |f|
    doc = Hpricot(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=#{db}"))
    (doc/'//fieldlist/field').each do |field|
      name = (field/'/name').inner_html
      fullname = (field/'/fullname').inner_html
      description = (field/'description').inner_html
      f.write("#{name},#{fullname},#{description}\n")
    end
  end
end

7 thoughts on “What the world needs is: lists of Entrez database fields

  1. Peter

    You talked about this before (July 2009),

    https://nsaunders.wordpress.com/2009/05/27/querying-ncbi-entrez-database-fields-using-ruby/

    I was inspired to write up a Biopython solution (June 2009),

    http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/

    It looks like at some point since then the NCBI made this easier to do via their web interface by adding a drop down list to the advanced search pages:

    http://www.ncbi.nlm.nih.gov/pubmed/advanced
    http://www.ncbi.nlm.nih.gov/nuccore/advanced

    1. nsaunders Post author

      I’ve not seen EUtils used in this way but if those data are exposed through the web interface, it should be possible. I’ll look into it.

  2. Chris Fields

    I actually have something like this now within BioPerl’s eutil tools, all wrapped up in a script (in the core distribution). One can call:

    bp_einfo.pl -e my@foo.bar -d pubmed

    to get both the links and the fields (along with descriptions).

    1. nsaunders Post author

      Nice. Yes, the einfo functionality in Bioperl is currently better than BioRuby, which only lists databases hence the need for the second step using the raw URL to fetch XML.

  3. Chris Fields

    One thing of note NCBI quietly recently added (it’s somewhat buried in their updated documentation).

    If you have a few thousand UIDs, to get around the long URL issue one can now use a post request (instead of a get request) on pretty much all the eutils besides einfo. Apparently this also works for esearch.

Comments are closed.