Querying NCBI Entrez database fields using Ruby

Here’s a problem. You’d like to construct a complex query at NCBI Entrez using various fields. Example:

“9606”[Taxonomy ID]

to limit your search to Homo sapiens. Except – you don’t know which fields are available for the database that you want to query.

EInfo can return an XML file with this information. Ruby + Hpricot eats XML for breakfast. Here’s an example using the GEO Datasets (gds) database.

#!/usr/bin/ruby

require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = Hpricot(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=gds"))

(doc/'//fieldlist/field').each do |f|
  puts "#{(f/'/name').inner_html},#{(f/'/fullname').inner_html},#{(f/'description').inner_html}"
end

And the first few lines of output:
ALL,All Fields,All terms from all searchable fields UID,UID,Unique number assigned to publication FILT,Filter,Limits the records ORGN,Organism,exploded organism names ....24 more lines....

3 thoughts on “Querying NCBI Entrez database fields using Ruby”

Going the Ruby route I see! I wrote up the BioPerl EUtilities tools (Bio::DB::EUtilities, Bio::Tools::EUtilities) to run and parse this stuff (it doesn’t dive into Seq objects, you get raw data for now).

Got a small HOWTO on it as well!

nsaunders

May 27, 2009 at 23:50

Very nice write-up. BioRuby doesn’t handle EInfo too well yet, but does a good job with the rest of the EUtils.

Nice tip – thanks!

I’ve written up a example using EInfo in Biopython,
http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/

Comments are closed.

What You're Doing Is Rather Desperate

Notes from the life of a [data] scientist

Querying NCBI Entrez database fields using Ruby

3 thoughts on “Querying NCBI Entrez database fields using Ruby”

Share this:

Related

3 thoughts on “Querying NCBI Entrez database fields using Ruby”