Exploring the NCBI taxonomy database using Entrez Direct

I’ve been meaning to write about Entrez Direct, henceforth called edirect, for some time. This tweet provided me with an excuse:

This post is not strictly the answer to that question. Instead we’ll ask: which parent IDs of records for insects in the NCBI Taxonomy database have the most species IDs?
Continue reading

From PMID to BibTeX via BioRuby

Chris writes:

The blog post in question concerns conversion of PubMed PMIDs to BibTeX citations. However, a few things have changed since 2010.

Here’s what currently works.


# pmid2bibtex.rb
# convert a PubMed PMID to BibTeX citation format
# updated version of http://chrisamiller.com/science/2010/12/13/using-bioruby-to-fetch-citations-from-pubmed/
# works as of 2015-03-18
require 'bio'
Bio::NCBI.default_email = "me@me.com" # required for EUtils
id = "18265351"
pm = Bio::PubMed::efetch(id) # array of MEDLINE-formatted string
med = Bio::MEDLINE.new(pm[0]) # MEDLINE object
bib = med.reference.format("bibtex") # format is a method of Reference object
# "@article{PMID:18265351,\n author = {Brown, T. and Mackey, K. and Du, T.},\n title = {Analysis of RNA by northern
# and slot blot hybridization.},\n journal = {Curr Protoc Mol Biol},\n year = {2004},\n volume = {Chapter
# 4},\n pages = {Unit 4.9},\n url = {http://www.ncbi.nlm.nih.gov/pubmed/18265351},\n}\n"

view raw

pmid2bibtex.rb

hosted with ❤ by GitHub

PubMed Publication Date: what is it, exactly?

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.”

Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013.

library(rentrez)

es <- entrez_search("pubmed", "\"Retracted Publication\"[PTYP] 2013[PDAT]", usehistory = "y")
es$count
# [1] 117

117 articles. Now let’s fetch the records in XML format.

xml <- entrez_fetch("pubmed", WebEnv = es$WebEnv, query_key = es$QueryKey, 
                    rettype = "xml", retmax = es$count)

Next question: which XML element specifies the “Date of publication” (PDAT)?
Continue reading

What the world needs is: lists of Entrez database fields

You know the problem. You want to qualify your NCBI/Entrez database search term using a field. For example: “autism[TIAB]”, to search PubMed for the word autism in either Title or Abstract. Problem – you can’t find a list of fields specific to that database.

Now you can. Follow the links in this public Dropbox file, to see a CSV file containing name, full name and description of the fields for each Entrez database.

Code to generate the files is listed below. This may or may not be the first in an occasional, irregular “what the world needs” series.

#!/usr/bin/ruby
require 'rubygems'
require 'bio'
require 'hpricot'
require 'open-uri'

Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new

ncbi.einfo.each do |db|
  puts "Processing #{db}..."
  File.open("#{db}.txt", "w") do |f|
    doc = Hpricot(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=#{db}"))
    (doc/'//fieldlist/field').each do |field|
      name = (field/'/name').inner_html
      fullname = (field/'/fullname').inner_html
      description = (field/'description').inner_html
      f.write("#{name},#{fullname},#{description}\n")
    end
  end
end