From PMID to BibTeX via BioRuby

Chris writes:

The blog post in question concerns conversion of PubMed PMIDs to BibTeX citations. However, a few things have changed since 2010.

Here’s what currently works.


# pmid2bibtex.rb
# convert a PubMed PMID to BibTeX citation format
# updated version of http://chrisamiller.com/science/2010/12/13/using-bioruby-to-fetch-citations-from-pubmed/
# works as of 2015-03-18
require 'bio'
Bio::NCBI.default_email = "me@me.com" # required for EUtils
id = "18265351"
pm = Bio::PubMed::efetch(id) # array of MEDLINE-formatted string
med = Bio::MEDLINE.new(pm[0]) # MEDLINE object
bib = med.reference.format("bibtex") # format is a method of Reference object
# "@article{PMID:18265351,\n author = {Brown, T. and Mackey, K. and Du, T.},\n title = {Analysis of RNA by northern
# and slot blot hybridization.},\n journal = {Curr Protoc Mol Biol},\n year = {2004},\n volume = {Chapter
# 4},\n pages = {Unit 4.9},\n url = {http://www.ncbi.nlm.nih.gov/pubmed/18265351},\n}\n"

view raw

pmid2bibtex.rb

hosted with ❤ by GitHub

Problematic cell lines: now in a real database

Back in July, I was complaining about the latest abuse of the word “database” by biologists: the “PDF as database.”

This led to some very productive discussion using PubMed Commons and I’m happy to report that misidentified and contaminated cell lines are now included in the NCBI BioSample database.

As the news release notes, rather alarmingly:

This problem is so common it is thought that thousands of misleading and potentially erroneous papers have been published using cell lines that are incorrectly identified

So it would be useful if there were a direct link between the BioSample record for a cell line and PubMed records in which it was used…
Continue reading

BioRuby development: feedback on using Git

Everyone likes constructive feedback. I received a couple of great comments on my previous post, which warrant a brief discussion.

@vlandham points out that when the main BioRuby repository updates, you’ll want to update your local repository. Using git, you do that by adding a remote which points to the original repository, from which you can fetch updates and merge with your local version:

git remote add upstream https://github.com/bioruby/bioruby.git
# fetch/merge only when main repo updates
git fetch upstream
git merge upstream master

This is described at the GitHub help page Fork A Repo.

Michael points to an article titled A successful Git branching model. It suggests that when developing new features you create a feature branch (also called topic branch). This can help with the management of new features and creates a more complete commit history if/when the new feature is merged back into your development repository. The article also suggests a main branch for development named develop, rather than the default master.

I haven’t quite got my head around all the ins-and-outs of the article yet, but it’s well worth a read.

A beginner’s guide to BioRuby development

I’m the “biologist-turned-programmer” type of bioinformatician which makes me a hacker, not a developer. Most of the day-to-day coding that I do goes something like this:

Colleague: Hey Neil, can you write me a script to read data from file X, do Y to it and output a table in file Z?
Me: Sure… (clickety-click, hackety-hack…) …there you go.
Colleague: Great! Thanks.

I’m a big fan of the Bio* projects and have used them for many years, beginning with Bioperl and more recently, BioRuby. And I’ve always wanted to contribute some code to them, but have never got around to doing so. This week, two thoughts popped into my head:

  • How hard can it be?
  • There isn’t much introductory documentation for would-be Bio* developers

The answer to the first question is: given some programming experience, not very hard at all. This blog post is my attempt to address the second thought, by writing a step-by-step guide to developing a simple class for the BioRuby library. When I say “beginner’s guide”, I’m referring to myself as much as anyone else.
Read the rest…