Some basics of biomaRt

One of the commonest bioinformatics questions, at Biostars and elsewhere, takes the form: “I have a list of identifiers (X); I want to relate them to a second set of identifiers (Y)”. HGNC gene symbols to Ensembl Gene IDs, for example.

When this occurs I have been known to tweet “the answer is BioMart” (there are often other solutions too) and I’ve written a couple of blog posts about the R package biomaRt in the past. However, I’ve realised that we need to take a step back and ask some basic questions that new users might have. How do I find what marts and datasets are available? How do I know what attributes and filters to use? How do I specify different genome build versions?
Continue reading

R gotcha for the week

I use the biomaRt package from Bioconductor in almost every R session. So I thought I’d load the library and set up a mart instance in my ~/.Rprofile:

library(biomaRt)
mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

On starting R, I was somewhat perplexed to see this error message:

Error in bmVersion(mart, verbose = verbose) : 
  could not find function "read.table"

Twitter to the rescue. @hadleywickham told me to load utils first and @vsbuffalo explained that normally, .Rprofile is read before the utils package is loaded. Seems rather odd to me; I’d have thought that biomaRt should load utils if required, but there you go.

So this works in ~/.Rprofile:

library(utils)
library(biomaRt)
mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")