Beware of rogue header files (Bioconductor installation)

Just a short note concerning a “gotcha”.

As I have many times before, I opened an R console on my newly-upgraded (to lucid 10.04) Ubuntu machine, typed source(“http://bioconductor.org/biocLite.R”) and began a Bioconductor install with biocLite(). Only this time, I saw this:

Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library 
 '/home/sau103/R/i486-pc-linux-gnu-library/2.11/affyio/libs/affyio.so':
  /home/sau103/R/i486-pc-linux-gnu-library/2.11/affyio/libs/affyio.so: undefined symbol: egzread
ERROR: loading failed
* removing ‘/home/sau103/R/i486-pc-linux-gnu-library/2.11/affyio’

A quick email to the Bioconductor mailing list put me in touch with the very helpful Martin Morgan, who suggested that I check my zlib libraries. Sure enough, the rogue “egzread” was found in /usr/local/include/zlibemboss.h, along with a second zlib.h file, in addition to /usr/include/zlib.h.

grep egz /usr/local/include/zlibemboss.h
> #define gzread egzread

I moved the rogue zlib.h out of /usr/local/include and order was restored.

So in summary, watch out when installing EMBOSS on Ubuntu – it seems to mess with things that it should not.

But just before I go…

…I have to mention Carl Zimmer’s post on the quest to find English words in human protein sequences.

This game has been around as long as sequence databases have existed. I have a vague memory of a letter from the early 1990s (possibly in Trends in Biochemical Sciences Nature) in which the authors reported the results of comparing SwissProt with the Oxford English Dictionary. As I recall, the longest word that they found was ENSILISTS – meaning people who practice the art of making silage.

Anyway – here’s a quick and easy way to tackle the problem using EMBOSS and some Linux command line trickery.
Read the rest…