But just before I go…
February 2, 2008 — nsaunders…I have to mention Carl Zimmer’s post on the quest to find English words in human protein sequences.
This game has been around as long as sequence databases have existed. I have a vague memory of a letter from the early 1990s (possibly in Trends in Biochemical Sciences Nature) in which the authors reported the results of comparing SwissProt with the Oxford English Dictionary. As I recall, the longest word that they found was ENSILISTS - meaning people who practice the art of making silage.
Anyway - here’s a quick and easy way to tackle the problem using EMBOSS and some Linux command line trickery.
Read the rest…

