But just before I go…

…I have to mention Carl Zimmer’s post on the quest to find English words in human protein sequences.

This game has been around as long as sequence databases have existed. I have a vague memory of a letter from the early 1990s (possibly in Trends in Biochemical Sciences Nature) in which the authors reported the results of comparing SwissProt with the Oxford English Dictionary. As I recall, the longest word that they found was ENSILISTS – meaning people who practice the art of making silage.

Anyway – here’s a quick and easy way to tackle the problem using EMBOSS and some Linux command line trickery.
Read the rest…