Price’s Protein Puzzle: 2019 update

Chains of amino acids strung together make up proteins and since each amino acid has a 1-letter abbreviation, we can find words (English and otherwise) in protein sequences. I imagine this pursuit began as soon as proteins were first sequenced, but the first reference to protein word-finding as a sport is, to my knowledge, “Price’s Protein Puzzle”, a letter to Trends in Biochemical Sciences in September 1987 [1].

Price wrote:

It occurred to me that TIBS could organise a competition to find the longest word […] contained within any known protein sequence.

The journal took up the challenge and published the winning entries in February 1988 [2]. The 7-letter winner was RERATED, with two 6-letter runners-up: LEADER and LIVELY. The sub-genre “biological words in protein sequences” was introduced almost one year later [3] with the discovery of ALLELE, then no more was heard until 1993 with Gonnet and Benner’s Nature correspondence “A Word in Your Protein” [4].

Noting that “none of the extensive literature devoted to this problem has taken a truly systematic approach” (it’s in Nature so one must declare superiority), this work is notable for two reasons. First, it discovered two 9-letter words: HIDALGISM and ENSILISTS. Second, it mentions the technique: a Patricia tree data structure, and that the search took 23 minutes.

Comments on this letter noted one protein sequence that ends with END [5] and the discovery of 10-letter, but non-English words ANNIDAVATE, WALLAWALLA and TARIEFKLAS [6].

I last visited this topic at my blog in 2008 and at someone else’s blog in 2015. So why am I here again? Because the Aho-Corasick algorithm in R, that’s why!

Create your own Google Scholar RSS feed

Google Scholar is a useful tool and now has a dedicated blog. The first post is dedicated to email alerts.

It’s unimaginable, in 2010, that an alert service would not provide an RSS feed, so I can only assume that this feature will appear “in due course”. In the meantime, a quick Google search for create rss feed from website lead me to 7 Tools To Make An RSS Feed Of Any Website. I quickly tested them all and I agree with the author of the article: Feed43 is the winner.

The process for creating a Google Scholar feed is a little complex. Here’s my first attempt.

Update: interesting FriendFeed thread, where people point out that (a) scraping Google Scholar is quite likely to fail and (b) this is not the same as an alert, since results are not ordered by date.
