…at its very best.
PhosphoGRID
I no longer work on protein kinases but when I did, PhosphoGRID is the kind of database that I would have wanted to see. It features:
- A nice clean interface, with good use of Javascript
- Useful information returned from a simple search form
- Data for download in plain text format with no restrictions or requirements for registration
All it lacks is a RESTful API, but nothing is perfect :-)
Published in the little-known but often-useful journal Database:
PhosphoGRID: a database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae.
doi:10.1093/database/bap026.
How to: archive data via an API using Ruby and MongoDB
I was going to title this post “How to: archive a FriendFeed feed in MongoDB”. The example code does just that but (a) I fear that this blog suggests a near-obsession with FriendFeed (see tag cloud, right sidebar) and (b) the principles apply to any API that returns JSON. There are rare examples of biological data with JSON output in the wild, e.g. the ArrayExpress Gene Expression Atlas. So I’m still writing a bioinformatics blog ;-)
Let’s go straight to the code:
#!/usr/bin/ruby
require "rubygems"
require "mongo"
require "json/pure"
require "open-uri"
# db config
db = Mongo::Connection.new.db('friendfeed')
col = db.collection('lifesci')
# fetch json
0.step(9900, 100) {|n|
f = open("http://friendfeed-api.com/v2/feed/the-life-scientists?start=#{n}&num=100").read
j = JSON.parse(f)
break if j['entries'].count == 0
j['entries'].each do |entry|
if col.find({:_id => entry['id']}).count == 0
entry[:_id] = entry['id']
entry.delete('id')
col.save(entry)
end
end
puts "Processed entries #{n} - #{n + 99}", "Database contains #{col.count} documents."
}
puts "No more entries to process. Database contains #{col.count} documents."
Also available as a gist. Fork away.
A quick run-through. Lines 4-6 load the required libraries: mongo (the mongodb ruby driver), json and open-uri. If you don’t have the first two, simply “gem install mongo json_pure”. Of course, you’ll need to download MongoDB and have the mongod server daemon running on your system.
Lines 9-10 connect to the database (assuming a standard database installation). Rename the database and collection as you see fit. Both will be created if they don’t exist.
The guts are lines 12-25. A loop fetches JSON from the FriendFeed API, 100 entries at a time (0-99, 100-199…) up to 9999. That’s an arbitrarily-high number, to ensure that all entries are retrieved. Change “the-life-scientists” in line 14 to the feed of your choice. The JSON is then parsed into a hash structure. In lines 17-23 we loop through each entry and extract the “id” key, a unique identifier for the entry. This is used to create the “_id” field, a unique identifier for the MongoDB document. If a document with _id == id does not exist we create an _id key in the hash, delete the (now superfluous) ‘id’ key and save the document. Otherwise, the entry is skipped.
At some point the API will return no more entries: { “entries” : [] }. When this happens, we exit the block (line 16) and print a summary.
That’s it, more or less. Obviously, the script would benefit from some error checking and more options (such as supplying a feed URL as a command line option). For entries with attached files, the file URL but not the attachment will be saved. A nice improvement would be to fetch the attachment and save it to the database, using GridFS.
Possible uses: a simple archive, a backend for a web application to analyse the feed.
The Life Scientists at FriendFeed: 2009 summary
It’s Christmas Eve tomorrow and so I declare the year over. My Christmas gift to you is a summary of activity in 2009 at the FriendFeed Life Scientists group. It’s crafted using R + Ruby, with raw data and some code snippets available. If you want to see the most popular items from the group this year, head down to the bottom of this post.
(Note: this post is a work in progress)
Read the rest…
APIs: I wish the life sciences would learn from social networks
I was prompted by a thread on the apparent decline of FriendFeed to look for evidence of declining participation in my networks.
Read the rest…
A brief survey of R web interfaces
I’m looking at ways to provide access to R via a web application. First rule: see what’s available first, before you reinvent the wheel. It’s not pretty.
From the R Web Interfaces FAQ:
| Software | Brief notes |
|---|---|
| Rweb | Page last updated 1999. Of the 3 example links on the page one ran very slowly, the second not at all and the third is broken. |
| R-Online | Or rather, not online. Unless this CGI form is the same thing. I tried Example 1, it returned a server error. |
| Rcgi | Links to several CGI forms, none of which worked for me. |
| CGI-based R access | Link did not load. |
| CGIwithR | Package now maintained at Omegahat. Did not attempt installation. Last updated 2005. |
| Rpad | I could not connect to this URL. |
| RApache | The pick of the bunch. Provides server-side access to R through an Apache module. I was able to install RApache on 32-bit (but not 64-bit) Ubuntu 9.10 and get it running. Could use more documentation. |
| Rserve | Serves R via TCP/IP. Last updated 2006. |
| OpenStatServer | Broken link. No longer exists, so far as I can tell. |
| R PHP Online | Link out of date (but you can follow it to the newer page). Last updated 2003, so unlikely to be much use. |
| R-php | Last updated 2006; the example that I tried gave a server error. |
| webbioc | A Bioconductor package. Did not investigate further. |
| Rwui | An application to create R web interfaces. My browser hung at “waiting for cache”. I gave up. |
So, aside from RApache and some very old-fashioned and/or broken CGI scripts, I conclude that there is little interest in writing beautiful, modern statistical web applications (notable exception). Not so much a case of “reinventing” as “inventing”.
Turn Emacs into an IDE
Update: I should have said Rails IDE – but I’m sure similar plugins are available for other languages
I fired up NetBeans at work today, tried to open a Rails project and – inexplicably, it crashed. All is well at home, so I’m blaming work machine setup issues as-yet unknown (but I suspect, involving the letters “ATI”).
It got me thinking that, as much as I like NetBeans, it is still just a memory-eating, CPU-hogging, bloated Java-based GUI. For some time I’ve wanted to convert my favourite editor, Emacs, to something more like an IDE.
The WyeWorks Blog to the rescue. Install emacs-23 and a couple of Ruby gems, clone their github repository of Emacs plugins, copy to your ~/.emacs.d/ and voilà – marvel at your new, shiny editing environment. I also replaced my ~/.emacs with their init.el file.
The key plugins include ECB, textmate.el, Rinari and yasnippet, plus a bunch of modes for syntax highlighting. If you’ve only tried cursory Emacs customisation in the past the results are a little alarming at first, but you’ll be back to coding (and saying “Ooh! Aah!”) in no time at all.
FriendFeed Life Scientists: 14-day summary
Since I haven’t posted for 14 days, what better (and lazier) way to post something than to surf over to a 14-day summary from the Life Scientists Group and link to the top ten items!
- Review process files in the EMBO Journal – but why only for “the majority of papers”?
- How XML threatens Big Data. Or not. How JSON might be an alternative – or not.
- Solve any computer problem – with this classic XKCD flowchart.
- Science reviews the revolution in ‘strategic scientific reading’ – are they way behind the curve, or providing a useful summary for the uninitiated?
- Best practice in microbial genome annotation – spirited discussion on the nature of best bioinformatics practice.
- FriendFeed Life Scientists user survey – no further word on whether this will happen.
- 50 Years of Structure – link to a JMB review on the early days of structural biology.
- Reflections on Science Online London 2009
- Workflow tools that speak SOAP?
- Advice on cleaning up a protein sample – a nice example of useful discussion from the group.
Who knows, this could become a semi-regular feature.
Improvements to the reference management workflow
I use Google Reader to subscribe to the RSS feeds from journals that interest me (see my public page). I’m also a big fan of CiteULike as a reference management system.
For a long time I’ve thought: it would be great if GReader handled journal articles more efficiently. Rather than going from link in GReader -> article at journal -> CiteULike bookmark -> back to GReader, how about “post directly from GReader?”
With Google Reader’s new send-to feature, you can do just that. See this forum post for the details. Also, take a look at this how-to for a quick way to post to CiteULike by entering a PubMed PMID, DOI or ISBN identifier in the address bar.



![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=f68c0591-1082-4e9e-a099-223ec938a912)

