FriendFeed Life Scientists: 14-day summary

Since I haven’t posted for 14 days, what better (and lazier) way to post something than to surf over to a 14-day summary from the Life Scientists Group and link to the top ten items!

  1. Review process files in the EMBO Journal – but why only for “the majority of papers”?
  2. How XML threatens Big Data. Or not. How JSON might be an alternative – or not.
  3. Solve any computer problem – with this classic XKCD flowchart.
  4. Science reviews the revolution in ‘strategic scientific reading’ – are they way behind the curve, or providing a useful summary for the uninitiated?
  5. Best practice in microbial genome annotation – spirited discussion on the nature of best bioinformatics practice.
  6. FriendFeed Life Scientists user survey – no further word on whether this will happen.
  7. 50 Years of Structure – link to a JMB review on the early days of structural biology.
  8. Reflections on Science Online London 2009
  9. Workflow tools that speak SOAP?
  10. Advice on cleaning up a protein sample – a nice example of useful discussion from the group.

Who knows, this could become a semi-regular feature.

Improvements to the reference management workflow

I use Google Reader to subscribe to the RSS feeds from journals that interest me (see my public page). I’m also a big fan of CiteULike as a reference management system.

For a long time I’ve thought: it would be great if GReader handled journal articles more efficiently. Rather than going from link in GReader -> article at journal -> CiteULike bookmark -> back to GReader, how about “post directly from GReader?”

With Google Reader’s new send-to feature, you can do just that. See this forum post for the details. Also, take a look at this how-to for a quick way to post to CiteULike by entering a PubMed PMID, DOI or ISBN identifier in the address bar.

RSRuby in the IRB console

R is terrific, of course, for all your statistical needs. But those data structures! “Everything is a list.” Leading to such wondrous ways to access variables as “p <- Meta(gds)$platform", or "last <- mylist[[1]][length(mylist[[1]])]".

Sometimes, you want something more familiar. An array, a hash, a hash of arrays. Or, you may need to access R data in the language of your choice – e.g. as part of a Rails project.

In Ruby, IRB is your friend. On the right, an IRB session in which we invoke RSRuby, load the GEOquery library from Bioconductor, fetch a dataset from the GEO database and examine the metadata that describes the experiment. Result: a ruby hash of arrays, where the keys are covariate types (“sample, disease.state, description”) and the values covariate names for each column (sample) in the dataset. Now easy to access using:

columns.each_pair do |key,val|
  # do something with keys
  val.each do |element|
    # do something with values