Tag Archives: api

When is db=all not db=all? When you use Entrez ELink.

Just a brief technical note.

I figured that for a given compound in PubChem, it would be interesting to know whether that compound had been used in a high-throughput experiment, which you might find in GEO. Very easy using the E-utilities, as implemented in the R package rentrez:

library(rentrez)
links <- entrez_link(dbfrom = "pccompound", db = "gds", id = "62857")
length(links$pccompound_gds)
# [1] 741

Browsing the rentrez documentation, I note that db can take the value "all". Sounds useful!

links <- entrez_link(dbfrom = "pccompound", db = "all", id = "62857")
length(links$pccompound_gds)
# [1] 0

That's odd. In fact, this query does not even link pccompound to gds:

length(names(links))
# [1] 39
which(names(links) == "pccompound_gds")
# integer(0)

It's not a rentrez issue, since the same result occurs using the E-utilities URL.

The good people at ropensci have opened an issue, contacting NCBI for clarification. We'll keep you posted.

Farewell FriendFeed. It’s been fun.

I’ve been a strong proponent of FriendFeed since its launch. Its technology, clean interface and “data first, then conversations” approach have made it a highly-successful experiment in social networking for scientists (and other groups). So you may be surprised to hear that from today, I will no longer be importing items into FriendFeed, or participating in the conversations at other feeds.

Here’s a brief explanation and some thoughts on my online activity in the coming months.
Read the rest…

APIs have let me down part 2/2: FriendFeed

In part 1, I described some frustrations arising out of a work project, using the Array Express API. I find that one way to deal mentally with these situations is to spend some time on a fun project, using similar programming techniques. A potential downside of this approach is that if your fun project goes bad, you’re really frustrated. That’s when it’s time to abandon the digital world, go outside and enjoy nature.

Here then, is why I decided to build another small project around FriendFeed, how its failure has led me to question the value of FriendFeed for the first time and why my time as a FriendFeed user might be up.
Read the rest…

APIs have let me down part 1/2: ArrayExpress

The API – Application Programming Interface – is, in principle, a wonderful thing. You make a request to a server using a URL and back come lovely, structured data, ready to parse and analyse. We’ve begun to demand that all online data sources offer an API and lament the fact that so few online biological databases do so.

Better though, to have no API at all than one which is poorly implemented and leads to frustration? I’m beginning to think so, after recent experiences on both a work project and one of my “fun side projects”. Let’s start with the work project, an attempt to mine a subset of the ArrayExpress microarray database.
Read the rest…

Backup your CiteULike library using MongoDB and Ruby

Well, that was easy.

#!/usr/bin/ruby
require "rubygems"
require "mongo"
require "json/pure"
require "open-uri"

db  = Mongo::Connection.new.db('citeulike')
col = db.collection('articles')
j   = JSON.parse(open("http://www.citeulike.org/json/user/neils").read)

j.each do |article|
  article[:_id] = article['article_id']
  col.save(article)
end

Lazy post: a Life Scientists best-of

When stuck for ideas or pressed for time, a blogger can always fall back on a round-up of activity from elsewhere on the web. Yes, it’s time for a “best of the past 14 days” from the FriendFeed Life Scientists group.

Just a slight twist to make it more exciting (?) – we’ll automate the process using the API and a little Ruby.
Read the rest…

How to: archive data via an API using Ruby and MongoDB

I was going to title this post “How to: archive a FriendFeed feed in MongoDB”. The example code does just that but (a) I fear that this blog suggests a near-obsession with FriendFeed (see tag cloud, right sidebar) and (b) the principles apply to any API that returns JSON. There are rare examples of biological data with JSON output in the wild, e.g. the ArrayExpress Gene Expression Atlas. So I’m still writing a bioinformatics blog ;-)

Let’s go straight to the code:

#!/usr/bin/ruby

require "rubygems"
require "mongo"
require "json/pure"
require "open-uri"

# db config
db  = Mongo::Connection.new.db('friendfeed')
col = db.collection('lifesci')

# fetch json
0.step(9900, 100) {|n|
  f = open("http://friendfeed-api.com/v2/feed/the-life-scientists?start=#{n}&amp;num=100").read
  j = JSON.parse(f)
  break if j['entries'].count == 0
  j['entries'].each do |entry|
    if col.find({:_id =&gt; entry['id']}).count == 0
      entry[:_id] = entry['id']
      entry.delete('id')
      col.save(entry)
    end
  end
  puts "Processed entries #{n} - #{n + 99}", "Database contains #{col.count} documents."
}

puts "No more entries to process. Database contains #{col.count} documents."

Also available as a gist. Fork away.

A quick run-through. Lines 4-6 load the required libraries: mongo (the mongodb ruby driver), json and open-uri. If you don't have the first two, simply "gem install mongo json_pure". Of course, you'll need to download MongoDB and have the mongod server daemon running on your system.

Lines 9-10 connect to the database (assuming a standard database installation). Rename the database and collection as you see fit. Both will be created if they don't exist.

The guts are lines 12-25. A loop fetches JSON from the FriendFeed API, 100 entries at a time (0-99, 100-199...) up to 9999. That's an arbitrarily-high number, to ensure that all entries are retrieved. Change "the-life-scientists" in line 14 to the feed of your choice. The JSON is then parsed into a hash structure. In lines 17-23 we loop through each entry and extract the "id" key, a unique identifier for the entry. This is used to create the "_id" field, a unique identifier for the MongoDB document. If a document with _id == id does not exist we create an _id key in the hash, delete the (now superfluous) 'id' key and save the document. Otherwise, the entry is skipped.
At some point the API will return no more entries: { "entries" : [] }. When this happens, we exit the block (line 16) and print a summary.

That's it, more or less. Obviously, the script would benefit from some error checking and more options (such as supplying a feed URL as a command line option). For entries with attached files, the file URL but not the attachment will be saved. A nice improvement would be to fetch the attachment and save it to the database, using GridFS.

Possible uses: a simple archive, a backend for a web application to analyse the feed.

Reblog this post [with Zemanta]

Brief notes on export from FriendFeed

During discussion of the ISMB 2008 room, Thomas asks: “Does FF really provide long-term archival?” Lars points out that it’s as permanent as anything else on the Web, Dorothea points out that FriendFeed offer no guarantees and Deepak discusses the FriendFeed API.

Question: how useful is the FriendFeed API as a tool to, for example, archive a FriendFeed room?
Read the rest…