Tag Archives: json

APIs have let me down part 2/2: FriendFeed

In part 1, I described some frustrations arising out of a work project, using the Array Express API. I find that one way to deal mentally with these situations is to spend some time on a fun project, using similar programming techniques. A potential downside of this approach is that if your fun project goes bad, you’re really frustrated. That’s when it’s time to abandon the digital world, go outside and enjoy nature.

Here then, is why I decided to build another small project around FriendFeed, how its failure has led me to question the value of FriendFeed for the first time and why my time as a FriendFeed user might be up.
Read the rest…

APIs have let me down part 1/2: ArrayExpress

The API – Application Programming Interface – is, in principle, a wonderful thing. You make a request to a server using a URL and back come lovely, structured data, ready to parse and analyse. We’ve begun to demand that all online data sources offer an API and lament the fact that so few online biological databases do so.

Better though, to have no API at all than one which is poorly implemented and leads to frustration? I’m beginning to think so, after recent experiences on both a work project and one of my “fun side projects”. Let’s start with the work project, an attempt to mine a subset of the ArrayExpress microarray database.
Read the rest…

Backup your CiteULike library using MongoDB and Ruby

Well, that was easy.

#!/usr/bin/ruby
require "rubygems"
require "mongo"
require "json/pure"
require "open-uri"

db  = Mongo::Connection.new.db('citeulike')
col = db.collection('articles')
j   = JSON.parse(open("http://www.citeulike.org/json/user/neils").read)

j.each do |article|
  article[:_id] = article['article_id']
  col.save(article)
end

How to: archive data via an API using Ruby and MongoDB

I was going to title this post “How to: archive a FriendFeed feed in MongoDB”. The example code does just that but (a) I fear that this blog suggests a near-obsession with FriendFeed (see tag cloud, right sidebar) and (b) the principles apply to any API that returns JSON. There are rare examples of biological data with JSON output in the wild, e.g. the ArrayExpress Gene Expression Atlas. So I’m still writing a bioinformatics blog ;-)

Let’s go straight to the code:

#!/usr/bin/ruby

require "rubygems"
require "mongo"
require "json/pure"
require "open-uri"

# db config
db  = Mongo::Connection.new.db('friendfeed')
col = db.collection('lifesci')

# fetch json
0.step(9900, 100) {|n|
  f = open("http://friendfeed-api.com/v2/feed/the-life-scientists?start=#{n}&num=100").read
  j = JSON.parse(f)
  break if j['entries'].count == 0
  j['entries'].each do |entry|
    if col.find({:_id => entry['id']}).count == 0
      entry[:_id] = entry['id']
      entry.delete('id')
      col.save(entry)
    end
  end
  puts "Processed entries #{n} - #{n + 99}", "Database contains #{col.count} documents."
}

puts "No more entries to process. Database contains #{col.count} documents."

Also available as a gist. Fork away.

A quick run-through. Lines 4-6 load the required libraries: mongo (the mongodb ruby driver), json and open-uri. If you don’t have the first two, simply “gem install mongo json_pure”. Of course, you’ll need to download MongoDB and have the mongod server daemon running on your system.

Lines 9-10 connect to the database (assuming a standard database installation). Rename the database and collection as you see fit. Both will be created if they don’t exist.

The guts are lines 12-25. A loop fetches JSON from the FriendFeed API, 100 entries at a time (0-99, 100-199…) up to 9999. That’s an arbitrarily-high number, to ensure that all entries are retrieved. Change “the-life-scientists” in line 14 to the feed of your choice. The JSON is then parsed into a hash structure. In lines 17-23 we loop through each entry and extract the “id” key, a unique identifier for the entry. This is used to create the “_id” field, a unique identifier for the MongoDB document. If a document with _id == id does not exist we create an _id key in the hash, delete the (now superfluous) ‘id’ key and save the document. Otherwise, the entry is skipped.
At some point the API will return no more entries: { “entries” : [] }. When this happens, we exit the block (line 16) and print a summary.

That’s it, more or less. Obviously, the script would benefit from some error checking and more options (such as supplying a feed URL as a command line option). For entries with attached files, the file URL but not the attachment will be saved. A nice improvement would be to fetch the attachment and save it to the database, using GridFS.

Possible uses: a simple archive, a backend for a web application to analyse the feed.

Reblog this post [with Zemanta]

R has a JSON package

Named rjson, appropriately. It’s quite basic just now, but contains methods for interconversion between R objects and JSON. Something like this:

library(rjson)
data <- list(a=1,b=2,c=3)
json <- toJSON(data)
json
[1] "{\"a\":1,\"b\":2,\"c\":3}"
cat(json, file="data.json")

Use cases? I wonder if RApache could be used to build an API that serves R data in JSON format?

Add FriendFeed comments and likes to WordPress.com posts using Ruby

The problem
FriendFeed aggregates your blog posts from WordPress.com. Naturally, people prefer to comment on your post at FriendFeed – it’s quicker, easier and more fun. However, you would like to see an indication of this activity back at the original blog post.

The solutions
You could self-host your blog using software from WordPress.org. This allows you to install plugins such as FriendFeed comments. But you’re at WordPress.com because you don’t want to self-host, right? So you just have to live with the absence of useful plugins. My advice: don’t try discussing issues like this one in the WordPress.com forums unless you’re the kind of person who enjoys comment threads at YouTube.

For intelligent, mature and constructive discussion go to FriendFeed of course, where Lars writes:

How hard would it be to make a web service that reads the RSS feed from you blog, accesses FriendFeed via the API, identifies comments on FriendFeed related to your blog posts, and reposts them on your blog? If you want to, you could also keep track of “likes”…

Lets find out – using Ruby!
Read the rest…