Brief notes on export from FriendFeed

During discussion of the ISMB 2008 room, Thomas asks: “Does FF really provide long-term archival?” Lars points out that it’s as permanent as anything else on the Web, Dorothea points out that FriendFeed offer no guarantees and Deepak discusses the FriendFeed API.

Question: how useful is the FriendFeed API as a tool to, for example, archive a FriendFeed room?

We can access the ISMB 2008 room via the API using a URL like this:

curl "http://friendfeed.com/api/feed/room/ismb-2008?format=xml" > ismb.xml

We can also retrieve items in other formats by substituting “xml” in the URL with one of: json, atom, rss. Note that where a FriendFeed post contains a “N more comments” link, those comments are actually present on the page and revealed using javascript on user click – so curl will retrieve the complete discussion.

By default this will retrieve “the most recent entries”. The first problem is that entries returned in this way do not correspond with “one page” of the room as viewed on the web. Looking at the XML output, the last entry is:

<body>Summary: largest metazoan gene network dataset to date.</body>

Whereas there are two further posts on page 1 of the room as seen on the web.

So, how do we go about fetching a range of items? The API documentation tells us that we can specify the item index using start and count from that index using num. With a little trial and error, I discovered that:

  • the index of the first item = 0
  • the maximum number of items that can be returned = 100

You can try this for yourself as follows. Run this curl command and check the size of the resulting file:

curl "http://friendfeed.com/api/feed/room/ismb-2008?start=0&num=99&format=json" > ismb.json

Now run it again but use “&num=100”, then again using “&num=101”, checking the file size each time. The 3 file sizes for num=99, 100 and 101 should be 489234, 490390 and 490390 bytes, respectively. In other words, no further entries are retrieved after num=100, although you know that there are more than 100 items.

Can we count the total number of posts in a room? To my knowledge, not easily. We can try to retrieve the complete set of posts for the ISMB 2008 room using:

curl "http://friendfeed.com/api/feed/room/ismb-2008?start=0&num=100&format=xml" > ismb0-99.xml
curl "http://friendfeed.com/api/feed/room/ismb-2008?start=99&num=100&format=xml" > ismb100-199.xml

However, getting the values of start and num correct (so as not to miss or duplicate posts) can be tricky.

In summary

  • Posts can be retrieved from a FriendFeed room in XML, RSS, ATOM or JSON format
  • Archiving all room content is not easy because it’s difficult to determine:
    • The total number of posts in a room
    • The index of each post
    • The best way to retrieve chunks without overlap or omission

However, it’s a start. Next steps: store posts in file format of choice, write parsers (e.g. XML to wiki syntax).

8 thoughts on “Brief notes on export from FriendFeed

  1. Deepak

    Neil

    Given how we’ve been using FF and the concerns (legitimate) expressed, should probably drop a note to the FF crowd. Would be interesting to see what they have to say

  2. nsaunders Post author

    Agreed. Ideally, export/archiving would be a feature provided by knowledgeable FriendFeed engineers, rather than amateur hacks like myself.

  3. Deepak

    Well your hack is actually a good way to do this, but perhaps they could add features to the FriendFeed API designed for archival as opposed to retrieval

  4. bill

    When it comes to amateur hacks, you got nothing on me. I pull my “comments and likes” feed into Google Reader and call that a searchable archive! It works, in so far as GR never deletes or truncates (afaik), but it’s not what you’d call convenient.

Comments are closed.