During discussion of the ISMB 2008 room, Thomas asks: “Does FF really provide long-term archival?” Lars points out that it’s as permanent as anything else on the Web, Dorothea points out that FriendFeed offer no guarantees and Deepak discusses the FriendFeed API.
Question: how useful is the FriendFeed API as a tool to, for example, archive a FriendFeed room?
We can access the ISMB 2008 room via the API using a URL like this:
curl "http://friendfeed.com/api/feed/room/ismb-2008?format=xml" > ismb.xml
By default this will retrieve “the most recent entries”. The first problem is that entries returned in this way do not correspond with “one page” of the room as viewed on the web. Looking at the XML output, the last entry is:
<body>Summary: largest metazoan gene network dataset to date.</body>
Whereas there are two further posts on page 1 of the room as seen on the web.
So, how do we go about fetching a range of items? The API documentation tells us that we can specify the item index using start and count from that index using num. With a little trial and error, I discovered that:
- the index of the first item = 0
- the maximum number of items that can be returned = 100
You can try this for yourself as follows. Run this curl command and check the size of the resulting file:
curl "http://friendfeed.com/api/feed/room/ismb-2008?start=0&num=99&format=json" > ismb.json
Now run it again but use “&num=100”, then again using “&num=101”, checking the file size each time. The 3 file sizes for num=99, 100 and 101 should be 489234, 490390 and 490390 bytes, respectively. In other words, no further entries are retrieved after num=100, although you know that there are more than 100 items.
Can we count the total number of posts in a room? To my knowledge, not easily. We can try to retrieve the complete set of posts for the ISMB 2008 room using:
curl "http://friendfeed.com/api/feed/room/ismb-2008?start=0&num=100&format=xml" > ismb0-99.xml
curl "http://friendfeed.com/api/feed/room/ismb-2008?start=99&num=100&format=xml" > ismb100-199.xml
However, getting the values of start and num correct (so as not to miss or duplicate posts) can be tricky.
- Posts can be retrieved from a FriendFeed room in XML, RSS, ATOM or JSON format
- Archiving all room content is not easy because it’s difficult to determine:
- The total number of posts in a room
- The index of each post
- The best way to retrieve chunks without overlap or omission
However, it’s a start. Next steps: store posts in file format of choice, write parsers (e.g. XML to wiki syntax).