Web scraping using Mechanize: PMID to PMCID/NIHMSID

Web services are great. Pass them a URL. Structured data comes back. Parse it, analyse it, visualise it. Done.

Web scraping – interacting programmatically with a web page – is not so great. It requires more code and when the web page changes, the code breaks. However, in the absence of a web service, scraping is better than nothing. It can even be rather satisfying. Early in my bioinformatics career the realisation that code, rather than humans, can automate the process of submitting forms and reading the results was quite a revelation.

In this post: how to interact with a web page at the NCBI using the Mechanize library.

Read the rest…

Add FriendFeed comments and likes to WordPress.com posts using Ruby

The problem
FriendFeed aggregates your blog posts from WordPress.com. Naturally, people prefer to comment on your post at FriendFeed – it’s quicker, easier and more fun. However, you would like to see an indication of this activity back at the original blog post.

The solutions
You could self-host your blog using software from WordPress.org. This allows you to install plugins such as FriendFeed comments. But you’re at WordPress.com because you don’t want to self-host, right? So you just have to live with the absence of useful plugins. My advice: don’t try discussing issues like this one in the WordPress.com forums unless you’re the kind of person who enjoys comment threads at YouTube.

For intelligent, mature and constructive discussion go to FriendFeed of course, where Lars writes:

How hard would it be to make a web service that reads the RSS feed from you blog, accesses FriendFeed via the API, identifies comments on FriendFeed related to your blog posts, and reposts them on your blog? If you want to, you could also keep track of “likes”…

Lets find out – using Ruby!
Read the rest…