Create your own Google Scholar RSS feed

Google Scholar is a useful tool and now has a dedicated blog. The first post is dedicated to email alerts.

It’s unimaginable, in 2010, that an alert service would not provide an RSS feed, so I can only assume that this feature will appear “in due course”. In the meantime, a quick Google search for create rss feed from website lead me to 7 Tools To Make An RSS Feed Of Any Website. I quickly tested them all and I agree with the author of the article: Feed43 is the winner.

The process for creating a Google Scholar feed is a little complex. Here’s my first attempt.

Update: interesting FriendFeed thread, where people point out that (a) scraping Google Scholar is quite likely to fail and (b) this is not the same as an alert, since results are not ordered by date.

1. Enter the URL
Follow the create your own feed link, which takes you to a submission form. If your search term is, for example, “prion protein”, you specify the Google Scholar URL and click “Reload”, as shown below:

feed43-1

Specify the URL

2. Define extraction rules
The form displays the HTML source of your URL. Examine the source, identify an item that you want to extract and paste the source for that item in the form field titled “Item (repeatable) Search Pattern”. In the form field above that, titled “Global Search Pattern”, type “{%}”.

Now, go back to the “Item (repeatable) Search Pattern” field and edit it, so that it looks like the example below:

feed43-2

Define extraction rules

This is the complex part, but it should make sense if you have worked with regular expressions. Each publication in the search result lies between a div tag of class gs_rt and a span tag of class gs_fl. In the text between those tags are 4 regions that we want to extract: the publication URL, title, authors and (partial) abstract. So we replace each of those with the “{%}” symbol.

Click the “Extract” button and you should see this result:

feed43-3

Item extraction

Note that this is my first pass at extraction. It detected 9 items whereas I know that there should be 10 on the first page of search results (one is a direct PDF link). If you were doing this “for real”, you would want to spend some time defining the extraction rules.

3. Define output
The next step is to define the format of the RSS feed. My form looks like this:

feed43-4

Define output format

Here, we specify which of the 4 parts that were extracted from the publication will comprise the title (“{%2}”), the article URL (“{%1}”) and the content (“{%3}”, “{%4}”). Click “Preview” to see what the final feed should look like:

feed43-5

Preview feed

4. Create feed
You can go back to any part of the form and make adjustments. When you’re done, click on the link to your new feed at the bottom of the page to view it in your browser or subscribe in your feed reader. Here is our prion protein feed (XML).

That’s it. Of course, the feed formatting will break as soon as Google Scholar decide to alter their HTML, but Feed43 also provide an edit URL for your feed, so at least you can go back and fix it. All in all, it’s a pretty useful tool – until such time as Google Scholar provides its own RSS.

One thought on “Create your own Google Scholar RSS feed

  1. An alternative is just to put the search url directly into Google reader and create a feed that way, but it’s a bit hit and miss and rather ugly.

Comments are closed.