Automated SRS using Perl and LWP

One of my earliest discoveries in Perl was a module named LWP::Simple. It was one of those “aah…” moments – the realisation that I could automate the retrieval of data from the web without using a browser.

I ignored SRS for years, largely because its “pointy-clickiness” suggested that options for automated retrieval were limited. Recently though, I came across an excellent EBI guide called Linking to SRS, which explains how to construct URLs to query the EBI SRS server. Time then, for some Perl + LWP magic.

There are numerous ways to query an SRS server. I’d like to retrieve every putative protein kinase from SwissProt. One way to do this is to look for sequences that are annotated with a protein kinase domain. We can do this by using the InterPro accession number, IPR000719. So, we construct an SRS URL like so:

We retrieve results as plain text (no HTML) using the -ascii switch. We can also define various output formats – in this case “-vn+2” corresponds to the complete entry in SwissProt format. By default this will return 30 results, so we can add another switch “+-lv+NNNN” to fetch NNNN results. Wait – how do we know how many results to fetch? By first using the “+cResult” option to count them. Note that cResult is an option specific to the EBI SRS server.

Here’s how all of that translates into Perl.

1   use strict;
2   use LWP;
3   my $ua       = LWP::UserAgent->new;
4   my $base     = 'http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-noSession';
5   my $cResult  = '+-page+cResult+-ascii';
6   my $kinase   = '+[swissprot-DBxref_:IPR000719*]';
7   my $fullview = '+-vn+2+-ascii+-lv+';
8   my $regex    = '^(\d+)\s';
9   my $query;
10  my $count;
11  my $lv;
12  ## count + fetch kinases
13  $count = $ua->get($base.$kinase.$cResult);
14  die "Can't get URL -- ", $count->status_line unless $count->is_success;
15  if($count->content =~/$regex/) {
16      $lv = $1;
17      print "$lv kinase entries found\n";
18      print "Fetching kinases as swissprot...\n";
19      $query  = $ua->get($base.$kinase.$fullview.$lv, ':content_file' => 'kinases.dat');
20                                 }

First we declare modules and variables (lines 1-11). In line 13 we construct the SRS URL to count the results of the query. Line 14 illustrates a way to detect whether retrieval was successful. The cResult option returns a line like this:

2500 entries for [swissprot-DBxref_:IPR000719*]

We use a regex (line 15) to extract the value (2500), which we then use as the “+-lv+” option (number of results to return). Line 19 constructs the URL which fetches all 2500 results as ASCII SwissProt files. Rather than store these as a variable (which could become large using certain queries), we write out to a file “kinases.dat” using the get(‘:content_file’ => FILE) method in LWP. LWP::Simple has a similar method named getprint().

And that’s it. Take a look at the documentation for more details about the options that you can add to URLs.

2 thoughts on “Automated SRS using Perl and LWP

Comments are closed.