Perl wrapper for DGPI

DGPI is a software tool (Java) for the prediction of GPI anchors (GPI = glycosylated phosphatidylinositol), which various organisms use to attach proteins to membranes.
Are there GPI anchors in Archaea? We’re not sure. You can download DGPI for free to run locally but like many packages, it accepts only one input sequence at a time. Can we install it and analyse one of our archaeal genomes, asks a member of my lab?
So I whipped up a quick and dirty Perl wrapper. Given a fasta file with multiple sequences it takes each one, runs it through DGPI, parses the output and prints out a nice CSV file for spreadsheet import. One problem is that DGPI output is rather verbose and variable – without knowing all the output variations, we could be missing something when we parse. But the main thing is to extract a simple “yes or no” for each sequence, which is easy enough.
It’s quick – churned through 2 493 sequences in about 40 minutes on my slow 2.66 GHz Celeron machine and the parsing seemed to work well. And it identified 128 candidates which after all is the goal of bioinformatics – reducing the data down to something that you can sift through. Many of them appear to be predicted membrane proteins, some with interesting roles, so should be good times ahead for the lab.