Posts Tagged ‘how to’
On parsing
Parsing – the act of ripping through a file, pulling out the relevant parts and doing something useful with them, is an integral part of bioinformatics. It can be a dull procedure. It can also be challenging, requiring creativity and imagination. Frequently as a bioinformatician, you will generate output from an unfamiliar program, or a colleague will bring you a file that you haven’t encountered. Your task is to figure out how the file is structured, which regular expressions are required to parse it, what kind of output to produce and most importantly, how to handle those rogue files which don’t obey the rules.
Here’s my top ten (language-agnostic) parsing tips, focusing only on non-XML text files.
Read the rest…
Missing links: using SwissProt IDtracker in your code
The BioPerl Bio::DB::SwissProt module lets you fetch sequences from SwissProt by ID or AC and store them as sequence objects:
use Bio::DB::SwissProt;
my $sp = Bio::DB::SwissProt->new('-servertype' => 'expasy', 'hostlocation' => 'australia');
my $seq = $sp->get_Seq_by_id('myod1_pig');
If you obtained SwissProt identifiers from a database that hasn’t been updated for some time, you may find that the ID or AC has changed. For example at NLSdb, the ID from the example shown is given as “myod_pig”. In this case, BioPerl will throw an error like this:
------------- EXCEPTION: Bio::Root::Exception ------------- MSG: id does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_id /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:154 STACK: test.pl:3 -----------------------------------------------------------
SwissProt provides a web page named IDtracker to help you find new identifiers using old ones. Here’s how we can integrate the service into Perl.
Read the rest…
How to: map protein sequence onto chromosomal coordinates using BioPerl
My coding challenge this week: given a protein sequence and its exons, how do you map single amino acid residues to a location on a DNA sequence? It’s trickier than you might think. Read on for my latest BioPerl how-to.
Read the rest…


