What You’re Doing Is Rather Desperate

Notes from the life of a bioinformatics researcher

Posts Tagged ‘how to

On parsing

with 8 comments

Parsing – the act of ripping through a file, pulling out the relevant parts and doing something useful with them, is an integral part of bioinformatics. It can be a dull procedure. It can also be challenging, requiring creativity and imagination. Frequently as a bioinformatician, you will generate output from an unfamiliar program, or a colleague will bring you a file that you haven’t encountered. Your task is to figure out how the file is structured, which regular expressions are required to parse it, what kind of output to produce and most importantly, how to handle those rogue files which don’t obey the rules.

Here’s my top ten (language-agnostic) parsing tips, focusing only on non-XML text files.
Read the rest…

Written by nsaunders

September 8, 2008 at 4:09 pm

Missing links: using SwissProt IDtracker in your code

without comments

The BioPerl Bio::DB::SwissProt module lets you fetch sequences from SwissProt by ID or AC and store them as sequence objects:

use Bio::DB::SwissProt;
my $sp = Bio::DB::SwissProt->new('-servertype' => 'expasy', 'hostlocation' => 'australia');
my $seq = $sp->get_Seq_by_id('myod1_pig');

If you obtained SwissProt identifiers from a database that hasn’t been updated for some time, you may find that the ID or AC has changed. For example at NLSdb, the ID from the example shown is given as “myod_pig”. In this case, BioPerl will throw an error like this:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: id does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_id /usr/local/share/perl/5.8.8/Bio/DB/WebDBSeqI.pm:154
STACK: test.pl:3
-----------------------------------------------------------

SwissProt provides a web page named IDtracker to help you find new identifiers using old ones. Here’s how we can integrate the service into Perl.
Read the rest…

Written by nsaunders

March 7, 2008 at 10:51 am

How to: map protein sequence onto chromosomal coordinates using BioPerl

with 6 comments

My coding challenge this week: given a protein sequence and its exons, how do you map single amino acid residues to a location on a DNA sequence? It’s trickier than you might think. Read on for my latest BioPerl how-to.
Read the rest…

Written by nsaunders

February 27, 2008 at 5:16 pm