Tie::Handle::CSV
Update 18/8/08: the code in this post is corrupted with smilies…working on it
Bioinformaticians like tabular data; plain ASCII text delimited by tabs, commas or whatever. In the past, I’ve written an awful lot of scripts that begin something like this:
open IN, $file;
while( <IN> ) {
chomp;
my @line = split("\t", $_);
## count fields in a line and check for header
if( $#line == 6 && $line[0] ne "ID" ) {
## do something with fields
}
}
close IN;
Bad programmer! Why not use one of the many Perl modules for handling CSV files, such as Tie::Handle::CSV?
First, create your new object:
my $csv = Tie::Handle::CSV->new($file, header => 1, sep_char => "\t");
Fairly self-explanatory. Now, loop through each record like so:
while( my $record = <$csv> ) {
## get the value of column ID
print $record->{ID}, "\n";
## or do whatever with each field
if( $record->{ID} eq "YAL001C" ) {...}
}
Here’s a sample delimited record (raw output from InterProScan with a custom header added):
SEQ CRC LEN METHOD ACC ID START END SCORE T DATE IPR IPRDESC YAL001C E16ACB7893ED5365 1161 HMMPfam PF04182 B-block_TFIIIC 8 411 0 T 14-Sep-2007 IPR007309 B-block binding subunit of TFIIIC
And here’s the corresponding Tie::Handle::CSV object, displayed using Data::Dumper (“print Dumper $record”):
$VAR1 = bless( {
'SEQ' => 'YAL001C',
'CRC' => 'E16ACB7893ED5365',
'LEN' => '1161',
'METHOD' => 'HMMPfam',
'ACC' => 'PF04182',
'ID' => 'B-block_TFIIIC',
'START' => '8',
'END' => '411',
'SCORE' => '0',
'T' => 'T',
'DATE' => '14-Sep-2007',
'IPR' => 'IPR007309',
'IPRDESC' => 'B-block binding subunit of TFIIIC'
}, 'Tie::Handle::CSV::Hash' );
One word of warning for files, like those from InterProScan, where fields can contain quotes:
YPR190C 9F739FA74F4A18BA 655 superfamily SSF46785 "Winged helix" DNA-binding domain 53 163 9.3e-07 T 25-Sep-2007 NULL NULL
Parsing is handled by Text::CSV_XS, which has quite strict rules to handle quotes. If in doubt or your code is throwing errors, you can always remove them.



I have a perl template which has timestamp and almost EXACTLY what you have written in the beginning of the paragraph. Bad programmer!
paradoxus
December 12, 2007 at 3:39 am
i am not sure if a problem as simple as this really justifies the pains of dealing with external modules. I usually find it far simpler to write perl than to read it (even with my own scripts, not to mention other people’s code), and understanding what exactly a foreign modules does, where the pitfalls are, etc, is often much more tedious than writing a few lines of code. And, your script will run on any computer and won’t require other users to install a bunch of modules first. If you are like me (which I hope you are not) you are also happy that by writing the code yourself you can get away without this ugly object-oriented syntax.
Of course, when it comes to more complex tasks, matters are different.
Kay at Suicyte
December 12, 2007 at 7:17 pm
I’ve personally used the Text::CSV Perl module for handling CSV based results and have never had a problem.
Ryan Castillo
December 22, 2007 at 5:37 am
:) bad programmer here too. I have an alias in the editor for that block that reads tab delimited files.
pedrobeltrao
December 22, 2007 at 6:06 am