Term::ProgressBar

“Joy is a simple progress bar for your command line script”, to paraphrase Greg in a recent IRC chat.
Here’s the scenario – you have a command line script that processes a large number of sequence files. There’s no output until the very end and it may take minutes or hours to complete, so you’d like to see some indication that something is happening. Perl offers a range of progress bar modules for this task. Here’s a brief guide to one of them – Term::ProgressBar.

First, obtain Term::Progress bar via CPAN:

sudo perl -MCPAN -e 'install Term::ProgressBar'

Installation was problem-free for me. Your next task is to figure out how to measure the progress of your script. If you’re processing sequences, the obvious solution is to increment a counter each time a sequence is processed, up to 100% completion when all sequences are processed. Let’s say your sequences are in a fasta file. You can count them up like so:

my $infile = shift || die("Usage = scriptname <fa file>\n");
my $seqcount = 0;
open IN, $infile;
while(<IN>) {
    $seqcount++ if(/^>/);
            }
close IN;

You can devise other regexes for different sequence formats, e.g. “^ID\s+” for swissprot. There are probably cleaner ways to count sequences, but this way works for me. In Bioperl you could use $seqio->next_seq, but why waste time and memory creating a bunch of SeqI objects just for counting, when you’re going to do that later anyway?

Next, set up your progress bar:

use Term::ProgressBar 2.00;
my $progress = Term::ProgressBar->new({name => 'Parser', count => $seqcount, ETA => 'linear',} );
   $progress->max_update_rate(1);
   my $next_update = 0;
   my $max = $seqcount;
   $seqcount = 0;

Here, we give our bar a name (anything you like), a maximum count and say that we’d like to see an estimated time of completion. ‘Linear’ is the only option for ETA and is obviously a very rough estimate, based on time taken for previous cycles. We update at most once a second to avoid CPU thrashing and declare some variables. $next_update is used to update and draw the bar, $max tells the bar when 100% is reached and we reset $seqcount to 0 before we start sequence processing.

OK – now we load up our sequences and process them. When a sequence is done, we increment $seqcount and use that to update our progress bar. Here’s how it looks when we put it all together:

use Term::ProgressBar 2.00;
use strict;
use Bio::SeqIO;
## count sequences
my $infile = shift || die("Usage = scriptname \n");
my $seqcount = 0;
open IN, $infile;
while(<IN>) {
    $seqcount++ if(/^>/);
            }
close IN;
## setup the bar
my $progress = Term::ProgressBar->new({name => 'Parser', count => $seqcount, ETA => 'linear',} );
   $progress->max_update_rate(1);
   my $next_update = 0;
   my $max = $seqcount;
   $seqcount = 0;
## process the sequences
my $seqio = Bio::SeqIO->new('-file' => $infile, '-format' => 'fasta');
while(my $seqobj = $seqio-> next_seq) {
...do stuff with the sequence here
## update the bar
$next_update = $progress->update($seqcount) if($seqcount > $next_update);
                                      }
## set bar = 100% when finished
$progress->update($max) if($max >= $next_update);

That’s about it. Quite simple, effective and satisfying. Here it is in action:
progress.png

3 thoughts on “Term::ProgressBar

  1. chris

    Cool! I usually just print an index out or something, which gives you a vague idea how far along the processing is, but not of time.

  2. nsaunders Post author

    I didn’t mention that Term::ProgressBar supports message printing as well as the bar:

    $progress->message(“Completed $seqcount sequences”);

    I find the bar alone is better visually, but it’s there if you want it.

Comments are closed.