Sequencing for relics from the Sanger era part 1: getting the raw data

Sequencing in the good old days

In another life, way back in the mists of time, I did a Ph.D. Part of my project was to sequence a bacterial gene which encoded an enzyme involved in nitrite metabolism. It took the best part of a year to obtain ~ 2 000 bp of DNA sequence: partly because I was rubbish at sequencing, but also because of the technology at the time. It was an elegant biochemical technique called the dideoxy chain termination method, or “Sanger sequencing” after its inventor. Sequence was visualized by exposing radioactively-labelled DNA to X-ray film, resulting in images like the one at left, from my thesis. Yes, that photograph is glued in place. The sequence was read manually, by placing the developed film on a light box, moving a ruler and writing down the bases.

By the time I started my first postdoc, technology had moved on a little. We still did Sanger sequencing but the radioactive label had been replaced with coloured dyes and a laser scanner, which allowed automated reading of the sequence. During my second postdoc, this same technology was being applied to the shotgun sequencing of complete bacterial genomes. Assembling the sequence reads into contigs was pretty straightforward: there were a few software packages around, but most people used a pipeline of Phred (to call base qualities), Phrap (to assemble the reads) and Consed (for manual editing and gap-filling strategy).

The last time I worked directly on a project with sequencing data was around 2005. Jump forward 5 years to the BioStar bioinformatics Q&A forum and you’ll find many questions related to sequencing. But not sequencing as I knew it. No, this is so-called next-generation sequencing, or NGS. Suddenly, I realised that I am no longer a sequencing expert. In fact:

I am a relic from the Sanger era

I resolved to improve this state of affairs. There is plenty of publicly-available NGS data, some of it relevant to my current work and my organisation is predicting a move away from microarrays and towards NGS in 2012. So I figured: what better way to educate myself than to blog about it as I go along?

This is part 1 of a 4-part series and in this installment, we’ll look at how to get hold of public NGS data.
Read the rest…