By the time I started my first postdoc, technology had moved on a little. We still did Sanger sequencing but the radioactive label had been replaced with coloured dyes and a laser scanner, which allowed automated reading of the sequence. During my second postdoc, this same technology was being applied to the shotgun sequencing of complete bacterial genomes. Assembling the sequence reads into contigs was pretty straightforward: there were a few software packages around, but most people used a pipeline of Phred (to call base qualities), Phrap (to assemble the reads) and Consed (for manual editing and gap-filling strategy).
The last time I worked directly on a project with sequencing data was around 2005. Jump forward 5 years to the BioStar bioinformatics Q&A forum and you’ll find many questions related to sequencing. But not sequencing as I knew it. No, this is so-called next-generation sequencing, or NGS. Suddenly, I realised that I am no longer a sequencing expert. In fact:
I am a relic from the Sanger era
I resolved to improve this state of affairs. There is plenty of publicly-available NGS data, some of it relevant to my current work and my organisation is predicting a move away from microarrays and towards NGS in 2012. So I figured: what better way to educate myself than to blog about it as I go along?
This is part 1 of a 4-part series and in this installment, we’ll look at how to get hold of public NGS data.
Read the rest…