Update: the “missing article” surfaced in my RSS reader on Nov 1; here’s the link
uncategorized
Giant panda genome: mapped or sequenced?
I’m with Ogden Nash who said:
I love the baby giant panda,
I’d welcome one to my veranda
This week, I learned via Keith that Chinese scientists announced the completion of the giant panda genome. An impressive achievement, given that the project was announced in March this year, but what exactly has been completed? Has the genome been sequenced – that is, there are strings of A, C, G and T covering most chromosomes, or mapped – that is, the approximate chromosomal location of most genes determined? The media seem unsure.
- The Australian: Scientists in China have mapped the genome of the giant panda…
- Window of China (the official source, it seems): Chinese scientists have completed sequencing the genome of giant pandas…
- The China Post: The Chinese-led genome-mapping effort…
- Melbourne Herald Sun: Genome sequencing confirms pandas are bears…but then “endangered giant pandas have had their genomes mapped…”
- Indian Express: Genome mapping to find out why Pandas are sex-shy…
- Discovery Channel: Scientists Map Giant Panda Genome…but then later on “by sequencing the genome, we have laid the genetic and biological foundation to gain a deeper understanding of this peculiar species”
And so on. Here’s a Google News search with more hits.
So what has been achieved – sequencing or mapping? If the former, is it really complete (I doubt this) or draft – and if draft, what kind of quality? And where are the data? Nothing in the genome project section of NCBI as yet.
Not as many structures as you might think
In the midst of preparing a talk for next Monday. It occurred to me that perhaps we don’t see more protein structure-based prediction in bioinformatics because – there aren’t enough structures.
Sure, the PDB has grown a lot in the past 5 years or so and 53 103 structures (as of now) looks impressive. However, if you’re interested in protein-protein interaction, you want at least 2 chains: which more or less halves the dataset. If you want two different protein chains, you lose almost another 75%. Let’s specify a reasonable minimum resolution for X-ray diffraction data and there go ~ 3 000 entries. We probably don’t want multiple, similar proteins so let’s remove sequence identity at a redundancy of 90%. We’re left with about 2% of the original PDB, which might be useable for looking at interactions.No wonder that most bioinformatics focuses on sequences and high-throughput interaction data.