In the midst of preparing a talk for next Monday. It occurred to me that perhaps we don’t see more protein structure-based prediction in bioinformatics because – there aren’t enough structures.Sure, the PDB has grown a lot in the past 5 years or so and 53 103 structures (as of now) looks impressive. However, if you’re interested in protein-protein interaction, you want at least 2 chains: which more or less halves the dataset. If you want two different protein chains, you lose almost another 75%. Let’s specify a reasonable minimum resolution for X-ray diffraction data and there go ~ 3 000 entries. We probably don’t want multiple, similar proteins so let’s remove sequence identity at a redundancy of 90%. We’re left with about 2% of the original PDB, which might be useable for looking at interactions.
No wonder that most bioinformatics focuses on sequences and high-throughput interaction data.
This is hardly earth-shattering stuff, but just for reference.
There are multiple ways to grab PDB files from the RCSB PDB servers. If you know the accession code of a structure, the simplest way is wget (or similar) straight off the FTP or HTTP server:
FTP wget ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdbXXXX.ent.gz HTTP wget http://www.rcsb.org/pdb/files/XXXX.pdb.gz
where XXXX is the 4-character PDB accession code.
Note the recent change of URL for the PDB archive: ftp://ftp.wwpdb.org. Note also the confusing 2, not 3 “w” in the URL.
- People are finding many outlets for their work. Pierre maintains a repository of tools where you can find IBDStatus, his latest software for genetic analysis.
- Spotted in Nature this week:
Makes perfect sense doesn’t it: if you publish an article on a structure, include a link to the PDB resource. Yet so far as I can tell this is a new feature, since it jumped out at me. Given that the WWW is such a rich publishing platform, simply because of hyperlinks that connect data, how long before paper copies of all journals are considered quaint and obsolete?