Mysteries of CCP4

Since moving to a structural biology group, I’ve had to become somewhat familiar with CCP4, a suite of programs that do all manner of things using structural data, typically PDB files.

Being a bioinformatician, I tend to ignore the GUI in favour of the input -> script -> output approach, as I’m mostly interested in batch processing. Documentation describing this operation for CCP4 programs is strangely lacking on the web. The best that I can find is the CCP4 wiki; if you know the package well, please contribute to it.

I eventually dug up what I was looking for in:

/opt/ccp4/ccp4-6.0.2/examples/unix/runnable/

or the equivalent on your system. Here, you’ll find a collection of shell scripts, confusingly named with the suffix “.exam”. As an example, here’s how you might run the program “contact”:

#!/bin/sh
set -e
contact XYZIN $1 << eof > $1.contact
MODE ALL
ATYPE ALL
eof

You might run that on a bunch of files using e.g. “find ./ -name “*.pdb” -exec contact.sh {} \;”.

Other mysteries: many CCP4 programs are quite happy to take gzipped files as input (such as you might download from the PDB FTP archive), but may choke if unzipped filenames don’t contain “.ent” or “.pdb”.

2 thoughts on “Mysteries of CCP4

  1. Marcin Cieslik

    Know your pain. It gets even more mysterious when you try to run those apps from within e.g. python. Some application will start running after the END card some after the eof statement others only after closing the input pipe or opening the output pipe, and some will mangle the output coordinates with summary information it’s a pain.

  2. Eugene Valkov

    I see your point with regard to CCP4, but the idea behind it was to organise output/input files to many separate programs that comprise rather than make it more user-friendly per se. I, as a structural biologist, still tend to run many programs that are part of CCP4 through a command-line script, but solving a crystal structure is a fairly complex process and it may require running the same job many times with slight modifications and keeping track of output files from one program to feed into another. That can quickly mushroom out of all proportion. That is where the organisational benefits of a GUI become really useful as otherwise you end up with a complicated jumble of folders and cryptically named output files that made perfect sense just several days ago but becomes as mysterious as ancient Mayan in several weeks’ time! CCP4 GUI keeps jobs neatly organised and you can easily sort jobs by names, dates and other parameters and it has many little tools for easy visualisation of logfiles and display graphs from tables of data etc. It’s very interesting to get your perspective on it though.

Comments are closed.