Make your own NCBI handbook
My previous post reminded me of an Australian company that used to sell the NCBI Handbook on a CD for AUD 35. Yes, this NCBI handbook – available for free at the NCBI website. The only drawback is that if you want to download a copy, it’s distributed as 24 separate PDF files.
Well you could be stupid and pay 35 dollars plus postage for a free resource – or you could create a single PDF using some freely-available software and a small shell script. Specifically you’ll need:
- wget – to fetch files over HTTP
- PDFjam – to concatenate PDF files into one file
- xargs – to submit the PDF filenames to pdfjoin, part of the PDFjam package
All of these are either available or easy to install on any Linux machine. And possibly other platforms, for all I know.
Here’s a shell script, ncbihbk.sh, to fetch the PDFs and stitch them together. Notice how the sneaky NCBI have named 3 of the files using a different convention to the other 21. I’m sure that it wasn’t deliberate.
#!/bin/sh
# ncbihbk.sh
# fetch NCBI handbook chapters 1-24 and concatenate
for i in `seq 1 24`
do
if [ $i -eq 5 -o $i -eq 13 -o $i -eq 18 ]; then
# chapters 5, 13, 18
echo "Fetching ch$i.pdf..."
wget -q http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/handbook/ch$i.pdf
echo ch$i.pdf >> filelist
# don't bash the servers!
sleep 3
else
# all other chapters
echo "Fetching ch${i}d1.pdf..."
wget -q http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/handbook/ch${i}d1.pdf
echo ch${i}d1.pdf >> filelist
sleep 3
fi
done
# concatenate PDFs from list
echo "Concatenating PDF files..."
cat filelist | xargs pdfjoin --outfile ncbi.pdf
echo "Output in ncbi.pdf"
exit 0
Type “sh ncbihbk.sh”, sit back and relax. Voilà, the NCBI handbook in all its 407-page glory. Another triumph for free software. To concatenate any collection of PDF files, just run “pdfjoin –outfile mypdf.pdf file1.pdf file2.pdf file3.pdf. . .”
To be honest, it’s probably as easy to browse the handbook online.



If you want to blow a bunch of your bandwith, you could always host the file you just created. I believe that the handbook is Public Domain, according to the disclaimer linked to from the NCBI handbook page: http://www.ncbi.nlm.nih.gov/About/disclaimer.html
Regardless, a great peek at effective shell scripting.
Brian Haugen
May 31, 2007 at 2:15 am
If you want to blow a bunch of your bandwith
The final PDF weighs in at about 36 MB. Not too outrageous. Perhaps the Nodalpoint wiki would be the place.
nsaunders
May 31, 2007 at 1:50 pm
Hi,
Thanks for your scripts. However, I noticed several code lines are obsolete or not up to date:
1) The chapter format is ch$i.pdf for *all* chapters. The ch${i}d1.pdf doesn’t seem to work anymore.
2) The -outfile argument needs a second “-” to be valid: –outfile
3) I added a cleaning part.
With those modifications, I get this much shorter code (note it is a bash code and not an sh one…):
#!/bin/bash
# ncbihbk.sh
# fetch NCBI handbook chapters 1-24 and concatenate
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
do
echo “Fetching ch$i.pdf…”
wget -q http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/handbook/ch$i.pdf
echo ch$i.pdf >> filelist
sleep 3
done
# concatenate PDFs from list
echo “Concatenating PDF files…”
cat filelist | xargs pdfjoin –outfile ncbi.pdf
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
do
rm ch$i.pdf
done
rm filelist
echo “Output in ncbi.pdf”
exit 0
This said, thanks again!
http://personomics.wordpress.com
Personomics
July 3, 2008 at 12:54 am
Thanks for the update. This post is over a year old; in general, I don’t go back and revise old posts.
nsaunders
July 3, 2008 at 10:16 am