Exploring the NCBI taxonomy database using Entrez Direct

I’ve been meaning to write about Entrez Direct, henceforth called edirect, for some time. This tweet provided me with an excuse:

This post is not strictly the answer to that question. Instead we’ll ask: which parent IDs of records for insects in the NCBI Taxonomy database have the most species IDs?
Continue reading

Fixing aberrant files using R and the shell: a case study

Once in a while, you embark on what looks like a simple computational procedure only to encounter frustration very early on. “I can’t even read my file into R!” you cry.

Step back, take a deep breath and take note of what the software is trying to tell you. Most times, you’ve just missed something very straightforward. Here’s an example.

Update: this post is not about how best to perform the task; it’s about how to cope with frustration. Please stop sending me your solutions :-)
Continue reading

Text to fasta and other delights of the shell

One thing I’ve learned in my current job is that some familiarity with Linux tools for processing text files: awk, sed, grep, head/tail, cut/paste and so on, often provides a speedier solution than writing a script in (insert scripting language of choice here). I know this stuff is trivial to shell gurus, but I still get a little buzz out of it. A couple of real-life examples.
Read the rest…