A lot of bioinformatics consists of fetching files in various formats from databases and writing parsers to extract features. What to do when one of your trusty parsers unexpectedly fails?
- Don’t panic
- Make sure that you haven’t done something silly:
- did you inadvertently alter the code recently?
- did you run a different version of the code by mistake?
- did you use the correct file(s) as input?
- does the machine that you’re using have the required libraries and software used by the parser?
Take a look at the file – use something like grep if possible to examine specific lines and see if their format has altered.
One of my more robust perl scripts is designed to examine the MOD_RES line in the feature table section of a SwissProt file for protein kinase names. Imagine my surprise when out of the blue, not a single name appeared in the ~50 000 line output file. A quick “grep MOD_RES file.dat | less” revealed this alteration:
Previous: FT MOD_RES 353 353 Phosphoserine (by MAPK12 and MAPK9) Current: FT MOD_RES 353 353 Phosphoserine; by MAPK12 and MAPK9.
Might be time to fix up your regexes if you have code that parses SwissProt format.