Poor reproducibility: understandable, if not desirable

Greg Wilson once told me a statistic concerning the mean lifetime of research software reproducibility. That is, the time that elapses on average after which you cannot reproduce your own results using your own code, never mind anyone else’s. I forget the exact number but it was not high – a few months at best.

Why does this happen, aside from obvious bad practices? Well, here’s a typical exchange in an academic research setting:

Doctor X: Oh! I should have included the PDB header and diffraction resolution in my database table. Guess I’ll have to modify my parser.

Doctor Y: No, don’t do that. I have those columns in one of my tables. I’ll just dump them out and you can import them in.

Doctor X: Great, thanks!

Six months later…

Doctor X: Hey, I’m writing up my analysis, can you describe how you generated your database table?

Doctor Y: Erm…I don’t really remember. Did I get it from Doctor Z?

Doctor X: Hmm. Writes – (data were obtained from Doctor Y…)

It’s not pretty, but it happens – because we’re human, I guess. Read Greg’s blog for much more on good software practice in research.