15 year-old error results in improved performance?

Here’s an interesting letter in the current issue of Nature Biotechnology (subscription only):

In the course of analyzing the evolution of the Blocks database2, we noticed errors in the software source code used to create the initial BLOSUM family of matrices […] The result of these errors is that the BLOSUM matrices—BLOSUM62, BLOSUM50, etc.—are quite different from the matrices that should have been calculated using the algorithm described by Henikoff and Henikoff. Obviously, minor errors in research, and particularly in software source code, are quite common. This case is noteworthy for three reasons: first, the BLOSUM matrices are ubiquitous in computational biology; second, these errors have gone unnoticed for 15 years; and third, the ‘incorrect’ matrices perform better than the ‘intended’ matrices.

Are they right? Does it matter?

4 thoughts on “15 year-old error results in improved performance?

  1. Deepak

    It may not matter in this case, but in the grand scheme of things, it scares the heck out of me. Unfortunately, I’ve seen some errors that should have been found through simple testing and debugging, and propagated for a few years.

  2. Egon Willighagen

    Yeah, these things happen. The thing is, we don’t really care about correctness of anything, particularly not if it just works. A famous example of this, is the development of Partial Least Squares (PLS). Some student was supposed to write a regression method of a vector Y against a matrix X; don’t remember which one exactly, but the student messed up while doing the rotations in X and Y to maximize the correlation, for which an iterative process was used. Now, he messed up the individual steps, by taking intermediate results on the X matrix and use that for changing the Y matrix and visa versa. However, instead of giving crap, which it ought to have, it actually give much better results than the method should have given. There was the birth of PLS. Details can be found in literature on PLS.

  3. emsi

    i’am scared even of my own programs if they do not have the right … unittests :P

  4. Pingback: Around the web - March 8, 2008 : business|bytes|genes|molecules

Comments are closed.