Data corruption using Excel: 12+ years and counting

Why, it seems like only 12 years since we read Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics.

And can it really be 4 years since we reviewed the topic of gene name corruption in Gene name errors and Excel: lessons not learned?

Well, here we are again in 2016 with Gene name errors are widespread in the scientific literature. This study examined 35 175 supplementary Excel data files from 3 597 published articles. Simple yet clever, isn’t it. I bet you wish you’d thought of doing that. I do. The conclusion: about 20% of the articles have associated data files in which gene names have been corrupted by Excel.

What if there is no tomorrow? There wasn’t one today.

We tell you not to use Excel. You counter with a host of reasons why you have to use Excel. None of them are good reasons. I don’t know what else to say. Except to reiterate that probably 80% or more of the data analyst’s time is spent on data cleaning and a good proportion of the dirt arises from avoidable errors.

What’s in a (gene) name?

I’ve posted before on standard names (or lack thereof) for genes and proteins and in particular, the whacky names of which biologists are so fond. Hopefully they now realise that in the age of bioinformatics – where we have to find stuff easily – descriptions such as ken and barbie, scott of the antarctic or glass-bottom boat are, um, unhelpful to say the least.

So hot on the heels of my “man, you can publish anything in bioinformatics these days” post comes:

Seringhaus, M. et al. (2008).
Uncovering trends in gene naming.
Genome Biology 9:401 Abstract | DOI 10.1186/gb-2008-9-1-401

We take stock of current genetic nomenclature and attempt to organize strange and notable gene names. We categorize, for instance, those that involve a naming system transferred from another context (for example, Pavlov’s dogs). We hope this analysis provides clues to better steer gene naming in the future.

It’s actually a fun and informative read.

See also: FlyNome, Clever Drosophila gene names and Sonic Hedgehog Sounded Funny, at First. From the latter source: “It’s a cute name when you have stupid flies and you call it a ‘turnip,’ ” Dr. Doe said. “When it’s linked to development in humans, it’s not so cute any more.”