It’s what – 10 years or more? – since we began to wonder when web technologies such as RSS, wikis and social bookmarking sites would be widely adopted by most working scientists, to further their productivity.
The email that I received today which began “I’ve read 3 interesting papers” and included 1 .doc, 3 .docx and 4 .pdf files as attachments is indicative of the answer to this question, which is “not any time soon.”
I’ve given up trying to educate colleagues in best practices. Clearly, I’m the one with the problem, since this is completely normal, acceptable behaviour for practically everyone that I’ve ever worked with. Instead, I’m just waiting for them to retire (or die). I reckon most senior scientists (and they’re the ones running the show) are currently aged 45-55. So it’s going to be 10-20 years before things improve.
Until then, I’ll just have to keep deleting your emails. Sorry.
…please, try to follow these simple guidelines.
1. Don’t bother to format the cells
Where possible, I will not open your spreadsheet in a spreadsheet application. If I do, it will be only to marvel at the horror, then export it as rapidly as possible to a delimited text file. I do not care about the font, the font size or the font weight. I do not care whether there are grid lines around the cells. I especially do not care about cells which you have highlighted using some arbitrary (and unexplained) colour scheme.
2. No multiple tables
If you include multiple “tables” on one sheet, separated by blank rows, there is a good chance that I will not notice them. If you include multiple tables on multiple “sheets”, there is an excellent chance that I will not notice them.
3. Be consistent
If you must use confusing, abbreviated terms for your row and column names, at least keep them consistent. When you suddenly switch from “Patient ID” to “MCO_ID”, or from “Tissue Bank ID” to “TB ID” but leave everything else the same, I (and my software) assume that you’re talking about something different.
4. Yes/No = 1/0
Would it kill you to think as hard about the type and structure of your data as the data itself? If your variable takes one of two values in a “yes/no” fashion, the best representation is 1 or 0. That goes for “wt/mut” too. If you must use “Y/N”, don’t suddenly switch to “Yes/No” (or case-sensitive variations thereof) just because you feel like it.
5. If it doesn’t exist, it shouldn’t be there
Just leave the cell blank. I don’t want to see “n/a”, “NA”, “?”, “-” or anything else.
6. What belongs with what?
Have you noticed that certain bits of your data belong with other bits? For example, you can take several samples from a patient and do several experiments using those samples? Perhaps you’ve heard the term “relational data”? Well, that’s what it means.
If you could find a way to highlight those relations in your spreadsheet (no, not using coloured cells please), it would really help. On second thoughts: why don’t you come and see me before collecting your data? We’ll design a database together. You might even realise why I hate your stupid spreadsheets so much.