If you must send me an Excel spreadsheet…

…please, try to follow these simple guidelines.

1. Don’t bother to format the cells
Where possible, I will not open your spreadsheet in a spreadsheet application. If I do, it will be only to marvel at the horror, then export it as rapidly as possible to a delimited text file. I do not care about the font, the font size or the font weight. I do not care whether there are grid lines around the cells. I especially do not care about cells which you have highlighted using some arbitrary (and unexplained) colour scheme.

2. No multiple tables
If you include multiple “tables” on one sheet, separated by blank rows, there is a good chance that I will not notice them. If you include multiple tables on multiple “sheets”, there is an excellent chance that I will not notice them.

3. Be consistent
If you must use confusing, abbreviated terms for your row and column names, at least keep them consistent. When you suddenly switch from “Patient ID” to “MCO_ID”, or from “Tissue Bank ID” to “TB ID” but leave everything else the same, I (and my software) assume that you’re talking about something different.

4. Yes/No = 1/0
Would it kill you to think as hard about the type and structure of your data as the data itself? If your variable takes one of two values in a “yes/no” fashion, the best representation is 1 or 0. That goes for “wt/mut” too. If you must use “Y/N”, don’t suddenly switch to “Yes/No” (or case-sensitive variations thereof) just because you feel like it.

5. If it doesn’t exist, it shouldn’t be there
Just leave the cell blank. I don’t want to see “n/a”, “NA”, “?”, “-” or anything else.

6. What belongs with what?
Have you noticed that certain bits of your data belong with other bits? For example, you can take several samples from a patient and do several experiments using those samples? Perhaps you’ve heard the term “relational data”? Well, that’s what it means.

If you could find a way to highlight those relations in your spreadsheet (no, not using coloured cells please), it would really help. On second thoughts: why don’t you come and see me before collecting your data? We’ll design a database together. You might even realise why I hate your stupid spreadsheets so much.

6 thoughts on “If you must send me an Excel spreadsheet…

  1. Joerg Kurt Wegner

    My personal favorite is …

    Please do not merge rows and colums, though it might be visually appealing. Interoperability drops to zero. When reading this with a software the rows and columns are typically misinterpreted and wrong data is assigned to wrong columns/rows.

  2. Jonathan Badger

    Seconded. Of course there’s always the opposite problem, where people want the whole relational database you built dumped out as a spreadsheet — never mind that this is going to result in a multi gigabyte file….

  3. Margaret Smith

    Excellent summary. Another problem is being presented with a table of data with no unique identifier for each data point, two outcomes and the statement “They’re in order….”. Not after my stats package gets through with them, they ain’t.

  4. Robby

    Great points. But…

    I often get data in Excel spreadsheets violating almost all points you mentioned, from a guy who has published several papers in high impact journals (including Nature, Cell) over the last few year.

    How am I going to tell this guy that “he is doing it the wrong way”, in a way that does not make him think twice before sending me his data? Any advice?

    1. nsaunders Post author

      That would depend to a degree on the specific dynamics of your relationship with the offender. I guess there are 2 approaches.

      If you have a system in place to clean up the data and it doesn’t take too much of your time (or create too much stress), you can just take a deep breath, deal with it and say nothing.

      If you just can’t take it any more, then you need to say something, tactfully. I’d start by saying “there are a few aspects of your data which make it difficult to process efficiently.” Outline what they are and why they are a problem. Suggest what you would like to see instead and why this would help you. I think most scientists appreciate constructive criticism. They also appreciate faster results, so you need to explain that you can do a better, faster job if they follow your guidelines. The main thing is to be constructive and avoid any personal references or suggestions that “it’s their fault” or “they are stupid”.

  5. Pingback: Spreadsheet love « Suelibrarian

Comments are closed.