Explaining the resistance to sharing data

I have a theory. My theory is that many scientists are prone to doublethink. They believe that they are acting in a certain way when in fact, they’re doing the exact opposite.

Take data sharing.

Young scientists are taught that science is all about being open. Making your results available (by publishing articles), communicating them to others (by seminars). Yet at the same time, they are taught work practices which encourage them to be solitary creatures inside their own little bubble. Keep a private, jealously-guarded lab notebook. Reinvent the wheel over and over again when developing protocols. Install software and store files on your own machine instead of using the lab server.

This makes it difficult for them to understand data sharing best practice. I’ve tried to explain to my colleagues on multiple occasions that if our lab wants to share bibliographic data, an Endnote binary ENL file is not an appropriate format. I point out that different versions of Endnote may cause compatibility problems, that they may want to try software other than Endnote in the future, that they may want to use the data for purposes other than Endnote import and that not everyone uses Endnote. I try to explain that export to a plain text format such as RIS or BibTeX is not difficult, that such a format will always be usable, that anyone can easily import those formats if they wish. To no avail. We can’t seem to get past the “but everyone uses Endnote – oh wait, you don’t – well, everyone but you uses Endnote, you’re just a techie Linux nerd weirdo” argument.

It’s nothing to do with the software or the level of computer literacy. It’s simply that they are not used to sharing files and so have never thought about what it means to share data. Sharing data means that your personal preferences regarding software no longer count. To make data available is to accept that other people might use it in any number of ways and so to provide it in a generic, utilisable platform-independent format.

We do a reasonable job of teaching young researchers to communicate their results. Let’s do something about teaching them to communicate their data.

8 thoughts on “Explaining the resistance to sharing data

  1. I agree, I think many scientist who work in the field of Biology are averse to sharing data and their Ideas in general. That is why we don’t have a thing like arXiv for Biology in general [yes I know q-bio, but look at the people who write articles there].
    PS: BibTeX… come on man, LaTeX scares the people who are used to Endnote :) Reminds of of the chapter [ http://www.catb.org/~esr/writings/taoup/html/textualitychapter.html ] from “The Art of Unix Programming” [http://www.catb.org/~esr/writings/taoup/html/] by ESR [http://www.catb.org/~esr/].

  2. I’m sure young scientists also quickly realize that science is rather competitive, and why make available your experimental data and protocols when that costs you time, gets you no special credit, and moreover may expose the corners you inevitably cut to get that nice graph for the Nature paper…

  3. LaTeX scares the people who are used to Endnote

    It might. I was using BibTeX as an example of a plain old ASCII file – it’s a good storage format, never mind LaTeX.

    science is rather competitive

    True, sharing brings time and privacy issues. I wasn’t thinking about sharing with the world so much as simply sharing with your lab or colleagues.

  4. I work in a rather large lab which is exactly the way you describe. Let me give you an examples

    In order to share lab protocols, the idea was first to give everyone a zip disk with the Word file on it. Then they had the bright idea to put the word file in a shared folder on one of the computers. However, not everyone can find it, though, because there are many different ad-hoc domains to which various computers in the workgroup belong. “Home” is a popular one.

    I floated the idea of writing a protocol wiki, but once I explained it was like wikipedia, but only for our lab, everyone thought it was a bad idea because “just anyone could change something”.

    Periodically, there will be emails sent around demanding that everyone remove their files from the various computers that run the instruments we use because the hard drives are filling up.

    I suggested that maybe we wouldn’t have to do that if all the machines were connected to a file server on which everyone had a user account and a quota.

    This would have had backups easier, too, but the department decided it would be a better idea to buy everyone external drives which didn’t come with backup software so are mostly sitting in shrink wrap on people’s shelves.

    So, yeah. Even if you’re just sharing among your colleagues, you have to convince then somehow that it’s worth their time and effort, and the only way you can do that is by setting some stuff up all on your own, and waiting until someone has a problem you’ve already solved. Until then, they could really care less.

  5. lolz the “but everyone uses Endnote – oh wait, you don’t – well, everyone but you uses Endnote, you’re just a techie Linux nerd weirdo” argument.” is soo classic..

    I think the problem is the ‘burden’ tat comes with doing things the hard way to make it easier. the burden is that u have to provide technical support for sth new and this might cause reliance..
    btw I use Jabref(bibtex) with openoffice

  6. Related to Mr. Gunn’s wiki idea: I had recently brought up the idea of a basic genome annotation/protocol wiki/blog combo localized to a lab (not mine, BTW), where lab members could update gene annotations or protocols manually or run a script which automates annotation updates. The majority of them liked the idea, but one of them wanted to have the wiki installed on each computer (not on a lab server). Hmm… then why have a wiki?

    I wanted to write this up to the person not really getting the idea of a wiki (regardless of their obvious intelligence), but I think you have probably hits closer to the truth: the inability to share data, even among lab members.

  7. In my workplace, essentially a university, data sharing within the research groups has been embraced. Originally Novell was used but recently space considerations encouraged the move to simple FTP servers etc. Blogs are also being used to share information and files. But this doesn’t necessarily extend beyond the local groups.

    On the negative side, sharing of data is sometimes abused. As you know, many sequencing centres (eg. Sanger, TIGR) release their data “for free use” EXCEPT for publication purposes where the licence requires permission. There have been 2 recent examples of projects we were working on using that data with permission/collaboration, but were “trumped” by others who took and published results from the data without permission. People get burnt and learn to hide.

Comments are closed.