Dumped on by data scientists

A story in The Chronicle of Higher Education reminded me that I’ve been meaning to write about “data science” for some time.

The headline to the story:

“Dumped On by Data: Scientists Say a Deluge Is Drowning Research”

Rather amusingly, this is abbreviated in the URL to “Dumped-On-by-Data-Scientists”; a nice example of how the same words, broken in the wrong place, can lead to a completely different meaning.

Anyway, to the point. The term “data scientist” – a good thing, or not?
I’m throwing this one out there because I spent much of 2010 (a) reading articles that used the term and (b) trying to decide whether I like it or not – and I still can’t decide.

Arguments for:

  • It’s an attention-grabber, designed to make us think about the tools and skills required to analyse “big data” in the same way that “NoSQL” is designed to make us think about alternative database solutions

Arguments against:

  • The “data” part is redundant, since all scientists deal with data
  • It belittles the job title of “scientist”; the term might be construed as dismissive of the education, training and skills required to do “boring old school science” as opposed to “new, flashy sexy data science”
  • Many (most?) “data scientists” do business intelligence, not science; crunching Twitter posts to help formulate a better product marketing strategy is not the same as addressing a genuine scientific problem

At the heart of the issue, I feel, is a different approach to data. In “data science” we start with everything, give it a shake and see if answers to our questions fall out. In “real science” we start with a specific question, generate data designed to answer that question and see what falls out. Perhaps they are just different philosophies and mindsets. Perhaps each can learn from the other.

I guess with one “for” and three “against” I’ve decided that I don’t like the term “data scientist”, but I can’t quite shake the feeling that it has some use. What do you think?