The terms “big science” and “big data” have recently become quite prominent on the Web. For commentary, I point you to the man with the tag.
There are those who believe that big data means fundamental change in how science is done. We’ll take all this data, make it machine-readable, put it in the cloud and – poof! – science will emerge. Almost as if it were self-aware. At the other extreme are those who see no fundamental difference in how we go about our business – there’ll just be “more” of it.
One analysis, of course, is that they’re both right and they’re both wrong.
There’s a word that I expected to see much more frequently than I did as the arguments flew back and forth. That word is questions. Science is, fundamentally, the business of asking questions. When we don’t know very much, we ask basic questions: why is the sky blue? As we learn more, questions get more specific. Knowing that cells divide we ask: how do they know when to start? And stop? And what happens when those signals go wrong? Pretty soon we’re asking extremely specific questions, such as “what are the mechanisms of E2-mediated down-regulation of the BTG2 gene?” Is it the great irony of our age that as the data get bigger, our questions get smaller? I digress…
Data, no matter how “big”, without questions are inert. They just sit there. Great science arises out of smart questions. The difference with big data is that (1) we can think up questions that might once have been thought impractical – how does the expression of every gene in my organism alter under these conditions? and (2) we need smart ways to ask and answer the questions – meaning technology and computation.
Hence the title of this post, which I think I’d summarise like this:
We used to ask questions, then generate the data.
Now we generate the data, then think of the questions.


