The real meaning of spurious correlations

Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything.

There is, however, a more formal definition of the term spurious correlation or more specifically, as the excellent Wikipedia page is now titled, spurious correlation of ratios. It describes the following situation:

  1. You take a bunch of measurements X1, X2, X3…
  2. And a second bunch of measurements Y1, Y2, Y3…
  3. There’s no correlation between them
  4. Now divide both of them by a third set of measurements Z1, Z2, Z3…
  5. Guess what? Now there is correlation between the ratios X/Z and Y/Z

It’s easy to demonstrate for yourself, using R to create something like the chart in the Wikipedia article.

Read the rest…