Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything.
There is, however, a more formal definition of the term spurious correlation or more specifically, as the excellent Wikipedia page is now titled, spurious correlation of ratios. It describes the following situation:
- You take a bunch of measurements X1, X2, X3…
- And a second bunch of measurements Y1, Y2, Y3…
- There’s no correlation between them
- Now divide both of them by a third set of measurements Z1, Z2, Z3…
- Guess what? Now there is correlation between the ratios X/Z and Y/Z
It’s easy to demonstrate for yourself, using R to create something like the chart in the Wikipedia article.