On the passing of Hans Rosling

It would be remiss not to mention briefly the passing of Hans Rosling. Data needs storytellers and the world needs advocates for evidence-based decision making. We have lost one of the best.

For some insights into the man and his interesting (and at times challenging) life, I highly recommend this news feature. You can enjoy presentations at the Gapminder website: I’d start with the documentary The Joy of Stats.

Perhaps I should not be surprised or annoyed – but I am – at the lack of coverage this story received at news outlets, particularly in Australia. Aside from an obituary at Guardian Australia (not on the front page), I don’t believe the news featured at all in any other major Australian news publisher. Perhaps not unrelated, stories like this feature quite frequently.

I’m told that in Europe, this effort at the BBC was one of few major reports.

Maybe I live in a data science bubble, but I would think this is a person and an event “of note”. Thanks for the stories, Hans.

Nice graphic? Are they taking the p…

Yes, it started with a tweet:

By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts.

Can it be fixed?
Read the rest…

The real meaning of spurious correlations

Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything.

There is, however, a more formal definition of the term spurious correlation or more specifically, as the excellent Wikipedia page is now titled, spurious correlation of ratios. It describes the following situation:

  1. You take a bunch of measurements X1, X2, X3…
  2. And a second bunch of measurements Y1, Y2, Y3…
  3. There’s no correlation between them
  4. Now divide both of them by a third set of measurements Z1, Z2, Z3…
  5. Guess what? Now there is correlation between the ratios X/Z and Y/Z

It’s easy to demonstrate for yourself, using R to create something like the chart in the Wikipedia article.

Read the rest…

Taking steps (in XML)

So the votes are in:

I thank you, kind readers. So here’s the plan: (1) keep blogging here as frequently as possible (perhaps monthly), (2) on more general “how to do cool stuff with data and R” topics, (3) which may still include biology from time to time. Sounds OK? Good.

So: let’s use R to analyse data from the iOS Health app.

Read the rest…

Evidence for a limit to effective peer review

I missed it first time around but apparently, back in October, Nature published a somewhat-controversial article: Evidence for a limit to human lifespan. It came to my attention in a recent tweet:

The source: a fact-check article from Dutch news organisation NRC titled “Nature article is wrong about 115 year limit on human lifespan“. NRC seem rather interested in this research article. They have published another more recent critique of the work, titled “Statistical problems, but not enough to warrant a rejection” and a discussion of that critique, Peer review post-mortem: how a flawed aging study was published in Nature.

Unfortunately, the first NRC article does itself no favours by using non-comparable x-axis scales for its charts and not really explaining very well how the different datasets (IDL and GRG) were used. Data nerds everywhere then, are wondering whether to repeat the analysis themselves and perhaps fire off a letter to Nature.

Read the rest…

An Analysis of Contributions to PubMed Commons

I recently saw a tweet floating by which included a link to some recent statistics from PubMed Commons, the NCBI service for commenting on scientific articles in PubMed. Perhaps it was this post at their blog. So I thought now would be a good time to write some code to analyse PubMed Commons data.

The tl;dr version: here’s the Github repository and the RPubs report.

For further details and some charts, read on.

Read the rest…

Putting data on maps using R: easier than ever

New Zealand earthquake density 2010 - November 2016

New Zealand earthquake density 2010 – November 2016

Using R to add data to maps has been pretty straightforward for a few years now. That said, it seems easier than ever to do things like use map APIs (e.g. Google, Open Street Map), overlay quite complex data visualisations (e.g. “heatmap-style” densities) and even generate animations.

A couple of key R packages in this space: ggmap and gganimate. To illustrate, I’ve used data from the recent New Zealand earthquake to generate some static maps and an animation. Here’s the Github repository and a report published at RPubs. Thanks to Florian Teschner for a great ggmap tutorial which got me started.

My own work in bioinformatics to date has not (sadly!) required much analysis of geospatial data but I can see use cases in many areas – environmental microbiology, for example.

The y-axis: to zero or not to zero

I don’t “do politics” at this blog, but I’m always happy to do charts. Here’s one that’s been doing the rounds on Twitter recently:

What’s the first thing that comes into your mind on seeing that chart?

It seems that there are two main responses to the chart:

  1. Wow, what happened to all those Democrat voters between 2008 and 2016?
  2. Wow, that’s misleading, it makes it look like Democrat support almost halved between 2008 and 2016

The question then is: when (if ever) is it acceptable to start a y-axis at a non-zero value?

Read the rest…

Let’s (briefly) revisit the Nobel API

It’s always nice when 12-month old code runs without a hitch. Not sure why this did not become a Github repo first time around, but now it is: my RMarkdown code to generate a report using data from the Nobel Prize API.

Now you too can generate a “gee, it’s all old white men” chart as seen in The EconomistGreying of the Nobel laureates, BBC NewsWhy are Nobel Prize winners getting older? and no doubt, many other outlets every year including me at RPubs, updated from 2015. As for myself, perhaps I should be offering my services to news outlets instead of publishing on blogs and obscure web platforms :)