Nice graphic? Are they taking the p…

Yes, it started with a tweet:

By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts.

Can it be fixed?

So far as I know, there isn’t a tool to generate data by extracting labels from images, so I sat down and typed in the numbers manually. Here they are for download. The top and bottom pie charts are identified by “all” and “other”, respectively.

Better make sure those percentages total 100, before we get into charts.

library(ggplot2)
library(dplyr)
library(readr)

urine1 <- read_csv("urine1.csv", col_names = FALSE)
colnames(urine1) <- c("component", "all_other", "percent")

# top chart - good!
urine1 %>% filter(all_other == "all") %>% summarise(total = sum(percent)) %>% glimpse()

# Observations: 1
# Variables: 1
# $ total <dbl> 99.9

# bottom chart - not good
urine1 %>% filter(all_other == "other") %>% summarise(total = sum(percent)) %>% glimpse()

# Observations: 1
# Variables: 1
# $ total <dbl> 113.61

Slices in the bottom chart sum to 113.61%. Problem. Not being an expert in urine composition I have no idea which figures might be incorrect, so I’ll just have to discard that data. However, on the subject of accuracy, I do know that it is lysozyme not lyzozyme and immunoglobulins not immunoglobulines.

Back to the top chart. Why are pie charts bad? Because we are poor at visually assessing relative areas (but good at assessing relative heights). And why are 3D pie charts bad? Because they are nothing but a gimmick, adding nothing to the visualisation and in fact, distorting it in the attempt to render perspective. The commonly-heard rejoinder is “but business people like them.” Well, that doesn’t make them right.

So we could try a bar chart, sorted by value.

urine1 %>% filter(all_other == "all") %>% 
  ggplot(aes(reorder(component, -percent), percent)) + 
  geom_col(fill = "skyblue3") + theme_bw() + 
  labs(x = "component", y = "percent", title = "Composition of human urine", subtitle = "50 g  dry weight / L")

urine1-1

Which is OK: it makes it easy to see and compare the relative proportion of each component. There’s a lot of white space though. We could stack the bars, but that would create problems in choosing a colour palette. So here’s another alternative: a treemap, created using the rather wonderful highcharter package.

library(treemap)
library(highcharter)

urine1_tm <- treemap(filter(urine1, all_other == "all"), index = "component", vSize = "percent", palette = "Spectral")
urine1_tmhc <- highchart() %>% hc_add_series_treemap(urine1_tm, name = "urine", layoutAlgorithm = "squarified") %>% 
  hc_title(text = "Composition of human urine (50 g  dry weight / L)")

Result: a nice interactive chart. Published straight to RPubs from the RStudio viewer pane, by the way. RStudio is just great. Here’s the non-interactive screenshot.

urine1-2

I’d suggest that if you must present proportions by area, this is a much nicer way to do it.

In summary then:

  • pie charts bad
  • 3D pie charts awful
  • columns functional, if not always compelling
  • so many other wonderful tools out there to visualise data than the tired old options

6 thoughts on “Nice graphic? Are they taking the p…

  1. Hi Neil,

    Nice post, and thanks for the pointer to the highcharter package.

    The error in the second pie chart, where the percentages sum to 113.61%, lies in the immunoglobulins (22.85%). I say this not from any specialist knowledge of urine, but from comparing the numbers with the areas. The immunoglobulin area is about the same as for 5-HIAA (10.27%), and much less than for albumin or cholesterol (both ~20%), so it is clearly at least 12% too large.

    A double irony is that although pie charts are meant to avoid the need for numbers, they provide both chart and numbers here, and manage to get them wrong. And without the numbers you would have had no story.

    Tim Cole

    • No :)

      The “simple visual experiment” illustrates exactly why pie charts are useless. Where one category dominates, as in the simple 75/25 split, you may as well use a table of numbers. As soon as there are multiple categories, we have the “brain finds it hard to compare areas” problem.

      The donut is not bad, I’ll give you that, but when there are many very small proportions, what is gained by displaying them?

Comments are closed.