Infographics. I’ve seen good examples. I’ve seen more bad examples. In general, I prefer a good chart to an infographic. That said, there’s a “genre” of infographic that I do think is useful, which I’ll call “if X were 100 Y”. A good example: if the world were 100 people.
That method of showing proportions has been called a waffle chart and for extra “infographic-i-ness”, the squares can be replaced by icons. You want to do this using R? Of course you do. Here’s how.
I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today.
I get what they are trying to do – illustrate trends within categories over time – but I don’t think years as coloured bars is the way to go. To me, progression over time suggests that time should be an axis, so as the eye moves along the data from one end to the other, without interruption. What I want to see is categories over time, not time within categories.
So what is the way to go? Let’s ask “what would ggplot2 do?”
First there was “insert statistical method here“. Now we have R – making it easy “to do analysis stuff“.
Via Elisabeth; I’ll hand you over now for an entertaining summary.
To be fair, analysis stuff describes my working life quite well.
Search all the hashtags
ISMB (Intelligent Systems for Molecular Biology – which sounds rather old-fashioned now, doesn’t it?) is the largest conference for bioinformatics and computational biology. It is held annually and, when in Europe, jointly with the European Conference on Computational Biology (ECCB).
I’ve had the good fortune to attend twice: in Brisbane 2003 (very enjoyable early in my bioinformatics career, but unfortunately the seed for the “no more southern hemisphere meetings” decision), and in Toronto 2008. The latter was notable for its online coverage and for me, the pleasure of finally meeting in person many members of the online bioinformatics community.
The 2017 meeting (and its satellite meetings) were covered quite extensively on Twitter. My search using a variety of hashtags based on “ismb”, “eccb”, “17” and “2017” retrieved 9052 tweets, which form the basis of this summary at RPubs. Code and raw data can be found at Github.
Usually I just let these reports speak for themselves but in this case, I thought it was worth noting a few points:
July 21-22 saw the 18th incarnation of the Bioinformatics Open Source Conference, which generally precedes the ISMB meeting. I had the great pleasure of attending BOSC way back in 2003 and delivering a short presentation on Bioperl. I knew almost nothing in those days, but everyone was very kind and appreciative.
My trusty R code for Twitter conference hashtags pulled out 3268 tweets and without further ado here is:
The ISMB/ECCB meeting wraps today and analysis of Twitter coverage for that meeting will appear here in due course.
Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R.
Highcharter has a nice simple function,
hcboxplot(), to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to get the number of observations, which could then be displayed in a tool-tip on mouse-over.
There wasn’t, so I wrote one like this.
“‘Demographic tsunami’ will keep Sydney, Melbourne property prices high” screams the headline.
While the census showed Australia overall is aging, there’s been a noticeable lift in the number of people aged between 25 to 32.
As the accompanying graph shows…
Whoa, that is one ugly chart. First thought: let’s not be too hard on Fairfax Media, they’ve sacked most of their real journalists and they took the chart from someone else. Second thought: if you want to visualise change over time, time as an axis rather than a coloured bar is generally a good idea.
Can we do better?
Back in February, I wrote some R code to analyse tweets covering the 2017 Lorne Genome conference. It worked pretty well. So I reused the code for two recent bioinformatics meetings held in Sydney: the Sydney Bioinformatics Research Symposium and the VIZBI 2017 meeting.
So without further ado, here are the reports in markdown format, which display quite nicely when pushed to Github:
and you can dig around in the repository for the Rmarkdown, HTML and image files, if you like.
Update: also available as published reports at RPubs:
Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes:
- In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit
- Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first time around – commit and see the README
Vignette is not yet updated with new examples.
So now you can generate even more depressing charts of funding rates for even more years, such as the one featured on the right (click for full-size).
Enjoy and as ever, let me know if there are any issues.
update: just found a bunch of issues myself :) which are now hopefully fixed
Do you like R? Information about Australian biomedical research funding outcomes? Tidy data? If the answers to those questions are “yes”, then you may also like nhmrcData, a collection of datasets derived from funding statistics provided by the Australian National Health & Medical Research Council. It’s also my first R package (more correctly, R data package).
Read on for the details.
Read the rest…