I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today.
I get what they are trying to do – illustrate trends within categories over time – but I don’t think years as coloured bars is the way to go. To me, progression over time suggests that time should be an axis, so as the eye moves along the data from one end to the other, without interruption. What I want to see is categories over time, not time within categories.
So what is the way to go? Let’s ask “what would ggplot2 do?”
“‘Demographic tsunami’ will keep Sydney, Melbourne property prices high” screams the headline.
While the census showed Australia overall is aging, there’s been a noticeable lift in the number of people aged between 25 to 32.
As the accompanying graph shows…
Whoa, that is one ugly chart. First thought: let’s not be too hard on Fairfax Media, they’ve sacked most of their real journalists and they took the chart from someone else. Second thought: if you want to visualise change over time, time as an axis rather than a coloured bar is generally a good idea.
Can we do better?
Yes, it started with a tweet:
By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are exploded 3D pie charts.
Can it be fixed?
Read the rest…
New Zealand earthquake density 2010 – November 2016
Using R to add data to maps has been pretty straightforward for a few years now
. That said, it seems easier than ever to do things like use map APIs (e.g.
Google, Open Street Map), overlay quite complex data visualisations (e.g.
“heatmap-style” densities) and even generate animations.
A couple of key R packages in this space: ggmap and gganimate. To illustrate, I’ve used data from the recent New Zealand earthquake to generate some static maps and an animation. Here’s the Github repository and a report. Thanks to Florian Teschner for a great ggmap tutorial which got me started.
My own work in bioinformatics to date has not (sadly!) required much analysis of geospatial data but I can see use cases in many areas – environmental microbiology, for example.
I don’t “do politics” at this blog, but I’m always happy to do charts. Here’s one that’s been doing the rounds on Twitter recently:
What’s the first thing that comes into your mind on seeing that chart?
It seems that there are two main responses to the chart:
- Wow, what happened to all those Democrat voters between 2008 and 2016?
- Wow, that’s misleading, it makes it look like Democrat support almost halved between 2008 and 2016
The question then is: when (if ever) is it acceptable to start a y-axis at a non-zero value?
Read the rest…
PeerJ, like PLoS ONE, aims to publish work on the basis of “soundness” (scientific and methodological) as opposed to subjective notions of impact, interest or significance. I’d argue that effective, appropriate data visualisation is a good measure of methodology. I’d also argue that on that basis, Evolution of a research field – a micro (RNA) example fails the soundness test.
6-way Venn banana
I thought nothing could top the classic “6-way Venn banana
“, featured in The banana (Musa acuminata) genome and the evolution of monocotyledonous plants
That is until I saw Figure 3 from Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.
5-way Venn roadkill
What’s odd is that Figure 2 in the latter paper is a nice, clear R/ggplot2 creation, using facet_grid(), so someone knew what they were doing.
That aside, the Antarctic midge paper is an interesting read; go check it out.
This led to some amusing Twitter discussion which pointed me to *A New Rose : The First Simple Symmetric 11-Venn Diagram.
[*] +1 for referencing The Damned, if indeed that was the intention.
A recent question over at BioStar asked whether abstracts returned from a PubMed search could easily be visualised as “word clouds”, using Wordle.
This got me thinking about ways to solve the problem using R. Here’s my first attempt, which demonstrates some functions from the RCurl and XML packages.
update: corrected a couple of copy/paste errors in the code
Read the rest…
The Life Scientists 2009
It’s Christmas Eve tomorrow and so I declare the year over. My Christmas gift to you is a summary of activity in 2009 at the FriendFeed Life Scientists group
. It’s crafted using R + Ruby, with raw data and some code snippets available. If you want to see the most popular items from the group this year, head down to the bottom of this post.
(Note: this post is a work in progress)
Read the rest…
Here’s a common problem solved: how to generate a pretty picture of your database schema. A Google search throws up all manner of home-brewed solutions using graphviz, perl scripts and so on. Or you can make life easier and simply install SQLFairy
Under Ubuntu: as simple as “sudo apt-get install sqlfairy”.
Next, dump your database tables, e.g. for MySQL:
mysqldump -u username -p -d mydatabase > mydatabase.sql
Finally, for a PNG image of your schema:
sqlt-graph -f MySQL -o mydatabase.png -t png mydatabase.sql
Too easy. Example shown is the BioSQL schema.
update: if your schema lacks explicit foreign keys, try the –natural-join options (man sqlt-graph, man sqlt-diagram)