Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R.
Highcharter has a nice simple function,
hcboxplot(), to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to get the number of observations, which could then be displayed in a tool-tip on mouse-over.
There wasn’t, so I wrote one like this.
“‘Demographic tsunami’ will keep Sydney, Melbourne property prices high” screams the headline.
While the census showed Australia overall is aging, there’s been a noticeable lift in the number of people aged between 25 to 32.
As the accompanying graph shows…
Whoa, that is one ugly chart. First thought: let’s not be too hard on Fairfax Media, they’ve sacked most of their real journalists and they took the chart from someone else. Second thought: if you want to visualise change over time, time as an axis rather than a coloured bar is generally a good idea.
Can we do better?
Back in February, I wrote some R code to analyse tweets covering the 2017 Lorne Genome conference. It worked pretty well. So I reused the code for two recent bioinformatics meetings held in Sydney: the Sydney Bioinformatics Research Symposium and the VIZBI 2017 meeting.
So without further ado, here are the reports in markdown format, which display quite nicely when pushed to Github:
and you can dig around in the repository for the Rmarkdown, HTML and image files, if you like.
Update: also available as published reports at RPubs:
Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes:
- In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit
- Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first time around – commit and see the README
Vignette is not yet updated with new examples.
So now you can generate even more depressing charts of funding rates for even more years, such as the one featured on the right (click for full-size).
Enjoy and as ever, let me know if there are any issues.
update: just found a bunch of issues myself :) which are now hopefully fixed
Do you like R? Information about Australian biomedical research funding outcomes? Tidy data? If the answers to those questions are “yes”, then you may also like nhmrcData, a collection of datasets derived from funding statistics provided by the Australian National Health & Medical Research Council. It’s also my first R package (more correctly, R data package).
Read on for the details.
Read the rest…
Short version: if RStudio on Windows 7 crashes when viewing vignettes in HTML format, it may be because those packages specify
knitr::rmarkdown as the vignette engine, instead of
knitr::knitr and you’re using
Longer version with details – read on.
update: looks like this issue relates to the installed version of
rmarkdown (1.3 in my case) – see here for details.
Read the rest…
Things to know about Lorne in the state of Victoria, Australia.
- It’s situated on the Great Ocean Road, a major visitor attraction and a great way to see the scenic coastline of the region
- It’s home to a number of life science conferences including Lorne Genome 2017
This week’s project then: use R to analyse coverage of the 2017 meeting on Twitter. I last did something similar for the ISMB meeting in 2012. How things have changed. Back then I prepared PDF reports using Sweave, retrieved tweets using the
twitteR package and struggled with dates and time when plotting timelines. This time around I wrote RMarkdown in RStudio, tried out the newer rtweet package and, thanks to packages such as
lubridate, the data munging is all so much cleaner and simpler.
So without further ado here are:
The presentation examines several aspects of the conference coverage under the broad headings of timeline, users, networks, retweets, favourites, quotes, media and text. Make sure to click in the title page, then you can navigate using your arrow keys. The latest version will always be at Github; you can simply download that and open in a browser.
It would be remiss not to mention briefly the passing of Hans Rosling. Data needs storytellers and the world needs advocates for evidence-based decision making. We have lost one of the best.
For some insights into the man and his interesting (and at times challenging) life, I highly recommend this news feature. You can enjoy presentations at the Gapminder website: I’d start with the documentary The Joy of Stats.
Perhaps I should not be surprised or annoyed – but I am – at the lack of coverage this story received at news outlets, particularly in Australia. Aside from an obituary at Guardian Australia (not on the front page), I don’t believe the news featured at all in any other major Australian news publisher. Perhaps not unrelated, stories like this feature quite frequently.
I’m told that in Europe, this effort at the BBC was one of few major reports.
Maybe I live in a data science bubble, but I would think this is a person and an event “of note”. Thanks for the stories, Hans.
Dual y-axes: yes or no? What about if one of them is also reversed, i.e. values increase from the top of the chart to the bottom?
Judging by this StackOverflow question, hydrologists are fond of both of these things. It asks whether
ggplot2 can be used to generate a “rainfall hyetograph and streamflow hydrograph”, which looks like this:
Read the rest…
Yes, it started with a tweet:
By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts.
Can it be fixed?
Read the rest…