The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.
So here’s how I got them, using R.
Sydney’s congestion at ‘tipping point’
Dual-axes at tipping-point
blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes.
Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.
I love it when researchers take the time to share their knowledge of the computational tools that they use. So first, let me point you at Environmental Computing, a site run by environmental scientists at the University of New South Wales, which has a good selection of R programming tutorials.
One of these is Making maps of your study sites. It was written with the specific purpose of generating simple, clean figures for publications and presentations, which it achieves very nicely.
I’ll be honest: the sole motivator for this post is that I thought it would be fun to generate the map using Leaflet for R as an alternative. You might use Leaflet if you want:
- An interactive map that you can drag, zoom, click for popup information
- A “fancier” static map with geographical features of interest
- concise and clean code which uses pipes and doesn’t require that you process shapefiles
The code that generated the report (which I’ve used heavily and written about before) is at Github too. A few changes required compared with previous reports, due to changes in the
rtweet package, and a weird issue with kable tables breaking markdown headers.
I love that the most popular media attachment is a screenshot of a Github repo.
A brief message for anyone who uses my PubMed retractions report. It’s no longer available at RPubs; instead, you will find it here at Github. Github pages hosting is great, once you figure out that
docs/ corresponds to your web root :)
Now I really must update the code and try to make it more interesting than a bunch of bar charts.
PubMed Commons, the NCBI’s experiment in comments for PubMed articles, has been discontinued. Thoroughly too, with all traces of it expunged from the NCBI website.
Last time I wrote about the service, I concluded “all it needs now is more active users, more comments per user and a real API.” None of those things happened. Result: “NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.”
NLM also write that “all comments are archived on our FTP site.” A CSV file is available at this location. So is it good for anything?
Infographics. I’ve seen good examples. I’ve seen more bad examples. In general, I prefer a good chart to an infographic. That said, there’s a “genre” of infographic that I do think is useful, which I’ll call “if X were 100 Y”. A good example: if the world were 100 people.
That method of showing proportions has been called a waffle chart and for extra “infographic-i-ness”, the squares can be replaced by icons. You want to do this using R? Of course you do. Here’s how.
Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes:
- In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit
- Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first time around – commit and see the README
Vignette is not yet updated with new examples.
So now you can generate even more depressing charts of funding rates for even more years, such as the one featured on the right (click for full-size).
Enjoy and as ever, let me know if there are any issues.
update: just found a bunch of issues myself :) which are now hopefully fixed
A recent tweet:
PubMed articles containing “novel” in title or abstract 1845 – 2014
made me think (1) has it really been 5 years, (2) gee, my ggplot skills were dreadful back then and (3) did I really not know how to correct for the increase in total publications?
So here is the update, at Github and the report.
“Novel” findings, as judged by the usage of that word in titles and abstracts really have undergone a startling increase since about 1975. Indeed, almost 7.2% of findings were “novel” in 2014, compared with 3.2% for the period 1845 – 2014. That said, if we plot using a log scale as suggested by Tal on the original post, the rate of usage appears to be slowing down. See image, right (click for larger version).
As before, none of this is novel.
The Nobel Prizes. Love them? Hate them? Are they still relevant, meaningful? Go on admit it, you always imagined you would win one day.
Whatever you think of them, the 2015 results are in. What’s more, the good people of the Nobel Foundation offer us free access to data via an API. I’ve published a document showing some of the ways to access and analyse their data using R. Just to get you started:
u <- "http://api.nobelprize.org/v1/laureate.json"
nobel <- fromJSON(u)
In this post, just the highlights. Click the images for larger versions.