It’s been a while. I hope you are all well. Shall we make some charts?
When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day.
library(dplyr) library(fitzRoy) afldata <- get_afltables_stats() afldata %>% select(Season, Round, Date, ID, First.name, Surname, Playing.for, Home.team, Home.score, Away.team, Away.score) %>% group_by(ID) %>% arrange(Date) %>% # a player's first game slice(1) %>% ungroup() %>% # grand finals only filter(Round == "GF") %>% # get the winning/losing margin mutate(Margin = case_when(Playing.for == Home.team ~ Home.score - Away.score, TRUE ~ Away.score - Home.score)) %>% select(-Home.team, -Away.team, -Home.score, -Away.score)
Each tweet takes one of two forms and is consistently formatted, making it easy to parse and extract information. Here are a couple of examples with the interesting parts highlighted in bold:
Between 16:00 and 18:30 today, 26% of trips experienced delays. #sydneytrains
The worst delay was 16 minutes, on the 18:16 City to Berowra via Gordon service. #sydneytrains
The take-home message: expect delays somewhere most days but in particular on Monday mornings, when students return to school after the holidays, and if you’re travelling in the far south-west or north-west of the network.
I’m not saying this is a good idea, but bear with me.
This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye?
(with apologies to long-time readers who used to come for the science)
Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow.
So how local are those names? Time for some quick and dirty maps using R.
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best.
Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions.
The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.
So here’s how I got them, using R.
Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.
I love it when researchers take the time to share their knowledge of the computational tools that they use. So first, let me point you at Environmental Computing, a site run by environmental scientists at the University of New South Wales, which has a good selection of R programming tutorials.
One of these is Making maps of your study sites. It was written with the specific purpose of generating simple, clean figures for publications and presentations, which it achieves very nicely.
I’ll be honest: the sole motivator for this post is that I thought it would be fun to generate the map using Leaflet for R as an alternative. You might use Leaflet if you want:
- An interactive map that you can drag, zoom, click for popup information
- A “fancier” static map with geographical features of interest
- concise and clean code which uses pipes and doesn’t require that you process shapefiles