When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day.
The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages.
afldata <- get_afltables_stats()
select(Season, Round, Date, ID, First.name, Surname, Playing.for,
Home.team, Home.score, Away.team, Away.score) %>%
# a player's first game
# grand finals only
filter(Round == "GF") %>%
# get the winning/losing margin
mutate(Margin = case_when(Playing.for == Home.team ~ Home.score - Away.score,
TRUE ~ Away.score - Home.score)) %>%
select(-Home.team, -Away.team, -Home.score, -Away.score)
The @sydstats Twitter account uses this code base, and data from the Transport for NSW Open Data API to publish insights into delays on the Sydney Trains network.
Each tweet takes one of two forms and is consistently formatted, making it easy to parse and extract information. Here are a couple of examples with the interesting parts highlighted in bold:
Between 16:00 and 18:30 today, 26% of trips experienced delays. #sydneytrains
The worst delay was 16 minutes, on the 18:16 City to Berowra via Gordon service. #sydneytrains
I’ve created a Github repository with code and a report showing some ways in which this data can be explored.
The take-home message: expect delays somewhere most days but in particular on Monday mornings, when students return to school after the holidays, and if you’re travelling in the far south-west or north-west of the network.
I’m not saying this is a good idea, but bear with me.
This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye?
(with apologies to long-time readers who used to come for the science)
The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw.
Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow.
So how local are those names? Time for some quick and dirty maps using R.
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best.
Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions.
The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.
So here’s how I got them, using R.
Sydney’s congestion at ‘tipping point’
Dual-axes at tipping-point
blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes.
Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.
I love it when researchers take the time to share their knowledge of the computational tools that they use. So first, let me point you at Environmental Computing, a site run by environmental scientists at the University of New South Wales, which has a good selection of R programming tutorials.
One of these is Making maps of your study sites. It was written with the specific purpose of generating simple, clean figures for publications and presentations, which it achieves very nicely.
I’ll be honest: the sole motivator for this post is that I thought it would be fun to generate the map using Leaflet for R as an alternative. You might use Leaflet if you want:
- An interactive map that you can drag, zoom, click for popup information
- A “fancier” static map with geographical features of interest
- concise and clean code which uses pipes and doesn’t require that you process shapefiles
The code that generated the report (which I’ve used heavily and written about before) is at Github too. A few changes required compared with previous reports, due to changes in the
rtweet package, and a weird issue with kable tables breaking markdown headers.
I love that the most popular media attachment is a screenshot of a Github repo.