Using R to detect the pressure wave from the 2022 Hunga Tonga eruption in personal weather station data

It seems like an age ago, but in fact it was only mid-January 2022 when this happened:

Wow. Now, pause for a moment and try to recall the last time you read any news about Tonga since the event.
The eruption sent an atmospheric pressure wave, clearly visible in this imagery, around the world. Friends online reported that this was detected by their personal weather stations (PWS) which made me wonder: was the wave apparent in online weather station data and can it be visualized using R?

The answers are yes and yes again.

Continue reading

Using R/fitzRoy to ask: how many times a V/AFL team with the same lineup has played together?

If you sit in the intersection of “likes Australian Rules football / finds sport statistics interesting / is on Twitter”, you’ve probably come across Swamp. One of his recent tweets tells us that:

You may go on to ask: has any team lineup from one of the almost 16 000 recorded games played together again in another game? And if so, how often?

The answer to that question is at once surprising, less surprising when you think about it, and quite easy to figure out using the ever-helpful fitzRoy package.

Continue reading

Debuting in a VFL/AFL Grand Final is rare

When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day.

The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages.


afldata <- get_afltables_stats()

afldata %>% 
  select(Season, Round, Date, ID,, Surname, Playing.for, , Home.score,, Away.score) %>% 
  group_by(ID) %>% 
  arrange(Date) %>%
  # a player's first game 
  slice(1) %>% 
  ungroup() %>% 
  # grand finals only
  filter(Round == "GF") %>%
  # get the winning/losing margin 
  mutate(Margin = case_when(Playing.for == ~ Home.score - Away.score,
                            TRUE ~ Away.score - Home.score)) %>% 
  select(,, -Home.score, -Away.score)
Season Round Date ID Surname Playing.for Margin
1908 GF 1908-09-26 5573 Harry Prout Essendon -9
1920 GF 1920-10-02 6677 Billy James Richmond 17
1923 GF 1923-10-20 6915 George Rawle Essendon 17
1926 GF 1926-10-09 3824 Francis Vine Melbourne 57
1952 GF 1952-09-27 9361 Keith Batchelor Collingwood -46

How long since your team scored 100+ points? This blog’s first foray into the fitzRoy R package

When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said.

So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world of – Australian football (AFL) statistics!
Continue reading

Just use a scatterplot. Also, Sydney sprawls.

Dual-axes at tipping-point

Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes.

Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.

Let’s explore.
Continue reading

Twitter coverage of the Australian Bioinformatics & Computational Biology Society Conference 2017

You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish the report.

This time it’s the Australian Bioinformatics & Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide.

A couple of quirks this time around. First, the rtweet package went through a brief phase of returning lists instead of nice data frames. I hope that’s been discarded as a bad idea :) There also seem to be additional columns, new column names and list-columns in the output from the latest search_tweets(), so there goes my previous code…

Second, given that most Twitter users have had 280 characters since about November 7, is this reflected in the conference tweets?

With thanks to Andrew Lonsdale for clearing up my confusion and pointing me to Twitter extended mode, the answer is “yes, somewhat”. Plenty of tweets are still hitting the 140 limit though: time to update those clients?

Mapping data using R and leaflet

The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here.

Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a table containing Australian members of parliament, their electorate and their voting intention regarding legalisation of same-sex marriage. Since I reside in New South Wales, let’s map the data for electorates in that state.

Continue reading

Feels like a dry winter – but what does the data say?

Update Feb 9 2020: Weather Underground retired their free API in 2018 so the code in this post no longer works

A reminder that when idle queries pop into your head, the answer can often be found using R + online data. And a brief excursion into accessing the Weather Underground.

One interesting aspect of Australian life, even in coastal urban areas like Sydney, is that sometimes it just stops raining. For weeks or months at a time. The realisation hits slowly: at some point you look around at the yellow-brown lawns, ovals and “nature strips” and say “gee, I don’t remember the last time it rained.”

Thankfully in our data-rich world, it’s relatively easy to find out whether the dry spell is really as long as it feels. In Australia, meteorological data is readily available via the Bureau of Meteorology (known as BoM). Another source is the Weather Underground (WU), which has the benefit that there may be data from a personal weather station much closer to you than the BoM stations.

Here’s how you can access WU data using R and see whether your fuzzy recollection is matched by reality.
Continue reading