When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day.
The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages.
afldata <- get_afltables_stats()
select(Season, Round, Date, ID, First.name, Surname, Playing.for,
Home.team, Home.score, Away.team, Away.score) %>%
# a player's first game
# grand finals only
filter(Round == "GF") %>%
# get the winning/losing margin
mutate(Margin = case_when(Playing.for == Home.team ~ Home.score - Away.score,
TRUE ~ Away.score - Home.score)) %>%
select(-Home.team, -Away.team, -Home.score, -Away.score)
This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye?
(with apologies to long-time readers who used to come for the science)
When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said.
So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world of – Australian football (AFL) statistics!
The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.
So here’s how I got them, using R.
Sydney’s congestion at ‘tipping point’
Dual-axes at tipping-point
blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes.
Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.
You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish the report.
This time it’s the Australian Bioinformatics & Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide.
A couple of quirks this time around. First, the rtweet package went through a brief phase of returning lists instead of nice data frames. I hope that’s been discarded as a bad idea :) There also seem to be additional columns, new column names and list-columns in the output from the latest
search_tweets(), so there goes my previous code…
Second, given that most Twitter users have had 280 characters since about November 7, is this reflected in the conference tweets?
With thanks to Andrew Lonsdale for clearing up my confusion and pointing me to Twitter extended mode, the answer is “yes, somewhat”. Plenty of tweets are still hitting the 140 limit though: time to update those clients?
The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here.
Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a table containing Australian members of parliament, their electorate and their voting intention regarding legalisation of same-sex marriage. Since I reside in New South Wales, let’s map the data for electorates in that state.
Update Feb 9 2020: Weather Underground retired their free API in 2018 so the code in this post no longer works
A reminder that when idle queries pop into your head, the answer can often be found using R + online data. And a brief excursion into accessing the Weather Underground.
One interesting aspect of Australian life, even in coastal urban areas like Sydney, is that sometimes it just stops raining. For weeks or months at a time. The realisation hits slowly: at some point you look around at the yellow-brown lawns, ovals and “nature strips” and say “gee, I don’t remember the last time it rained.”
Thankfully in our data-rich world, it’s relatively easy to find out whether the dry spell is really as long as it feels. In Australia, meteorological data is readily available via the Bureau of Meteorology (known as BoM). Another source is the Weather Underground (WU), which has the benefit that there may be data from a personal weather station much closer to you than the BoM stations.
Here’s how you can access WU data using R and see whether your fuzzy recollection is matched by reality.
Off to Melbourne tomorrow for perhaps my favourite annual work event: the Bioinformatics FOAM (Focus on Analytical Methods) meeting, organised by CSIRO.
Unfortunately, but for good reasons, it’s an internal event this year, but I’m putting my presentations online. I’ll be speaking twice; the first for Thursday is called “Online bioinformatics forums: why do we keep asking the same questions?” It’s an informal, subjective survey of the questions that come up again and again at bioinformatics Q&A forums such as Biostars and my attempt to understand why this is the case. Of course one simple answer might be selection bias – we don’t observe the users who came, found that their question already had an answer and so did not ask it again. I’ll also try to articulate my concern that many people view bioinformatics as a collection of recipe-style solutions to specific tasks, rather than a philosophy of how to do biological data analysis.
My second talk on Friday is called “Should I be dead? a very personal genomics.” It’s a more practical talk, outlining how I converted my own 23andMe raw data to VCF format, for use with the Ensembl Variant Effect Predictor. The question for the end – which I’ve left open – is this: as personal genomics becomes commonplace, we’re going to need simple but effective reporting tools that patients and their clinicians can use. What are those tools going to look like?
Looking forward to spending some time in Melbourne and hopefully catching up with this awesome lady.
In 2015, I’d like to write, think and do more about things that I care about. One of those things happens to be the koala. Now, this being a blog about bioinformatics and computational biology, I can’t just start writing about any old thing that takes my fancy…I guess. So in this post I’m going to stretch the definition to include ecological informatics and tell you the story of how I achieved a long-held ambition using one of my favourite online resources, The Atlas of Living Australia. And then we’ll wrap up with a quick survey of the (sorry) state of marsupial genomics.