How I resurrected my ancient PhD thesis using R/bookdown (and some other tools)

An ancient thesis

I’ve long admired the look of publications generated using the R bookdown package, and thought it would be fun and educational to publish one myself. The problem is that I am not writing a book and have no plans to do so any time soon.

Then I remembered that I’ve already written a book. There it is on the right. It’s called “Cloning, sequence analysis and studies on the expression of the nirS gene, encoding cytochrome cd1 nitrite reductase, from Thiosphaera pantotropha“. Catchy title, hey. It’s from my former life, as a biochemistry graduate turned reluctant molecular microbiologist. I believe there are 3 printed copies in existence: mine, one for the lab and one deposited in the university library.

That’s simple enough then Neil, you say, you just grab your digital files, copy/paste into RMarkdown files, do a bit of editing and you’re set. Here’s the thing.

There are no digital files.

There were, once. A collection of documents: Word, Powerpoint and JPEGs. I think they lived on a 100 MB zip drive for a while. At some point they were burned onto a CD. And at some other point, that CD became corrupted. And that was that. Like many (most?) people, I’d barely looked at the thesis since depositing a copy in the library anyway. It didn’t seem to matter much.

And then I grew older, and started looking at some of the documents in our family, and realising that in the event of accident or disaster, they’d be lost forever. So I started working on ways to digitally archive some of them. At some point my thoughts turned to that thesis, which took 4 years of my life. I wondered whether the university library had digitised it and if so, whether it might be available online. So far as I can tell, the answer is no. That seemed a shame.

So here, briefly, is the story of how I used R/bookdown and some other tools to resurrect that thesis.

Read the rest

When your tools are broken, just change the data

Update August 7 2020
The gene symbol renaming is now official. Here’s the publication (not open access, should be), coverage at The Verge and more coverage at The Register. The latter with quotes from me.

It’s been 3 years since we last visited that old favourite recurring topic, data corruption by Excel. Specifically, the unwanted auto-conversion of identifiers that look like dates, e.g. SEPT1, to – well, dates.

Here’s a new twist – well, a two year-old twist in fact, as I don’t keep up to date with this field any longer:

Yes, in 2017 the HGNC decided that the solution to this long-standing issue is to rename the offending genes to prevent the auto-conversion. I’m yet to determine whether anything more came of the proposal.

It is I suppose a practical suggestion that will work. The newsletter states that:

Our initial consultation with the research community publishing on these genes had very mixed results

I bet it did. However, given that ongoing consultation with the research community about the inappropriate use of software has had essentially no results in 15+ years, perhaps it is the most effective solution to the problem.

Debuting in a VFL/AFL Grand Final is rare

When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day.

The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages.

library(dplyr)
library(fitzRoy)

afldata <- get_afltables_stats()

afldata %>% 
  select(Season, Round, Date, ID, First.name, Surname, Playing.for, 
         Home.team, Home.score, Away.team, Away.score) %>% 
  group_by(ID) %>% 
  arrange(Date) %>%
  # a player's first game 
  slice(1) %>% 
  ungroup() %>% 
  # grand finals only
  filter(Round == "GF") %>%
  # get the winning/losing margin 
  mutate(Margin = case_when(Playing.for == Home.team ~ Home.score - Away.score,
                            TRUE ~ Away.score - Home.score)) %>% 
  select(-Home.team, -Away.team, -Home.score, -Away.score)
Season Round Date ID First.name Surname Playing.for Margin
1908 GF 1908-09-26 5573 Harry Prout Essendon -9
1920 GF 1920-10-02 6677 Billy James Richmond 17
1923 GF 1923-10-20 6915 George Rawle Essendon 17
1926 GF 1926-10-09 3824 Francis Vine Melbourne 57
1952 GF 1952-09-27 9361 Keith Batchelor Collingwood -46

Extracting Sydney transport data from Twitter

The @sydstats Twitter account uses this code base, and data from the Transport for NSW Open Data API to publish insights into delays on the Sydney Trains network.

Each tweet takes one of two forms and is consistently formatted, making it easy to parse and extract information. Here are a couple of examples with the interesting parts highlighted in bold:

Between 16:00 and 18:30 today, 26% of trips experienced delays. #sydneytrains

The worst delay was 16 minutes, on the 18:16 City to Berowra via Gordon service. #sydneytrains


I’ve created a Github repository with code and a report showing some ways in which this data can be explored.

The take-home message: expect delays somewhere most days but in particular on Monday mornings, when students return to school after the holidays, and if you’re travelling in the far south-west or north-west of the network.

Is your phone giving you horns?

No.

Why would you even ask that? Well, because this.

I sense problems immediately. First, the story is tagged “evolution”. The horns are not arising through inheritance of advantageous mutations, so that isn’t evolution.

Second:

Yes last time I checked, horns were external and pointed upwards. The X-ray seems to show an internal, downward-pointing bone growth.

But wait, there’s more.
Continue reading

Mapping the Vikings using R

The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw.

Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow.

So how local are those names? Time for some quick and dirty maps using R.
Continue reading