How I resurrected my ancient PhD thesis using R/bookdown (and some other tools)

An ancient thesis

I’ve long admired the look of publications generated using the R bookdown package, and thought it would be fun and educational to publish one myself. The problem is that I am not writing a book and have no plans to do so any time soon.

Then I remembered that I’ve already written a book. There it is on the right. It’s called “Cloning, sequence analysis and studies on the expression of the nirS gene, encoding cytochrome cd1 nitrite reductase, from Thiosphaera pantotropha“. Catchy title, hey. It’s from my former life, as a biochemistry graduate turned reluctant molecular microbiologist. I believe there are 3 printed copies in existence: mine, one for the lab and one deposited in the university library.

That’s simple enough then Neil, you say, you just grab your digital files, copy/paste into RMarkdown files, do a bit of editing and you’re set. Here’s the thing.

There are no digital files.

There were, once. A collection of documents: Word, Powerpoint and JPEGs. I think they lived on a 100 MB zip drive for a while. At some point they were burned onto a CD. And at some other point, that CD became corrupted. And that was that. Like many (most?) people, I’d barely looked at the thesis since depositing a copy in the library anyway. It didn’t seem to matter much.

And then I grew older, and started looking at some of the documents in our family, and realising that in the event of accident or disaster, they’d be lost forever. So I started working on ways to digitally archive some of them. At some point my thoughts turned to that thesis, which took 4 years of my life. I wondered whether the university library had digitised it and if so, whether it might be available online. So far as I can tell, the answer is no. That seemed a shame.

So here, briefly, is the story of how I used R/bookdown and some other tools to resurrect that thesis.

Read the rest

Taking steps (in XML)

So the votes are in:

I thank you, kind readers. So here’s the plan: (1) keep blogging here as frequently as possible (perhaps monthly), (2) on more general “how to do cool stuff with data and R” topics, (3) which may still include biology from time to time. Sounds OK? Good.

So: let’s use R to analyse data from the iOS Health app.

Read the rest…

Hiatus, indefinite

May. No blog posts yet in 2016. “What’s going on Neil?” asked no-one at all. For anyone who may be wondering…

Last November, I resigned from my position with my previous employer after almost 7 years. Just before Christmas, I was offered a position as a data scientist with a Sydney-based healthcare technology start-up. I started working there in early January and so far, it has been a terrific experience. Had I known how enjoyable it could be, I would have made a move like this 10 years ago. Career advice: there are many more jobs that can engage scientists and utilise their skills than academic research.

So what does that mean for this blog? It means that I’m no longer a researcher, at least in the narrow sense that science would use that word. It means that the things I learn during a working day are unlikely to translate into blog posts of broader interest (confidentiality issues not withstanding). And quite frankly, given where I’m at in my life (balancing working for a startup with raising my family), it means that I no longer have time to write regular blog posts.

Like a band that never officially breaks up, I’m not ready to declare the end just yet. So I’m placing the blog “on hiatus”, indefinitely. I’ll still be active online, which right now mostly means Twitter.

Better living through informatics: in search of koalas

In 2015, I’d like to write, think and do more about things that I care about. One of those things happens to be the koala. Now, this being a blog about bioinformatics and computational biology, I can’t just start writing about any old thing that takes my fancy…I guess. So in this post I’m going to stretch the definition to include ecological informatics and tell you the story of how I achieved a long-held ambition using one of my favourite online resources, The Atlas of Living Australia. And then we’ll wrap up with a quick survey of the (sorry) state of marsupial genomics.
Continue reading

This blog in 2013

In something of an end-of-year tradition, WordPress provides users with an effort-free blog post in the form of an annual report. Here is mine.

My ambitious plan at the start of 2013 was to aim for 4 posts a month. I managed 28 and I’m happy with that; about one every two weeks.

Looking forward to a new year of blogging. All the best to you and yours for 2014.

Using the Ensembl Variant Effect Predictor with your 23andme data

I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared.

Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and returns a summary of how the variants affect transcripts and regulatory regions. My first thought – can I apply this to my own 23andme data?

Read the rest…

23 and me – yes, me – part 2

Sample journey and arrival

fedex-delivered

Spitting across the Pacific

My tube of spit arrived at the lab on May 19. Six days door-to-door via Guangzhou, Anchorage and Memphis to LA.

23andmeraw

23andMe raw data menu

On arrival, a confirmatory email: “The spit sample you recently submitted to 23andMe for the person listed above has been received by the laboratory and is now pending analysis; the process usually takes 6-8 weeks. You will receive another email notification from us as soon as the data for this sample are ready to be accessed through your 23andMe account.”

In the meantime, there’s plenty to explore at the 23andMe website. Anyone can create a demo account, which allows you to explore anonymous sample data to get a feel for what you’ll see when your own sample is processed. Naturally, I’m most excited by the options to browse and download raw data. You can also participate in around 20 health and genetics surveys which are a good way to kill time, although not many of them provide instant personal gratification.

Next update – some time in July.