Visualising Twitter coverage of recent bioinformatics conferences

Back in February, I wrote some R code to analyse tweets covering the 2017 Lorne Genome conference. It worked pretty well. So I reused the code for two recent bioinformatics meetings held in Sydney: the Sydney Bioinformatics Research Symposium and the VIZBI 2017 meeting.

So without further ado, here are the reports in markdown format, which display quite nicely when pushed to Github:

and you can dig around in the repository for the Rmarkdown, HTML and image files, if you like.

Update: also available as published reports at RPubs:

An update to the nhmrcData R package

Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes:

  • In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit
  • Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first time around – commit and see the README

Vignette is not yet updated with new examples.

So now you can generate even more depressing charts of funding rates for even more years, such as the one featured on the right (click for full-size).

Enjoy and as ever, let me know if there are any issues.

update: just found a bunch of issues myself :) which are now hopefully fixed

The nhmrcData package: NHMRC funding outcomes data made tidy

Do you like R? Information about Australian biomedical research funding outcomes? Tidy data? If the answers to those questions are “yes”, then you may also like nhmrcData, a collection of datasets derived from funding statistics provided by the Australian National Health & Medical Research Council. It’s also my first R package (more correctly, R data package).

Read on for the details.
Read the rest…

HTML vignettes crashing your RStudio? This may be the reason

Short version: if RStudio on Windows 7 crashes when viewing vignettes in HTML format, it may be because those packages specify knitr::rmarkdown as the vignette engine, instead of knitr::knitr and you’re using rmarkdown v1.

Longer version with details – read on.

update: looks like this issue relates to the installed version of rmarkdown (1.3 in my case) – see here for details.

Read the rest…

Twitter Coverage of the Lorne Genome Conference 2017

Things to know about Lorne in the state of Victoria, Australia.

  • It’s situated on the Great Ocean Road, a major visitor attraction and a great way to see the scenic coastline of the region
  • It’s home to a number of life science conferences including Lorne Genome 2017

tweets-by-day-hour-1This week’s project then: use R to analyse coverage of the 2017 meeting on Twitter. I last did something similar for the ISMB meeting in 2012. How things have changed. Back then I prepared PDF reports using Sweave, retrieved tweets using the twitteR package and struggled with dates and time when plotting timelines. This time around I wrote RMarkdown in RStudio, tried out the newer rtweet package and, thanks to packages such as dplyr and lubridate, the data munging is all so much cleaner and simpler.

So without further ado here are:

The presentation examines several aspects of the conference coverage under the broad headings of timeline, users, networks, retweets, favourites, quotes, media and text. Make sure to click in the title page, then you can navigate using your arrow keys. The latest version will always be at Github; you can simply download that and open in a browser.

Nice graphic? Are they taking the p…

Yes, it started with a tweet:

By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts.

Can it be fixed?
Read the rest…

The real meaning of spurious correlations

Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything.

There is, however, a more formal definition of the term spurious correlation or more specifically, as the excellent Wikipedia page is now titled, spurious correlation of ratios. It describes the following situation:

  1. You take a bunch of measurements X1, X2, X3…
  2. And a second bunch of measurements Y1, Y2, Y3…
  3. There’s no correlation between them
  4. Now divide both of them by a third set of measurements Z1, Z2, Z3…
  5. Guess what? Now there is correlation between the ratios X/Z and Y/Z

It’s easy to demonstrate for yourself, using R to create something like the chart in the Wikipedia article.

Read the rest…

Taking steps (in XML)

So the votes are in:

I thank you, kind readers. So here’s the plan: (1) keep blogging here as frequently as possible (perhaps monthly), (2) on more general “how to do cool stuff with data and R” topics, (3) which may still include biology from time to time. Sounds OK? Good.

So: let’s use R to analyse data from the iOS Health app.

Read the rest…

Evidence for a limit to effective peer review

I missed it first time around but apparently, back in October, Nature published a somewhat-controversial article: Evidence for a limit to human lifespan. It came to my attention in a recent tweet:

The source: a fact-check article from Dutch news organisation NRC titled “Nature article is wrong about 115 year limit on human lifespan“. NRC seem rather interested in this research article. They have published another more recent critique of the work, titled “Statistical problems, but not enough to warrant a rejection” and a discussion of that critique, Peer review post-mortem: how a flawed aging study was published in Nature.

Unfortunately, the first NRC article does itself no favours by using non-comparable x-axis scales for its charts and not really explaining very well how the different datasets (IDL and GRG) were used. Data nerds everywhere then, are wondering whether to repeat the analysis themselves and perhaps fire off a letter to Nature.

Read the rest…