A brief message for anyone who uses my PubMed retractions report. It’s no longer available at RPubs; instead, you will find it here at Github. Github pages hosting is great, once you figure out that
docs/ corresponds to your web root :)
Now I really must update the code and try to make it more interesting than a bunch of bar charts.
Today in “blog posts that have spent two years in the draft folder” – “Humans are 50% banana.”
“Humans are 50% banana.”
Perhaps you have heard this statement, or one like it. It seems to be widely-quoted. As an example it’s hard to go past this article from UK tabloid The Mirror which, in addition to the banana, also informs us that “the entire internet weighs about the same as one large strawberry”. I don’t even know where to begin with that one.
A couple of years ago whilst between jobs and with time on my hands, I thought I’d go in search of the source for this factoid.
If you still follow my Twitter feed – I pity you, as it’s been rather boring of late. Consisting largely of Github commit messages, many including the words “knit to github document”.
Here’s why. RPubs, an early offering from RStudio, has been a great platform for easy and free publishing of HTML documents generated from RMarkdown and written in RStudio. That said, it’s always been very basic (e.g. no way to organise documents by content, tags). There’s been no real development of the platform for several years and of late, I’ve noticed it’s become less reliable. Bugs, for example, such as one document overwriting another when published from RStudio.
I think it’s unlikely that issues will be addressed, given that RStudio are now focused on RStudio Connect. So I’ve removed as many documents as I can and rewritten them as Github documents. These render as HTML when pushed to Github, generating attractive reports. Here’s an example.
I’ve done my best to update all blog posts here with links to the new reports. If you do come across old broken links to RPubs reports, just remember that the content is probably now at Github.
PubMed Commons, the NCBI’s experiment in comments for PubMed articles, has been discontinued. Thoroughly too, with all traces of it expunged from the NCBI website.
Last time I wrote about the service, I concluded “all it needs now is more active users, more comments per user and a real API.” None of those things happened. Result: “NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.”
NLM also write that “all comments are archived on our FTP site.” A CSV file is available at this location. So is it good for anything?
You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish the report.
This time it’s the Australian Bioinformatics & Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide.
A couple of quirks this time around. First, the rtweet package went through a brief phase of returning lists instead of nice data frames. I hope that’s been discarded as a bad idea :) There also seem to be additional columns, new column names and list-columns in the output from the latest
search_tweets(), so there goes my previous code…
Second, given that most Twitter users have had 280 characters since about November 7, is this reflected in the conference tweets?
With thanks to Andrew Lonsdale for clearing up my confusion and pointing me to Twitter extended mode, the answer is “yes, somewhat”. Plenty of tweets are still hitting the 140 limit though: time to update those clients?
The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here.
Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a table containing Australian members of parliament, their electorate and their voting intention regarding legalisation of same-sex marriage. Since I reside in New South Wales, let’s map the data for electorates in that state.
Sometime in 2009, I began listening to a science podcast titled This Week in Virology, or TWiV for short. I thought it was pretty good and listened regularly up until sometime in 2016, when it seemed that most episodes were approaching two hours in duration. I listen to several podcasts when commuting to/from work, which takes up about 10 hours of my week, so I found it hard to justify two hours for one podcast, no matter how good.
Were the episodes really getting longer over time? Let’s find out using R.
A reminder that when idle queries pop into your head, the answer can often be found using R + online data. And a brief excursion into accessing the Weather Underground.
One interesting aspect of Australian life, even in coastal urban areas like Sydney, is that sometimes it just stops raining. For weeks or months at a time. The realisation hits slowly: at some point you look around at the yellow-brown lawns, ovals and “nature strips” and say “gee, I don’t remember the last time it rained.”
Thankfully in our data-rich world, it’s relatively easy to find out whether the dry spell is really as long as it feels. In Australia, meteorological data is readily available via the Bureau of Meteorology (known as BoM). Another source is the Weather Underground (WU), which has the benefit that there may be data from a personal weather station much closer to you than the BoM stations.
Here’s how you can access WU data using R and see whether your fuzzy recollection is matched by reality.
Infographics. I’ve seen good examples. I’ve seen more bad examples. In general, I prefer a good chart to an infographic. That said, there’s a “genre” of infographic that I do think is useful, which I’ll call “if X were 100 Y”. A good example: if the world were 100 people.
That method of showing proportions has been called a waffle chart and for extra “infographic-i-ness”, the squares can be replaced by icons. You want to do this using R? Of course you do. Here’s how.
I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today.
I get what they are trying to do – illustrate trends within categories over time – but I don’t think years as coloured bars is the way to go. To me, progression over time suggests that time should be an axis, so as the eye moves along the data from one end to the other, without interruption. What I want to see is categories over time, not time within categories.
So what is the way to go? Let’s ask “what would ggplot2 do?”