On the road: CSS and eResearch Conference 2014

Next week I’ll be in Melbourne for one of my favourite meetings, the annual Computational and Simulation Sciences and eResearch Conference.

The main reason for my visit is the Bioinformatics FOAM workshop. Day 1 (March 27) is not advertised since it is an internal CSIRO day, but I’ll be presenting a talk titled “SQL, noSQL or no database at all? Are databases still a core skill?“. Day 2 (March 28) is open to all and I’ll be talking about “Learning from complete strangers: social networking for bioinformaticians“.

I imagine these and other talks will appear on Slideshare soon, at both my account and that of the Australian Bioinformatics Network.

I’m also excited to see that Victoria Stodden is presenting a keynote at the main CSS meeting (PDF) on “Reproducibility in Computational Science: Opportunities and Challenges”.

Hope to see some of you there.

“Advance” access and DOIs: what’s the problem?

A DOI, this morning

A DOI, this morning


When I arrive at work, the first task for the day is “check feeds”. If I’m lucky, in the “journal TOCs” category, there will be an abstract that looks interesting, like this one on the left (click for larger version).

Sometimes, the title is a direct link to the article at the journal website. Often though, the link is a Digital Object Identifier or DOI. Frequently, when the article is labelled as “advance access” or “early”, clicking on the DOI link leads to a page like the one below on the right.

DOI #fail

DOI #fail

In the grand scheme of things I suppose this rates as “minor annoyance”; it means that I have to visit the journal website and search for the article in question. The question is: why does this happen? I’m not familiar with the practical details of setting up a DOI, but I assume that the journal submits article URLs to the DOI system for processing. So who do I blame – journals, for making URLs public before the DOI is ready, or the DOI system, for not processing new URLs quickly enough?

There’s also the issue of whether terms like “advance access” have any meaning in the era of instant, online publishing but that’s for another day.

A minor update to my “apply functions” post

One of my more popular posts is A brief introduction to “apply” in R. Come August, it will be four years old. Technology moves on, old blog posts do not.

So: thanks to BioStar user zx8754 for pointing me to this Stack Overflow post, in which someone complains that the code in the post does not work as described. The by example is now fixed.

Side note: I often find “contact the author” is the most direct approach to solving this kind of problem ;) always happy to be contacted.

New publication: A panel of genes methylated with high frequency in colorectal cancer

I’m pleased to announce an open-access publication with my name on it:

Mitchell, S.M., Ross, J.P., Drew, H.R., Ho, T., Brown, G.S., Saunders, N.F.W., Duesing, K.R., Buckley, M.J., Dunne, R., Beetson, I., Rand, K.N., McEvoy, A., Thomas, M.L., Baker, R.T., Wattchow, D.A., Young, G.P., Lockett, T.J., Pedersen, S.K., LaPointe L.C. and Molloy, P.L. (2014). A panel of genes methylated with high frequency in colorectal cancer. BMC Cancer 14:54.

Continue reading

A lesson in “reading before you tweet”

So, I read the title:

Mining locus tags in PubMed Central to improve microbial gene annotation

and skimmed the abstract:

The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases.

and thought, well OK, but wouldn’t it be better to incorporate annotations in the first place – when submitting to the public databases – rather than by this indirect method?

The point, of course, is to incorporate new findings from the literature into existing records, rather than to use the tool as a primary method of annotation. I do believe that public databases could do more to enforce data quality standards at deposition time, but that’s an entirely separate issue.

Big thanks to Michael Hoffman for a spirited Twitter discussion that put me straight.

Box plots. Like box plots, only…box plots.

On a rare, brief holiday (here and here, if you’re interested; both highly-recommended), I make the mistake of checking my Twitter feed:

This points me to BoxPlotR. It draws box plots. Using Shiny Server. That’s the “innovation”, presumably.

With “quilt plots” and now this, I’m starting to think that I’ve been doing science wrong all these years. If I’d been told to submit the trivial computational work I do every single day to journals, I could have thousands of publications by now.

I’m still pretty relaxed post-holiday, so let’s just leave it there.

BLATting the internet: the most frequent gene?

I enjoyed this story from the OpenHelix blog today, describing a Microsoft Research project to mine DNA sequences from web pages and map them to UCSC genome builds.

Laura DeMare asks: what was the most-hit gene?

Continue reading

Quilt plots. Like heat maps, only…heat maps

Stephen tweets:

A "quilt plot"

A “quilt plot”


Quilt plots. Sounds interesting. The link points to a short article in PLoS ONE, containing a table and a figure. Here is Figure 1.

If you looked at that and thought “Hey, that’s a heat map!”, you are correct. That is a heat map. Let’s be quite clear about that. It’s a heat map.

So, how do the authors justify publishing a method for drawing heat maps and then calling them “quilt plots”?
Read the rest…

This blog in 2013

In something of an end-of-year tradition, WordPress provides users with an effort-free blog post in the form of an annual report. Here is mine.

My ambitious plan at the start of 2013 was to aim for 4 posts a month. I managed 28 and I’m happy with that; about one every two weeks.

Looking forward to a new year of blogging. All the best to you and yours for 2014.