Stephen tweets:
Quilt plots. Sounds interesting. The link points to a short article in PLoS ONE, containing a table and a figure. Here is Figure 1.
If you looked at that and thought “Hey, that’s a heat map!”, you are correct. That is a heat map. Let’s be quite clear about that. It’s a heat map.
So, how do the authors justify publishing a method for drawing heat maps and then calling them “quilt plots”?
Well, they at least admit that the quilt plot is a heat map. Or as they insist throughout, “heat maps“, in quotes, you know, “so-called heat maps”. Here’s a paragraph from the discussion, reproduced in full because it is so nonsensical and strange:
“Quilt plots” can be considered as a simple formulation of “heat maps”. They produce a similar graphical display to “heat maps” when the “clustering” and “dendrogram” options are turned off. In addition, “quilt plots” have several advantages over “heat maps”. Firstly, unlike “heat maps”, “quilt plots” come with easily understood R-functions (i.e. plot, legend and color). In addition, R is freely available software and supported by leading statistical experts around the world, and it is important to promote the use of this software among epidemiological researchers. In addition it is difficult to learn to use R compared to other statistical packages. For example, “heat maps” require the specification of 21 arguments including hierarchical clustering, weights for re-ordering the row and columns dendrogram, which are not always easily understood unless one has an extensive programming knowledge and skills. One of the aims of our paper is to present “quilt plots” as a useful tool with simply formulated R-functions that can be easily understood by researchers from different scientific backgrounds without high-level programming skills.
So what they’re saying is: quilt plots are heat maps. However, heat maps have complicated options. Turn those options off and you have a quilt plot. Quilt plots are written in R, which makes them easy. It’s good for people to learn R. Except that R is difficult. Especially when you use heat maps. Which are quilt plots. Which make R easier.
What?
The last section of the discussion begins:
Although our method cannot be considered “new”, the novelty is to make these types of methodologies more accessible for researchers from different scientific backgrounds and without the need for strong computing skills.
I do not buy this argument at all. Providing a slightly simpler version of an R function is not going to “lower the bar” to using R for biologists lacking computational skills. I recommend a Software Carpentry bootcamp as a solution to that issue.
I certainly agree that the method cannot be considered new. Which raises the question: should it have been published?
We’re frequently told that the main criterion for publication in PLoS ONE is soundness of methodology. I guess this article is sound in that the code does what it says: generates a heat map. Sorry, quilt plot. However, that criterion is an over-simplification. The criteria for publication are listed here and include:
- The study presents the results of primary scientific research
- Results reported have not been published elsewhere
- Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail
I’d argue that this is not primary research and that the statistics are not of a high technical standard. I’d even make the case that the “results” can be summarised as “how to draw a heat map”, which has been published previously and is well-known. My main argument though, would be that this does not represent an advance in either methodology or knowledge and as such, does not warrant publication in a scholarly journal. Code like this should go straight to Github, where it will be found and used by someone – if of use.
I doubt that @rvimieiro is the only person to have this thought:
but that’s another blog post. For now, let’s just say that the reviewers and/or editors could have done better with this one.
We should create the “maps of coolness”.
I don’t know. There are certainly papers of questionable value published in PLOS ONE, but I’m not entirely sure this is one. Yes, I dislike the needless neologism “quilt map” given that “heat map” is already well established, but the simple fact is the existing heatmap function in R is a horrible mess — I’m not sure why that paragraph is “nonsensical and strange”. Having 21 arguments to a function kind of violates every rule of designing usable APIs. And this sort of thing permeates R, making it painful to not only people new to computation, but those previously exposed to better designed APIs like that of MatLab.
Always nice to hear the other view :)
I’d argue that the issue here is not the design of the R language, but whether this contribution is sufficient to merit publication. In my opinion, it is not. As someone pointed out on our (non-public) Yammer, you can recreate their findings using the heatmap() function and 4 options: http://pastebin.com/tUimRnTG.
Well, I agree that the paper is rather thin — I’d be happier if it was a library handling several plot types, but it isn’t really true that that call to heatmap() replicates their heatmap — heatmap() is more-or-less unusable as a meaningful figure because it lacks a legend telling you what the colors correspond to. It’s true that there are multiple ways already to get a decent heatmap with R — you can roll your own with a bit of work with ggplot, and Bioconductor has the awkward but more functional heatmap2(), but why does the core set of functions have to be so unusable and clunky? Is it because of compability with the proprietory S system (which I’ve never used)? I use R because of Bioconductor, but I can’t help wishing that they had chosen something like Octave or SciLab as the base rather than R.
This certainly doesn’t seem enough to merit publication.
I understand though where the authors are coming from thinking that “hey our code contributes something to the community let’s publish an article about it”: Scientific articles are the currency of academia so every additional line on your CV under the heading “publications” will help.
Of course, whether or not whoever cashed out $1,350 (I’m guessing you, the taxpayer, did) would agree with this sort of money spent on what
software developers “publish” as a commit on GitHub (free of charge) is a different issue — or then again it might just be the same issue.
nice catch, I enjoy your blog. This particular kind of heatmap (age and time axes) also already has a name in the fields of demography and epidemiology: it’s a Lexis surface, a rather old graphical tool. The only difference is that the above figure does not have an aspect ratio of 1 (i.e. 1 year of age = 1 calendar year- to me that’s a sin), and age increases down rather than up (bo big deal). I *don’t* think that article should have been published. Also, there are several ways of producing the above rendering other than heatmap(), such as image() [you need to add the color key yourself], fields::image.plot(), lattice::levelplot(), and it is also possible in ggplot2, which means that there are arguments for every taste.
and here’s a link to a paper expounding essentially the same subclass of heatmap, originally from ye-olde 1987
In my view while some new development is OK, but basically both quilt and heat maps are the same then why write as if a new finding has been made
I wrote this a couple of years ago. Took me about an hour
http://www.r-bloggers.com/ggheat-a-ggplot2-style-heatmap-function/
Sadly I never thought to publish it. Dang. I think ggplot has been updated since then so I may redo it using the new options and send to Nature ..no that’s silly… Nature Communications.
Nice rant – it is pretty silly of PLoS One to think that this is original research. I don’t understand why they conflate the heatmap R function with the actual plot produced. Heatmap plots have clearly been around along time, and in this light it would have been clear that “quiltplot” is a needless addition to the lexicon (see The history of the cluster heat map (Wilkinson & Friendly, 2009) for examples going back over 100 years). Also it is disingenuous to say the least that you need specify all 21 arguments to the current heatmap function in R! I’m sure you can find online examples of generating heatmaps in any major statistical package (I would guess for at least 5+ years – probably going back much farther). And I agree with Malarkey that this is just redundant with functionality currently present in lattice and ggplot2 if the original heatmap function does not suit your fancy for whatever reason.
I have read good viz. research coming out of PLoS One (see The Communicability of Graphical Alternatives to Tabular Displays of Statistical Simulation Studies (Cook & Teo, 2011) for an example), so this oversight is much unfortunate for the journals perceptions.
I think Malarkey hit on it. What they basically have is a blog post: ‘Here is a way to do something easier in R’. I think what annoys me most is the ‘re-branding’ of a heatmap to a quiltplot, without actually changing anything about the heatmap.
Thanks for all comments. Thinking further about “should it have been published”; I suppose there is a school of thought that says “publish everything and let the people decide.” My problem with that philosophy is that “the people” end up sifting through an awful lot of junk. Personally, I think journals still have a role to play as filters and one obvious target to filter out is “trivial findings that are not new.”
Perhaps each month, journals should publish a list of what they rejected, with reasons and links…no, now I’m just having mad thoughts…
I’m writing a MatPlotLib function for making “HueCharts”. I could just dump it in GitHub, but maybe I can get it published in PLOS ONE.
I have written a very important contribution to this story: http://phylogenomics.blogspot.com/2014/01/top-alternatives-to-quilt-plots-and.html
Pingback: Why do you look at the speck in your sister’s quilt plot and pay no attention to the plank in your own heat map? | Bits of DNA
You didn’t mention the most absurd element of the quilt plot article. They didn’t provide R code: they provided *screenshots* of their R code. So they thought it was less effort to retype all their functions than to type “clustering=FALSE, dendrogram=FALSE”
The fact it took four days for someone to notice this goes to show how few of us actually intend to use this function.
Yes, I just noticed the “images in a Word document” yesterday and much (more) hilarity ensued on Twitter. It’s a crime against science.
Pingback: On Quilt Plots, and the need for editorial consistency | opiniomics
Great post Neil. I am conflicted about the paper – I understand the pressures to publish on the authors, yet I see the poor quality of the paper. Anyway, I have blogged about it here: http://biomickwatson.wordpress.com/2014/01/23/on-quilt-plots-and-the-need-for-editorial-consistency/
Pingback: Links 1/27/14 | Mike the Mad Biologist
GitHub is all well and good, but CRAN exists specifically as a repository of R packages: http://cran.r-project.org/web/packages/
If the authors had actually bothered to write a user-friendly implementation of heat maps and submit it to CRAN, it might have been rather more useful than their PLOS paper.
I’m pleased to see that a correction has been published, which includes the code as R files. http://www.plosone.org/article/info:doi/10.1371%2Fjournal.pone.0093201