Counting things is hard for a given value of “things”

This post is just a summary of some interesting online discussion from last week around open access publishing. I learned a few things about definitions and PubMed/PMC filters.

It all begins with an opinion piece, “Open access is tiring out peer reviewers.” With a title like that you might expect rebuttals from people like Michael Eisen and you’d be right.

In his post, Michael presents a table showing numbers for total and open access (OA) publications from 2000-2013. Initially I thought his OA numbers were rather low, but it turns out that there is a very strict definition of what constitutes OA: membership of the PMC OA subset.

I still don’t agree entirely with Michael’s numbers though; for example his PMC OA count for the year 2000 is 3 438, whereas mine is:

library(rentrez)
es <- entrez_search("pmc", "open access[FILT] AND 2000[PDAT]")
es$count
[1] 4827

but we’re in the same ballpark, at least. Can I suggest that when writing blog posts which use numbers to support an argument, it’s important and useful to show exactly how those numbers were derived.

For comparison, and mainly because I wanted an excuse to use RPubs for the first time, here are a couple of documents that I created. The first one looks at the increase in PubMed articles marked as “free full text” as compared with total PubMed articles:

es <- entrez_search("pubmed", "freetext[FILT] AND 2000[PDAT]")
es$count
[1] 109326

pmc

Growth of PMC OA subset 2000-2013

“Free text available” is not the same as “open access”. That hasn’t stopped others from using it as a proxy for OA and I think it’s worth examining. A major argument for OA is that publicly-funded research should be accessible to the public; if “free text available” achieves this then surely that is A Good Thing, regardless of whether it is “truly OA”. There are two messages from the OA movement, “accessibility” and “reusability” and to be frank, I think there are times when those messages become confused, mixed or lost inside technical, rather zealous arguments.

My second document compares the growth of the PMC OA subset with all PMC articles. I’d argue that this is a more “like with like” comparison than PMC-OA to PubMed, although I can see the value of PubMed as a proxy for “all biomedical articles.”

To summarise, my documents contain broadly the same message as Michael’s: namely that whilst the proportion of OA (or if you like, freely-available) articles is rising, there is no rapid “year on year” inflationary increase that could be interpreted as driving the overall growth in literature. My additional message is that when presenting tables of numbers, it’s nice to make them reproducible :)

One thought on “Counting things is hard for a given value of “things”

  1. Bruce Tabor

    Thanks Neil, I enjoyed your insights and the tutorial on use of features of R of which I was not familiar.

Comments are closed.