Farewell then, PubMed Commons

PubMed Commons, the NCBI’s experiment in comments for PubMed articles, has been discontinued. Thoroughly too, with all traces of it expunged from the NCBI website.

Last time I wrote about the service, I concluded “all it needs now is more active users, more comments per user and a real API.” None of those things happened. Result: “NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.”

NLM also write that “all comments are archived on our FTP site.” A CSV file is available at this location. So is it good for anything?

The CSV archive contains only 6 fields: CommentId, PubmedId, DateCreated, FirstName, LastName and Content. This is unfortunate, as a lot of information has been lost. For example:

  • user IDs, to disambiguate user names
  • comment up- and down-votes
  • threading, showing which comments were replies to other comments
  • information regarding comment moderation

However, there is still information to be extracted from the file. Here’s a summary document at Github. We can see, for example that:

Comments per year never exceeded the maximum achieved in the first full year of operation (2014) and declined to a minimum in 2017

Comments per month also declined to a minimum in 2017, rarely surpassing 150 and often falling below 100.

We can count comments per article showing that the most-commented, with 33 comments, is: “When is Science Ultimately Unreliable?” You will never know now, from looking at PubMed, that this article was controversial and caused debate.

We can count comments per author showing that the “winner” is Lydia Maniatis, with 248 comments. You will never know now, from looking at PubMed, what inspired her and others or precisely how they interacted.

We can at least analyse the comment text; this simple word cloud highlights the prevalance of human clinical studies in publications that generated debate.

For more data
I re-ran my code a few days before PubMed Commons closed its doors, to generate a richer data file (commons.csv) that you can find here. It contains 7619 comments, which I believe is only 10 less than the NCBI archive. I also re-ran my report one final time and you can see the results here.

It is a shame, in my opinion, that NCBI never fully committed to PubMed Commons, and that this same attitude is apparent in their approach to archiving the data. I guess it was an interesting if flawed experiment.

4 thoughts on “Farewell then, PubMed Commons

  1. Nice article!

    One small remark about ” You will never know now, from looking at PubMed, that this article was controversial and caused debate”, technically you do not know if it was controversial, only that is caused debate. It could be that people debated about the degree of openness in this case but that the main message was agreed upon therefore not being controversial.

    In addition, do you have any idea what happened in august 2016?


    • We have the comment text, you can decide from that if there was controversy :)

      August 23 2016: a bunch of comments were posted pointing out incorrect IDs in clinical trial publications, as part of the Open Trials Project.

Comments are closed.