PubMed retractions report has moved

May 23, 2018 / nsaunders

A brief message for anyone who uses my PubMed retractions report. It’s no longer available at RPubs; instead, you will find it here at Github. Github pages hosting is great, once you figure out that docs/ corresponds to your web root :)

Now I really must update the code and try to make it more interesting than a bunch of bar charts.

Novelty: an update

October 21, 2015March 16, 2018 / nsaunders

A recent tweet:

@neilfws I enjoyed this: https://t.co/ynyHRbgpLN Have you published (or are you thinking about publishing) this analysis anywhere?

— Marcus Munafo (@MarcusMunafo) October 7, 2015

PubMed articles containing “novel” in title or abstract 1845 – 2014

made me think (1) has it really been 5 years, (2) gee, my ggplot skills were dreadful back then and (3) did I really not know how to correct for the increase in total publications?

So here is the update, at Github and the report.

“Novel” findings, as judged by the usage of that word in titles and abstracts really have undergone a startling increase since about 1975. Indeed, almost 7.2% of findings were “novel” in 2014, compared with 3.2% for the period 1845 – 2014. That said, if we plot using a log scale as suggested by Tal on the original post, the rate of usage appears to be slowing down. See image, right (click for larger version).

As before, none of this is novel.

PubMed retraction reporting update

March 24, 2015 / nsaunders / 1 Comment

Just a quick update to the previous post. At the helpful suggestion of Steve Royle, I’ve added a new section to the report which attempts to normalise retractions by journal. So for example, J. Biol. Chem. has (as of now) 94 retracted articles and in total 170 842 publications indexed in PubMed. That becomes (100 000 / 170 842) * 94 = 55.022 retractions per 100 000 articles.

Top 20 journals, retracted articles per 100 000 publications

This leads to some startling changes to the journals “top 20” list. If you’re wondering what’s going on in the world of anaesthesiology, look no further (thanks again to Steve for the reminder).

PMRetract: PubMed retraction reporting rewritten as an interactive RMarkdown document

March 23, 2015March 24, 2015 / nsaunders / 4 Comments

Back in 2010, I wrote a web application called PMRetract to monitor retraction notices in the PubMed database. It was written primarily as a way for me to explore some technologies: the Ruby web framework Sinatra, MongoDB (hosted at MongoHQ, now Compose) and Heroku, where the app was hosted.

I automated the update process using Rake and the whole thing ran pretty smoothly, in a “set and forget” kind of way for four years or so. However, the first era of PMRetract is over. Heroku have shut down git pushes to their “Bamboo Stack” – which runs applications using Ruby version 1.8.7 – and will shut down the stack on June 16 2015. Currently, I don’t have the time either to update my code for a newer Ruby version or to figure out the (frankly, near-unintelligible) instructions for migration to the newer Cedar stack.

So I figured now was a good time to learn some new skills, deal with a few issues and relaunch PMRetract as something easier to maintain and more portable. Here it is. As all the code is “out there” for viewing, I’ll just add few notes here regarding this latest incarnation.
Continue reading →

Just how many retracted articles are there in PubMed anyway?

March 20, 2015March 22, 2015 / nsaunders / 3 Comments

I am forever returning to PubMed data, downloaded as XML, trying to extract information from it and becoming deeply confused in the process.

Take the seemingly-simple question “how many retracted articles are there in PubMed?”
Continue reading →

From PMID to BibTeX via BioRuby

March 18, 2015 / nsaunders / 1 Comment

Chris writes:

Nothing like searching for an answer (PMIDs->Bibtex) and finding someone else pointing back to your own solution! http://t.co/ZOm0cK6o0d

— Chris Miller (@chrisamiller) March 17, 2015

The blog post in question concerns conversion of PubMed PMIDs to BibTeX citations. However, a few things have changed since 2010.

Here’s what currently works.

	# pmid2bibtex.rb
	# convert a PubMed PMID to BibTeX citation format
	# updated version of http://chrisamiller.com/science/2010/12/13/using-bioruby-to-fetch-citations-from-pubmed/
	# works as of 2015-03-18

	require 'bio'
	Bio::NCBI.default_email = "me@me.com" # required for EUtils

	id = "18265351"
	pm = Bio::PubMed::efetch(id) # array of MEDLINE-formatted string
	med = Bio::MEDLINE.new(pm[0]) # MEDLINE object
	bib = med.reference.format("bibtex") # format is a method of Reference object

	# "@article{PMID:18265351,\n author = {Brown, T. and Mackey, K. and Du, T.},\n title = {Analysis of RNA by northern
	# and slot blot hybridization.},\n journal = {Curr Protoc Mol Biol},\n year = {2004},\n volume = {Chapter
	# 4},\n pages = {Unit 4.9},\n url = {http://www.ncbi.nlm.nih.gov/pubmed/18265351},\n}\n"

view raw

pmid2bibtex.rb

hosted with ❤ by GitHub

Bioinformatics journals: time from submission to acceptance, revisited

October 14, 2014 / nsaunders / 1 Comment

Before we start: yes, we’ve been here before. There was the Biostars question “Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper.” That gave rise to Pierre’s excellent blog post and code + data on Figshare.

So why are we here again? 1. It’s been a couple of years. 2. This is the R (+ Ruby) version. 3. It’s always worth highlighting how the poor state of publicly-available data prevents us from doing what we’d like to do. In this case the interesting question “which bioinformatics journal should I submit to for rapid publication?” becomes “here’s an incomplete analysis using questionable data regarding publication dates.”

Let’s get it out of the way then.
Continue reading →

PubMed Publication Date: what is it, exactly?

September 24, 2014 / nsaunders / 2 Comments

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.”

Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013.

library(rentrez)

es <- entrez_search("pubmed", "\"Retracted Publication\"[PTYP] 2013[PDAT]", usehistory = "y")
es$count
# [1] 117

117 articles. Now let’s fetch the records in XML format.

xml <- entrez_fetch("pubmed", WebEnv = es$WebEnv, query_key = es$QueryKey, 
                    rettype = "xml", retmax = es$count)

Next question: which XML element specifies the “Date of publication” (PDAT)?
Continue reading →

-omics in 2013

June 25, 2013July 30, 2013 / nsaunders / 11 Comments

Just how many (bad) -omics are there anyway? Let’s find out.

Update: code and data now at Github
Read the rest…

Monitoring PubMed retractions: updates

August 16, 2011 / nsaunders / 6 Comments

PubMed cumulative retractions 1977-present

There’s been a recent flurry of interest in retractions. See for example: Scientific Retractions: A Growth Industry?; summarised also by GenomeWeb in Take That Back; articles in the WSJ and the Pharmalot blog; and academic articles in the Journal of Medical Ethics and Infection & Immunity.

Several of these sources cite data from my humble web application, PMRetract. So now seems like a good time to mention that:

The application is still going strong and is updated regularly
I’ve added a few enhancements to the UI; you can follow development at GitHub
I’ve also added a long-overdue about page with some extra information, including the fact that I wrote it :)

Now I just need to fix up my Git repositories. Currently there’s one which pushes to GitHub and a second, with a copy of the Sinatra code for pushing to Heroku, which isn’t too smart.

What You're Doing Is Rather Desperate

Notes from the life of a [data] scientist

pubmed