Findings increasingly novel, scientists say…

…was the tongue-in-cheek title of an image that I posted to Twitpic this week. It shows the usage of the word “novel” in PubMed article titles over time. As someone correctly pointed out at FriendFeed, it needs to be corrected for total publications per year.

It was inspired by a couple of items that caught my attention. First, a question at BioStar with the self-explanatory title Locations of plots of quantities of publicly available biological data. Second, an item at FriendFeed musing on the (over?) use of the word “insight” in scientific publications.

I’m sure that quite recently, I’ve read a letter to a journal which analysed the use of phrases such as “novel insights” in articles over time, but it’s currently eluding my search skills. So here’s my simple roll-your-own approach, using a little Ruby and R.

Initially, I entered “novel[Title]” at the PubMed website, download all 143 031 results in Medline format and parsed the “DP” (publication date) field. Useful, in that I learned the earliest title (1845); inefficient, in that the resulting download is ~ 397 MB.

Fortunately, BioRuby comes with a nice set of methods for search and retrieval from the NCBI Entrez databases, including esearch_count() – as the name suggests, it simply counts returned results for a query.

So, to search pubmed for (1) all articles published from 1845 – 2009 and (2) those articles with the word “novel” in title or abstract is as simple as this:


require "rubygems"
require "bio"

Bio::NCBI.default_email = ""
ncbi =

1845.upto(2009) do |year|
  all   = ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})
  novel = ncbi.esearch_count("novel[tiab] #{year}[dp]", {"db" => "pubmed"})
  puts "#{year}\t#{all}\t#{novel}"

Save and run that as pmnovel.rb > pmdata.txt. Obviously, we’re having a bit of fun here. You could search for any terms that you like and in a real script, you’d probably want to specify the terms and date range as command-line options.

Next, load the tab-delimited output file into R for some simple plotting.

pmdata <- read.table("pmdata.txt", sep = "\t")
colnames(pmdata) <- c("year", "total", "novel")
pmdata$freq <- pmdata$novel/pmdata$total
# make year = end of year; then make it a real date
pmdata$year <- paste(pmdata$year, "12", "31", sep = "-")
pmdata$year <- as.Date(pmdata$year)
# reshape the data and plot each variable
pm <- melt(pmdata, id = "year")
png(file = "pmdata.png", width = 800, height = 600)
print(ggplot(pm, aes(year, value)) + geom_line(aes(color = factor(variable))) + 
scale_x_date(format = "%Y", major = "15 years") + opts(title = "Novelty 1845 - 2009") + facet_grid(variable ~ ., scale = "free_y") + scale_colour_discrete(legend = FALSE))

And here's the result (click for full-size version).

There you have it. We see a steady post-WWII increase in total publications (top panel), increasing more sharply around 1995. The exponential increase in “novel” findings (middle panel) looks like it begins in the early 1980s. And the fraction of total publications that are “novel” (bottom panel) also begins to increase in the 1980s and is now at an all-time high. Last year, ~ 6.1% of findings were “novel”, compared with the all-time proportion – sum(pmdata$novel)/sum(pmdata$total) of ~ 2.3%.

Exciting times ;-)

16 thoughts on “Findings increasingly novel, scientists say…

  1. It looks as if the percentage of “novel” publications increases by 2% every 5 years. Will the linear increase flatten when we reach 20% in 2045?

    And thanks a lot for the code, will be useful for more specific queries.

  2. What a noval post you wrote :)

    p.s: Maybe you could try log scaling the results, so we can see if there was a shift in the acceleration trend.


  3. Pingback: Quick Links | A Blog Around The Clock

    • Funny! Those are all from years ago though, we have to reinvent everything every 10 years :-)

      As I mentioned, I’m sure I read a similar study, but more recent than any of those.

  4. Everything is “novel” when it is discovered and described for the first time. No doubt there is a strong impetus to characterize one’s research findings as “novel” when it is a requirement for publication in journals and the granting of patent applications. However, I think that the rise of “novel” results in recent years stems more from the exponential improvements in the technologies for biomedical research. Such gains, for example, in nucleotide sequencing methods, gene and protein microarrays, mass spectrometry and other high throughput methods supported by robotics and bioinformatics in the last 10 years alone is yielding incredible amounts of biological data that few would have anticipated a decade or two earlier. It is indeed “exciting times” with no sarcasm intended.

    • Good to hear the less cynical viewpoint. I agree, we genuinely do live in exciting times. I wonder though, how much of what’s being “discovered” and described really is for the first time.

  5. FYI, there’s a typo in your R code. On line 11, “em” should be “pm”.

    Fun little script – I imagine I’ll be playing around more with it soon.

  6. Pingback: BioBits » Blog Archive » Fun with Pubmed

  7. Pingback: Analysis of retractions in PubMed | What You’re Doing Is Rather Desperate

  8. Pingback: Nu ägnar vi ett ögonblick åt att skratta åt vetenskapligt språkbruk « There is grandeur in this view of life

Comments are closed.