It would be nice if there were an R package, along the lines of RMySQL, for MongoDB. For now there is not – so, how best to get data from a MongoDB database into R?
One option is to retrieve JSON via the MongoDB REST interface and parse it using the rjson package. Assuming, for example, that you have retrieved your CiteULike collection in JSON format from this URL:
- and saved it to a database named citeulike in a collection named articles, you can fetch the first 5 articles into R like so:
library(RCurl) library(rjson) db <- "http://localhost:28017/citeulike/articles/?limit=5" articles <- fromJSON(getURL(db)) articles$rows[]$title #  "A computational genomics pipeline for prokaryotic sequencing projects"
That works, but you may not want to use the MongoDB REST interface: for example, it may be slow for large queries or there might be security concerns.
MongoDB has both C and Java drivers. R has packages that interface with these languages: .C/.Call and rJava, respectively. My only problem is that I can write what I know about C and Java on the back of a postage stamp.
Not to be deterred, I took the approach that has served me well my whole professional life: wing it, using what I could glean from Google searches and the Web. In the end, using Java in R to connect with MongoDB was surprisingly easy. Here’s a basic how-to.
Read the rest…