Getting “stuff” into MongoDB

One of the aspects I like most about MongoDB is the “store first, ask questions later” approach. No need to worry about table design, column types or constant migrations as design changes. Provided that your data are in some kind of hash-like structure, you just drop them in.

Ruby is particularly useful for this task, since it has many gems which can parse common formats into a hash. Here are 3 quick examples with relevance to bioinformatics.
Read the rest…

MongoDB, Mongoid and Map/Reduce

I’ve been experimenting with MongoDB’s map-reduce, called from Ruby, as a replacement for Ruby’s Enumerable methods (map/collect, inject). It’s faster. Much faster.

Next, the details but first – the disclaimers:

  • My database is not optimised (using, e.g. indices)
  • My non-map/reduce code is not optimised; I’m sure there are better ways to perform database queries and use Enumerable than those in this post
  • My map/reduce code is not optimised – these are my first attempts

In short nothing is optimised, my code is probably awful and I’m making it up as I go along. Here we go then!
Read the rest…

MongoDB: post-discussion thoughts

It’s good to talk. In my previous post, I aired a few issues concerning MongoDB database design. There’s nothing like including a buzzword (“NoSQL”) in your blog posts to bring out comments from the readers. My thoughts are much clearer, thanks in part to the discussion – here are the key points:

It’s OK to be relational
When you read about storing JSON in MongoDB (or similar databases), it’s easy to form the impression that the “correct” approach is always to embed documents and that if you use relational associations, you have somehow “failed”. This is not true. If you feel that your design and subsequent operations benefit from relational association between collections, go for it. That’s why MongoDB provides the option – for flexibility.

Focus on the advantages
Remember why you chose MongoDB in the first place. In my case, a big reason was easy document saves. Even though I’ve chosen two or more collections in my design, the approach still affords advantages. The documents in those collections still contain arrays and hashes, which can be saved “as is”, without parsing. Not to mention that in a Rails environment, doing away with migrations is, in my experience, a huge benefit.

Get comfortable with map-reduce – now!
It’s tempting to pull records out of MongoDB and then “loop through” them, or use Ruby’s Enumerable methods (e.g. collect, inject) to process the results. If this is your approach, stop, right now and read Map-reduce basics by Kyle Banker. Then, start converting all of your old Ruby code to use map-reduce instead.

Before map-reduce, pages in my Rails application were taking seconds – sometimes tens of seconds – to render, when fetching only hundreds or thousands of database records. After: a second or less. That was my map-reduce epiphany and I’ll describe the details in the next blog post.

The “NoSQL” approach: struggling to see the benefits

Document-oriented data modeling is still young. The fact is, many more applications will need to be built on the document model before we can say anything definitive about best practices.
MongoDB Data Modeling and Rails


ISMB 2009 feed, entries by date

This quote from the MongoDB website sums up, for me, the key problem in moving to a document-oriented, schema-free database: design. It’s easy to end up with a solution which resembles a relational database to the extent that you begin to wonder – if you should not just use a relational database. I’d like to illustrate by example.
Read the rest…