map-reduce

It’s good to talk. In my previous post, I aired a few issues concerning MongoDB database design. There’s nothing like including a buzzword (“NoSQL”) in your blog posts to bring out comments from the readers. My thoughts are much clearer, thanks in part to the discussion – here are the key points:

It’s OK to be relational
When you read about storing JSON in MongoDB (or similar databases), it’s easy to form the impression that the “correct” approach is always to embed documents and that if you use relational associations, you have somehow “failed”. This is not true. If you feel that your design and subsequent operations benefit from relational association between collections, go for it. That’s why MongoDB provides the option – for flexibility.

Focus on the advantages
Remember why you chose MongoDB in the first place. In my case, a big reason was easy document saves. Even though I’ve chosen two or more collections in my design, the approach still affords advantages. The documents in those collections still contain arrays and hashes, which can be saved “as is”, without parsing. Not to mention that in a Rails environment, doing away with migrations is, in my experience, a huge benefit.

Get comfortable with map-reduce – now!
It’s tempting to pull records out of MongoDB and then “loop through” them, or use Ruby’s Enumerable methods (e.g. collect, inject) to process the results. If this is your approach, stop, right now and read Map-reduce basics by Kyle Banker. Then, start converting all of your old Ruby code to use map-reduce instead.

Before map-reduce, pages in my Rails application were taking seconds – sometimes tens of seconds – to render, when fetching only hundreds or thousands of database records. After: a second or less. That was my map-reduce epiphany and I’ll describe the details in the next blog post.

What You're Doing Is Rather Desperate

Notes from the life of a [data] scientist

MongoDB, Mongoid and Map/Reduce

MongoDB: post-discussion thoughts