Shift8 Creative Graphic Design and Website Development

Creating a Simple and Efficient Rating System with MongoDB

Posted by Tom on Fri, Feb 18 2011 13:51:00

I wrote a previous article on using MongoDB's non-locking updates and it's $inc operator in order to allow for a simple counter that would create minimal impact on your database. Related to that, let's suppose we want to make a rating system. Say it's a star rating; however, it can be any kind of voting or polling system really.

So here's the trick. You're storing this star rating on a piece of content (a document) and you already queried that so you can display the data on the page. We're in NoSQL land so we don't use JOINs and we really don't want to query another "rating" collection of some sort. There's no point. So we keep under this document a "ratings" array. Now when you load the document you can simply count and divide to get your average with the language of your choice. You could also make another count() query to MongoDB but again we want one query.

So how do you store ratings? You want to do an non-locking update to the database and much like incrementing, we want the impact on the database to be low. It's not that MongoDB isn't an amazing database that is more than capable of handling these updates, even if you need to make a read first, but we want to be efficient. Let's assume we're making millions of queries and this is all adding up to save us more than a few pennies in hosting costs.

Ok, so let's take a look at the $set operator. It's a simple one, nothing magical. It's just some cleverness with what we are setting. Under this "ratings" array we're going to set each users IP address (substitute for user id if you have some sort of authentication system you'd like to rely on instead) into this ratings array with a value. 

$set will set the value or update it. So you don't have to worry about repeat voters! They simply will change their vote and not skew your results. If you don't want the users to be able to vote twice then you can simply disable the link to vote on the page by looking for the IP in the array of ratings. Which, yes, is a simple deterrent and the user could technically lift the URL to hit to make the request to vote again...But again, your results aren't skewed and if you need further protection you can come up with something else. 

Let's apply this to my favorite framework here, Lithium. Here's what your query might look like:

$query = array(
  '$set'=> array('rating.' . str_replace('.', '-', $_SERVER['REMOTE_ADDR']) => $rating)
);

$conditions = array('_id' => $some_id);

$result = Page::update($query, $conditions, array('atomic' => false));

Do you see the gotchya? It's the IP address. You can't have a key name with dots in it. When passed to MongoDB it's going to translate to object hierarchy. So you'd end up with something like: rating" : { "192" : { "168" : { "126" : { "1" : "3" } } } } ... No good. So if we replace the dots with dashes, underscores, or hash it, etc. then you'll end up with something like this instead: "rating" : { "192-168-126-1" : "5", "192-168-126-2" : "3" }

Now if the same IP address voted again it would simply change the value in the "rating" array on the document. This way a user couldn't skew the rating by contstantly clicking on a star rating widget, submitting a form, etc. Of course you can further limit the impact to your database and server by setting a cookie.

Now it's just simple math to get the average. If you're using PHP, you'll run a count() on the ratings array and then add up the values and divide. Simple!

What would you have to do alternatively? Well, you might store another field with all IP's or user id's that voted in addition to a rating field... But now you've disassociated users with their votes. You could also store a different field for each option and then use $addToSet to add the IP to each option. Great, but then you allow a user to vote once for each option. You won't be completely skewed, but you wouldn't be as accurate. What about MySQL and relational database land? Well, you probably don't need to hear about all the ways you can use JOIN and how many rows you'll be scanning through to get your results and what kind of indexes you need to make sure you build.

Hope you found this little schema example useful in a schemaless world.


[Back To Blog Index]