Shift8 Creative Graphic Design and Website Development

Machine Learning in MongoDB

Posted by Tom on Sat, Dec 03 2011 11:19:00

I'm very excited to be speaking at MongoSV on Friday, December 9th about some of my research on machine learning in MongoDB. I've implemented a naive bayes classifier within MongoDB and it works quite well. I will post a good write up (and slides) about that later.

I wanted to leave a blog post for people to sort of list out some of the things I'll be blogging about in the near future here. More than just machine learning algorithms, there's also some other data mining and indexing algorithms that I'm running within MongoDB that I want to discuss. While I'm not a mathematician or expert in statistics...I have been able to disect enough of that crazy math to get me where I need to be for my goals and apps at hand.

So the question keeps driving me is, what kind of creative things can one do with MongoDB? Mongo offers a lot of great features and the 10gen team is hard at work adding more and improving existing features (along with the all important performance improvements). 

Some of the things I'll be blogging about in the future include running algorithms like the naive bayes classifier inside MongoDB as well as:

  • Other text processing algorithms and methods such as stemming
  • Internal, stored JavaScript within MongoDB and benchmarking it to determine when you may want to do it and when you don't
  • Implmenting the nearest neighbour algorithm in MongoDB
  • How about a search engine in MongoDB? What about stored JavaScript that is responsible for indexing other documents to later be searched for?
  • Playing around with the new ability of multiple geo-spatial indexing per document and what that can do for us
  • ...then maybe some more crazy stuff like trainable neural networks (farther down on my list of research items, but way cool)

So stay tuned! As always, I'm super swamped with work...But this weekend, I've managed to set some good ground work for easily storing JavaScript within MongoDB using PHP and the Lithium framework. I've also started playing around with the Porter stemmer algorithm within MongoDB. Likely for that, especially when using PHP...Using the pecl extension is going to be better. However, it's good to see what we can do in MongoDB.

I'll leave you all with some parting gifts here...The knowledge and research of others.

Here is a great article on stored procedures in MongoDB with PHP.

Here is another geared toward Python.

An example of nearest neighbour in PHP.


[Back To Blog Index]