Almost every software developer could say a few words about MongoDB. We could say scalability and replication out of the box, easy for clustering... and yes, this is true but to achieve that point there are many things to do!
I have spent a few weeks researching, learning and developing a set of tools to handle all MongoDB technologies and make it ready for real production so...
For a last few weeks my company is running mongo cluster based on over than 10 servers with more than 2 independent replicas and still grows. This is basically first technology we use which is built with concept of “distributed computing” in mind. As this kind of technology is sometimes difficult for administration we had to invest a lot of time for experiments to make sure this is secure for our production. A lot of work has been done in case of automation, orchestration, testing and reporting. We have large set of tools based on puppet, pulsar (sorry private repo), bipbip and cloud services like Copperegg and MMS for cluster backup and restore.
Our cluster is based on few roles: routers (mongos), shards/replicas (mongod), config servers (mongod) and MMS agents. For easy modeling and scaling we introduced Hiera based description for cluster and member roles (example). Each member of cluster is controlled by puppet which contains built in mongodb logic in case of replicasets and shards. Please have a look on very reach set of tests prepared for MongoDB Puppet Module.
The Pulsar tool which was previously used for many applications deployment in many environments was extended to orchestrate MongoDB cluster. So far it works very well for us. Basic implemented tasks list can be found here. Unfortunately full set of tasks is private now (hope will be public soon as well). There is built it simple logic for testing global health of cluster with output useful for e.g. monit or tools like pulsar-rest-api (in development, ETA end of 2014) which will be able to notify sysadmin team over hubot and hipchat or e.g. pager-duty!