Quick Intro
        
        Rob Pocklington
        Full-stack developer (jack of all trades)
        Started with the usual SQL DBs (8 - 10 years)
        Tinkered with Graph DB (Neo4J) for 1 year
        
Been working with MongoDB in production for 2+ years
      
      
      
        What does NoSQL typically mean?
        
        Web-scalable
        Fault tolerance
        Scalable architecture
        Different QLs
        K,V or JSON-based
        Eventual (or tunable) consistency
        
      
      
      
        What's out there? (K,V)
        DynamoDB
        Cassandra
        RocksDB
        Redis
        Neo4J - GraphDB
        
      
      
      
        What's out there? (NoSQL)
        RethinkDB
        CouchBase
        Riak
        MongoRocks
        Elastic Search
        
          FoundationDB
        
        
      
      
      
        Introduction - what is MongoDB?
        
        A Document DB
        Horizontally scalable
        Designed for high performance
        Designed for flexibility
        Framework SDKs and drivers in most languages
(Ruby, Java, .NET, Javascript etc)
        Drivers in all common languages
        
      
      
      
        MongoDB - History
        
        Build in 2007 by 10Gen to support their PaaS
        Suffered from some early bad-press
(optimistic defaults = loss of data)
        Used by Foursquare, Forbes, Disney, Cisco, Github, Bitly, Ebay, LinkedIn, CraigsList, Adobe etc.
        Now the fourth most popular DB and the most popular NoSQL DB
        It is open source!
        
      
      
      
        Features
        Strong Data Types (Dates, Booleans, Arrays ...)
        
        Extensive query support
        Large file storage (GridFS)
        Indexing / Load Balancing
        Capped Collections
        Map reduce
        Aggregation pipelines
      
      
      
        Features (cont...)
        
        No join tables
        No transactions (tunable consistency)
        Atomic at a document-level
        Geo-query support (simple and complex)
        Full text searching (not as good as ES)
        
      
      
      
        Features (cont...)
        
        Command Line Interface (REPL)
        Sharding and Replication (for scaling and redundancy)
        Full set of tooling (backup and monitoring)
        Custom Validations (think: constraints)
        BI (v3.2)
        
      
      
      
        Software Vendors
        
        https://mongolab.com
        https://scalegrid.io
        https://cloud.mongodb.com
      
      
      
        Common Reasons for Use
        
        Flexible - good for relational, denormalised or graph data structures
        Cuts down on time to market and speeds development
        
natural choice for any JSON structured data
        popular choice for IoT (real time metrics)
        
        
      
      
      
        Considerations
        
        Max size of document (16MB)
        Big Files / Binaries >> GridFS
        No schema != schemaless
        Schema design (referencing vs. embedding)
        
Begin with the end in mind
        
        
      
      
      
        CAP Theorem
         
      
      
      
        MongoDB Architecture
        
        Optimised for 64-bit systems, written in C++ (some Go for newer tools)
        Tunable C.A.P (Consistent, Available or Partial Tolerance)
        
        Pluggable Storage engines (3.0)
        
        MMAPv1 (original)
        WiredTiger (with compression)
        MongoRocks (extra library)
      
      
      
        MongoDB Design
        
        GUIDs for id
        
Has an equivalent to foreign key relationships (referencing)
        1 -> M, M -> 1, M -> N
        
or you can embed (more later)
        
      
      
      
        Schema Considations
        
        Design is the difference between loving and hating MongoDB
        Schema reviews are important
        DB Migrations are easy (but still necessary)
        Design for use (fast write / fast read)
        
Denormalisation is not evil
        
      
      
      
        Mongo Commands
        
        Commands in MongoDB (vs. SQL)
        Let's CRUD in Mongo!
        Other queries (Regex, etc.)
        Atomic updates
        
3T - MongoChef Demo
        
      
      
      
        Referenced Documents (foreign keys)
        
         
      
      
      
        Embedded Documents (nested)
        
         
      
      
      
        Banking scenario
        
        You can't use non-transaction dbs for banking, right?
        Well, Stripe does.
        
        Just get it right!
        
        To be atomic in MongoDB you must execute an atomic operation on a single document.
      
      
      
        Banking scenario (cont...)
        
        There is no `BEGIN TRANSACTION` - time to do it another way.
        
        db.account.update({ _id: ..., balance: { $gte: amount }}, 
{ $inc: { balance: amount }});
        
        More complex strategies can use MVCC or 2-phase commits if required
        
      
      
      
        Security
        
        Access Control (roles / permissions)
        Limiting network access (ports)
        Certificates (SSL)
        Encryption (file system)
        Trust between boxes
        
      
      
      
        Durability
        
        Durability is a question of how much data would be lost in a crash.
        Ultimately, you can define how consistent / available you want to be in MongoDB.
        
        
        Write concern
        0, 1, majority
        Read concern
        local, majority
      
      
      
        Journalling
        
        Journalling is MongoDBs way to make pending operations durable (per node).
        By default it writes the journal every 50ms (configurable down to 2ms)
        
In practice, replication and good backup processes are more important than absolute durability.
        
        
        
      
      
      
        Replication (Replica Sets)
        
        Creates additional copies of the data and allows for automatic failover to another node.
        Requires heart-beat / time synchronisation
        Can improve read performance (unless read from master is required)
        
Think: RAID 1 - mirroring aka duplication (for redundancy)
        
        
        
        
        
      
      
      
        Replication (cont...)
        
        Can use hidden and delayed replicas for analytics / monitoring
        
        
        Replicate locally (separate disk for example) just to sort out configuration first.
        Careful creating and adding a replica - don't do it in peak traffic!
        Consider restoring a primary backup to a replica then adding (less delta)
        
      
      
      
        Sharding
        
        Allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key.
        It's important to choose a good shard key.
        Think RAID 0 - striping aka splitting (for performance).
        
        NOTE: Don't do sharding without replication first.
        
        
      
      
      
        Sharding (cont...)
        
        Shard is done per-collection
        Choosing shard key (eg. region / country)
        Shard locally before sharding over network (work out issues before adding latency)
        
        Avoid sharding unless you've explored all other scaling options.
        
      
      
      
      
        IoT (Internet of Things)
        
        The next big Thing™   (sorry Cloud)
        efficient logging of RT metrics
        aggregate metrics to minute-level (in an array)
        store in per-hour documents
        
      
      
      
        IoT (cont...)
        
        pre-pad document to avoid fragmentation
        pre-allocate 60 seconds for 1 minute of data
        also for rolling aggregate metrics (ie. last hour, last day last week)
        use slice to keep it the same size (less disk fragmenetation)
        
      
      
      
        Metrics - Map-Reduce (old way)
        
        Original form of real-time data processing
        Outputs results to another collection
        Superceded by Aggregation Pipelines
        Can be useful for discovering what data you ultimately want
        Executed in Javascript (vs. C++ for Aggregations)
        
      
      
      
        Metrics - Aggregation Pipelines
        
        Faster than superman
        Good for daily, weekly, monthly data
        
Example
        
      
      
      
        Backup and Restoring
        
        Simpler than most:
        mongodump and mongorestore
        
        Backup admin db if you want to keep roles / permissions
        Ensure you restore the admin db as well
        
Demo
        
      
      
      
        Backup and Restoring (cont...)
        
         Can backup volume (ie. snapshot on EC2)
        
        Vendor Solutions
         Ops Manager (Enterprise $)
         Mongo DB Cloud Manager ($)
      
      
      
        Performance Tuning
        
        Monitor the usual suspects (memory, disk and CPU)
        mongotop, mongostat, htop
        
Monitor page faults
        Monitor Index misses (tune your queries) and
        DB Queue length (is the node saturated / hammered?)
        
      
      
        Finishing
        
        Q & A
        Thanks!
Presentation is available at: 
http://rp.js.org/mongodb-pres