Co to jest MongoDB?
Data seem sometimes to have their own life and will, and they refuse to behave as we wish. Then, you need a firm hand to tame the wild data and turn them into quiet and obeying pets.
MongoDB is a document-oriented database. Each MongoDB instance has multiple databases, and each database can have multiple collections.
MongoDB can be thought of as the goodness that erupts when a traditional key-value store collides with a relational database management system, mixing their essences into something that’s not quite either, but rather something novel and fascinating.
MongoDB is written in C++ and offers the following features:
- Collection oriented storage: easy storage of object/JSON-style data
- Dynamic queries
- Full index support, including on inner objects and embedded arrays
- Query profiling
- Efficient storage of binary data including large objects (e.g. photos and videos)
Twierdzenie CAP (Eric Brewer, 2000)
Twierdzenie CAP mówi, że rozproszony system baz danych nie może jednocześnie zapewniać consistency, availability, partition tolerance.
Consistency (spójność) comes in various forms, and that one word covers a myriad of ways errors can creep into your life:
- update: write-write conflict – kilku użytkowników uaktualnia te same dane w tym samym czasie
Approaches for maintaining consistency in the face of concurrency are often described as pessimistic or optimistic. A pessimistic approach works by preventing conflicts from occurring; an optimistic approach lets conflicts occur, but detects them and takes action to sort them out.
- read: różne rodzaje – logical consistency, replication consistency, eventually consistent; również inconsistency window i atomic updates
Availability (dostępność) means that if you can talk to a node in the cluster, it can read and write data.
Left or right-brained?
Prof. R. Wiseman, Neuropsychologist and Magician, suggests easy and entertaining ways to discover whether you are left-brained or right-brained. One way is to clasp your hands casually and see whether right or left thumb is positioned on the top (see the picture). If the left thumb comes on top, you are right brained and artistic, adventurous and accommodative. If the right thumb is on the top, you are analytical, fluent with words and conservative.
Partition tolerance (odporność na podział sieci) means that the cluster can survive communication breakages in the cluster that separate the cluster into multiple partitions unable to communicate with each other (situation known as a split brain).
(źródło: P.J. Sadalage, M. Fowler, NoSQL Distilled)
CAP i MongoDB
Dane w klastrze MongoDB są sharded i replicated (shard możemy przetłumaczyć jako kawałek, replica – kopia).
MongoDB sharded cluster
A sharded cluster consists of shards, mongos routers (an interface to the cluster as a whole; typically reside on the same machines as the application servers), and config servers (store the shard cluster’s state – configuration, location of each database and collection, shard ranges of; typically reside in separate failure domains).
Each shard is a replica set. Replica sets are used for data redundancy, automated failover, read scaling, server maintenance without downtime, and disaster recovery. Shards are used fo write scaling.
Consistency in MongoDB database is configured by using the replica sets and choosing to wait for the writes to be replicated to all the slaves or a given number of slaves. Every write can specify the number of servers the write has to be propagated to before it returns as successful.
By default, a write is reported successful once the database receives it; you can change this so as to wait for the writes to be synced to disk or to propagate to two or more slaves. This is known as write concern.
Document databases try to improve on availability by replicating data using the master-slave setup. The same data is available on multiple nodes and the clients can get to the data even when the primary node is down.
Minimum deployment requirements:
- Each member of a replica set, whether it’s a complete replica or an arbiter, needs to live on a distinct machine.
- Every replicating replica set member needs its own machine.
- Replica set arbiters are lightweight enough to share a machine with another process.
- Config servers can optionally share a machine. The only hard requirement is that all config servers in the config cluster reside on distinct machines.
(źródło, K. Banker, MongoDB in Action)
Przykład jak to można zrobić dla dwóch shards (replica set = 2 kopie + 1 arbiter) i 3 config servers.
Jak widać wystarczą 4 komputery.
Manuale, samouczki, ściągi…
- Karl Seguin. The Little MongoDB Book.
- The MongoDB Manual; single page html, źródło github.
- Pytania z stackoverflow.com. oznaczone etykietką mongodb.
Schemas & Embedded/Non-Embedded Docs
K. Banker, MongoDB Aggregation:
- MapReduce Example Programs – proste grafiki pokazujące o co chodzi w mapreduce
- MapReduce and K-Means Clustering – tylko dla MongoDB
- Cube – an open-source system for visualizing time series data, built on MongoDB, Node and D3