# MongoDB The database is a container of **collections**. The collections are containers of **documents**. The documents are _schema-less_ that is they have a dynamic structure that can change between documents in the same collection. ## Data Types | Tipo | Documento | Funzione | | ----------------- | ------------------------------------------------ | ----------------------- | | Text | `"Text"` | | Boolean | `true` | | Number | `42` | | Objectid | `"_id": {"$oid": ""}` | `ObjectId("")` | | ISODate | `"": {"$date": "YYYY-MM-DDThh:mm:ss.sssZ"}` | `ISODate("YYYY-MM-DD")` | | Timestamp | | `Timestamp(11421532)` | | Embedded Document | `{"a": {...}}` | | Embedded Array | `{"b": [...]}` | It's mandatory for each document ot have an unique field `_id`. MongoDB automatically creates an `ObjectId()` if it's not provided. ## Databases & Collections Usage To create a database is sufficient to switch towards a non existing one with `use ` (implicit creation). The database is not actually created until a document is inserted. ```sh show dbs # list all databases use # use a particular database show collections # list all collection for the current database dbs.dropDatabase() # delete current database db.createCollection(name, {options}) # explicit collection creation db..insertOne({document}) # implicit collection creation ``` ## Operators ```json /* --- Update operators --- */ { "$inc": { "": , ... } } // Increment value { "$set": { "": "", ... } } // Set value { "$push": { "": "", ... } } // add a value to an array field /* --- Query Operators --- */ { "": { "$in": [ "", "", ...] } } // Membership { "": { "$nin": [ "", "", ...] } } // Membership { "": { "$exists": true } } // Field Exists /* --- Comparison Operators (DEFAULT: $eq) --- */ { "": { "$gt": "" }} // > { "": { "$gte": "" }} // >= { "": { "$lt": "" }} // < { "": { "$lte": "" }} // <= { "": { "$eq": "" }} // == { "": { "$ne": "" }} // != /* --- Logic Operators (DEFAULT $and) --- */ { "$and": [ { }, ...] } { "$or": [ { }, ...] } { "$nor": [ { }, ...] } { "$not": { } } ``` ### Expressive Query Operator `$` is used to access the value of the field dynamically ```json { "$expr": { } } // aggregation expression, variables, conditional expressions { "$expr": { "$comparison_operator": [ "$", "$" ] } } // compare field values ``` ## CRUD Operations ### Create It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`. Insertion results: - error -> rollback - success -> entire documents gets saved ```sh # explicit collection creation, all options are optional db.createCollection( , { capped: , autoIndexId: , size: , max: , storageEngine: , validator: , validationLevel: , validationAction: , indexOptionDefaults: , viewOn: , pipeline: , collation: , writeConcern: } ) db.createCollection("name", { capped: true, size: max_bytes, max: max_docs_num } ) # creation of a capped collection # SIZE: int - will be rounded to a multiple of 256 # implicit creation at doc insertion db..insertOne({ document }, options) # insert a document in a collection db..insertMany([ { document }, { document }, ... ], options) # insert multiple docs db..insertMany([ { document }, { document } ] , { "ordered": false }) # allow the unordered insertion, only documents that cause errors wont be inserted ``` **NOTE**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted. ### Read ```sh db..findOne() # find only one document db..find(filter) # show selected documents db..find(filter, {"": 1}) # show selected values form documents (1 or true => show, 0 or false => don't show, cant mix 0 and 1) db..find(filter, {_id: 0, "": 1}) # only _id can be set to 0 with other keys at 1 db..find().pretty() # show documents formatted db..find().limit(n) # show n documents db..find().limit(n).skip(k) # show n documents skipping k docs db..find().count() # number of found docs db..find().sort({key1: 1, ... , key_n: -1}) # show documents sorted by specified keys in ascending (1) or descending (-1) order # GeoJSON - https://docs.mongodb.com/manual/reference/operator/query/near/index.html db..find( { : { $near: { $geometry: { type: "Point", coordinates: [ , ] }, $maxDistance: , $minDistance: } } } ) db..find().hint( { "": 1 } ) # specify the index db..find().hint( "index-name" ) # specify the index using the index name db..find().hint( { $natural : 1 } ) # force the query to perform a forwards collection scan db..find().hint( { $natural : -1 } ) # force the query to perform a reverse collection scan ``` ### Update [Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation") ```sh db..updateOne(filter, $set: {"": value}) # add or modify values db..updateOne(filter, $set: {"": value}, {upsert: true}) # add or modify values, if attribute doesn't exists create it db..updateMany(filter, update) db..replaceOne(filter, { document }, options) ``` ### Delete ```sh db..deleteOne(filter, options) db..deleteMany(filter, options) db..drop() # delete whole collection db.dropDatabase() # delete entire database ``` ## [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) Utility to import all docs into a specified collection. If the collection already exists `--drop` deletes it before reuploading it. **WARNING**: CSV separators must be commas (`,`) ```sh mongoimport --uri= --host=<:port>, -h=<:port> --username=, -u= --password=, -p= --collection=, -c= # Specifies the collection to import. --ssl # Enables connection to a mongod or mongos that has TLS/SSL support enabled. --type # Specifies the file type to import. DEFAULT: json --drop # drops the collection before importing the data from the input. --headerline # if file is CSV and first line is header --jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB ``` ## [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/) Utility to export documents into a specified file. ```sh mongoexport --collection= --uri= --host=<:port>, -h=<:port> --username=, -u= --password=, -p= --db=, -d= --collection=, -c= --type= --out=, -o= #Specifies a file to write the export to. DEFAULT: stdout --jsonArray # Write the entire contents of the export as a single json array. --pretty # Outputs documents in a pretty-printed format JSON. --skip= --limit= # Specifies a maximum number of documents to include in the export --sort= # Specifies an ordering for exported results ``` ## [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs] `mongodump` exports the content of a running server into `.bson` files. `mongorestore` Restore backups generated with `mongodump` to a running server. [mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/ [mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/ ## Relations **Nested / Embedded Documents**: - Group data logically - Optimal for data belonging together that do not overlap - Should avoid nesting too deep or making too long arrays (max doc size 16 mb) ```json { "_id": Objectid() "": "value" "": "value" "innerDocument": { "": "value" "": "value" } } ``` **References**: - Divide data between collections - Optimal for related but shared data used in relations or stand-alone - Allows to overtake nesting and size limits NoSQL databases do not have relations and references. It's the app that has to handle them. ```json { "": "value" "references": ["id1", "id2"] } // referenced { "_id": "id1" "": "value" } ``` ## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation") Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a _collection scan_ (_COLLSCAN_): scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect (_IXSCAN_). Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index. Indexes _slow down writing operations_ since the index must be updated at every writing. ![IXSCAN](../img/mongodb_ixscan.png ".find() using an index") ### [Index Types](https://docs.mongodb.com/manual/indexes/#index-types) - **Normal**: Fields sorted by name - **Compound**: Multiple Fields sorted by name - **Multikey**: values of sorted arrays - **Text**: Ordered text fragments - **Geospatial**: ordered geodata **Sparse** indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. ### Diagnosis and query planning ```sh db..find({...}).explain() # explain won't accept other functions db.explain()..find({...}) # can accept other functions db.explain("executionStats")..find({...}) # more info ``` ### Index Creation ```sh db..createIndex( , ) db..createIndex( { "": , "": , ... } ) # normal, compound or multikey (field is array) index db..createIndex( { "": "text" } ) # text index db..createIndex( { "": 2dsphere } ) # geospatial 2dsphere index # sparse index db..createIndex( { "": , "": , ... }, { sparse: true } # sparse option ) # custom name db..createIndex( { , }, { name: "index-name" } # name option ) ``` ### [Index Management](https://docs.mongodb.com/manual/tutorial/manage-indexes/) ```sh # view all db indexes db.getCollectionNames().forEach(function(collection) { indexes = db[collection].getIndexes(); print("Indexes for " + collection + ":"); printjson(indexes); }); db..getIndexes() # view collection's index db..dropIndexes() # drop all indexes db..dropIndex( { "index-name": 1 } ) # drop a specific index ``` ## Database Profiling Profiling Levels: - `0`: no profiling - `1`: data on operations slower than `slowms` - `2`: data on all operations Logs are saved in the `system.profile` _capped_ collection. ```sh db.setProfilingLevel(n) # set profiler level db.setProfilingLevel(1, { slowms: }) db.getProfilingStatus() # check profiler status db.system.profile.find().limit(n).sort( {} ).pretty() # see logs db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp ``` ## Roles and permissions **Authentication**: identifies valid users **Authorization**: identifies what a user can do - **userAdminAnyDatabase**: can admin every db in the instance (role must be created on admin db) - **userAdmin**: can admin the specific db in which is created - **readWrite**: can read and write in the specific db in which is created - **read**: can read the specific db in which is created ```sh # create users in the current MongoDB instance db.createUser( { user: "dbAdmin", pwd: "password", roles:[ { role: "userAdminAnyDatabase", db:"admin" } ] }, { user: "username", pwd: "password", roles:[ { role: "role", db: "database" } ] } ) ``` ## Sharding **Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB. It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server. A collection containing several documents is splitted in more smaller collections (_shards_) Shards are implemented via cluster that are none other a group of MongoDB instances. Shard components are: - Shards (min 2), instances of MongoDB that contain a subset of the data - A config server, instance of MongoDB which contains metadata on the cluster, that is the set of instances that have the shard data. - A router (or `mongos`), instance of MongoDB used to redirect the user instructions from the client to the correct server. ![Shared Cluster](../img/mongodb_shared-cluster.png "Components of a shared cluster") ### [Replica set](https://docs.mongodb.com/manual/replication/) A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. ## Aggregations Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)`. [Aggregations Stages][aggeregation_stages_docs]: - `$lookup`: Right Join - `$match`: Where - `$sort`: Order By - `$project`: Select \* - ... [aggeregation_stages_docs]: https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/ Example: ```sh db.collection.aggregate([ { $lookup: { from: , localField: , foreignField: , as: } }, { $match: { } }, { $sort: { ... } }, { $project: { ... } }, { ... } ]) ```