# MongoDB Cheat Sheet ## Terminologia & concetti base The database is a container of **collections**. The collections are containers of **documents**. The documents are *schema-less* that is they have a dynamic structure that can change between documents in the same colletion. ### Data Types | Tipo | Documento | Funzione | |-------------------|------------------------------------------------|-------------------------| | Text | `"Text"` | | Boolean | `true` | | Number | `42` | | Objectid | `"_id": {"$oid": ""}` | `ObjectId("")` | | ISODate | `"key": {"$date": "YYYY-MM-DDThh:mm:ss.sssZ"}` | `ISODate("YYYY-MM-DD")` | | Timestamp | | `Timestamp(11421532)` | | Embedded Document | `{"a": {...}}` | | Embedded Array | `{"b": [...]}` | It's mandatory for each document ot have an uniwue field `_id`. MongoDB automatically creates an `ObjectId()` if it's not provided. ### Database Usage To create a database is sufficient to switch towards a non existing one with `use ` (implicit creation). The database is not actually created until a document is inserted. ```sh show dbs # list all databases use # use a particular database show collections # list all collection for the current database dbs.dropDatabase() # delete current database ``` ## Collection Ussage ```sh db.createCollection(name, {options}) # explicit collection creation db..insertOne({document}) # implicit collection creation ``` ## CRUD Operations ### Filters Base Syntax: `{ "outerKey.innerKey": "value" }` Comparison: `{ key: { $operator : "value"} }` | Operator | Math Symbol | |----------|-------------| | `$gt` | > | | `$gte` | => | | `$lt` | < | | `$lte` | <= | | `$eq` | == | | `$ne` | != | Field Exists: `{ key: {$exists: true} }` Logical `Or`: `{ $or: [ {filter_1}, {filter_2}, ... ] }` Membership: `{ key: { $in: [value_1, value_2, ...] } }` or `{ key: { $nin: [value_1, value_2, ...] } }` ### Create It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`. Isertion results: - error -> rollback - success -> entire documents gets saved ```sh # explicit collection creation, all options are otional db.createCollection( , { capped: , autoIndexId: , size: , max: , storageEngine: , validator: , validationLevel: , validationAction: , indexOptionDefaults: , viewOn: , pipeline: , collation: , writeConcern: } ) db.createCollection("name", { capped: true, size: max_bytes, max: max_docs_num } ) # creation of a capped collection # SIZE: int - will be rounded to a multiple of 256 # implicit creation at doc insertion db..insertOne({ document }, options) # insert a document in a collection db..insertMany([ { document }, { document }, ... ], options) # insert multiple docs db..insert() ``` Se `insertMany()` causa un errore il processo di inserimento si arresta. Non viene eseguito il rollback dei documenti già inseriti. ### Read ```sh db..findOne() # find only one document db..find(filter) # show selected documents db..find(filter, {key: 1}) # show selected values form documents (1 or true => show, 0 or false => dont show, cant mix 0 and 1) db..find(filter, {_id: 0, key: 1}) # only _id can be set to 0 with other keys at 1 db..find().pretty() # show documents formatted db..find().limit(n) # show n documents db..find().limit(n).skip(k) # show n documents skipping k docs db..find().count() # number of found docs db..find().sort({key1: 1, ... , key_n: -1}) # show documents sorted by specified keys in ascending (1) or descending (-1) order # GeoJSON - https://docs.mongodb.com/manual/reference/operator/query/near/index.html db..find( { : { $near: { $geometry: { type: "Point", coordinates: [ , ] }, $maxDistance: , $minDistance: } } } ) db..find().hint( { : 1 } ) # specify the index db..find().hint( "index-name" ) # specify the index using the index name db..find().hint( { $natural : 1 } ) # force the query to perform a forwards collection scan db..find().hint( { $natural : -1 } ) # force the query to perform a reverse collection scan ``` ### Update [Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation") ```sh db..updateOne(filter, $set: {key: value}) # add or modify values db..updateOne(filter, $set: {key: value}, {upsert: true}) # add or modify values, if attribute doesent exists create it db..updateMany(filter, update) db..replaceOne(filter, { document }, options) ``` ### Delete ```sh db..deleteOne(filter, options) db..deleteMany(filter, options) db..drop() # delete whole collection db.dropDatabase() # delete entire database ``` ## Mongoimport Tool Utility to import all docs into a specified collection. If the collection alredy exists `--drop` deletes it before reuploading it. **WARNING**: CSV separators must be commas (`,`) ```sh mongoimport -h –d –c --drop --jsonArray mongoimport --host --ssl --username --password --authenticationDatabase admin --db --collection --type --file # if file is CSV and first line is header mongoimport ... --haderline ``` ## Mongoexport Tool Utility to export documents into a specified file. ```sh mongoexport -h –d –c mongoexport --host --ssl --username --password --authenticationDatabase admin --db --collection --type --out ``` ## Mongodump & Mongorestore `mongodump` exports the content of a running server into `.bson` files. `mongorestore` Restore backups generated with `mongodump` to a running server. ## Relations **Nested / Embedded Documents**: - Group data locically - Optimal for data belonging together that do not overlap - Should avoid nesting too deep or making too long arrays (max doc size 16 mb) ```json { _id: Objectid() key: "value" key: "value" innerDocument: { key: "value" key: "value" } } ``` **References**: - Divide data between collections - Optimal for related but shared data used in relations or stand-alone - Allows to overtake nidification and size limits NoSQL databases do not have relations and references. It's the app that has to handle them. ```json { key: "value" references: ["id1", "id2"] } // referenced { _id: "id1" key: "value" } ``` ## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation") Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a *collection scan* (*COLLSCAN*): scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect (*IXSCAN*). Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index. Indexes *slow down writing operations* since the index must be updated at every writing. ![IXSCAN](https://docs.mongodb.com/manual/_images/index-for-sort.bakedsvg.svg ".find() using an index") ### [Index Types](https://docs.mongodb.com/manual/indexes/#index-types) - **Normal**: Fields sorted by name - **Compound**: Multiple Fields sorted by name - **Multykey**: values of sorted arrays - **Text**: Ordered text fragments - **Geospatial**: ordered geodata **Sparse** indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. ### Diagnosys and query planning ```sh db..find({...}).explain() # explain won't accept other functions db.explain()..find({...}) # can accept other functions db.explain("executionStats")..find({...}) # more info ``` ### Index Creation ```sh db..createIndex( , ) db..createIndex( { : , : , ... } ) # normal, compound or multikey (field is array) index db..createIndex( { : "text" } ) # text index db..createIndex( { : 2dsphere } ) # geospatial 2dsphere index # sparse index db..createIndex( { : , : , ... }, { sparse: true } # sparse option ) # custom name db..createIndex( { , }, { name: "index-name" } # name option ) ``` ### [Index Management](https://docs.mongodb.com/manual/tutorial/manage-indexes/) ```sh # view all db indexes db.getCollectionNames().forEach(function(collection) { indexes = db[collection].getIndexes(); print("Indexes for " + collection + ":"); printjson(indexes); }); db..getIndexes() # view collenction's index db..dropIndexes() # drop all indexes db..dropIndex( { "index-name": 1 } ) # drop a specific index ``` ## Database Profiling Profiling Levels: - `0`: no profiling - `1`: data on operations slower than `slowms` - `2`: data on all operations Logs are saved in the `system.profile` *capped* collection. ```sh db.setProgilingLevel(n) # set profiler level db.setProfilingLevel(1, { slowms: }) db.getProfilingStatus() # check profiler satus db.system.profile.find().limit(n).sort( {} ).pretty() # see logs db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp ``` ## Roles and permissions **Authentication**: identifies valid users **Authorization**: identifies what a user can do - **userAdminAnyDatabase**: can admin every db in the istance (role must be created on admin db) - **userAdmin**: can admin the specific db in which is created - **readWrite**: can read and write in the specific db in which is created - **read**: can read the specific db in which is created ```sh # create users in the current MongoDB instance db.createUser( { user: "dbAdmin", pwd: "password", roles:[ { role: "userAdminAnyDatabase", db:"admin" } ] }, { user: "username", pwd: "password", roles:[ { role: "role", db: "database" } ] } ) ``` ## Sharding **Sharding** is a MongoDB concept through which big datasests are subdivided in smaller sets and distribuited towards multiple instances of MongoDB. It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server. A collection containing several documents is splitted in more smaller collections (*shards*) Shards are implemented via cluster that are none other a group of MongoDB instances. Shard components are: - Shards (min 2), instances of MongoDB that contain a subset of the data - A config server, instasnce of MongoDB which contains metadata on the cluster, that is the set of instances that have the shard data. - A router (or `mongos`), instance of MongoDB used to redirect the user instructions from the client to the correct server. ![Shared Cluster](https://docs.mongodb.com/manual/_images/sharded-cluster-production-architecture.bakedsvg.svg "Components of a shared cluster") ### [Replica set](https://docs.mongodb.com/manual/replication/) A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. ## Aggregations Sequence of operations applied to a collection as a *pipeline* to get a result: `db.collection.aggregate(pipeline, options)`. [Aggragations Stages][AggrStgs]: - `$lookup`: Right Join - `$match`: Where - `$sort`: Order By - `$project`: Select * - ... [AggrStgs]: https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/ Example: ```sh db.collection.aggregate([ { $lookup: { from: , localField: , foreignField: , as: } }, { $match: { } }, { $sort: { ... } }, { $project: { ... } }, { ... } ]) ```