# MongoDB The database is a container of **collections**. The collections are containers of **documents**. The documents are _schema-less_ that is they have a dynamic structure that can change between documents in the same collection. ## Data Types | Tipo | Documento | Funzione | | ----------------- | ------------------------------------------------ | ----------------------- | | Text | `"Text"` | | Boolean | `true` | | Number | `42` | | Objectid | `"_id": {"$oid": ""}` | `ObjectId("")` | | ISODate | `"": {"$date": "YYYY-MM-DDThh:mm:ss.sssZ"}` | `ISODate("YYYY-MM-DD")` | | Timestamp | | `Timestamp(11421532)` | | Embedded Document | `{"a": {...}}` | | Embedded Array | `{"b": [...]}` | It's mandatory for each document ot have an unique field `_id`. MongoDB automatically creates an `ObjectId()` if it's not provided. ## Databases & Collections Usage To create a database is sufficient to switch towards a non existing one with `use ` (implicit creation). The database is not actually created until a document is inserted. ```sh show dbs # list all databases use # use a particular database show collections # list all collection for the current database dbs.dropDatabase() # delete current database db.createCollection(name, {options}) # explicit collection creation db..insertOne({document}) # implicit collection creation ``` ## Operators (MQL Syntax) ```json /* --- Update operators --- */ { "$inc": { "": "", ... } } // Increment value { "$set": { "": "", ... } } // Set value { "$push": { "": "", ... } } // add a value to an array field or turn field into array /* --- Query Operators --- */ { "": { "$in": [ "", "", ...] } } // Membership { "": { "$nin": [ "", "", ...] } } // Membership { "": { "$exists": true } } // Field Exists /* --- Comparison Operators (DEFAULT: $eq) --- */ { "": { "$gt": "" }} // > { "": { "$gte": "" }} // >= { "": { "$lt": "" }} // < { "": { "$lte": "" }} // <= { "": { "$eq": "" }} // == { "": { "$ne": "" }} // != /* --- Logic Operators (DEFAULT $and) --- */ { "$and": [ { "" }, ...] } { "$or": [ { "" }, ...] } { "$nor": [ { "" }, ...] } { "$not": { "" } } /* --- Array Operators --- */ { "": { "$all": ["value>", "", ...] } } // field contains all values { "": { "$size": "" } } { "": { "$elemMatch": { "": "" } } } // elements in array must match an expression /* --- REGEX Operator --- */ { "": { "$regex": "/pattern/", "$options": "" } } { "": { "$regex": "pattern", "$options": "" } } { "": { "$regex": "/pattern/" } } { "": "/pattern/" } ``` ### Expressive Query Operator > **Note**: `$` is used to access the value of the field dynamically ```json { "$expr": { "" } } // aggregation expression, variables, conditional expressions { "$expr": { "$": [ "$", "$" ] } } // compare field values (operators use aggregation syntax) ``` ## Mongo Query Language (MQL) ### Insertion It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`. Insertion results: - error -> rollback - success -> entire documents gets saved ```sh # explicit collection creation, all options are optional db.createCollection( , { capped: , autoIndexId: , size: , max: , storageEngine: , validator: , validationLevel: , validationAction: , indexOptionDefaults: , viewOn: , pipeline: , collation: , writeConcern: } ) db.createCollection("name", { capped: true, size: max_bytes, max: max_docs_num } ) # creation of a capped collection # SIZE: int - will be rounded to a multiple of 256 # implicit creation at doc insertion db..insertOne({ document }, options) # insert a document in a collection db..insertMany([ { document }, { document }, ... ], options) # insert multiple docs db..insertMany([ { document }, { document } ] , { "ordered": false }) # allow the unordered insertion, only documents that cause errors wont be inserted ``` > **Note**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted. ### Querying ```sh db..findOne() # find only one document db..find(filter) # show selected documents db..find().pretty() # show documents formatted db..find().limit(n) # show n documents db..find().limit(n).skip(k) # show n documents skipping k docs db..find().count() # number of found docs db..find().sort({ "": 1, ... , "": -1 }) # show documents sorted by specified keys in ascending (1) or descending (-1) order # projection db..find(filter, { "": 1 }) # show selected values form documents (1 or true => show, 0 or false => don't show, cant mix 0 and 1) db..find(filter, { _id: 0, "": 1 }) # only _id can be set to 0 with other keys at 1 db..find(filter, { "": { "$elemMatch": { "": "" } } }) # project only elements matching the expression # sub documents & arrays db..find({ "..": "" }) db..find({ "..": "" }) # GeoJSON - https://docs.mongodb.com/manual/reference/operator/query/near/index.html db..find( { : { $near: { $geometry: { type: "Point", coordinates: [ , ] }, $maxDistance: , $minDistance: } } } ) db..find().hint( { "": 1 } ) # specify the index db..find().hint( "index-name" ) # specify the index using the index name db..find().hint( { $natural : 1 } ) # force the query to perform a forwards collection scan db..find().hint( { $natural : -1 } ) # force the query to perform a reverse collection scan ``` > **Note**: `{ : }` in case of a field array will match if the array _contains_ the value ### Updating [Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation") ```sh db..replaceOne(filter, update, options) db..updateOne(filter, update, {upsert: true}) # modify document if existing, insert otherwise db..updateOne(filter, { "$push": { ... }, "$set": { ... }, { "$inc": { ... }, ... } }) ``` ### Deletion ```sh db..deleteOne(filter, options) db..deleteMany(filter, options) db..drop() # delete whole collection db.dropDatabase() # delete entire database ``` --- ## MongoDB Database Tools ### [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) Utility to import all docs into a specified collection. If the collection already exists `--drop` deletes it before reuploading it. **WARNING**: CSV separators must be commas (`,`) ```sh mongoimport --uri= --host=<:port>, -h=<:port> --username=, -u= --password=, -p= --collection=, -c= # Specifies the collection to import. --ssl # Enables connection to a mongod or mongos that has TLS/SSL support enabled. --type # Specifies the file type to import. DEFAULT: json --drop # drops the collection before importing the data from the input. --headerline # if file is CSV and first line is header --jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB ``` ### [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/) Utility to export documents into a specified file. ```sh mongoexport --collection= --uri= --host=<:port>, -h=<:port> --username=, -u= --password=, -p= --db=, -d= --collection=, -c= --type= --out=, -o= #Specifies a file to write the export to. DEFAULT: stdout --jsonArray # Write the entire contents of the export as a single json array. --pretty # Outputs documents in a pretty-printed format JSON. --skip= --limit= # Specifies a maximum number of documents to include in the export --sort= # Specifies an ordering for exported results ``` ### [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs] `mongodump` exports the content of a running server into `.bson` files. `mongorestore` Restore backups generated with `mongodump` to a running server. [mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/ [mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/ --- ## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation") Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a _collection scan_ (_COLLSCAN_): scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect (_IXSCAN_). Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index. Indexes _slow down writing operations_ since the index must be updated at every writing. ![IXSCAN](../img/mongodb_ixscan.png ".find() using an index") ### [Index Types](https://docs.mongodb.com/manual/indexes/#index-types) - **Normal**: Fields sorted by name - **Compound**: Multiple Fields sorted by name - **Multikey**: values of sorted arrays - **Text**: Ordered text fragments - **Geospatial**: ordered geodata **Sparse** indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. ### Diagnosis and query planning ```sh db..find({...}).explain() # explain won't accept other functions db.explain()..find({...}) # can accept other functions db.explain("executionStats")..find({...}) # more info ``` ### Index Creation ```sh db..createIndex( , ) db..createIndex( { "": , "": , ... } ) # normal, compound or multikey (field is array) index db..createIndex( { "": "text" } ) # text index db..createIndex( { "": 2dsphere } ) # geospatial 2dsphere index # sparse index db..createIndex( { "": , "": , ... }, { sparse: true } # sparse option ) # custom name db..createIndex( { , }, { name: "index-name" } # name option ) ``` ### [Index Management](https://docs.mongodb.com/manual/tutorial/manage-indexes/) ```sh # view all db indexes db.getCollectionNames().forEach(function(collection) { indexes = db[collection].getIndexes(); print("Indexes for " + collection + ":"); printjson(indexes); }); db..getIndexes() # view collection's index db..dropIndexes() # drop all indexes db..dropIndex( { "index-name": 1 } ) # drop a specific index ``` --- ## Cluster Administration ### `mongod` `mongod` is the main deamon process for MongoDB. It's the core process of the database, handling connections, requests and persisting the data. `mongod` default configuration: - port: `27017` - dbpath: `/data/db` - bind_ip: `localhost` - auth: disabled [`mongod` config file][mongod_config_file] [`mongod` command line options][mongod_cli_options] [mongod_config_file]: https://www.mongodb.com/docs/manual/reference/configuration-options "`mongod` config file docs" [mongod_cli_options]: https://www.mongodb.com/docs/manual/reference/program/mongod/#options "`mongod` command line options docs" ### Basic Shell Helpers ```sh db.() # database interaction db..() # collection interaction rs.(); # replica set deployment and management sh.(); # sharded cluster deployment and management # user management db.createUser() db.dropUser() # collection management db.renameCollection() db..createIndex() db..drop() # database management db.dropDatabase() db.createCollection() # database status db.serverStatus() # database command (underlying to shell helpers and drivers) db.runCommand({ "" }) # help db.commandHelp(") ``` ### Logging The **process log** displays activity on the MongoDB instance and collects activities of various components: Log Verbosity Level: - `-1`: Inherit from parent - `0`: Default Verbosity (Information) - `1 - 5`: Increases the verbosity up to Debug messages ```sh db.getLogComponents() # get components and their verbosity db.adminCommand({"getLog": ""}) # retrieve logs (getLog must be run on admin db -> adminCommand) db.setLogLevel(, ""); # set log level (output is OLD verbosity levels) tail -f /path/to/mongod.log # read end og log file ``` > **Note**: Log Message Structure: ` ...` ### Database Profiling Profiling Levels: - `0`: no profiling - `1`: data on operations slower than `slowms` (default 100ms) - `2`: data on all operations Events captured by the profiler: - CRUD operations - Administrative operations - Configuration operations > **Note**: Logs are saved in the `system.profile` _capped_ collection. ```sh db.setProfilingLevel(n) # set profiler level db.setProfilingLevel(1, { slowms: }) db.getProfilingStatus() # check profiler status db.system.profile.find().limit(n).sort( {} ).pretty() # see logs db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp ``` ### Authentication Client Authentication Mechanisms: - **SCRAM** (Default): Salted Challenge Response Authentication Mechanism - **X.509**: `X.509` Certificate - **LADP**: Lightweight Directory Access Protocol (Enterprise Only) - **KERBEROS** (Enterprise Only) Cluster Authentication Mechanism: ### Authorization: Role Based Access Control (RBAC) Each user has one or more **Roles**. Each role has one or more **Privileges**. A privilege represents a group of _actions_ and the _resources_ those actions apply to. By default no user exists so the ONLY way to act is to connect locally to the server. This is the "localhost exception" and it closes after the _first_ user is created. > **WARN**: Always create an admin user first (ideally with the `userAdmin` role) Role's Resources: - specific database and collection: `{ "db": "", "collection": "" }` - all databases and collections: `{ "db": "", "collection": "" }` - any databases and specific collection: `{ "db": "", "collection": "" }` - specific database and any collection: `{ "db": "", "collection": "" }` - cluster resource: `{ "cluster": true }` Role's Privileges: `{ resource: { }, actions: [ "" ] }` A role can _inherit_ from multiple others and can define **network restrictions** such as _Server Address_ and _Client Source_. Built-in Roles Groups and Names: - Database User: `read`, `readWrite`, `readAnyDatabase`, `readWriteAnyDatabase` - Database Administration: `dbAdmin`, `userAdmin`, `dbOwner`, `dbAdminAnyDatabase`, `userAdminAnyDatabase` - Cluster Administration: `clusterAdmin`, `clusterManager`, `clusterMonitor`, `hostManager` - Backup/Restore: `backup`, `restore` - Super User: `root` ```sh db.createUser( { user: "", pwd: "", roles: [ { role: "", db: "" } ] } ) # add role to existing user db.grantRolesToUser( "", [ { db: "", role: "" } ] ) # show role privilege db.runCommand( { rolesInfo: { db: "", role: "" }, showPrivileges: true } ) ``` ### [Replica set](https://docs.mongodb.com/manual/replication/) A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. ### Sharding **Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB. It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server. A collection containing several documents is splitted in more smaller collections (_shards_) Shards are implemented via cluster that are none other a group of MongoDB instances. Shard components are: - Shards (min 2), instances of MongoDB that contain a subset of the data - A config server, instance of MongoDB which contains metadata on the cluster, that is the set of instances that have the shard data. - A router (or `mongos`), instance of MongoDB used to redirect the user instructions from the client to the correct server. ![Shared Cluster](../img/mongodb_shared-cluster.png "Components of a shared cluster") --- ## [Aggregation Framework](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/) Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)`. Each step of the pipeline acts on its inputs and not on the original data in the collection. ### Variables Variable syntax in aggregations: - `$key`: field path - `$$UPPERCASE`: system variable (e.g.: `$$CURRENT`) - `$$foo`: user defined variable ### Aggregation Syntax ```sh db..aggregate([ { "$project": { "_id": 0, "": 1, ...} }, { "$match": { "" } }, { "$group": { "_id": "", # Group By Expression (Required) "": { "": "" }, ... } }, { "$lookup": { "from": "", "localField": "", "foreignField": "", "as": "" } }, { "$sort": { "": "", "": "", ... } }, { "$count": "" }, { "$skip": "" } { "$limit": "" } { ... } ]) ```