diff --git a/docs/database/mongo-db.md b/docs/database/mongo-db.md index 66ee6f3..db214fc 100644 --- a/docs/database/mongo-db.md +++ b/docs/database/mongo-db.md @@ -1,543 +1,763 @@ -# MongoDB - -The database is a container of **collections**. The collections are containers of **documents**. - -The documents are _schema-less_ that is they have a dynamic structure that can change between documents in the same collection. - -## Data Types - -| Tipo | Documento | Funzione | -| ----------------- | ------------------------------------------------ | ----------------------- | -| Text | `"Text"` | -| Boolean | `true` | -| Number | `42` | -| Objectid | `"_id": {"$oid": ""}` | `ObjectId("")` | -| ISODate | `"": {"$date": "YYYY-MM-DDThh:mm:ss.sssZ"}` | `ISODate("YYYY-MM-DD")` | -| Timestamp | | `Timestamp(11421532)` | -| Embedded Document | `{"a": {...}}` | -| Embedded Array | `{"b": [...]}` | - -It's mandatory for each document ot have an unique field `_id`. -MongoDB automatically creates an `ObjectId()` if it's not provided. - -## Databases & Collections Usage - -To create a database is sufficient to switch towards a non existing one with `use ` (implicit creation). -The database is not actually created until a document is inserted. - -```sh -show dbs # list all databases -use # use a particular database -show collections # list all collection for the current database - -dbs.dropDatabase() # delete current database - -db.createCollection(name, {options}) # explicit collection creation -db..insertOne({document}) # implicit collection creation -``` - -## Operators (MQL Syntax) - -```json -/* --- Update operators --- */ -{ "$inc": { "": "", ... } } // Increment value -{ "$set": { "": "", ... } } // Set value -{ "$push": { "": "", ... } } // add a value to an array field or turn field into array - -/* --- Query Operators --- */ -{ "": { "$in": [ "", "", ...] } } // Membership -{ "": { "$nin": [ "", "", ...] } } // Membership -{ "": { "$exists": true } } // Field Exists - -/* --- Comparison Operators (DEFAULT: $eq) --- */ -{ "": { "$gt": "" }} // > -{ "": { "$gte": "" }} // >= -{ "": { "$lt": "" }} // < -{ "": { "$lte": "" }} // <= -{ "": { "$eq": "" }} // == -{ "": { "$ne": "" }} // != - -/* --- Logic Operators (DEFAULT $and) --- */ -{ "$and": [ { "" }, ...] } -{ "$or": [ { "" }, ...] } -{ "$nor": [ { "" }, ...] } -{ "$not": { "" } } - -/* --- Array Operators --- */ -{ "": { "$all": ["value>", "", ...] } } // field contains all values -{ "": { "$size": "" } } -{ "": { "$elemMatch": { "": "" } } } // elements in array must match an expression - -/* --- REGEX Operator --- */ -{ "": { "$regex": "/pattern/", "$options": "" } } -{ "": { "$regex": "pattern", "$options": "" } } -{ "": { "$regex": "/pattern/" } } -{ "": "/pattern/" } -``` - -### Expressive Query Operator - -> **Note**: `$` is used to access the value of the field dynamically - -```json -{ "$expr": { "" } } // aggregation expression, variables, conditional expressions -{ "$expr": { "$": [ "$", "$" ] } } // compare field values (operators use aggregation syntax) -``` - -## Mongo Query Language (MQL) - -### Insertion - -It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`. - -Insertion results: - -- error -> rollback -- success -> entire documents gets saved - -```sh -# explicit collection creation, all options are optional -db.createCollection( , - { - capped: , - autoIndexId: , - size: , - max: , - storageEngine: , - validator: , - validationLevel: , - validationAction: , - indexOptionDefaults: , - viewOn: , - pipeline: , - collation: , - writeConcern: - } -) - -db.createCollection("name", { capped: true, size: max_bytes, max: max_docs_num } ) # creation of a capped collection -# SIZE: int - will be rounded to a multiple of 256 - -# implicit creation at doc insertion -db..insertOne({ document }, options) # insert a document in a collection -db..insertMany([ { document }, { document }, ... ], options) # insert multiple docs -db..insertMany([ { document }, { document } ] , { "ordered": false }) # allow the unordered insertion, only documents that cause errors wont be inserted -``` - -> **Note**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted. - -### Querying - -```sh -db..findOne() # find only one document -db..find(filter) # show selected documents -db..find().pretty() # show documents formatted -db..find().limit(n) # show n documents -db..find().limit(n).skip(k) # show n documents skipping k docs -db..find().count() # number of found docs -db..find().sort({ "": 1, ... , "": -1 }) # show documents sorted by specified keys in ascending (1) or descending (-1) order - -# projection -db..find(filter, { "": 1 }) # show selected values form documents (1 or true => show, 0 or false => don't show, cant mix 0 and 1) -db..find(filter, { _id: 0, "": 1 }) # only _id can be set to 0 with other keys at 1 -db..find(filter, { "": { "$elemMatch": { "": "" } } }) # project only elements matching the expression - -# sub documents & arrays -db..find({ "..": "" }) -db..find({ "..": "" }) - -# GeoJSON - https://docs.mongodb.com/manual/reference/operator/query/near/index.html -db..find( - { - : { - $near: { - $geometry: { type: "Point", coordinates: [ , ] }, - $maxDistance: , - $minDistance: - } - } - } -) - -db..find().hint( { "": 1 } ) # specify the index -db..find().hint( "index-name" ) # specify the index using the index name - -db..find().hint( { $natural : 1 } ) # force the query to perform a forwards collection scan -db..find().hint( { $natural : -1 } ) # force the query to perform a reverse collection scan -``` - -> **Note**: `{ : }` in case of a field array will match if the array _contains_ the value - -### Updating - -[Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation") - -```sh -db..replaceOne(filter, update, options) -db..updateOne(filter, update, {upsert: true}) # modify document if existing, insert otherwise - -db..updateOne(filter, { "$push": { ... }, "$set": { ... }, { "$inc": { ... }, ... } }) -``` - -### Deletion - -```sh -db..deleteOne(filter, options) -db..deleteMany(filter, options) - -db..drop() # delete whole collection -db.dropDatabase() # delete entire database -``` - ---- - -## MongoDB Database Tools - -### [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) - -Utility to import all docs into a specified collection. -If the collection already exists `--drop` deletes it before reuploading it. -**WARNING**: CSV separators must be commas (`,`) - -```sh -mongoimport - ---uri= ---host=<:port>, -h=<:port> ---username=, -u= ---password=, -p= ---collection=, -c= # Specifies the collection to import. ---ssl # Enables connection to a mongod or mongos that has TLS/SSL support enabled. ---type # Specifies the file type to import. DEFAULT: json ---drop # drops the collection before importing the data from the input. ---headerline # if file is CSV and first line is header ---jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB -``` - -### [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/) - -Utility to export documents into a specified file. - -```sh -mongoexport --collection= - ---uri= ---host=<:port>, -h=<:port> ---username=, -u= ---password=, -p= ---db=, -d= ---collection=, -c= ---type= ---out=, -o= #Specifies a file to write the export to. DEFAULT: stdout ---jsonArray # Write the entire contents of the export as a single json array. ---pretty # Outputs documents in a pretty-printed format JSON. ---skip= ---limit= # Specifies a maximum number of documents to include in the export ---sort= # Specifies an ordering for exported results -``` - -### [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs] - -`mongodump` exports the content of a running server into `.bson` files. - -`mongorestore` Restore backups generated with `mongodump` to a running server. - -[mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/ -[mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/ - ---- - -## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation") - -Indexes support the efficient execution of queries in MongoDB. - -Without indexes, MongoDB must perform a _collection scan_ (_COLLSCAN_): scan every document in a collection, to select those documents that match the query statement. -If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect (_IXSCAN_). - -Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index. - -Indexes _slow down writing operations_ since the index must be updated at every writing. - -![IXSCAN](../img/mongodb_ixscan.png ".find() using an index") - -### [Index Types](https://docs.mongodb.com/manual/indexes/#index-types) - -- **Normal**: Fields sorted by name -- **Compound**: Multiple Fields sorted by name -- **Multikey**: values of sorted arrays -- **Text**: Ordered text fragments -- **Geospatial**: ordered geodata - -**Sparse** indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. - -### Diagnosis and query planning - -```sh -db..find({...}).explain() # explain won't accept other functions -db.explain()..find({...}) # can accept other functions -db.explain("executionStats")..find({...}) # more info -``` - -### Index Creation - -```sh -db..createIndex( , ) - -db..createIndex( { "": , "": , ... } ) # normal, compound or multikey (field is array) index -db..createIndex( { "": "text" } ) # text index -db..createIndex( { "": 2dsphere } ) # geospatial 2dsphere index - -# sparse index -db..createIndex( - { "": , "": , ... }, - { sparse: true } # sparse option -) - -# custom name -db..createIndex( - { , }, - { name: "index-name" } # name option -) -``` - -### [Index Management](https://docs.mongodb.com/manual/tutorial/manage-indexes/) - -```sh -# view all db indexes -db.getCollectionNames().forEach(function(collection) { - indexes = db[collection].getIndexes(); - print("Indexes for " + collection + ":"); - printjson(indexes); -}); -db..getIndexes() # view collection's index - -db..dropIndexes() # drop all indexes -db..dropIndex( { "index-name": 1 } ) # drop a specific index -``` - ---- - -## Cluster Administration - -### `mongod` - -`mongod` is the main deamon process for MongoDB. It's the core process of the database, -handling connections, requests and persisting the data. - -`mongod` default configuration: - -- port: `27017` -- dbpath: `/data/db` -- bind_ip: `localhost` -- auth: disabled - -[`mongod` config file][mongod_config_file] -[`mongod` command line options][mongod_cli_options] - -[mongod_config_file]: https://www.mongodb.com/docs/manual/reference/configuration-options "`mongod` config file docs" -[mongod_cli_options]: https://www.mongodb.com/docs/manual/reference/program/mongod/#options "`mongod` command line options docs" - -### Basic Shell Helpers - -```sh -db.() # database interaction -db..() # collection interaction -rs.(); # replica set deployment and management -sh.(); # sharded cluster deployment and management - -# user management -db.createUser() -db.dropUser() - -# collection management -db.renameCollection() -db..createIndex() -db..drop() - -# database management -db.dropDatabase() -db.createCollection() - -# database status -db.serverStatus() - -# database command (underlying to shell helpers and drivers) -db.runCommand({ "" }) - -# help -db.commandHelp(") -``` - -### Logging - -The **process log** displays activity on the MongoDB instance and collects activities of various components: - -Log Verbosity Level: - -- `-1`: Inherit from parent -- `0`: Default Verbosity (Information) -- `1 - 5`: Increases the verbosity up to Debug messages - -```sh -db.getLogComponents() # get components and their verbosity -db.adminCommand({"getLog": ""}) # retrieve logs (getLog must be run on admin db -> adminCommand) -db.setLogLevel(, ""); # set log level (output is OLD verbosity levels) - -tail -f /path/to/mongod.log # read end og log file -``` - -> **Note**: Log Message Structure: ` ...` - -### Database Profiling - -Profiling Levels: - -- `0`: no profiling -- `1`: data on operations slower than `slowms` (default 100ms) -- `2`: data on all operations - -Events captured by the profiler: - -- CRUD operations -- Administrative operations -- Configuration operations - -> **Note**: Logs are saved in the `system.profile` _capped_ collection. - -```sh -db.setProfilingLevel(n) # set profiler level -db.setProfilingLevel(1, { slowms: }) -db.getProfilingStatus() # check profiler status - -db.system.profile.find().limit(n).sort( {} ).pretty() # see logs -db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp -``` - -### Authentication - -Client Authentication Mechanisms: - -- **SCRAM** (Default): Salted Challenge Response Authentication Mechanism -- **X.509**: `X.509` Certificate -- **LADP**: Lightweight Directory Access Protocol (Enterprise Only) -- **KERBEROS** (Enterprise Only) - -Cluster Authentication Mechanism: - -### Authorization: Role Based Access Control (RBAC) - -Each user has one or more **Roles**. Each role has one or more **Privileges**. -A privilege represents a group of _actions_ and the _resources_ those actions apply to. - -By default no user exists so the ONLY way to act is to connect locally to the server. -This is the "localhost exception" and it closes after the _first_ user is created. - -> **WARN**: Always create an admin user first (ideally with the `userAdmin` role) - -Role's Resources: - -- specific database and collection: `{ "db": "", "collection": "" }` -- all databases and collections: `{ "db": "", "collection": "" }` -- any databases and specific collection: `{ "db": "", "collection": "" }` -- specific database and any collection: `{ "db": "", "collection": "" }` -- cluster resource: `{ "cluster": true }` - -Role's Privileges: `{ resource: { }, actions: [ "" ] }` - -A role can _inherit_ from multiple others and can define **network restrictions** such as _Server Address_ and _Client Source_. - -Built-in Roles Groups and Names: - -- Database User: `read`, `readWrite`, `readAnyDatabase`, `readWriteAnyDatabase` -- Database Administration: `dbAdmin`, `userAdmin`, `dbOwner`, `dbAdminAnyDatabase`, `userAdminAnyDatabase` -- Cluster Administration: `clusterAdmin`, `clusterManager`, `clusterMonitor`, `hostManager` -- Backup/Restore: `backup`, `restore` -- Super User: `root` - -```sh -db.createUser( - { - user: "", - pwd: "", - roles: [ { role: "", db: "" } ] - } -) - -# add role to existing user -db.grantRolesToUser( "", [ { db: "", role: "" } ] ) - -# show role privilege -db.runCommand( { rolesInfo: { db: "", role: "" }, showPrivileges: true } ) -``` - -### [Replica set](https://docs.mongodb.com/manual/replication/) - -A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. - -### Sharding - -**Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB. -It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server. - -A collection containing several documents is splitted in more smaller collections (_shards_) -Shards are implemented via cluster that are none other a group of MongoDB instances. - -Shard components are: - -- Shards (min 2), instances of MongoDB that contain a subset of the data -- A config server, instance of MongoDB which contains metadata on the cluster, that is the set of instances that have the shard data. -- A router (or `mongos`), instance of MongoDB used to redirect the user instructions from the client to the correct server. - -![Shared Cluster](../img/mongodb_shared-cluster.png "Components of a shared cluster") - ---- - -## [Aggregation Framework](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/) - -Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)`. -Each step of the pipeline acts on its inputs and not on the original data in the collection. - -### Variables - -Variable syntax in aggregations: - -- `$key`: field path -- `$$UPPERCASE`: system variable (e.g.: `$$CURRENT`) -- `$$foo`: user defined variable - -### Aggregation Syntax - -```sh - -db..aggregate([ - { "$project": { "_id": 0, "": 1, ...} }, - - { "$match": { "" } }, - - { "$group": { - "_id": "", # Group By Expression (Required) - "": { "": "" }, - ... - } - }, - - { - "$lookup": { - "from": "", - "localField": "", - "foreignField": "", - "as": "" - } - }, - - { "$sort": { "": "", "": "", ... } }, - - { "$count": "" }, - - { "$skip": "" } - - { "$limit": "" } - - { ... } -]) -``` +# MongoDB + +The database is a container of **collections**. The collections are containers of **documents**. + +The documents are _schema-less_ that is they have a dynamic structure that can change between documents in the same collection. + +## Data Types + +| Type | Example Value | Function | +| ----------------- | ------------------------------------------------ | ----------------------- | +| Text | `"Text"` | +| Boolean | `true` | +| Number | `42` | +| Objectid | `"_id": {"$oid": ""}` | `ObjectId("")` | +| ISODate | `"": {"$date": "YYYY-MM-DDThh:mm:ss.sssZ"}` | `ISODate("YYYY-MM-DD")` | +| Timestamp | | `Timestamp(11421532)` | +| Embedded Document | `{"a": {...}}` | +| Embedded Array | `{"b": [...]}` | + +It's mandatory for each document ot have an unique field `_id`. +MongoDB automatically creates an `ObjectId()` if it's not provided. + +## Databases & Collections Usage + +To create a database is sufficient to switch towards a non existing one with `use ` (implicit creation). +The database is not actually created until a document is inserted. + +```sh +show dbs # list all databases +use # use a particular database +show collections # list all collection for the current database + +dbs.dropDatabase() # delete current database + +db.createCollection(name, {options}) # explicit collection creation +db..insertOne({document}) # implicit collection creation +``` + +## Operators (MQL Syntax) + +```json +/* --- Update operators --- */ +{ "$inc": { "": "", ... } } // Increment value +{ "$set": { "": "", ... } } // Set value +{ "$push": { "": "", ... } } // add a value to an array field or turn field into array + +/* --- Query Operators --- */ +{ "": { "$in": [ "", "", ...] } } // Membership +{ "": { "$nin": [ "", "", ...] } } // Membership +{ "": { "$exists": true } } // Field Exists + +/* --- Comparison Operators (DEFAULT: $eq) --- */ +{ "": { "$gt": "" }} // > +{ "": { "$gte": "" }} // >= +{ "": { "$lt": "" }} // < +{ "": { "$lte": "" }} // <= +{ "": { "$eq": "" }} // == +{ "": { "$ne": "" }} // != + +/* --- Logic Operators (DEFAULT $and) --- */ +{ "$and": [ { "" }, ...] } +{ "$or": [ { "" }, ...] } +{ "$nor": [ { "" }, ...] } +{ "$not": { "" } } + +/* --- Array Operators --- */ +{ "": { "$all": ["value>", "", ...] } } // field contains all values +{ "": { "$size": "" } } +{ "": { "$elemMatch": { "": "" } } } // elements in array must match an expression + +/* --- REGEX Operator --- */ +{ "": { "$regex": "/pattern/", "$options": "" } } +{ "": { "$regex": "pattern", "$options": "" } } +{ "": { "$regex": "/pattern/" } } +{ "": "/pattern/" } +``` + +### Expressive Query Operator + +> **Note**: `$` is used to access the value of the field dynamically + +```json +{ "$expr": { "" } } // aggregation expression, variables, conditional expressions +{ "$expr": { "$": [ "$", "$" ] } } // compare field values (operators use aggregation syntax) +``` + +## Mongo Query Language (MQL) + +### Insertion + +It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`. + +Insertion results: + +- error -> rollback +- success -> entire documents gets saved + +```sh +# explicit collection creation, all options are optional +db.createCollection( , + { + capped: , + autoIndexId: , + size: , + max: , + storageEngine: , + validator: , + validationLevel: , + validationAction: , + indexOptionDefaults: , + viewOn: , + pipeline: , + collation: , + writeConcern: + } +) + +db.createCollection("name", { capped: true, size: max_bytes, max: max_docs_num } ) # creation of a capped collection +# SIZE: int - will be rounded to a multiple of 256 + +# implicit creation at doc insertion +db..insertOne({ document }, options) # insert a document in a collection +db..insertMany([ { document }, { document }, ... ], options) # insert multiple docs +db..insertMany([ { document }, { document } ] , { "ordered": false }) # allow the unordered insertion, only documents that cause errors wont be inserted +``` + +> **Note**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted. + +### Querying + +```sh +db..findOne() # find only one document +db..find(filter) # show selected documents +db..find().pretty() # show documents formatted +db..find().limit(n) # show n documents +db..find().limit(n).skip(k) # show n documents skipping k docs +db..find().count() # number of found docs +db..find().sort({ "": 1, ... , "": -1 }) # show documents sorted by specified keys in ascending (1) or descending (-1) order + +# projection +db..find(filter, { "": 1 }) # show selected values form documents (1 or true => show, 0 or false => don't show, cant mix 0 and 1) +db..find(filter, { _id: 0, "": 1 }) # only _id can be set to 0 with other keys at 1 +db..find(filter, { "": { "$elemMatch": { "": "" } } }) # project only elements matching the expression + +# sub documents & arrays +db..find({ "..": "" }) +db..find({ "..": "" }) + +# GeoJSON - https://docs.mongodb.com/manual/reference/operator/query/near/index.html +db..find( + { + : { + $near: { + $geometry: { type: "Point", coordinates: [ , ] }, + $maxDistance: , + $minDistance: + } + } + } +) + +db..find().hint( { "": 1 } ) # specify the index +db..find().hint( "index-name" ) # specify the index using the index name + +db..find().hint( { $natural : 1 } ) # force the query to perform a forwards collection scan +db..find().hint( { $natural : -1 } ) # force the query to perform a reverse collection scan +``` + +> **Note**: `{ : }` in case of a field array will match if the array _contains_ the value + +### Updating + +[Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation") + +```sh +db..replaceOne(filter, update, options) +db..updateOne(filter, update, {upsert: true}) # modify document if existing, insert otherwise + +db..updateOne(filter, { "$push": { ... }, "$set": { ... }, { "$inc": { ... }, ... } }) +``` + +### Deletion + +```sh +db..deleteOne(filter, options) +db..deleteMany(filter, options) + +db..drop() # delete whole collection +db.dropDatabase() # delete entire database +``` + +--- + +## MongoDB Database Tools + +### [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) + +Utility to import all docs into a specified collection. +If the collection already exists `--drop` deletes it before reuploading it. +**WARNING**: CSV separators must be commas (`,`) + +```sh +mongoimport + +--uri= +--host=<:port>, -h=<:port> +--username=, -u= +--password=, -p= +--collection=, -c= # Specifies the collection to import. +--ssl # Enables connection to a mongod or mongos that has TLS/SSL support enabled. +--type # Specifies the file type to import. DEFAULT: json +--drop # drops the collection before importing the data from the input. +--headerline # if file is CSV and first line is header +--jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB +``` + +### [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/) + +Utility to export documents into a specified file. + +```sh +mongoexport --collection= + +--uri= +--host=<:port>, -h=<:port> +--username=, -u= +--password=, -p= +--db=, -d= +--collection=, -c= +--type= +--out=, -o= #Specifies a file to write the export to. DEFAULT: stdout +--jsonArray # Write the entire contents of the export as a single json array. +--pretty # Outputs documents in a pretty-printed format JSON. +--skip= +--limit= # Specifies a maximum number of documents to include in the export +--sort= # Specifies an ordering for exported results +``` + +### [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs] + +`mongodump` exports the content of a running server into `.bson` files. + +`mongorestore` Restore backups generated with `mongodump` to a running server. + +[mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/ +[mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/ + +--- + +## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation") + +Indexes support the efficient execution of queries in MongoDB. + +Without indexes, MongoDB must perform a _collection scan_ (_COLLSCAN_): scan every document in a collection, to select those documents that match the query statement. +If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect (_IXSCAN_). + +Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index. + +Indexes _slow down writing operations_ since the index must be updated at every writing. + +![IXSCAN](../img/mongodb_ixscan.png ".find() using an index") + +### [Index Types](https://docs.mongodb.com/manual/indexes/#index-types) + +- **Normal**: Fields sorted by name +- **Compound**: Multiple Fields sorted by name +- **Multikey**: values of sorted arrays +- **Text**: Ordered text fragments +- **Geospatial**: ordered geodata + +**Sparse** indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. + +### Diagnosis and query planning + +```sh +db..find({...}).explain() # explain won't accept other functions +db.explain()..find({...}) # can accept other functions +db.explain("executionStats")..find({...}) # more info +``` + +### Index Creation + +```sh +db..createIndex( , ) + +db..createIndex( { "": , "": , ... } ) # normal, compound or multikey (field is array) index +db..createIndex( { "": "text" } ) # text index +db..createIndex( { "": 2dsphere } ) # geospatial 2dsphere index + +# sparse index +db..createIndex( + { "": , "": , ... }, + { sparse: true } # sparse option +) + +# custom name +db..createIndex( + { , }, + { name: "index-name" } # name option +) +``` + +### [Index Management](https://docs.mongodb.com/manual/tutorial/manage-indexes/) + +```sh +# view all db indexes +db.getCollectionNames().forEach(function(collection) { + indexes = db[collection].getIndexes(); + print("Indexes for " + collection + ":"); + printjson(indexes); +}); +db..getIndexes() # view collection's index + +db..dropIndexes() # drop all indexes +db..dropIndex( { "index-name": 1 } ) # drop a specific index +``` + +--- + +## Cluster Administration + +### `mongod` + +`mongod` is the main deamon process for MongoDB. It's the core process of the database, +handling connections, requests and persisting the data. + +`mongod` default configuration: + +- port: `27017` +- dbpath: `/data/db` +- bind_ip: `localhost` +- auth: disabled + +[`mongod` config file][mongod_config_file] +[`mongod` command line options][mongod_cli_options] + +[mongod_config_file]: https://www.mongodb.com/docs/manual/reference/configuration-options "`mongod` config file docs" +[mongod_cli_options]: https://www.mongodb.com/docs/manual/reference/program/mongod/#options "`mongod` command line options docs" + +### Basic Shell Helpers + +```sh +db.() # database interaction +db..() # collection interaction +rs.(); # replica set deployment and management +sh.(); # sharded cluster deployment and management + +# user management +db.createUser() +db.dropUser() + +# collection management +db.renameCollection() +db..createIndex() +db..drop() + +# database management +db.dropDatabase() +db.createCollection() + +# database status +db.serverStatus() + +# database command (underlying to shell helpers and drivers) +db.runCommand({ "" }) + +# help +db.commandHelp(") +``` + +### Logging + +The **process log** displays activity on the MongoDB instance and collects activities of various components: + +Log Verbosity Level: + +- `-1`: Inherit from parent +- `0`: Default Verbosity (Information) +- `1 - 5`: Increases the verbosity up to Debug messages + +```sh +db.getLogComponents() # get components and their verbosity +db.adminCommand({"getLog": ""}) # retrieve logs (getLog must be run on admin db -> adminCommand) +db.setLogLevel(, ""); # set log level (output is OLD verbosity levels) + +tail -f /path/to/mongod.log # read end og log file +``` + +> **Note**: Log Message Structure: ` ...` + +### Database Profiling + +Profiling Levels: + +- `0`: no profiling +- `1`: data on operations slower than `slowms` (default 100ms) +- `2`: data on all operations + +Events captured by the profiler: + +- CRUD operations +- Administrative operations +- Configuration operations + +> **Note**: Logs are saved in the `system.profile` _capped_ collection. + +```sh +db.setProfilingLevel(n) # set profiler level +db.setProfilingLevel(1, { slowms: }) +db.getProfilingStatus() # check profiler status + +db.system.profile.find().limit(n).sort( {} ).pretty() # see logs +db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp +``` + +### Authentication + +Client Authentication Mechanisms: + +- **SCRAM** (Default): Salted Challenge Response Authentication Mechanism +- **X.509**: `X.509` Certificate +- **LADP**: Lightweight Directory Access Protocol (Enterprise Only) +- **KERBEROS** (Enterprise Only) + +Cluster Authentication Mechanism: + +### Authorization: Role Based Access Control (RBAC) + +Each user has one or more **Roles**. Each role has one or more **Privileges**. +A privilege represents a group of _actions_ and the _resources_ those actions apply to. + +By default no user exists so the ONLY way to act is to connect locally to the server. +This is the "localhost exception" and it closes after the _first_ user is created. + +> **Warn**: Always create an admin user first (ideally with the `userAdmin` role) + +Role's Resources: + +- specific database and collection: `{ "db": "", "collection": "" }` +- all databases and collections: `{ "db": "", "collection": "" }` +- any databases and specific collection: `{ "db": "", "collection": "" }` +- specific database and any collection: `{ "db": "", "collection": "" }` +- cluster resource: `{ "cluster": true }` + +Role's Privileges: `{ resource: { }, actions: [ "" ] }` + +A role can _inherit_ from multiple others and can define **network restrictions** such as _Server Address_ and _Client Source_. + +Built-in Roles Groups and Names: + +- Database User: `read`, `readWrite`, `readAnyDatabase`, `readWriteAnyDatabase` +- Database Administration: `dbAdmin`, `userAdmin`, `dbOwner`, `dbAdminAnyDatabase`, `userAdminAnyDatabase` +- Cluster Administration: `clusterAdmin`, `clusterManager`, `clusterMonitor`, `hostManager` +- Backup/Restore: `backup`, `restore` +- Super User: `root` + +```sh +db.createUser( + { + user: "", + pwd: "", + roles: [ { role: "", db: "" } ] + } +) + +# add role to existing user +db.grantRolesToUser( "", [ { db: "", role: "" } ] ) + +# show role privilege +db.runCommand( { rolesInfo: { db: "", role: "" }, showPrivileges: true } ) +``` + +### [Replica set](https://docs.mongodb.com/manual/replication/) + +A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. + +### Sharding + +**Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB. +It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server. + +A collection containing several documents is splitted in more smaller collections (_shards_) +Shards are implemented via cluster that are none other a group of MongoDB instances. + +Shard components are: + +- Shards (min 2), instances of MongoDB that contain a subset of the data +- A config server, instance of MongoDB which contains metadata on the cluster, that is the set of instances that have the shard data. +- A router (or `mongos`), instance of MongoDB used to redirect the user instructions from the client to the correct server. + +![Shared Cluster](../img/mongodb_shared-cluster.png "Components of a shared cluster") + +--- + +## [Aggregation Framework](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/) + +Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)`. +Each step of the pipeline acts on its inputs and not on the original data in the collection. + +### Variables + +Variable syntax in aggregations: + +- `$key`: field path +- `$$UPPERCASE`: system variable (e.g.: `$$CURRENT`) +- `$$foo`: user defined variable + +### [`$match` Aggregation Stage][$match_docs] + +Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage. + +```sh +db..aggregate([ + { "$match": { "" } }, + + # key exists and is an array + { $match: { "": { $elemMatch: { $exists: true } } } } +}) +``` + +> **Note**: `$match` can contain the `$text` query operation but it **must** ber the _first_ in a pipeline +> **Note**: `$match` cannot contain use `$where` +> **Note**: `$match` uses the same syntax as `find()` + +[$match_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/ "$match operator docs" + +### [`$project` Aggregation Stage][$project_docs] + +Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields. + +`$project` Array Expression Operators: + +- [`$filter`][$filter_docs] +- [`$map`][$map_docs] +- [`$reduce`][$reduce_docs] + +`$project` Arithmetic Expression Operators: + +- [`$max`][$max_docs] +- [`$min`][$min_docs] +- [`$sum`][$sum_docs] +- [`$avg`][$avg_docs] + +```sh +db..aggregate([ + { + "$project": { + "_id": 0, # discard value + "": 1, # keep value + "": "$" # reassign or create field, + "": { "" } # calculate field value. + + # filter elements in an array + "": { + "$filter": { + "input": "$", + "as": "", + "cond": { "" } + } + }, + + # transform array items + "": { + "$map": { + "input": "$", + "as": "", + "in": { "" } + # $$ is the current item's value + } + } + + # apply expression to each element in an array and combine them + "": { + "$reduce": { + "input": "", + "initialValue": "", + "in": { "" } + # $$this is current document, $$value is current accumulated value + } + } + } + } +]) +``` + +[$project_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/ "$project operator docs" +[$filter_docs]: https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/filter/ "$filter operator docs" +[$map_docs]: https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/map/ "$map operator docs" +[$reduce_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/reduce/ "$reduce operator docs" + +[$sum_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/sum/ "$sum operator docs" +[$max_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/max/ "$max operator docs" +[$min_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/min/ "$min operator docs" +[$avg_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/avg/ "$avg operator docs" + +### [`$addFields` Aggregation Stage][$addFields_docs] + +Adds new fields to documents (can be result of computation). +`$addFields` outputs documents that contain _all existing fields_ from the input documents and newly added fields. + +```sh +db..aggregate({ + { $addFields: { : , ... } } +}) +``` + +[$addFields_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/addFields/ "$addFields operator docs" + +### [`$group` Aggregation Stage][$group_docs] + +The $`group` stage separates documents into groups according to a "group key". The output is one document for each unique group key. + +```sh +db..aggregate([ + { + "$group": { + "_id": "", # Group By Expression (Required) + "": { "": "" }, + ... + } + } +]) +``` + +[$group_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/group/ "$group operator docs" + +### [`$unwind` Aggregation Stage][$unwind_docs] + +Deconstructs an array field from the input documents to output a document for each element. +Each output document is the input document with the value of the array field replaced by the element + +```sh +db..aggregate([ + { "$unwind": "" } + + { + "$unwind": { + "path": "", # array to unwind + "includeArrayIndex": "", # name of index field + "preserveNullAndEmptyArrays": + } + } +], { "allowDiskUse": }) +``` + +[$unwind_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/unwind/ "$unwind operator docs" + +### [`$count` Aggregation Stage][$count_docs] + +```sh +db..aggregate([ + { "$count": "" } +]) +``` + +[$count_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/count/ "$count operator docs" + +### [`$sort` Aggregation Stage][$sort_docs] + +```sh +db..aggregate([ + { + "$sort": { + "": "", + "": "", + ... + } + } +], { "allowDiskUse": }) +``` + +> **Note**: can take advantage of indexes if early int the pipeline and before any `%project`, `$group` and `$unwind` +> **Note**: By default `$sort` will use up to 10 MB of RAM. Setting `allowDiskUse: true` will allow for larger sorts + +[$sort_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/ "$sort operator docs" + +### [`$skip` Aggregation Stage][$skip_docs] + +```sh +db..aggregate([ + { "$skip": "" } +]) +``` + +[$skip_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/skip/ "$skip operator docs" + +### [`$limit` Aggregation Stage][$limit_docs] + +```sh +db..aggregate([ + { "$limit": "" } +]) +``` + +[$limit_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/limit/ "$limit operator docs" + +### [`$lookup` Aggregation Stage][$lookup_docs] + +Performs a left outer join to a collection _in the same database_ to filter in documents from the "joined" collection for processing. +The `$lookup` stage adds a new array field to each input document. The new array field contains the matching documents from the "joined" collection. + +> **Note**: To combine elements from two different collections, use the [`$unionWith`][$unionWith_docs] pipeline stage. + +```sh +db..aggregate([ + { + "$lookup": { + "from": "", + "localField": "", + "foreignField": ".", + "as": "" + } + } +]) +``` + +[$lookup_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/ "$look operator docs" + +[$unionWith_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/unionWith/ "$unionWith operator docs" + +### [`$graphLookup` Aggregation Stage][$graph_lookup_docs] + +Performs a recursive search on a collection, with options for restricting the search by recursion depth and query filter. + +The connection between documents follows `.` => `.`. The collection on which the aggregation is performed and the `from` collection can be the same (in-collection search) or different (cross-collection search) + +```sh +db..aggregate([ + { + $graphLookup: { + from: , # starting collection of the search + startWith: , # initial value(s) of search + connectFromField: , # source of the connection + connectToField: , # destination of the connection + as: , # array of found documents + maxDepth: , # recursive search depth limit (steps inside from collection) + depthField: , # field containing distance from start + restrictSearchWithMatch: # filter on found documents + } + } +], { allowDiskUse: true }) +``` + +> **Note**: Having the `connectToField` indexed will improve search performance +> **Warn**: Can exceed the `100 Mb` memory limit even with `{ allowDiskUse: true }` + +[$graph_lookup_docs]: https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/graphLookup/ "$graphLookup operator docs" + +### [`$sortByCount` Aggregation Stage][$sort_by_count_docs] + +Groups incoming documents based on the value of a specified expression, then computes the count of documents in each distinct group. + +Each output document contains two fields: an `_id` field containing the distinct grouping value, and a `count` field containing the number of documents belonging to that grouping or category. + +The documents are sorted by count in descending order. + +```sh +db..aggregate([ + { $sortByCount: } +]) +``` + +[$sort_by_count_docs]: https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/sortByCount/ "$sortByCount operator docs"