diff --git a/docs/database/mongo-db.md b/docs/database/mongo-db.md index 28552fd..db214fc 100644 --- a/docs/database/mongo-db.md +++ b/docs/database/mongo-db.md @@ -6,7 +6,7 @@ The documents are _schema-less_ that is they have a dynamic structure that can c ## Data Types -| Tipo | Documento | Funzione | +| Type | Example Value | Function | | ----------------- | ------------------------------------------------ | ----------------------- | | Text | `"Text"` | | Boolean | `true` | @@ -84,9 +84,9 @@ db..insertOne({document}) # implicit collection creation { "$expr": { "$": [ "$", "$" ] } } // compare field values (operators use aggregation syntax) ``` -## CRUD Operations +## Mongo Query Language (MQL) -### Create +### Insertion It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`. @@ -126,7 +126,7 @@ db..insertMany([ { document }, { document } ] , { "ordered": false } > **Note**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted. -### Read +### Querying ```sh db..findOne() # find only one document @@ -168,7 +168,7 @@ db..find().hint( { $natural : -1 } ) # force the query to perform a > **Note**: `{ : }` in case of a field array will match if the array _contains_ the value -### Update +### Updating [Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation") @@ -179,7 +179,7 @@ db..updateOne(filter, update, {upsert: true}) # modify document if db..updateOne(filter, { "$push": { ... }, "$set": { ... }, { "$inc": { ... }, ... } }) ``` -### Delete +### Deletion ```sh db..deleteOne(filter, options) @@ -189,7 +189,11 @@ db..drop() # delete whole collection db.dropDatabase() # delete entire database ``` -## [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) +--- + +## MongoDB Database Tools + +### [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) Utility to import all docs into a specified collection. If the collection already exists `--drop` deletes it before reuploading it. @@ -210,7 +214,7 @@ mongoimport --jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB ``` -## [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/) +### [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/) Utility to export documents into a specified file. @@ -232,7 +236,7 @@ mongoexport --collection= --sort= # Specifies an ordering for exported results ``` -## [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs] +### [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs] `mongodump` exports the content of a running server into `.bson` files. @@ -241,47 +245,7 @@ mongoexport --collection= [mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/ [mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/ -## Relations - -**Nested / Embedded Documents**: - -- Group data logically -- Optimal for data belonging together that do not overlap -- Should avoid nesting too deep or making too long arrays (max doc size 16 mb) - -```json -{ - "_id": "ObjectId()", - "": "value", - "": "value", - - "innerDocument": { - "": "value", - "": "value" - } -} -``` - -**References**: - -- Divide data between collections -- Optimal for related but shared data used in relations or stand-alone -- Allows to overtake nesting and size limits - -NoSQL databases do not have relations and references. It's the app that has to handle them. - -```json -{ - "": "value", - "references": ["id1", "id2"] -} - -// referenced -{ - "_id": "id1", - "": "value" -} -``` +--- ## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation") @@ -351,15 +315,94 @@ db..dropIndexes() # drop all indexes db..dropIndex( { "index-name": 1 } ) # drop a specific index ``` -## Database Profiling +--- + +## Cluster Administration + +### `mongod` + +`mongod` is the main deamon process for MongoDB. It's the core process of the database, +handling connections, requests and persisting the data. + +`mongod` default configuration: + +- port: `27017` +- dbpath: `/data/db` +- bind_ip: `localhost` +- auth: disabled + +[`mongod` config file][mongod_config_file] +[`mongod` command line options][mongod_cli_options] + +[mongod_config_file]: https://www.mongodb.com/docs/manual/reference/configuration-options "`mongod` config file docs" +[mongod_cli_options]: https://www.mongodb.com/docs/manual/reference/program/mongod/#options "`mongod` command line options docs" + +### Basic Shell Helpers + +```sh +db.() # database interaction +db..() # collection interaction +rs.(); # replica set deployment and management +sh.(); # sharded cluster deployment and management + +# user management +db.createUser() +db.dropUser() + +# collection management +db.renameCollection() +db..createIndex() +db..drop() + +# database management +db.dropDatabase() +db.createCollection() + +# database status +db.serverStatus() + +# database command (underlying to shell helpers and drivers) +db.runCommand({ "" }) + +# help +db.commandHelp(") +``` + +### Logging + +The **process log** displays activity on the MongoDB instance and collects activities of various components: + +Log Verbosity Level: + +- `-1`: Inherit from parent +- `0`: Default Verbosity (Information) +- `1 - 5`: Increases the verbosity up to Debug messages + +```sh +db.getLogComponents() # get components and their verbosity +db.adminCommand({"getLog": ""}) # retrieve logs (getLog must be run on admin db -> adminCommand) +db.setLogLevel(, ""); # set log level (output is OLD verbosity levels) + +tail -f /path/to/mongod.log # read end og log file +``` + +> **Note**: Log Message Structure: ` ...` + +### Database Profiling Profiling Levels: - `0`: no profiling -- `1`: data on operations slower than `slowms` +- `1`: data on operations slower than `slowms` (default 100ms) - `2`: data on all operations -Logs are saved in the `system.profile` _capped_ collection. +Events captured by the profiler: + +- CRUD operations +- Administrative operations +- Configuration operations + +> **Note**: Logs are saved in the `system.profile` _capped_ collection. ```sh db.setProfilingLevel(n) # set profiler level @@ -370,43 +413,68 @@ db.system.profile.find().limit(n).sort( {} ).pretty() # see logs db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp ``` -## Roles and permissions +### Authentication -**Authentication**: identifies valid users -**Authorization**: identifies what a user can do +Client Authentication Mechanisms: -- **userAdminAnyDatabase**: can admin every db in the instance (role must be created on admin db) -- **userAdmin**: can admin the specific db in which is created -- **readWrite**: can read and write in the specific db in which is created -- **read**: can read the specific db in which is created +- **SCRAM** (Default): Salted Challenge Response Authentication Mechanism +- **X.509**: `X.509` Certificate +- **LADP**: Lightweight Directory Access Protocol (Enterprise Only) +- **KERBEROS** (Enterprise Only) + +Cluster Authentication Mechanism: + +### Authorization: Role Based Access Control (RBAC) + +Each user has one or more **Roles**. Each role has one or more **Privileges**. +A privilege represents a group of _actions_ and the _resources_ those actions apply to. + +By default no user exists so the ONLY way to act is to connect locally to the server. +This is the "localhost exception" and it closes after the _first_ user is created. + +> **Warn**: Always create an admin user first (ideally with the `userAdmin` role) + +Role's Resources: + +- specific database and collection: `{ "db": "", "collection": "" }` +- all databases and collections: `{ "db": "", "collection": "" }` +- any databases and specific collection: `{ "db": "", "collection": "" }` +- specific database and any collection: `{ "db": "", "collection": "" }` +- cluster resource: `{ "cluster": true }` + +Role's Privileges: `{ resource: { }, actions: [ "" ] }` + +A role can _inherit_ from multiple others and can define **network restrictions** such as _Server Address_ and _Client Source_. + +Built-in Roles Groups and Names: + +- Database User: `read`, `readWrite`, `readAnyDatabase`, `readWriteAnyDatabase` +- Database Administration: `dbAdmin`, `userAdmin`, `dbOwner`, `dbAdminAnyDatabase`, `userAdminAnyDatabase` +- Cluster Administration: `clusterAdmin`, `clusterManager`, `clusterMonitor`, `hostManager` +- Backup/Restore: `backup`, `restore` +- Super User: `root` ```sh -# create users in the current MongoDB instance db.createUser( { - user: "dbAdmin", - pwd: "password", - roles:[ - { - role: "userAdminAnyDatabase", - db:"admin" - } - ] - }, - { - user: "username", - pwd: "password", - roles:[ - { - role: "role", - db: "database" - } - ] + user: "", + pwd: "", + roles: [ { role: "", db: "" } ] } ) + +# add role to existing user +db.grantRolesToUser( "", [ { db: "", role: "" } ] ) + +# show role privilege +db.runCommand( { rolesInfo: { db: "", role: "" }, showPrivileges: true } ) ``` -## Sharding +### [Replica set](https://docs.mongodb.com/manual/replication/) + +A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. + +### Sharding **Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB. It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server. @@ -422,46 +490,274 @@ Shard components are: ![Shared Cluster](../img/mongodb_shared-cluster.png "Components of a shared cluster") -### [Replica set](https://docs.mongodb.com/manual/replication/) - -A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments. +--- ## [Aggregation Framework](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/) Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)`. Each step of the pipeline acts on its inputs and not on the original data in the collection. +### Variables + +Variable syntax in aggregations: + +- `$key`: field path +- `$$UPPERCASE`: system variable (e.g.: `$$CURRENT`) +- `$$foo`: user defined variable + +### [`$match` Aggregation Stage][$match_docs] + +Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage. + ```sh - db..aggregate([ - { "$project": { "_id": 0, "": 1, ...} }, - { "$match": { "" } }, - { "$group": { + # key exists and is an array + { $match: { "": { $elemMatch: { $exists: true } } } } +}) +``` + +> **Note**: `$match` can contain the `$text` query operation but it **must** ber the _first_ in a pipeline +> **Note**: `$match` cannot contain use `$where` +> **Note**: `$match` uses the same syntax as `find()` + +[$match_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/ "$match operator docs" + +### [`$project` Aggregation Stage][$project_docs] + +Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields. + +`$project` Array Expression Operators: + +- [`$filter`][$filter_docs] +- [`$map`][$map_docs] +- [`$reduce`][$reduce_docs] + +`$project` Arithmetic Expression Operators: + +- [`$max`][$max_docs] +- [`$min`][$min_docs] +- [`$sum`][$sum_docs] +- [`$avg`][$avg_docs] + +```sh +db..aggregate([ + { + "$project": { + "_id": 0, # discard value + "": 1, # keep value + "": "$" # reassign or create field, + "": { "" } # calculate field value. + + # filter elements in an array + "": { + "$filter": { + "input": "$", + "as": "", + "cond": { "" } + } + }, + + # transform array items + "": { + "$map": { + "input": "$", + "as": "", + "in": { "" } + # $$ is the current item's value + } + } + + # apply expression to each element in an array and combine them + "": { + "$reduce": { + "input": "", + "initialValue": "", + "in": { "" } + # $$this is current document, $$value is current accumulated value + } + } + } + } +]) +``` + +[$project_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/ "$project operator docs" +[$filter_docs]: https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/filter/ "$filter operator docs" +[$map_docs]: https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/map/ "$map operator docs" +[$reduce_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/reduce/ "$reduce operator docs" + +[$sum_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/sum/ "$sum operator docs" +[$max_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/max/ "$max operator docs" +[$min_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/min/ "$min operator docs" +[$avg_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/avg/ "$avg operator docs" + +### [`$addFields` Aggregation Stage][$addFields_docs] + +Adds new fields to documents (can be result of computation). +`$addFields` outputs documents that contain _all existing fields_ from the input documents and newly added fields. + +```sh +db..aggregate({ + { $addFields: { : , ... } } +}) +``` + +[$addFields_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/addFields/ "$addFields operator docs" + +### [`$group` Aggregation Stage][$group_docs] + +The $`group` stage separates documents into groups according to a "group key". The output is one document for each unique group key. + +```sh +db..aggregate([ + { + "$group": { "_id": "", # Group By Expression (Required) "": { "": "" }, ... } - }, - - { - "$lookup": { - "from": "", - "localField": "", - "foreignField": "", - "as": "" - } - }, - - { "$sort": { "": "", "": "", ... } }, - - { "$count": "" }, - - { "$skip": "" } - - { "$limit": "" } - - { ... } + } ]) ``` + +[$group_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/group/ "$group operator docs" + +### [`$unwind` Aggregation Stage][$unwind_docs] + +Deconstructs an array field from the input documents to output a document for each element. +Each output document is the input document with the value of the array field replaced by the element + +```sh +db..aggregate([ + { "$unwind": "" } + + { + "$unwind": { + "path": "", # array to unwind + "includeArrayIndex": "", # name of index field + "preserveNullAndEmptyArrays": + } + } +], { "allowDiskUse": }) +``` + +[$unwind_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/unwind/ "$unwind operator docs" + +### [`$count` Aggregation Stage][$count_docs] + +```sh +db..aggregate([ + { "$count": "" } +]) +``` + +[$count_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/count/ "$count operator docs" + +### [`$sort` Aggregation Stage][$sort_docs] + +```sh +db..aggregate([ + { + "$sort": { + "": "", + "": "", + ... + } + } +], { "allowDiskUse": }) +``` + +> **Note**: can take advantage of indexes if early int the pipeline and before any `%project`, `$group` and `$unwind` +> **Note**: By default `$sort` will use up to 10 MB of RAM. Setting `allowDiskUse: true` will allow for larger sorts + +[$sort_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/ "$sort operator docs" + +### [`$skip` Aggregation Stage][$skip_docs] + +```sh +db..aggregate([ + { "$skip": "" } +]) +``` + +[$skip_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/skip/ "$skip operator docs" + +### [`$limit` Aggregation Stage][$limit_docs] + +```sh +db..aggregate([ + { "$limit": "" } +]) +``` + +[$limit_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/limit/ "$limit operator docs" + +### [`$lookup` Aggregation Stage][$lookup_docs] + +Performs a left outer join to a collection _in the same database_ to filter in documents from the "joined" collection for processing. +The `$lookup` stage adds a new array field to each input document. The new array field contains the matching documents from the "joined" collection. + +> **Note**: To combine elements from two different collections, use the [`$unionWith`][$unionWith_docs] pipeline stage. + +```sh +db..aggregate([ + { + "$lookup": { + "from": "", + "localField": "", + "foreignField": ".", + "as": "" + } + } +]) +``` + +[$lookup_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/ "$look operator docs" + +[$unionWith_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/unionWith/ "$unionWith operator docs" + +### [`$graphLookup` Aggregation Stage][$graph_lookup_docs] + +Performs a recursive search on a collection, with options for restricting the search by recursion depth and query filter. + +The connection between documents follows `.` => `.`. The collection on which the aggregation is performed and the `from` collection can be the same (in-collection search) or different (cross-collection search) + +```sh +db..aggregate([ + { + $graphLookup: { + from: , # starting collection of the search + startWith: , # initial value(s) of search + connectFromField: , # source of the connection + connectToField: , # destination of the connection + as: , # array of found documents + maxDepth: , # recursive search depth limit (steps inside from collection) + depthField: , # field containing distance from start + restrictSearchWithMatch: # filter on found documents + } + } +], { allowDiskUse: true }) +``` + +> **Note**: Having the `connectToField` indexed will improve search performance +> **Warn**: Can exceed the `100 Mb` memory limit even with `{ allowDiskUse: true }` + +[$graph_lookup_docs]: https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/graphLookup/ "$graphLookup operator docs" + +### [`$sortByCount` Aggregation Stage][$sort_by_count_docs] + +Groups incoming documents based on the value of a specified expression, then computes the count of documents in each distinct group. + +Each output document contains two fields: an `_id` field containing the distinct grouping value, and a `count` field containing the number of documents belonging to that grouping or category. + +The documents are sorted by count in descending order. + +```sh +db..aggregate([ + { $sortByCount: } +]) +``` + +[$sort_by_count_docs]: https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/sortByCount/ "$sortByCount operator docs"