Feat(mongodb): Add Aggregation Framework Notes Pt. 1 (#8)

* mongodb: `mongod` & logging notes

* mongodb: auth & users

* mongodb: Aggregation Framework Pt. 1
This commit is contained in:
Marcello 2022-11-08 16:18:16 +01:00
parent f3568b7b06
commit 7e06e9db01

View file

@ -6,7 +6,7 @@ The documents are _schema-less_ that is they have a dynamic structure that can c
## Data Types
| Tipo | Documento | Funzione |
| Type | Example Value | Function |
| ----------------- | ------------------------------------------------ | ----------------------- |
| Text | `"Text"` |
| Boolean | `true` |
@ -84,9 +84,9 @@ db.<collection>.insertOne({document}) # implicit collection creation
{ "$expr": { "$<comparison_operator>": [ "$<key>", "$<key>" ] } } // compare field values (operators use aggregation syntax)
```
## CRUD Operations
## Mongo Query Language (MQL)
### Create
### Insertion
It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()`.
@ -126,7 +126,7 @@ db.<collection>.insertMany([ { document }, { document } ] , { "ordered": false }
> **Note**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted.
### Read
### Querying
```sh
db.<collection>.findOne() # find only one document
@ -168,7 +168,7 @@ db.<collection>.find().hint( { $natural : -1 } ) # force the query to perform a
> **Note**: `{ <key>: <value> }` in case of a field array will match if the array _contains_ the value
### Update
### Updating
[Update Operators](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation")
@ -179,7 +179,7 @@ db.<collection>.updateOne(filter, update, {upsert: true}) # modify document if
db.<collection>.updateOne(filter, { "$push": { ... }, "$set": { ... }, { "$inc": { ... }, ... } })
```
### Delete
### Deletion
```sh
db.<collection>.deleteOne(filter, options)
@ -189,7 +189,11 @@ db.<collection>.drop() # delete whole collection
db.dropDatabase() # delete entire database
```
## [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/)
---
## MongoDB Database Tools
### [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/)
Utility to import all docs into a specified collection.
If the collection already exists `--drop` deletes it before reuploading it.
@ -210,7 +214,7 @@ mongoimport <options> <connection-string> <file>
--jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB
```
## [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/)
### [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/)
Utility to export documents into a specified file.
@ -232,7 +236,7 @@ mongoexport --collection=<collection> <options> <connection-string>
--sort=<JSON> # Specifies an ordering for exported results
```
## [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs]
### [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs]
`mongodump` exports the content of a running server into `.bson` files.
@ -241,47 +245,7 @@ mongoexport --collection=<collection> <options> <connection-string>
[mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/
[mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/
## Relations
**Nested / Embedded Documents**:
- Group data logically
- Optimal for data belonging together that do not overlap
- Should avoid nesting too deep or making too long arrays (max doc size 16 mb)
```json
{
"_id": "ObjectId()",
"<key>": "value",
"<key>": "value",
"innerDocument": {
"<key>": "value",
"<key>": "value"
}
}
```
**References**:
- Divide data between collections
- Optimal for related but shared data used in relations or stand-alone
- Allows to overtake nesting and size limits
NoSQL databases do not have relations and references. It's the app that has to handle them.
```json
{
"<key>": "value",
"references": ["id1", "id2"]
}
// referenced
{
"_id": "id1",
"<key>": "value"
}
```
---
## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation")
@ -351,15 +315,94 @@ db.<collection>.dropIndexes() # drop all indexes
db.<collection>.dropIndex( { "index-name": 1 } ) # drop a specific index
```
## Database Profiling
---
## Cluster Administration
### `mongod`
`mongod` is the main deamon process for MongoDB. It's the core process of the database,
handling connections, requests and persisting the data.
`mongod` default configuration:
- port: `27017`
- dbpath: `/data/db`
- bind_ip: `localhost`
- auth: disabled
[`mongod` config file][mongod_config_file]
[`mongod` command line options][mongod_cli_options]
[mongod_config_file]: https://www.mongodb.com/docs/manual/reference/configuration-options "`mongod` config file docs"
[mongod_cli_options]: https://www.mongodb.com/docs/manual/reference/program/mongod/#options "`mongod` command line options docs"
### Basic Shell Helpers
```sh
db.<method>() # database interaction
db.<collection>.<method>() # collection interaction
rs.<method>(); # replica set deployment and management
sh.<method>(); # sharded cluster deployment and management
# user management
db.createUser()
db.dropUser()
# collection management
db.renameCollection()
db.<collection>.createIndex()
db.<collection>.drop()
# database management
db.dropDatabase()
db.createCollection()
# database status
db.serverStatus()
# database command (underlying to shell helpers and drivers)
db.runCommand({ "<COMMAND>" })
# help
db.commandHelp("<command>)
```
### Logging
The **process log** displays activity on the MongoDB instance and collects activities of various components:
Log Verbosity Level:
- `-1`: Inherit from parent
- `0`: Default Verbosity (Information)
- `1 - 5`: Increases the verbosity up to Debug messages
```sh
db.getLogComponents() # get components and their verbosity
db.adminCommand({"getLog": "<scope>"}) # retrieve logs (getLog must be run on admin db -> adminCommand)
db.setLogLevel(<level>, "<component>"); # set log level (output is OLD verbosity levels)
tail -f /path/to/mongod.log # read end og log file
```
> **Note**: Log Message Structure: `<timestamp> <severity-level> <component> <connection> <event> ...`
### Database Profiling
Profiling Levels:
- `0`: no profiling
- `1`: data on operations slower than `slowms`
- `1`: data on operations slower than `slowms` (default 100ms)
- `2`: data on all operations
Logs are saved in the `system.profile` _capped_ collection.
Events captured by the profiler:
- CRUD operations
- Administrative operations
- Configuration operations
> **Note**: Logs are saved in the `system.profile` _capped_ collection.
```sh
db.setProfilingLevel(n) # set profiler level
@ -370,43 +413,68 @@ db.system.profile.find().limit(n).sort( {} ).pretty() # see logs
db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp
```
## Roles and permissions
### Authentication
**Authentication**: identifies valid users
**Authorization**: identifies what a user can do
Client Authentication Mechanisms:
- **userAdminAnyDatabase**: can admin every db in the instance (role must be created on admin db)
- **userAdmin**: can admin the specific db in which is created
- **readWrite**: can read and write in the specific db in which is created
- **read**: can read the specific db in which is created
- **SCRAM** (Default): Salted Challenge Response Authentication Mechanism
- **X.509**: `X.509` Certificate
- **LADP**: Lightweight Directory Access Protocol (Enterprise Only)
- **KERBEROS** (Enterprise Only)
Cluster Authentication Mechanism:
### Authorization: Role Based Access Control (RBAC)
Each user has one or more **Roles**. Each role has one or more **Privileges**.
A privilege represents a group of _actions_ and the _resources_ those actions apply to.
By default no user exists so the ONLY way to act is to connect locally to the server.
This is the "localhost exception" and it closes after the _first_ user is created.
> **Warn**: Always create an admin user first (ideally with the `userAdmin` role)
Role's Resources:
- specific database and collection: `{ "db": "<database>", "collection": "<collection>" }`
- all databases and collections: `{ "db": "", "collection": "" }`
- any databases and specific collection: `{ "db": "", "collection": "<collections>" }`
- specific database and any collection: `{ "db": "<database>", "collection": "" }`
- cluster resource: `{ "cluster": true }`
Role's Privileges: `{ resource: { <resource> }, actions: [ "<action>" ] }`
A role can _inherit_ from multiple others and can define **network restrictions** such as _Server Address_ and _Client Source_.
Built-in Roles Groups and Names:
- Database User: `read`, `readWrite`, `readAnyDatabase`, `readWriteAnyDatabase`
- Database Administration: `dbAdmin`, `userAdmin`, `dbOwner`, `dbAdminAnyDatabase`, `userAdminAnyDatabase`
- Cluster Administration: `clusterAdmin`, `clusterManager`, `clusterMonitor`, `hostManager`
- Backup/Restore: `backup`, `restore`
- Super User: `root`
```sh
# create users in the current MongoDB instance
db.createUser(
{
user: "dbAdmin",
pwd: "password",
roles:[
{
role: "userAdminAnyDatabase",
db:"admin"
}
]
},
{
user: "username",
pwd: "password",
roles:[
{
role: "role",
db: "database"
}
]
user: "<username>",
pwd: "<password>",
roles: [ { role: "<role>", db: "<database>" } ]
}
)
# add role to existing user
db.grantRolesToUser( "<user>", [ { db: "<database>", role: "<role>" } ] )
# show role privilege
db.runCommand( { rolesInfo: { db: "<database>", role: "<role>" }, showPrivileges: true } )
```
## Sharding
### [Replica set](https://docs.mongodb.com/manual/replication/)
A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments.
### Sharding
**Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB.
It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server.
@ -422,46 +490,274 @@ Shard components are:
![Shared Cluster](../img/mongodb_shared-cluster.png "Components of a shared cluster")
### [Replica set](https://docs.mongodb.com/manual/replication/)
A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset`. Replica sets provide redundancy and high availability, and are the basis for all production deployments.
---
## [Aggregation Framework](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/)
Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)`.
Each step of the pipeline acts on its inputs and not on the original data in the collection.
### Variables
Variable syntax in aggregations:
- `$key`: field path
- `$$UPPERCASE`: system variable (e.g.: `$$CURRENT`)
- `$$foo`: user defined variable
### [`$match` Aggregation Stage][$match_docs]
Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
```sh
db.<collection>.aggregate([
{ "$project": { "_id": 0, "<key>": 1, ...} },
{ "$match": { "<query>" } },
{ "$group": {
# key exists and is an array
{ $match: { "<array-key>": { $elemMatch: { $exists: true } } } }
})
```
> **Note**: `$match` can contain the `$text` query operation but it **must** ber the _first_ in a pipeline
> **Note**: `$match` cannot contain use `$where`
> **Note**: `$match` uses the same syntax as `find()`
[$match_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/ "$match operator docs"
### [`$project` Aggregation Stage][$project_docs]
Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
`$project` Array Expression Operators:
- [`$filter`][$filter_docs]
- [`$map`][$map_docs]
- [`$reduce`][$reduce_docs]
`$project` Arithmetic Expression Operators:
- [`$max`][$max_docs]
- [`$min`][$min_docs]
- [`$sum`][$sum_docs]
- [`$avg`][$avg_docs]
```sh
db.<collection>.aggregate([
{
"$project": {
"_id": 0, # discard value
"<key>": 1, # keep value
"<key>": "$<other-key>" # reassign or create field,
"<key>": { "<expression>" } # calculate field value.
# filter elements in an array
"<key>": {
"$filter": {
"input": "$<array-key>",
"as": "<name-of-item>",
"cond": { "<bool-expression>" }
}
},
# transform array items
"<key>": {
"$map": {
"input": "$<array-key>",
"as": "<item>",
"in": { "<expression>" }
# $$<item> is the current item's value
}
}
# apply expression to each element in an array and combine them
"<key>": {
"$reduce": {
"input": "<array-key>",
"initialValue": "<value>",
"in": { "<expression>" }
# $$this is current document, $$value is current accumulated value
}
}
}
}
])
```
[$project_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/ "$project operator docs"
[$filter_docs]: https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/filter/ "$filter operator docs"
[$map_docs]: https://www.mongodb.com/docs/v4.4/reference/operator/aggregation/map/ "$map operator docs"
[$reduce_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/reduce/ "$reduce operator docs"
[$sum_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/sum/ "$sum operator docs"
[$max_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/max/ "$max operator docs"
[$min_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/min/ "$min operator docs"
[$avg_docs]: https://www.mongodb.com/docs/v5.0/reference/operator/aggregation/avg/ "$avg operator docs"
### [`$addFields` Aggregation Stage][$addFields_docs]
Adds new fields to documents (can be result of computation).
`$addFields` outputs documents that contain _all existing fields_ from the input documents and newly added fields.
```sh
db.<collection>.aggregate({
{ $addFields: { <newField>: <expression>, ... } }
})
```
[$addFields_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/addFields/ "$addFields operator docs"
### [`$group` Aggregation Stage][$group_docs]
The $`group` stage separates documents into groups according to a "group key". The output is one document for each unique group key.
```sh
db.<collection>.aggregate([
{
"$group": {
"_id": "<expression>", # Group By Expression (Required)
"<key-1>": { "<accumulator-1>": "<expression-1>" },
...
}
},
{
"$lookup": {
"from": "<collection to join>",
"localField": "<field from the input documents>",
"foreignField": "<field from the documents of the 'from' collection>",
"as": "<output array field>"
}
},
{ "$sort": { "<key-1>": "<sort order>", "<key-2>": "<sort order>", ... } },
{ "$count": "<count-key>" },
{ "$skip": "<positive 64-bit integer>" }
{ "$limit": "<positive 64-bit integer>" }
{ ... }
}
])
```
[$group_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/group/ "$group operator docs"
### [`$unwind` Aggregation Stage][$unwind_docs]
Deconstructs an array field from the input documents to output a document for each element.
Each output document is the input document with the value of the array field replaced by the element
```sh
db.<collection>.aggregate([
{ "$unwind": "<array-key>" }
{
"$unwind": {
"path": "<array-key>", # array to unwind
"includeArrayIndex": "<string>", # name of index field
"preserveNullAndEmptyArrays": <bool>
}
}
], { "allowDiskUse": <bool> })
```
[$unwind_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/unwind/ "$unwind operator docs"
### [`$count` Aggregation Stage][$count_docs]
```sh
db.<collection>.aggregate([
{ "$count": "<count-key>" }
])
```
[$count_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/count/ "$count operator docs"
### [`$sort` Aggregation Stage][$sort_docs]
```sh
db.<collection>.aggregate([
{
"$sort": {
"<key-1>": "<sort order>",
"<key-2>": "<sort order>",
...
}
}
], { "allowDiskUse": <bool> })
```
> **Note**: can take advantage of indexes if early int the pipeline and before any `%project`, `$group` and `$unwind`
> **Note**: By default `$sort` will use up to 10 MB of RAM. Setting `allowDiskUse: true` will allow for larger sorts
[$sort_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/ "$sort operator docs"
### [`$skip` Aggregation Stage][$skip_docs]
```sh
db.<collection>.aggregate([
{ "$skip": "<positive 64-bit integer>" }
])
```
[$skip_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/skip/ "$skip operator docs"
### [`$limit` Aggregation Stage][$limit_docs]
```sh
db.<collection>.aggregate([
{ "$limit": "<positive 64-bit integer>" }
])
```
[$limit_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/limit/ "$limit operator docs"
### [`$lookup` Aggregation Stage][$lookup_docs]
Performs a left outer join to a collection _in the same database_ to filter in documents from the "joined" collection for processing.
The `$lookup` stage adds a new array field to each input document. The new array field contains the matching documents from the "joined" collection.
> **Note**: To combine elements from two different collections, use the [`$unionWith`][$unionWith_docs] pipeline stage.
```sh
db.<collection>.aggregate([
{
"$lookup": {
"from": "<foreign-collection>",
"localField": "<key>",
"foreignField": "<foreign-collection>.<key>",
"as": "<output array field>"
}
}
])
```
[$lookup_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/ "$look operator docs"
[$unionWith_docs]: https://www.mongodb.com/docs/manual/reference/operator/aggregation/unionWith/ "$unionWith operator docs"
### [`$graphLookup` Aggregation Stage][$graph_lookup_docs]
Performs a recursive search on a collection, with options for restricting the search by recursion depth and query filter.
The connection between documents follows `<from-collection>.<connectFromField>` => `<aggregated-collection>.<connectToField>`. The collection on which the aggregation is performed and the `from` collection can be the same (in-collection search) or different (cross-collection search)
```sh
db.<collection>.aggregate([
{
$graphLookup: {
from: <collection>, # starting collection of the search
startWith: <expression>, # initial value(s) of search
connectFromField: <string>, # source of the connection
connectToField: <string>, # destination of the connection
as: <string>, # array of found documents
maxDepth: <number>, # recursive search depth limit (steps inside from collection)
depthField: <string>, # field containing distance from start
restrictSearchWithMatch: <document> # filter on found documents
}
}
], { allowDiskUse: true })
```
> **Note**: Having the `connectToField` indexed will improve search performance
> **Warn**: Can exceed the `100 Mb` memory limit even with `{ allowDiskUse: true }`
[$graph_lookup_docs]: https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/graphLookup/ "$graphLookup operator docs"
### [`$sortByCount` Aggregation Stage][$sort_by_count_docs]
Groups incoming documents based on the value of a specified expression, then computes the count of documents in each distinct group.
Each output document contains two fields: an `_id` field containing the distinct grouping value, and a `count` field containing the number of documents belonging to that grouping or category.
The documents are sorted by count in descending order.
```sh
db.<collection>.aggregate([
{ $sortByCount: <expression> }
])
```
[$sort_by_count_docs]: https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/sortByCount/ "$sortByCount operator docs"