2021-02-16 18:32:24 +01:00
# MongoDB
2021-01-31 11:05:37 +01:00
2021-02-08 21:19:33 +01:00
The database is a container of **collections** . The collections are containers of **documents** .
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
The documents are _schema-less_ that is they have a dynamic structure that can change between documents in the same collection.
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
## Data Types
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
| Tipo | Documento | Funzione |
| ----------------- | ------------------------------------------------ | ----------------------- |
| Text | `"Text"` |
| Boolean | `true` |
| Number | `42` |
| Objectid | `"_id": {"$oid": "<id>"}` | `ObjectId("<id>")` |
| ISODate | `"<key>": {"$date": "YYYY-MM-DDThh:mm:ss.sssZ"}` | `ISODate("YYYY-MM-DD")` |
| Timestamp | | `Timestamp(11421532)` |
| Embedded Document | `{"a": {...}}` |
| Embedded Array | `{"b": [...]}` |
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
It's mandatory for each document ot have an unique field `_id` .
2021-02-08 21:19:33 +01:00
MongoDB automatically creates an `ObjectId()` if it's not provided.
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
## Databases & Collections Usage
2021-01-31 11:05:37 +01:00
2021-02-08 21:19:33 +01:00
To create a database is sufficient to switch towards a non existing one with `use <database>` (implicit creation).
The database is not actually created until a document is inserted.
2021-01-31 11:05:37 +01:00
```sh
show dbs # list all databases
use < database > # use a particular database
show collections # list all collection for the current database
dbs.dropDatabase() # delete current database
2021-02-08 21:19:33 +01:00
db.createCollection(name, {options}) # explicit collection creation
db.< collection > .insertOne({document}) # implicit collection creation
2021-01-31 11:05:37 +01:00
```
2021-02-16 18:32:24 +01:00
## Operators
```json
/* --- Update operators --- */
{ "$inc": { "< key > ": < increment > , ... } } // Increment value
{ "$set": { "< key > ": "< value > ", ... } } // Set value
{ "$push": { "< key > ": "< value > ", ... } } // add a value to an array field
/* --- Query Operators --- */
{ "< key > ": { "$in": [ "< value_1 > ", "< value_2 > ", ...] } } // Membership
{ "< key > ": { "$nin": [ "< value_1 > ", "< value_2 > ", ...] } } // Membership
{ "< key > ": { "$exists": true } } // Field Exists
/* --- Comparison Operators (DEFAULT: $eq) --- */
{ "< key > ": { "$gt": "< value > " }} // >
{ "< key > ": { "$gte": "< value > " }} // >=
{ "< key > ": { "$lt": "< value > " }} // <
{ "< key > ": { "$lte": "< value > " }} // < =
{ "< key > ": { "$eq": "< value > " }} // ==
{ "< key > ": { "$ne": "< value > " }} // !=
/* --- Logic Operators (DEFAULT $and) --- */
{ "$and": [ { < statement > }, ...] }
{ "$or": [ { < statement > }, ...] }
{ "$nor": [ { < statement > }, ...] }
{ "$not": { < statement > } }
```
### Expressive Query Operator
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
`$<key>` is used to access the value of the field dynamically
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
```json
2021-09-20 19:35:32 +02:00
{ "$expr": { < expression > } } // aggregation expression, variables, conditional expressions
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
{ "$expr": { "$comparison_operator": [ "$< key > ", "$< key > " ] } } // compare field values
```
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
## CRUD Operations
2021-01-31 11:05:37 +01:00
### Create
2021-02-08 21:19:33 +01:00
It's possible to insert a single document with the command `insertOne()` or multiple documents with `insertMany()` .
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
Insertion results:
2021-01-31 11:05:37 +01:00
2021-02-08 21:19:33 +01:00
- error -> rollback
- success -> entire documents gets saved
2021-01-31 11:05:37 +01:00
```sh
2021-09-20 19:35:32 +02:00
# explicit collection creation, all options are optional
2021-01-31 11:05:37 +01:00
db.createCollection( < name > ,
{
capped: < boolean > ,
autoIndexId: < boolean > ,
size: < number > ,
max: < number > ,
storageEngine: < document > ,
validator: < document > ,
validationLevel: < string > ,
validationAction: < string > ,
indexOptionDefaults: < document > ,
viewOn: < string > ,
pipeline: < pipeline > ,
collation: < document > ,
writeConcern: < document >
}
)
db.createCollection("name", { capped: true, size: max_bytes, max: max_docs_num } ) # creation of a capped collection
# SIZE: int - will be rounded to a multiple of 256
# implicit creation at doc insertion
db.< collection > .insertOne({ document }, options) # insert a document in a collection
db.< collection > .insertMany([ { document }, { document }, ... ], options) # insert multiple docs
2021-02-16 18:32:24 +01:00
db.< collection > .insertMany([ { document }, { document } ] , { "ordered": false }) # allow the unordered insertion, only documents that cause errors wont be inserted
2021-01-31 11:05:37 +01:00
```
2021-02-16 18:32:24 +01:00
**NOTE**: If `insertMany()` fails the already inserted documents are not rolled back but all the successive ones (even the correct ones) will not be inserted.
2021-01-31 11:05:37 +01:00
### Read
```sh
db.< collection > .findOne() # find only one document
db.< collection > .find(filter) # show selected documents
2021-09-20 19:35:32 +02:00
db.< collection > .find(filter, {"< key > ": 1}) # show selected values form documents (1 or true => show, 0 or false => don't show, cant mix 0 and 1)
2021-02-16 18:32:24 +01:00
db.< collection > .find(filter, {_id: 0, "< key > ": 1}) # only _id can be set to 0 with other keys at 1
2021-01-31 11:05:37 +01:00
db.< collection > .find().pretty() # show documents formatted
db.< collection > .find().limit(n) # show n documents
db.< collection > .find().limit(n).skip(k) # show n documents skipping k docs
db.< collection > .find().count() # number of found docs
db.< collection > .find().sort({key1: 1, ... , key_n: -1}) # show documents sorted by specified keys in ascending (1) or descending (-1) order
# GeoJSON - https://docs.mongodb.com/manual/reference/operator/query/near/index.html
db.< collection > .find(
{
< location field > : {
$near: {
$geometry: { type: "Point", coordinates: [ < longitude > , < latitude > ] },
$maxDistance: < distance in meters > ,
$minDistance: < distance in meters >
}
}
}
)
2021-02-16 18:32:24 +01:00
db.< collection > .find().hint( { "< key > ": 1 } ) # specify the index
2021-01-31 11:05:37 +01:00
db.< collection > .find().hint( "index-name" ) # specify the index using the index name
db.< collection > .find().hint( { $natural : 1 } ) # force the query to perform a forwards collection scan
db.< collection > .find().hint( { $natural : -1 } ) # force the query to perform a reverse collection scan
```
### Update
[Update Operators ](https://docs.mongodb.com/manual/reference/operator/update/ "Update Operators Documentation" )
```sh
2021-02-16 18:32:24 +01:00
db.< collection > .updateOne(filter, $set: {"< key > ": value}) # add or modify values
2021-09-20 19:35:32 +02:00
db.< collection > .updateOne(filter, $set: {"< key > ": value}, {upsert: true}) # add or modify values, if attribute doesn't exists create it
2021-01-31 11:05:37 +01:00
db.< collection > .updateMany(filter, update)
db.< collection > .replaceOne(filter, { document }, options)
```
### Delete
```sh
db.< collection > .deleteOne(filter, options)
db.< collection > .deleteMany(filter, options)
db.< collection > .drop() # delete whole collection
db.dropDatabase() # delete entire database
```
2021-02-16 14:10:18 +01:00
## [Mongoimport](https://docs.mongodb.com/database-tools/mongoimport/)
2021-01-31 11:05:37 +01:00
Utility to import all docs into a specified collection.
2021-09-20 19:35:32 +02:00
If the collection already exists `--drop` deletes it before reuploading it.
2021-01-31 11:05:37 +01:00
**WARNING**: CSV separators must be commas (`,` )
```sh
2021-02-16 14:10:18 +01:00
mongoimport < options > < connection-string > < file >
--uri=< connectionString >
--host=< hostname > < :port > , -h=< hostname > < :port >
--username=< username > , -u=< username >
--password=< password > , -p=< password >
--collection=< collection > , -c=< collection > # Specifies the collection to import.
--ssl # Enables connection to a mongod or mongos that has TLS/SSL support enabled.
--type < json | csv | tsv > # Specifies the file type to import. DEFAULT: json
--drop # drops the collection before importing the data from the input.
--headerline # if file is CSV and first line is header
--jsonarray # Accepts the import of data expressed with multiple MongoDB documents within a single json array. MAX 16 MB
2021-01-31 11:05:37 +01:00
```
2021-02-16 14:10:18 +01:00
## [Mongoexport](https://docs.mongodb.com/database-tools/mongoexport/)
2021-01-31 11:05:37 +01:00
Utility to export documents into a specified file.
```sh
2021-02-16 14:10:18 +01:00
mongoexport --collection=< collection > < options > < connection-string >
--uri=< connectionString >
--host=< hostname > < :port > , -h=< hostname > < :port >
--username=< username > , -u=< username >
--password=< password > , -p=< password >
--db=< database > , -d=< database >
--collection=< collection > , -c=< collection >
--type=< json | csv >
--out=< file > , -o=< file > #Specifies a file to write the export to. DEFAULT: stdout
--jsonArray # Write the entire contents of the export as a single json array.
--pretty # Outputs documents in a pretty-printed format JSON.
--skip=< number >
--limit=< number > # Specifies a maximum number of documents to include in the export
--sort=< JSON > # Specifies an ordering for exported results
2021-01-31 11:05:37 +01:00
```
2021-02-16 14:10:18 +01:00
## [Mongodump][mongodump_docs] & [Mongorestore][mongorestore_docs]
2021-01-31 11:05:37 +01:00
`mongodump` exports the content of a running server into `.bson` files.
`mongorestore` Restore backups generated with `mongodump` to a running server.
2021-02-16 14:10:18 +01:00
[mongodump_docs]: https://docs.mongodb.com/database-tools/mongodump/
[mongorestore_docs]: https://docs.mongodb.com/database-tools/mongorestore/
2021-01-31 11:05:37 +01:00
## Relations
**Nested / Embedded Documents**:
2021-09-20 19:35:32 +02:00
- Group data logically
2021-01-31 11:05:37 +01:00
- Optimal for data belonging together that do not overlap
- Should avoid nesting too deep or making too long arrays (max doc size 16 mb)
```json
{
2021-09-20 19:35:32 +02:00
"_id": Objectid()
2021-02-16 18:32:24 +01:00
"< key > ": "value"
"< key > ": "value"
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
"innerDocument": {
2021-02-16 18:32:24 +01:00
"< key > ": "value"
"< key > ": "value"
2021-01-31 11:05:37 +01:00
}
}
```
**References**:
- Divide data between collections
- Optimal for related but shared data used in relations or stand-alone
2021-09-20 19:35:32 +02:00
- Allows to overtake nesting and size limits
2021-01-31 11:05:37 +01:00
NoSQL databases do not have relations and references. It's the app that has to handle them.
```json
{
2021-02-16 18:32:24 +01:00
"< key > ": "value"
2021-09-20 19:35:32 +02:00
"references": ["id1", "id2"]
2021-01-31 11:05:37 +01:00
}
// referenced
{
2021-09-20 19:35:32 +02:00
"_id": "id1"
2021-02-16 18:32:24 +01:00
"< key > ": "value"
2021-01-31 11:05:37 +01:00
}
```
## [Indexes](https://docs.mongodb.com/manual/indexes/ "Index Documentation")
Indexes support the efficient execution of queries in MongoDB.
2021-02-16 18:32:24 +01:00
Without indexes, MongoDB must perform a _collection scan_ (_COLLSCAN_): scan every document in a collection, to select those documents that match the query statement.
If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect (_IXSCAN_).
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index.
2021-01-31 11:05:37 +01:00
2021-02-16 18:32:24 +01:00
Indexes _slow down writing operations_ since the index must be updated at every writing.
2021-01-31 11:05:37 +01:00
2021-07-01 20:37:27 +02:00
 using an index")
2021-01-31 11:05:37 +01:00
### [Index Types](https://docs.mongodb.com/manual/indexes/#index-types)
- **Normal**: Fields sorted by name
- **Compound**: Multiple Fields sorted by name
2021-09-20 19:35:32 +02:00
- **Multikey**: values of sorted arrays
2021-01-31 11:05:37 +01:00
- **Text**: Ordered text fragments
- **Geospatial**: ordered geodata
**Sparse** indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field.
2021-09-20 19:35:32 +02:00
### Diagnosis and query planning
2021-01-31 11:05:37 +01:00
```sh
db.< collection > .find({...}).explain() # explain won't accept other functions
db.explain().< collection > .find({...}) # can accept other functions
db.explain("executionStats").< collection > .find({...}) # more info
```
### Index Creation
```sh
db.< collection > .createIndex( < key and index type specification > , < options > )
2021-02-16 18:32:24 +01:00
db.< collection > .createIndex( { "< key > ": < type > , "< key > ": < type > , ... } ) # normal, compound or multikey (field is array) index
db.< collection > .createIndex( { "< key > ": "text" } ) # text index
db.< collection > .createIndex( { "< key > ": 2dsphere } ) # geospatial 2dsphere index
2021-01-31 11:05:37 +01:00
# sparse index
db.< collection > .createIndex(
2021-02-16 18:32:24 +01:00
{ "< key > ": < type > , "< key > ": < type > , ... },
2021-01-31 11:05:37 +01:00
{ sparse: true } # sparse option
)
# custom name
db.< collection > .createIndex(
{ < key and index type specification > , },
{ name: "index-name" } # name option
)
```
### [Index Management](https://docs.mongodb.com/manual/tutorial/manage-indexes/)
```sh
# view all db indexes
db.getCollectionNames().forEach(function(collection) {
indexes = db[collection].getIndexes();
print("Indexes for " + collection + ":");
printjson(indexes);
});
2021-07-12 16:18:53 +02:00
db.< collection > .getIndexes() # view collection's index
2021-01-31 11:05:37 +01:00
db.< collection > .dropIndexes() # drop all indexes
db.< collection > .dropIndex( { "index-name": 1 } ) # drop a specific index
```
## Database Profiling
Profiling Levels:
- `0` : no profiling
- `1` : data on operations slower than `slowms`
- `2` : data on all operations
2021-02-16 18:32:24 +01:00
Logs are saved in the `system.profile` _capped_ collection.
2021-01-31 11:05:37 +01:00
```sh
2021-09-20 19:35:32 +02:00
db.setProfilingLevel(n) # set profiler level
2021-01-31 11:05:37 +01:00
db.setProfilingLevel(1, { slowms: < ms > })
2021-09-20 19:35:32 +02:00
db.getProfilingStatus() # check profiler status
2021-01-31 11:05:37 +01:00
db.system.profile.find().limit(n).sort( {} ).pretty() # see logs
db.system.profile.find().limit(n).sort( { ts : -1 } ).pretty() # sort by decreasing timestamp
```
## Roles and permissions
**Authentication**: identifies valid users
**Authorization**: identifies what a user can do
2021-09-20 19:35:32 +02:00
- **userAdminAnyDatabase**: can admin every db in the instance (role must be created on admin db)
2021-01-31 11:05:37 +01:00
- **userAdmin**: can admin the specific db in which is created
- **readWrite**: can read and write in the specific db in which is created
- **read**: can read the specific db in which is created
```sh
# create users in the current MongoDB instance
db.createUser(
{
user: "dbAdmin",
pwd: "password",
roles:[
{
role: "userAdminAnyDatabase",
db:"admin"
}
]
},
{
user: "username",
pwd: "password",
roles:[
{
role: "role",
db: "database"
}
]
}
)
```
## Sharding
2021-09-20 19:35:32 +02:00
**Sharding** is a MongoDB concept through which big datasets are subdivided in smaller sets and distributed towards multiple instances of MongoDB.
2021-01-31 11:05:37 +01:00
It's a technique used to improve the performances of large queries towards large quantities of data that require al lot of resources from the server.
2021-02-16 18:32:24 +01:00
A collection containing several documents is splitted in more smaller collections (_shards_)
2021-01-31 11:05:37 +01:00
Shards are implemented via cluster that are none other a group of MongoDB instances.
Shard components are:
- Shards (min 2), instances of MongoDB that contain a subset of the data
2021-09-20 19:35:32 +02:00
- A config server, instance of MongoDB which contains metadata on the cluster, that is the set of instances that have the shard data.
2021-01-31 11:05:37 +01:00
- A router (or `mongos` ), instance of MongoDB used to redirect the user instructions from the client to the correct server.
2021-07-01 20:37:27 +02:00

2021-01-31 11:05:37 +01:00
### [Replica set](https://docs.mongodb.com/manual/replication/)
A **replica set** in MongoDB is a group of `mongod` processes that maintain the `same dataset` . Replica sets provide redundancy and high availability, and are the basis for all production deployments.
## Aggregations
2021-02-16 18:32:24 +01:00
Sequence of operations applied to a collection as a _pipeline_ to get a result: `db.collection.aggregate(pipeline, options)` .
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
[Aggregations Stages][aggeregation_stages_docs]:
2021-01-31 11:05:37 +01:00
- `$lookup` : Right Join
- `$match` : Where
- `$sort` : Order By
2021-02-16 18:32:24 +01:00
- `$project` : Select \*
2021-01-31 11:05:37 +01:00
- ...
2021-02-16 18:32:24 +01:00
[aggeregation_stages_docs]: https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
2021-01-31 11:05:37 +01:00
Example:
```sh
db.collection.aggregate([
{
$lookup: {
from: < collection to join > ,
localField: < field from the input documents > ,
foreignField: < field from the documents of the " from " collection > ,
as: < output array field >
}
},
{ $match: { < query > } },
{ $sort: { ... } },
{ $project: { ... } },
{ ... }
])
```