MongoDB Data Model
- Read
- Discuss
Data modelling is the process of defining how data is stored and what relationships exist between different entities in our data.
Types of Data Models
Embedded Data Models
Embedded documents/data models capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These schemas are generally known as “denormalized” models.
When to use an Embedded Data Model?
In general, use embedded data models when:
- you have “contains” relationships between entities.
- you have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents.
Advantages of Embedded Data Model
- Embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation
- Embedded data models make it possible to update related data in a single atomic write operation.
Normalized Data Models
Normalized data models describe relationships using references between documents.
When to use a Normalized Data Model?
In general, use normalized data models:
- when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
- to represent more complex many-to-many relationships.
- to model large hierarchical data sets.
Advantages of Normalized Data Model
- No data duplication
- Can represent more complex many-to-many relationships
- Can represent hierarchical data sets
Model Relationships Between Documents
One-to-One Relationship
Consider the following example that maps patron and address relationships.
The example illustrates the advantage of embedding over referencing if you need to view one data entity in context of the other. In this one-to-one relationship between patron and address data, the address belongs to the patron.
For Normalized Data Model:
// patron document
{
_id: "joe",
name: "Joe Bookreader"
}
// address document
{
patron_id: "joe", // reference to patron document
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
For Embedded Data Model:
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
}
With the embedded data model, your application can retrieve the complete patron information with one query.
One-to-Many Relationship
Consider the following example that maps patron and multiple address relationships.
The example illustrates the advantage of embedding over referencing if you need to view many data entities in the context of another. In this one-to-many relationship between patron and address data, the patron has multiple address entities.
For Normalized Data Model:
// patron document
{
_id: "joe",
name: "Joe Bookreader"
}
// address documents
{
patron_id: "joe", // reference to patron document
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: "12345"
}
For Embedded Data Model:
{
"_id": "joe",
"name": "Joe Bookreader",
"addresses": [
{
"street": "123 Fake Street",
"city": "Faketon",
"state": "MA",
"zip": "12345"
},
{
"street": "1 Some Other Street",
"city": "Boston",
"state": "MA",
"zip": "12345"
}
]
}
Many-to-Many Relationship
Consider the following example that maps publisher and book relationships.
The example illustrates the advantage of referencing over embedding to avoid repetition of the publisher information.
Embedding the publisher document inside the book document would lead to repetition of the publisher data, as the following documents show:
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
To avoid repetition of the publisher data, use references and keep the publisher information in a separate collection from the book collection.
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
Model Tree Structures
Model Tree Structures with Parent References
Pattern
The Parent References pattern stores each tree node in a document; in addition to the tree node, the document stores the ID of the node’s parent.
Consider the following hierarchy of categories:
The following example models the tree using Parent References, storing the reference to the parent category in the field parent:
db.categories.insertMany( [
{ _id: "MongoDB", parent: "Databases" },
{ _id: "dbm", parent: "Databases" },
{ _id: "Databases", parent: "Programming" },
{ _id: "Languages", parent: "Programming" },
{ _id: "Programming", parent: "Books" },
{ _id: "Books", parent: null }
] )
The query to retrieve the parent of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).parent
You can create an index on the field parent to enable fast search by the parent node:
db.categories.createIndex( { parent: 1 } )
You can query by the parent field to find its immediate children nodes:
db.categories.find( { parent: "Databases" } )
Model Tree Structures with Child References
Pattern
The Child References pattern stores each tree node in a document; in addition to the tree node, the document stores in an array the id(s) of the node’s children.
Consider the following hierarchy of categories:
The following example models the tree using Child References, storing the reference to the node’s children in the field children:
db.categories.insertMany( [
{ _id: "MongoDB", children: [] },
{ _id: "dbm", children: [] },
{ _id: "Databases", children: [ "MongoDB", "dbm" ] },
{ _id: "Languages", children: [] },
{ _id: "Programming", children: [ "Databases", "Languages" ] },
{ _id: "Books", children: [ "Programming" ] }
] )
The query to retrieve the immediate children of a node is fast and straightforward:
db.categories.findOne( { _id: "Databases" } ).children
You can create an index on the field children to enable fast search by the child nodes:
db.categories.createIndex( { children: 1 } )
You can query for a node in the children field to find its parent node as well as its siblings:
db.categories.find( { children: "MongoDB" } )
The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple parents.