Difference between revisions of "Join"
(One intermediate revision by the same user not shown) | |||
Line 65: | Line 65: | ||
Using a list of references reduces this problem but does not remove it entirely. To avoid this an additional field can be added to each book document: | Using a list of references reduces this problem but does not remove it entirely. To avoid this an additional field can be added to each book document: | ||
<pre> | <pre> | ||
− | |||
{ | { | ||
_id: "oreilly", | _id: "oreilly", | ||
Line 98: | Line 97: | ||
<pre> | <pre> | ||
SELECT *, inventory_docs | SELECT *, inventory_docs | ||
− | FROM orders | + | FROM orders |
− | WHERE inventory_docs IN (SELECT * | + | WHERE inventory_docs IN ( |
− | FROM inventory | + | SELECT * |
− | WHERE sku= orders.item); | + | FROM inventory |
+ | WHERE sku = orders.item | ||
+ | ); | ||
</pre> | </pre> | ||
<div class="hint" title="'orders' collection"> | <div class="hint" title="'orders' collection"> |
Latest revision as of 16:54, 18 July 2018
The examples in this page are taken from the MongoDB reference documentation.
Modelling Relations in Document Databases
Document databases typically do not implement JOIN operations. Unlike relational databases, data is usually stored in a denormalised format and document structure is not enforced by default.
Instead, One-to-One and One-to-Many relationships between documents are typically done using what are known as embedded or nested documents, e.g:
One-to-One
{ _id: "joe", name: "Joe Bookreader", address: { street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" } }
One-to-Many
{ _id: "joe", name: "Joe Bookreader", addresses: [{ street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" }, { street: "1 Some Other Street", city: "Boston", state: "MA", zip: "12345" } ] }
Some document databases also allow the use of references, where documents can point to other documents in the collection, allowing for Many-to-Many relationships.
In the following example concerning publishers and publications, the 'books' field contains the IDs of related documents - books published by O'Reilly Media.
{ name: "O'Reilly Media", founded: 1980, location: "CA", books: [123456789, 234567890, ...] }, { _id: 123456789, title: "MongoDB: The Definitive Guide", author: ["Kristina Chodorow", "Mike Dirolf"], published_date: ISODate("2010-09-24"), pages: 216, language: "English" }, { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English" }
It would be possible to embed the publisher information into each book document but this would lead to replication of data and any changes to O'Reilly's details would require updating several documents rather than just one, which in turn may lead to inconsistent data.
An alternative would be to embed each book in a list inside the publisher document, but over time this would lead to large document growth which may cause the document to be relocated on disk when it outgrows its allocated space - resulting in a drop in performance.
Using a list of references reduces this problem but does not remove it entirely. To avoid this an additional field can be added to each book document:
{ _id: "oreilly", name: "O'Reilly Media", founded: 1980, location: "CA" }, { _id: 123456789, title: "MongoDB: The Definitive Guide", author: ["Kristina Chodorow", "Mike Dirolf"], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" }, { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher_id: "oreilly" }
Unlike relational databases, there are no primary-foreign key constraints in document databases. Therefore, operations that preserve database integrity such as cascading delete must be implemented by the developer rather than the underlying database software.
MongoDB's $lookup
MongoDB's $lookup
stage of the aggregation pipeline can be used to provide similar behaviour to a LEFT OUTER JOIN in SQL.
The following example joins documents the orders and inventory collections and is equivalent to the following SQL statement:
SELECT *, inventory_docs FROM orders WHERE inventory_docs IN ( SELECT * FROM inventory WHERE sku = orders.item );
{"_id": 1, "item": "almonds", "price": 12, "quantity": 2} {"_id": 2, "item": "pecans", "price": 20, "quantity": 1} {"_id": 3}
{ "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 } { "_id" : 2, "sku" : "bread", "description" : "product 2", "instock" : 80 } { "_id" : 3, "sku" : "cashews", "description" : "product 3", "instock" : 60 } { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 } { "_id" : 5, "sku" : null, "description" : "Incomplete" } { "_id" : 6 }
db.orders.aggregate([ {$lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" } }]);