Difference between revisions of "AGGREGATE examples"
(→$limit) |
|||
(45 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ==aggregate== | |
− | + | The aggregate function takes a list of operations - this is the pipeline. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | The data passes through each stage of the pipeline in turn. | |
− | + | ||
+ | Pipeline stages can include: | ||
+ | * '''$group''' This is the aggregate special sauce; you specify the _id value, the output from this stage includes one entry for each distinct _id value. In SQL you would use a GROUP BY clause | ||
+ | * '''$match''' - this acts as a filter, some data items pass through this stage, some do not. Similar to the WHERE clause in SQL | ||
+ | * '''$project''' - this can be used to transform each element. Rather like the values on the SELECT line of an SQL query | ||
+ | * '''$limit''' | ||
+ | * '''$sort''' | ||
+ | * '''$skip''' | ||
<div class='extra_space' style='width:1em; height:6em;'></div> | <div class='extra_space' style='width:1em; height:6em;'></div> | ||
+ | ==$group== | ||
+ | <div class="q" data-lang="mongo"> | ||
+ | '''$group''' allows you to collect group items that share common features | ||
+ | * _id - this determines the values to be grouped | ||
+ | * $continent and $population refers to keys 'continent' and 'population' available in each item | ||
+ | * $sum is an aggregating function, it takes many values in and returns a single value. Other examples of aggregating functions are $min $max, $avg | ||
+ | <p class="strong">List the continents</p> | ||
+ | <pre class="def"><nowiki> | ||
+ | db.world.aggregate( | ||
+ | {$group: {_id: "$continent"}, pop:{$sum:'$population'}} | ||
+ | );</nowiki></pre> | ||
+ | </div> | ||
− | <div class=q data-lang=" | + | ==$match== |
− | + | <div class="q" data-lang="mongo"> | |
− | <p class=strong>Show all the details for France</p> | + | '''$match''' performs queries in a similar way to <syntaxhighlight lang="JavaScript" inline>find()</syntaxhighlight> |
− | <pre class=def> | + | <p class="strong">Show all the details for France</p> |
− | + | <pre class="def"><nowiki> | |
− | + | db.world.aggregate([ | |
− | + | {$match: {name: "France"}} | |
− | + | ]);</nowiki></pre> | |
− | + | <pre class="ans"><nowiki>db.world.aggregate([{$match:{name:"France"}}]);</nowiki></pre> | |
− | </pre> | ||
− | < | ||
− | |||
</div> | </div> | ||
+ | |||
+ | ==$limit== | ||
+ | <div class="q" data-lang="mongo">'''$limit''' sets the amount of documents to be handed to the next stage in the pipeline. | ||
+ | <p class="strong">Return the first two document</p> | ||
+ | <pre class="def"><nowiki> | ||
+ | db.world.aggregate([ | ||
+ | {$limit: 2} | ||
+ | ]);</nowiki></pre> | ||
+ | <pre class="ans"><nowiki>db.world.aggregate([{"$limit":2}]);</nowiki></pre> | ||
</div> | </div> | ||
− | <div class=q data-lang=" | + | ==$project== |
− | + | <div class="q" data-lang="mongo"> | |
− | It can also has the ability to | + | '''$project''' selects what fields to display.<br/> |
− | <p class=strong>Show the name and population density of all Asian countries. (population/area)</p> | + | It can also has the ability to create new fields and to compare fields against each other without using '''$where''' |
− | Note that "density" is a new field | + | <p class="strong">Show the name and population density of all Asian countries. (population/area)</p> |
− | <div class=hint title="Dealing with division by 0"> | + | Note that "density" is a new field made from the result of dividing two existing fields, and that '''$divide''' is an aggregate function. |
− | To avoid diving by 0 | + | <div class="hint" title="Dealing with division by 0"> |
+ | To avoid diving by 0 insert a '''$match''' to remove any countries with no area (Vatican City), then pipe these results through to '''$project'''<br/> | ||
+ | There is no need to check if values are <syntaxhighlight lang="JavaScript" inline>null</syntaxhighlight>, MongoDB will ignore these documents. | ||
</div> | </div> | ||
− | <pre class=def> | + | <pre class="def"><nowiki> |
− | + | db.world.aggregate([ | |
− | + | {$match: {area: {$ne: 0}, continent: "Asia"}}, | |
− | + | {$project: { | |
− | + | _id: 0, | |
− | + | name: 1, | |
− | + | density: {$divide: ["$population", "$area"]} | |
− | + | }} | |
− | + | ]);</nowiki></pre> | |
− | + | <pre class="ans"><nowiki>db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}]);</nowiki></pre> | |
− | |||
− | </pre> | ||
− | < | ||
− | |||
</div> | </div> | ||
+ | |||
+ | ==aggregate composition== | ||
+ | <div class="q" data-lang="mongo"> | ||
+ | You can have several pipeline stages, the data flows through each one in turn. | ||
+ | <p class="strong">Show the name of Asian countries with a density that's over 500 people per km<sup>2</sup>. (population/area)</p> | ||
+ | <pre class="def"><nowiki> | ||
+ | db.world.aggregate([ | ||
+ | {$match: {area: {$ne: 0}, continent: "Asia"}}, | ||
+ | {$project: { | ||
+ | _id: 0, | ||
+ | name: 1, | ||
+ | density: {$divide: ["$population", "$area"]} | ||
+ | }}, | ||
+ | {$match: {density: {$gt: 500}}} | ||
+ | ]);</nowiki></pre> | ||
+ | <pre class="ans"><nowiki>db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}]);</nowiki></pre> | ||
</div> | </div> | ||
− | <div class=q data-lang=" | + | ==$sort== |
− | + | <div class="q" data-lang="mongo"> | |
− | Note that | + | '''$sort''' allows ordering of the results set, where 1 is ascending and -1 is descending.<br/> |
− | <p class=strong>Show the name of all countries in descending order.</p> | + | Note that not including '''$match''' is the same as <syntaxhighlight lang="JavaScript" inline>{"$match":{}}</syntaxhighlight> |
− | <pre class=def> | + | <p class="strong">Show the name of all countries in descending order.</p> |
− | + | <pre class="def"><nowiki> | |
− | + | db.world.aggregate([ | |
− | + | {"$project":{ | |
− | + | "_id":0, | |
− | + | "name":1, | |
− | + | }}, | |
− | + | {"$sort":{ | |
− | + | "name":-1 | |
− | + | }} | |
− | + | ]);</nowiki></pre> | |
− | + | <pre class="ans"><nowiki>db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])</nowiki></pre> | |
− | </pre> | ||
− | < | ||
− | |||
− | </ | ||
</div> | </div> | ||
==Grouping== | ==Grouping== | ||
− | Grouping | + | Grouping provides accumulator operations such as '''$sum'''<br /> |
− | + | All groups must have an '''_id'''. To see why this is useful imagine the following:<br/><br/> | |
− | < | + | |
− | <div class=q data-lang=" | + | So far you've been using the '''world''' collection</code><br/> |
− | + | As every country has a continent, it would make sense to have countries as a nested document inside continents: e.g: | |
− | <p class=strong>Get the smallest and largest GDPs.</p> | + | <syntaxhighlight lang="JavaScript"> |
− | <pre class=def> | + | [ |
− | + | {"name": "Africa", | |
− | + | "countries": [ | |
− | + | {"name": "Algeria", "capital": "Algiers", ...}, | |
− | + | {"name": "Angola", "capital": "Luanda", ...}, | |
− | + | {"name": "Benin", "capital": "Porto-Novo", ...}. | |
− | + | {...}, | |
− | + | ... | |
− | + | ]}, | |
− | + | {"name": "Asia", | |
− | + | "countries": [ | |
− | + | {"name": "Afghanistan", "capital": "Kabul", ...}, | |
− | + | {"name": "Azerbaijan", "capital": "Baku", ...}, | |
− | + | {"name": "Bahrain", "capital": "Manama", ...}, | |
− | + | {...}, | |
− | </pre> | + | ... |
− | < | + | ]}, |
− | + | {...}, | |
+ | ... | ||
+ | ] | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | The '''world''' collection isn't like this however. It uses the following structure, which has a redundancy where '''continent''' is repeated for each country. | ||
+ | |||
+ | <syntaxhighlight lang="JavaScript"> | ||
+ | [ | ||
+ | {"name": "Afghanistan", "capital": "Kabul", "continent": "Asia", ...}, | ||
+ | {"name": "Albania", "capital": "Tirana", "continent": "Europe", ...}, | ||
+ | {"name": "Algeria", "capital": "Algiers", "contiennt": "Africa", ...}, | ||
+ | {...}, | ||
+ | ... | ||
+ | ] | ||
+ | </syntaxhighlight> | ||
+ | The code to group by continent is <syntaxhighlight lang="JavaScript" inline>"_id":"$continent"</syntaxhighlight><br/> | ||
+ | If instead the question was to group by country the code would be <syntaxhighlight lang="JavaScript" inline>"_id": "$name"</syntaxhighlight>.<br/> | ||
+ | To operate over the whole document (which would have the same effect as <syntaxhighlight lang="JavaScript" inline>"_id": "$name"</syntaxhighlight>) <syntaxhighlight lang="JavaScript" inline>"_id": "null"</syntaxhighlight> or <syntaxhighlight lang="JavaScript" inline>"_id": None</syntaxhighlight> can be used. | ||
+ | ==group operators== | ||
+ | <div class="q" data-lang="mongo"> | ||
+ | '''$max''' and '''$min''' can be used to get the largest and smallest values in a group. | ||
+ | <p class="strong">Get the smallest and largest GDPs of each continent.</p> | ||
+ | <pre class="def"><nowiki> | ||
+ | db.world.aggregate([ | ||
+ | {$group: { | ||
+ | _id: '$continent', | ||
+ | min: {$min: "$gdp"}, | ||
+ | max: {$max: "$gdp"} | ||
+ | }}, | ||
+ | {$project: { | ||
+ | _id: 1, | ||
+ | min: 1, | ||
+ | max: 1 | ||
+ | }} | ||
+ | ]);</nowiki></pre> | ||
+ | <pre class="ans"><nowiki>db.world.aggregate([{"$group":{'_id':'$continent','min':{"$min":"$gdp"},'max':{"$max":"$gdp"}}},{"$project":{"_id":1,"min":1,"max":1}}]);</nowiki></pre> | ||
</div> | </div> | ||
+ | |||
+ | <div class="q" data-lang="mongo"> | ||
+ | Some other useful aggregate functions to know are '''$sum''' and average: '''$avg'''<br/> | ||
+ | The example below combines previous example material. | ||
+ | <p class="strong">Order the continents in descending order by total GDP, Include the average GDP for each country.</p> | ||
+ | <pre class="def"><nowiki> | ||
+ | db.world.aggregate([ | ||
+ | {$match: {}}, | ||
+ | {$group: { | ||
+ | _id:"$continent", | ||
+ | "Total GDP": {"$sum": "$gdp"}, | ||
+ | "Average GDP": {"$avg": "$gdp"} | ||
+ | }}, | ||
+ | {$sort: { | ||
+ | "Total GDP":-1 | ||
+ | }}, | ||
+ | {$project:{ | ||
+ | "Area": "$_id", | ||
+ | "Total GDP": 1, | ||
+ | "Average GDP": 1, | ||
+ | _id: 0 | ||
+ | }} | ||
+ | ]);</nowiki></pre> | ||
+ | <pre class="ans"><nowiki>db.world.aggregate([{"$group":{"_id":"$continent","Total GDP":{"$sum":"$gdp"},"Average GDP":{"$avg":"$gdp"}}},{"$sort":{"Total GDP":-1}},{"$project":{"Area":"$_id","Total GDP":1,"Average GDP":1,"_id":0}}]);</nowiki></pre> | ||
</div> | </div> | ||
− | <div class=q data-lang=" | + | <div class="q" data-lang="mongo"> |
− | + | Using Conditions<br/><br/> | |
− | + | '''$cond''' is similar to a '''CASE''' statement in other languages.<br/> | |
− | + | It has the form <syntaxhighlight lang="JavaScript" inline>"$cond": [{<comparison>: [<field or value>, <field or value>]}, <true case>, <false case>]</syntaxhighlight><br/><br/> | |
− | < | + | <pre class="def"><nowiki> |
− | + | db.world.aggregate([ | |
− | </ | + | {$group: { |
− | <pre class=def> | + | _id: { |
− | + | $cond: [{"$eq": ["$continent", "Eurasia"]}, "Europe", "$continent"] | |
− | + | }, | |
− | + | area: {$sum: "$area"} | |
− | + | }}, | |
− | + | {$sort: { | |
− | + | area: -1 | |
− | + | }}, | |
− | + | {$project: { | |
− | + | _id: 1, | |
− | + | area: 1 | |
− | </ | + | }} |
+ | ]);</nowiki></pre> | ||
</div> | </div> |
Latest revision as of 20:22, 1 April 2022
Contents
aggregate
The aggregate function takes a list of operations - this is the pipeline.
The data passes through each stage of the pipeline in turn.
Pipeline stages can include:
- $group This is the aggregate special sauce; you specify the _id value, the output from this stage includes one entry for each distinct _id value. In SQL you would use a GROUP BY clause
- $match - this acts as a filter, some data items pass through this stage, some do not. Similar to the WHERE clause in SQL
- $project - this can be used to transform each element. Rather like the values on the SELECT line of an SQL query
- $limit
- $sort
- $skip
$group
$group allows you to collect group items that share common features
- _id - this determines the values to be grouped
- $continent and $population refers to keys 'continent' and 'population' available in each item
- $sum is an aggregating function, it takes many values in and returns a single value. Other examples of aggregating functions are $min $max, $avg
List the continents
db.world.aggregate( {$group: {_id: "$continent"}, pop:{$sum:'$population'}} );
$match
$match performs queries in a similar way to find()
Show all the details for France
db.world.aggregate([ {$match: {name: "France"}} ]);
db.world.aggregate([{$match:{name:"France"}}]);
$limit
Return the first two document
db.world.aggregate([ {$limit: 2} ]);
db.world.aggregate([{"$limit":2}]);
$project
$project selects what fields to display.
It can also has the ability to create new fields and to compare fields against each other without using $where
Show the name and population density of all Asian countries. (population/area)
Note that "density" is a new field made from the result of dividing two existing fields, and that $divide is an aggregate function.
To avoid diving by 0 insert a $match to remove any countries with no area (Vatican City), then pipe these results through to $project
There is no need to check if values are null
, MongoDB will ignore these documents.
db.world.aggregate([ {$match: {area: {$ne: 0}, continent: "Asia"}}, {$project: { _id: 0, name: 1, density: {$divide: ["$population", "$area"]} }} ]);
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}]);
aggregate composition
You can have several pipeline stages, the data flows through each one in turn.
Show the name of Asian countries with a density that's over 500 people per km2. (population/area)
db.world.aggregate([ {$match: {area: {$ne: 0}, continent: "Asia"}}, {$project: { _id: 0, name: 1, density: {$divide: ["$population", "$area"]} }}, {$match: {density: {$gt: 500}}} ]);
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}]);
$sort
$sort allows ordering of the results set, where 1 is ascending and -1 is descending.
Note that not including $match is the same as {"$match":{}}
Show the name of all countries in descending order.
db.world.aggregate([ {"$project":{ "_id":0, "name":1, }}, {"$sort":{ "name":-1 }} ]);
db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])
Grouping
Grouping provides accumulator operations such as $sum
All groups must have an _id. To see why this is useful imagine the following:
So far you've been using the world collection
As every country has a continent, it would make sense to have countries as a nested document inside continents: e.g:
[
{"name": "Africa",
"countries": [
{"name": "Algeria", "capital": "Algiers", ...},
{"name": "Angola", "capital": "Luanda", ...},
{"name": "Benin", "capital": "Porto-Novo", ...}.
{...},
...
]},
{"name": "Asia",
"countries": [
{"name": "Afghanistan", "capital": "Kabul", ...},
{"name": "Azerbaijan", "capital": "Baku", ...},
{"name": "Bahrain", "capital": "Manama", ...},
{...},
...
]},
{...},
...
]
The world collection isn't like this however. It uses the following structure, which has a redundancy where continent is repeated for each country.
[
{"name": "Afghanistan", "capital": "Kabul", "continent": "Asia", ...},
{"name": "Albania", "capital": "Tirana", "continent": "Europe", ...},
{"name": "Algeria", "capital": "Algiers", "contiennt": "Africa", ...},
{...},
...
]
The code to group by continent is "_id":"$continent"
If instead the question was to group by country the code would be "_id": "$name"
.
To operate over the whole document (which would have the same effect as "_id": "$name"
) "_id": "null"
or "_id": None
can be used.
group operators
$max and $min can be used to get the largest and smallest values in a group.
Get the smallest and largest GDPs of each continent.
db.world.aggregate([ {$group: { _id: '$continent', min: {$min: "$gdp"}, max: {$max: "$gdp"} }}, {$project: { _id: 1, min: 1, max: 1 }} ]);
db.world.aggregate([{"$group":{'_id':'$continent','min':{"$min":"$gdp"},'max':{"$max":"$gdp"}}},{"$project":{"_id":1,"min":1,"max":1}}]);
Some other useful aggregate functions to know are $sum and average: $avg
The example below combines previous example material.
Order the continents in descending order by total GDP, Include the average GDP for each country.
db.world.aggregate([ {$match: {}}, {$group: { _id:"$continent", "Total GDP": {"$sum": "$gdp"}, "Average GDP": {"$avg": "$gdp"} }}, {$sort: { "Total GDP":-1 }}, {$project:{ "Area": "$_id", "Total GDP": 1, "Average GDP": 1, _id: 0 }} ]);
db.world.aggregate([{"$group":{"_id":"$continent","Total GDP":{"$sum":"$gdp"},"Average GDP":{"$avg":"$gdp"}}},{"$sort":{"Total GDP":-1}},{"$project":{"Area":"$_id","Total GDP":1,"Average GDP":1,"_id":0}}]);
Using Conditions
$cond is similar to a CASE statement in other languages.
It has the form "$cond": [{<comparison>: [<field or value>, <field or value>]}, <true case>, <false case>]
db.world.aggregate([ {$group: { _id: { $cond: [{"$eq": ["$continent", "Eurasia"]}, "Europe", "$continent"] }, area: {$sum: "$area"} }}, {$sort: { area: -1 }}, {$project: { _id: 1, area: 1 }} ]);