AGGREGATE examples: Difference between revisions
From NoSQLZoo
No edit summary |
|||
| Line 18: | Line 18: | ||
<div class='extra_space' style='width:1em; height:6em;'></div> | <div class='extra_space' style='width:1em; height:6em;'></div> | ||
<div class=q data-lang=" | <div class=q data-lang="mongo"> | ||
<code>$match</code> performs queries in a similar way to <code>find()</code> | <code>$match</code> performs queries in a similar way to <code>find()</code> | ||
<p class=strong>Show all the details for France</p> | <p class=strong>Show all the details for France</p> | ||
<pre class=def> | <pre class=def> | ||
db.world.aggregate([ | |||
{$match:{name:"France"}} | |||
]) | |||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
db.world.aggregate([{$match:{name:"France"}}]) | |||
</div> | </div> | ||
</div> | </div> | ||
<div class=q data-lang=" | <div class=q data-lang="mongo"><code>limit</code> sets the amount of documents to be handed to the next stage in the pipeline. | ||
<p class=strong>Return the first document</p> | <p class=strong>Return the first document</p> | ||
<pre class=def> | <pre class=def> | ||
db.world.aggregate([ | |||
{$limit:1} | |||
]) | |||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
db.world.aggregate([{"$limit":1}]) | |||
</div> | </div> | ||
</div> | </div> | ||
<div class=q data-lang=" | <div class=q data-lang="mongo"> | ||
<code>$project</code> selects what fields to display.<br/> | <code>$project</code> selects what fields to display.<br/> | ||
It can also has the ability to create new fields and to compare fields against each other without using <code>$where</code> | It can also has the ability to create new fields and to compare fields against each other without using <code>$where</code> | ||
| Line 57: | Line 53: | ||
</div> | </div> | ||
<pre class=def> | <pre class=def> | ||
db.world.aggregate([ | |||
{$match:{area:{$ne:0}, continent:"Asia"}}, | |||
{$project:{ | |||
_id:0, | |||
name:1, | |||
density: {$divide: ["$population","$area"]} | |||
}} | |||
]) | |||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}]) | |||
</div> | </div> | ||
</div> | </div> | ||
<div class=q data-lang=" | <div class=q data-lang="mongo"> | ||
Because aggregate is a pipeline stages may be repeated, and stages don't have to be used in a specific order. | Because aggregate is a pipeline stages may be repeated, and stages don't have to be used in a specific order. | ||
<p class=strong>Show the name of Asian countries with a density that's over 500 people per km<sup>2</sup>. (population/area)</p> | <p class=strong>Show the name of Asian countries with a density that's over 500 people per km<sup>2</sup>. (population/area)</p> | ||
<pre class=def> | <pre class=def> | ||
db.world.aggregate([ | |||
{$match:{area:{$ne:0},continent:"Asia"}}, | |||
{$project:{ | |||
_id:0, | |||
name:1, | |||
density: {$divide: ["$population","$area"]} | |||
}}, | |||
{$match:{density:{$gt:500}}} | |||
]) | |||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}]) | |||
</div> | </div> | ||
</div> | </div> | ||
<div class=q data-lang=" | <div class=q data-lang="mongo"> | ||
<code>$sort</code> allows ordering of the results set, where 1 is ascending and -1 is descending.<br/> | <code>$sort</code> allows ordering of the results set, where 1 is ascending and -1 is descending.<br/> | ||
Note that not including <code>$match</code> is the same as <code>{"$match":{}}</code> | Note that not including <code>$match</code> is the same as <code>{"$match":{}}</code> | ||
<p class=strong>Show the name of all countries in descending order.</p> | <p class=strong>Show the name of all countries in descending order.</p> | ||
<pre class=def> | <pre class=def> | ||
db.world.aggregate([ | |||
{"$project":{ | |||
"_id":0, | |||
"name":1, | |||
}}, | |||
{"$sort":{ | |||
"name":-1 | |||
}} | |||
]) | |||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}]) | |||
</div> | </div> | ||
</div> | </div> | ||
Revision as of 14:52, 3 December 2015
#ENCODING
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16')
#MONGO
from pymongo import MongoClient
client = MongoClient()
client.progzoo.authenticate('scott','tiger')
db = client['progzoo']
#PRETTY
import pprint
pp = pprint.PrettyPrinter(indent=4, width=160)
Introducing the aggregation framework
These examples introduce the aggregation framework and its operators. Again we will be using the collection world
$match performs queries in a similar way to find()
Show all the details for France
db.world.aggregate([
{$match:{name:"France"}}
])
db.world.aggregate([{$match:{name:"France"}}])
limit sets the amount of documents to be handed to the next stage in the pipeline.
Return the first document
db.world.aggregate([
{$limit:1}
])
db.world.aggregate([{"$limit":1}])
$project selects what fields to display.
It can also has the ability to create new fields and to compare fields against each other without using $where
Show the name and population density of all Asian countries. (population/area)
Note that "density" is a new field, made from the result of dividing two existing fields, and that $divide is an aggregate function.
To avoid diving by 0 insert a $match to remove any countries with 0 area (Vatican City), then pipe these results through to $project
There is no need to check if values are null, MongoDB will ignore these documents.
db.world.aggregate([
{$match:{area:{$ne:0}, continent:"Asia"}},
{$project:{
_id:0,
name:1,
density: {$divide: ["$population","$area"]}
}}
])
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}])
Because aggregate is a pipeline stages may be repeated, and stages don't have to be used in a specific order.
Show the name of Asian countries with a density that's over 500 people per km2. (population/area)
db.world.aggregate([
{$match:{area:{$ne:0},continent:"Asia"}},
{$project:{
_id:0,
name:1,
density: {$divide: ["$population","$area"]}
}},
{$match:{density:{$gt:500}}}
])
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}])
$sort allows ordering of the results set, where 1 is ascending and -1 is descending.
Note that not including $match is the same as {"$match":{}}
Show the name of all countries in descending order.
db.world.aggregate([
{"$project":{
"_id":0,
"name":1,
}},
{"$sort":{
"name":-1
}}
])
db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])
Grouping
Grouping provides accumulator operations such as $sum
All groups must have an _id. To see why this is useful imagine the following:
So far you've been using the collection world
As every country has a continent, it would make sense to have countries as a nested document inside continents: e.g:
[
{"name":"Africa",
"countries":[
{"name":"Algeria", "capital":"Algiers", ...},
{"name":"Angola", "capital":"Luanda", ...},
{"name":"Benin", "capital":"Porto-Novo",...}.
{...},
...
]},
{"name":"Asia",
"countries":[
{"name":"Afghanistan","capital":"Kabul", ...},
{"name":"Azerbaijan", "capital":"Baku", ...},
{"name":"Bahrain", "capital":"Manama",...},
{...},
...
]},
{...},
...
]
The world collection isn't like this however. It uses the following structure, which has a redundancy where continent is repeated for each country.
[
{"name":"Afghanistan","capital":"Kabul", "continent":"Asia", ...},
{"name":"Albania", "capital":"Tirana", "continent":"Europe, ...},
{"name":"Algeria", "capital":"Algiers","contiennt":"Africa",...},
{...},
...
]
The code to group by continent is "_id":"$continent"
If instead the question was to group by country the code would be "_id":"$name".
To operate over the whole document (which would have the same effect as "_id":"$name") "_id":"null" or "_id":None can be used.
$max and $min can be used to get the largest and smallest values in a group.
Get the smallest and largest GDPs of each continent.
pp.pprint(list(
db.world.aggregate([
{"$group":{
'_id':'$continent',
'min':{"$min":"$gdp"},
'max':{"$max":"$gdp"}
}},
{"$project":{
"_id":1,
"min":1,
"max":1
}},
])
))
pp.pprint(list(db.world.aggregate([{"$group":{'_id':'$continent','min':{"$min":"$gdp"},'max':{"$max":"$gdp"}}},{"$project":{"_id":1,"min":1,"max":1}},])))
Some other useful aggregate functions to know are $sum and average: $avg
This example combines all the material in these examples.
Order the continents in descending order by total GDP, Include the average GDP for each country.
pp.pprint(list(
db.world.aggregate([
{"$match":{}},
{"$group":{
"_id":"$continent",
"Total GDP": {"$sum": "$gdp"},
"Average GDP": {"$avg": "$gdp"}
}},
{"$sort":{
"Total GDP":-1
}},
{"$project":{
"Area":"$_id",
"Total GDP": 1,
"Average GDP":1,
"_id":0
}}
])
))
pp.pprint(list(db.world.aggregate([{"$group":{"_id":"$continent","Total GDP":{"$sum":"$gdp"},"Average GDP":{"$avg":"$gdp"}}},{"$sort":{"Total GDP":-1}},{"$project":{"Area":"$_id","Total GDP":1,"Average GDP":1,"_id":0}}])))
Using Conditions
$cond is similar to a CASE statement in other languages.
It has the form "$cond": [{<comparison> :[<field or value>,<field or value>]},<true case>,<false case>]
pp.pprint(list(
db.world.aggregate([
{"$group":{
"_id":{
"$cond": [{"$eq":["$continent","Eurasia"]},"Europe","$continent"]
},
"area":{"$sum": "$area"}
}},
{"$sort":{
"area": -1
}},
{"$project":{
"_id":1,
"area":1
}}
])
))