Difference between revisions of "AGGREGATE examples"
Line 105: | Line 105: | ||
<div class=ans> | <div class=ans> | ||
pp.pprint(list(db.world.aggregate([{"$group":{'_id':'null','min':{"$min":"$gdp"},'max':{"$max":"$gdp"},}},{"$project":{"_id":0,"min":1,"max":1}},]))) | pp.pprint(list(db.world.aggregate([{"$group":{'_id':'null','min':{"$min":"$gdp"},'max':{"$max":"$gdp"},}},{"$project":{"_id":0,"min":1,"max":1}},]))) | ||
+ | </div> | ||
+ | </div> | ||
+ | |||
+ | <div class=q data-lang="py3"> | ||
+ | In the previous example it is impossible to get the names of the countries with the largest and smallest GDPs.<br/> | ||
+ | If we want to do this, we can use <code>.find_one()</code> to get a single result and <code>.sort()</code> to order the collection on one field, for example:</br> | ||
+ | <p class=strong>Get the names and GDPs of the two countries with the smallest and largest GDPs.</p> | ||
+ | <pre class=def> | ||
+ | pp.pprint(list( | ||
+ | db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":-1) | ||
+ | db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":1) | ||
+ | )) | ||
+ | </pre> | ||
+ | <div class=ans> | ||
+ | pp.pprint(list(db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":-1)db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":1))) | ||
</div> | </div> | ||
</div> | </div> |
Revision as of 09:48, 17 July 2015
#ENCODING import io import sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16') #MONGO from pymongo import MongoClient client = MongoClient() client.progzoo.authenticate('scott','tiger') db = client['progzoo'] #PRETTY import pprint pp = pprint.PrettyPrinter(indent=4)
Introducing the aggregation framework
These examples introduce the aggregation framework and its operators. Again we will be using the collection world
$match
Allows us to perform queries in a similar way to find()
Show all the details for France
pp.pprint(list( db.world.aggregate([ {"$match":{"name":"France"}} ]) ))
pp.pprint(list(db.world.aggregate([{"$match":{"name":"France"}}])))
$project
Allows us to select what fields to display.
It can also has the ability to insert new fields and allows you to compare fields against each other without using $where
Show the name and population density of all Asian countries. (population/area)
Note that "density" is a new field, made from the result of dividing two existing fields, and that $divide
is an aggregate function.
To avoid diving by 0 we do a $match
to remove any countries with 0 area (Vatican City), then pipe these results through to $project
pp.pprint(list( db.world.aggregate([ {"$match":{"area":{"$ne":0},"continent":"Asia"}}, {"$project":{ "_id":0, "name":1, "density": {"$divide": ["$population","$area"]} }} ]) ))
pp.pprint(list(db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}])))
$sort
Allows us to choose how the results are displayed, where 1 is ascending and -1 is descending.
Note that excluding $match
is the same as {"$match":{}}
Show the name of all countries in descending order.
pp.pprint(list( db.world.aggregate([ {"$project":{ "_id":0, "name":1, }}, {"$sort":{ "name":-1 }} ]) ))
pp.pprint(list(db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])))
Grouping
Grouping allows us to use accumulator operations sum as $sum
All groups must have an _id
. To accumulate over all the results you can just use null
$max
and $min
can be used to get the largest and smallest values in a group.
Get the smallest and largest GDPs.
pp.pprint(list( db.world.aggregate([ {"$group":{ '_id':'null', 'min':{"$min":"$gdp"}, 'max':{"$max":"$gdp"}, }}, {"$project":{ "_id":0, "min":1, "max":1 }}, ]) ))
pp.pprint(list(db.world.aggregate([{"$group":{'_id':'null','min':{"$min":"$gdp"},'max':{"$max":"$gdp"},}},{"$project":{"_id":0,"min":1,"max":1}},])))
In the previous example it is impossible to get the names of the countries with the largest and smallest GDPs.
If we want to do this, we can use .find_one()
to get a single result and .sort()
to order the collection on one field, for example:
Get the names and GDPs of the two countries with the smallest and largest GDPs.
pp.pprint(list( db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":-1) db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":1) ))
pp.pprint(list(db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":-1)db.world.find_one({},{"name":1,"gdp":1,"_id":0}).sort("gdp":1)))