AGGREGATE examples

#ENCODING
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16')
#MONGO
from pymongo import MongoClient
client = MongoClient()
client.progzoo.authenticate('scott','tiger')
db = client['progzoo']
#PRETTY
import pprint
pp = pprint.PrettyPrinter(indent=4)

Introducing the aggregation framework

These examples introduce the aggregation framework and its operators. Again we will be using the collection world

$match Allows us to perform queries in a similar way to find()

Show all the details for France

pp.pprint(list(
    db.world.aggregate([
        {"$match":{"name":"France"}}
    ])
))

pp.pprint(list(db.world.aggregate([{"$match":{"name":"France"}}])))

$project Allows us to select what fields to display.
It can also has the ability to insert new fields and allows you to compare fields against each other without using $where

Show the name and population density of all Asian countries. (population/area)

Note that "density" is a new field, made from the result of dividing two existing fields, and that $divide is an aggregate function.

To avoid diving by 0 we do a $match to remove any countries with 0 area (Vatican City), then pipe these results through to $project

pp.pprint(list(
     db.world.aggregate([
        {"$match":{"area":{"$ne":0},"continent":"Asia"}},
        {"$project":{
            "_id":0,
            "name":1,
            "density": {"$divide": ["$population","$area"]}
        }}
    ])
))

pp.pprint(list(db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}])))

$sort Allows us to choose how the results are displayed, where 1 is ascending and -1 is descending.
Note that excluding $match is the same as {"$match":{}}

Show the name of all countries in descending order.

pp.pprint(list(
     db.world.aggregate([
        {"$project":{
            "_id":0,
            "name":1,
        }},
        {"$sort":{
            "name":-1
        }}  
    ])
))

pp.pprint(list(db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])))

Grouping

Grouping allows us to use accumulator operations sum as $sum
All groups must have an _id. To accumulate over all the results you can just use null

$max and $min can be used to get the largest and smallest values in a group.

Get the smallest and largest GDPs.

pp.pprint(list(
     db.world.aggregate([
        {"$group":{
            '_id':'null',
            'min':{"$min":"$gdp"},
            'max':{"$max":"$gdp"},
        }},
        {"$project":{
          "_id":0,  
          "min":1,
          "max":1
        }},
    ])
))

pp.pprint(list(db.world.aggregate([{"$group":{'_id':'null','min':{"$min":"$gdp"},'max':{"$max":"$gdp"},}},{"$project":{"_id":0,"min":1,"max":1}},])))

In the previous example we can't use $name to get the names of the countries with the smallest and largest values as we lost the ability to identify these documents when we set the _id to null to perform a grouping

If we want to do this, a simple way is to use sort=[(<fieldname>, <direction>)] inside a find_one() statement. Performance will be improved if these fields have previously been indexed.

Get the names and GDPs of the two countries with the smallest and largest GDPs.

It is possible that we will occasionally encounter null values in a data collection. This could be either where the field is simply not included on a certain document, or a document has a redundancy where the field is specified as null. e.g: {"name": "Bob", "age": None} etc.

To deal with this we can use {<field>: {"$ne": None}} to prevent any null values from being included.

pp.pprint(
    db.world.find_one({"gdp":{"$ne":None}},{"name":1,"gdp":1,"_id":0},sort=[("gdp", 1)])
)
pp.pprint(
    db.world.find_one({},{"name":1,"gdp":1,"_id":0},sort=[("gdp", -1)])
)

pp.pprint(db.world.find_one({"gdp":{"$ne":None}},{"name":1,"gdp":1,"_id":0},sort=[("gdp", 1)]) pp.pprint(db.world.find_one({},{"name":1,"gdp":1,"_id":0},sort=[("gdp", -1)]))