Cookies help us deliver our services. By using our services, you agree to our use of cookies. More information

MapReduce

From NoSQLZoo
Revision as of 19:29, 20 July 2015 by 40166222 (talk | contribs) (Created page with "<pre class=setup> #ENCODING import io import sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16') #MONGO from pymongo import MongoClient client = MongoClien...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
#ENCODING
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16')
#MONGO
from pymongo import MongoClient
client = MongoClient()
client.progzoo.authenticate('scott','tiger')
db = client['progzoo']
#PRETTY
import pprint
pp = pprint.PrettyPrinter(indent=4)

Introducing the MapReduce function

The MapReduce function is an aggregate function that consists of two functions: Map and Reduce. As the name would suggest, the map is always performed before the reduce.

The map function takes data and breaks it down into tuples (key/value pairs) for each element in the dataset
The reduce function then takes the result of the map function and simply reduces it in to a smaller set of tuples by merging all values with the same key.

Map is used to deal with "embarassingly parallel problems" where a task can be broken down into subtasks that can then be ran simultaneously without affecting each other. Instead of just processing elements one by one, all elements can all be dealt with at the same time in parallel. This allows for massively reduced processing times as well as large scalability across multiple servers, making it an attractive solution to handling Big Data.