Difference between revisions of "RegEx Pattern Matching"
m (40166222 moved page FIND names to Pattern Matching (RegEx)) |
|||
Line 23: | Line 23: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':"^ | + | db.world.find({"name":{'$regex':"^Y"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 36: | Line 36: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':" | + | db.world.find({"name":{'$regex':"$Y"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
− | pp.pprint(list(db.world.find({"name":{'$regex':" | + | pp.pprint(list(db.world.find({"name":{'$regex':"$Y"}},{"name":1,"_id":0}))) |
</div> | </div> | ||
</div> | </div> | ||
Line 49: | Line 49: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':" | + | db.world.find({"name":{'$regex':"x"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 62: | Line 62: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':" | + | db.world.find({"name":{'$regex':"land$"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 76: | Line 76: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':"^.*$"}},{"name":1,"_id":0}) | + | db.world.find({"name":{'$regex':"^C.*ia$"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 89: | Line 89: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':" | + | db.world.find({"name":{'$regex':"oo"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 100: | Line 100: | ||
Bahamas has three <b>a</b>, who else?<br/> | Bahamas has three <b>a</b>, who else?<br/> | ||
<p class=strong>Find the country that has three or more a in the name</p> | <p class=strong>Find the country that has three or more a in the name</p> | ||
− | |||
<code>[Aa] matches both capital and lowercase A.</code> | <code>[Aa] matches both capital and lowercase A.</code> | ||
− | |||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':" | + | db.world.find({"name":{'$regex':"(.*[aA].*){3}"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 119: | Line 117: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':"^. | + | db.world.find({"name":{'$regex':"^.t"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 132: | Line 130: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':"o.o"}},{"name":1,"_id":0}) | + | db.world.find({"name":{'$regex':"o..o"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 145: | Line 143: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"name":{'$regex':"^ | + | db.world.find({"name":{'$regex':"^.{4}$"}},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
Line 152: | Line 150: | ||
</div> | </div> | ||
</div> | </div> | ||
− | == | + | ==Complex Examples== |
− | |||
− | |||
<div class=q data-lang="py3"> | <div class=q data-lang="py3"> | ||
The capital of <b>Luxembourg</b> is <b>Luxembourg</b>. Show all the countries where the capital is the same as the name of the country | The capital of <b>Luxembourg</b> is <b>Luxembourg</b>. Show all the countries where the capital is the same as the name of the country | ||
Line 164: | Line 160: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"$where":"this.name == | + | db.world.find({"$where":"this.name == this.capital"},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
− | pp.pprint(list( | + | pp.pprint(list(db.world.find({"$where":"this.name == this.capital"},{"name":1,"_id":0}))) |
− | |||
− | )) | ||
</div> | </div> | ||
</div> | </div> | ||
Line 179: | Line 173: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"$where":"this.capital == | + | db.world.find({"$where":"this.capital == this.name+' City'"},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
− | pp.pprint(list( | + | pp.pprint(list(db.world.find({"$where":"this.capital == this.name+' City'"},{"name":1,"_id":0}))) |
− | |||
− | )) | ||
</div> | </div> | ||
</div> | </div> | ||
Line 194: | Line 186: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"$where":"this.capital.match( | + | db.world.find({"$where":"this.capital.match(this.name)"},{"name":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
− | pp.pprint(list( | + | pp.pprint(list(db.world.find({"$where":"this.capital.match(this.name)"},{"name":1,"_id":0}))) |
− | |||
− | )) | ||
</div> | </div> | ||
</div> | </div> | ||
Line 215: | Line 205: | ||
<pre class=def> | <pre class=def> | ||
pp.pprint(list( | pp.pprint(list( | ||
− | db.world.find({"$where":"this.capital.match('^'+this.name+'. | + | db.world.find({"$where":"this.capital.match('^'+this.name+'.+$')"},{"name":1,"capital":1,"_id":0}) |
)) | )) | ||
</pre> | </pre> | ||
<div class=ans> | <div class=ans> | ||
− | pp.pprint(list( | + | pp.pprint(list(db.world.find({"$where":"this.capital.match('^'+this.name+'.+$')"},{"name":1,"capital":1,"_id":0}))) |
− | |||
− | )) | ||
</div> | </div> | ||
</div> | </div> |
Revision as of 10:30, 27 July 2015
#ENCODING import io import sys sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16') #MONGO from pymongo import MongoClient client = MongoClient() client.progzoo.authenticate('scott','tiger') db = client['progzoo'] #PRETTY import pprint pp = pprint.PrettyPrinter(indent=4)
Pattern Matching String
This tutorial uses RegEx to check names. We will be using find()
on the collection world
.
You can use '$regex':"^B"
to get all the countries that start with B.
Find the countries that start with Y
pp.pprint(list( db.world.find({"name":{'$regex':"^Y"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"^Y"}},{"name":1,"_id":0})))
You can use '$regex':"a$"
to get all the countries that end with a.
Find the countries that end with Y
pp.pprint(list( db.world.find({"name":{'$regex':"$Y"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"$Y"}},{"name":1,"_id":0})))
Luxembourg has an x, so does one other country, list them both
Find the countries that contain the letter x
pp.pprint(list( db.world.find({"name":{'$regex':"x"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"x"}},{"name":1,"_id":0})))
Iceland and Switzerland end with land but where are the others?
Find the countries that end with land
pp.pprint(list( db.world.find({"name":{'$regex':"land$"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"land$"}},{"name":1,"_id":0})))
Columbia starts with a C and ends with ia - there are two other countries like this.
You can use .*
to match any amount of any characters except newlines.
Find the countries that start with C and end with ia
pp.pprint(list( db.world.find({"name":{'$regex':"^C.*ia$"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"^C.*ia$"}},{"name":1,"_id":0})))
Greece has a double e, who has a double o
Find the country that has oo in its name
pp.pprint(list( db.world.find({"name":{'$regex':"oo"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"oo"}},{"name":1,"_id":0})))
Bahamas has three a, who else?
Find the country that has three or more a in the name
[Aa] matches both capital and lowercase A.
pp.pprint(list( db.world.find({"name":{'$regex':"(.*[aA].*){3}"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"(.*[aA].*){3}"}},{"name":1,"_id":0})))
India and Angola have n as their second character.
.*
Indicates zero or more characters, .
indicates just one.
Find the countries that have "t" as the second character.
pp.pprint(list( db.world.find({"name":{'$regex':"^.t"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"^.t"}},{"name":1,"_id":0})))
Lesotho and Moldova both have two o characters seperated by two other characters.
Find the countries that have two "o" characters separated by two others.
pp.pprint(list( db.world.find({"name":{'$regex':"o..o"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"o..o"}},{"name":1,"_id":0})))
Cuba and Togo have four character names.
Find the countries that have exactly four characters
pp.pprint(list( db.world.find({"name":{'$regex':"^.{4}$"}},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"name":{'$regex':"^.{4}$"}},{"name":1,"_id":0})))
Complex Examples
The capital of Luxembourg is Luxembourg. Show all the countries where the capital is the same as the name of the country
You can compare two fields by using where db.<collection>.find({"$where":"this.<field1> <<operator>> this.<field2>"})
Where uses JavaScript on each document,this means you are able to call string methods such as .match()
Find the country where the name is the capital city.
pp.pprint(list( db.world.find({"$where":"this.name == this.capital"},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"$where":"this.name == this.capital"},{"name":1,"_id":0})))
The capital of Mexico is Mexico City. Show all the countries where the capital has the country together with the word "City".
Find the country where the capital is the country plus "City".
pp.pprint(list( db.world.find({"$where":"this.capital == this.name+' City'"},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"$where":"this.capital == this.name+' City'"},{"name":1,"_id":0})))
Find the capital and the name where the capital includes the name of the country.
You should include countries like Luxembourg where the capital is Luxembourg, and countries like Mexico where the capital is Mexico City
pp.pprint(list( db.world.find({"$where":"this.capital.match(this.name)"},{"name":1,"_id":0}) ))
pp.pprint(list(db.world.find({"$where":"this.capital.match(this.name)"},{"name":1,"_id":0})))
Find the capital and the name where the capital is an extension of name of the country.
You should include Mexico City as it is longer than Mexico. You should not include Luxembourg as the capital is the same as the country.
.
matches a single character.
*
matches zero or more of the previous character or string if a string is given in brackets, eg: (abc)*
+
matches one or more of the previous character or string.
.*
is the same as '0 or more characters' and .+
is the same as "at least one character"
pp.pprint(list( db.world.find({"$where":"this.capital.match('^'+this.name+'.+$')"},{"name":1,"capital":1,"_id":0}) ))
pp.pprint(list(db.world.find({"$where":"this.capital.match('^'+this.name+'.+$')"},{"name":1,"capital":1,"_id":0})))