Cookies help us deliver our services. By using our services, you agree to our use of cookies. More information

Difference between revisions of "RegEx Pattern Matching"

From NoSQLZoo
Jump to: navigation, search
m
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<pre class=setup>
 
#ENCODING
 
import io
 
import sys
 
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16')
 
#MONGO
 
from pymongo import MongoClient
 
client = MongoClient()
 
client.progzoo.authenticate('scott','tiger')
 
db = client['progzoo']
 
#PRETTY
 
import pprint
 
pp = pprint.PrettyPrinter(indent=4)
 
</pre>
 
 
 
==Pattern Matching String==
 
==Pattern Matching String==
<p> This tutorial uses RegEx to check names. We will be using find() on the collection world.</p>
+
<p> This tutorial uses RegEx to check names. We will be using <code>find()</code> on the collection <code>world</code>.</p>
 
<div class='extra_space' style='width:1em; height:6em;'></div>
 
<div class='extra_space' style='width:1em; height:6em;'></div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
You can use <code>'$regex':"^B"</code> to get all the countries that start with B.
+
You can use <code>'$regex': "^B"</code> to get all the countries that start with B.
<p class=strong>Find the countries that start with Y</p>
+
<p class="strong">Find the countries that start with Y</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "^Y"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"^F"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"^Y"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
You can use <code>'$regex':"a$"</code> to get all the countries that end with a.
 
You can use <code>'$regex':"a$"</code> to get all the countries that end with a.
<p class=strong>Find the countries that end with Y</p>
+
<p class="strong">Find the countries that end with y</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "y$"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"l$"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"y$"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
Luxembourg has an <b>x</b>, so does one other country, list them both
 
Luxembourg has an <b>x</b>, so does one other country, list them both
<p class=strong>Find the countries that contain the letter x</p>
+
<p class="strong">Find the countries that contain the letter x</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "x"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"ana"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"x"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
Iceland and Switzerland end with <b>land</b> but where are the others?
 
Iceland and Switzerland end with <b>land</b> but where are the others?
<p class=strong>Find the countries that end with land</p>
+
<p class="strong">Find the countries that end with land</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "land$"}}, {"name": 1,"_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"stan$"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"land$"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
Columbia starts with a <b>C</b> and ends with <b>ia</b> - there are two other countries like this.<br/>
 
Columbia starts with a <b>C</b> and ends with <b>ia</b> - there are two other countries like this.<br/>
 
You can use <code>.*</code> to match any amount of any characters except newlines.
 
You can use <code>.*</code> to match any amount of any characters except newlines.
<p class=strong>Find the countries that start with C and end with ia</p>
+
<p class="strong">Find the countries that start with C and end with ia</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "^C.*ia$"}},{"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"^A.*n$"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"^C.*ia$"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
Greece has a double <b>e</b>, who has a double <b>o</b><br/>
+
Greece has a double <b>e</b>, who has a double <b>o</b>?<br/>
<p class=strong>Find the country that has oo in its name</p>
+
<p class="strong">Find the country that has oo in its name</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "oo"}},{"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"ee"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"oo"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
Bahamas has three <b>a</b>, who else?<br/>
 
Bahamas has three <b>a</b>, who else?<br/>
<p class=strong>Find the country that has three or more a in the name</p>
+
<p class="strong">Find the countries that have three or more letter a.</p>
<div class="hint" title="Not getting countries that start with A?">
 
 
<code>[Aa] matches both capital and lowercase A.</code>
 
<code>[Aa] matches both capital and lowercase A.</code>
</div>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "(.*[aA].*){3}"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
<pre class=def>
 
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"^T"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"(.*[aA].*){3}"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
India and Angola have n as their second character.<br/>
 
India and Angola have n as their second character.<br/>
 
<code>.*</code> Indicates zero or more characters, <code>.</code> indicates just one.
 
<code>.*</code> Indicates zero or more characters, <code>.</code> indicates just one.
<p class=strong>Find the countries that have "t" as the second character.</p>
+
<p class="strong">Find the countries that have "t" as the second character.</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "^.t"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"^.n"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"^.t"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
Les<b>o</b>th<b>o</b> and M<b>o</b>ld<b>o</b>va both have two o characters seperated by two other characters.
 
Les<b>o</b>th<b>o</b> and M<b>o</b>ld<b>o</b>va both have two o characters seperated by two other characters.
<p class=strong>Find the countries that have two "o" characters separated by two others.</p>
+
<p class="strong">Find the countries that have two "o" characters separated by two others.</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "o..o"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"o.o"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"o..o"}},{"name":1,"_id":0})))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
Cuba and Togo have four character names.
 
Cuba and Togo have four character names.
<p class=strong>Find the countries that have exactly four characters</p>
+
<p class="strong">Find the countries that have exactly four characters</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"name": {'$regex': "^.{4}$"}}, {"name": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"name":{'$regex':"^Cu.*$"}},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(db.world.find({"name":{'$regex':"^.{4}$"}},{"name":1,"_id":0})))
 
 
</div>
 
</div>
</div>
+
 
==Harder Questions==
+
==Complex Examples==
Well done for getting this far.<br/>
+
<div class="q" data-lang="mongo">
Some optional, more complex questions are provided below.
 
<div class=q data-lang="py3">
 
 
The capital of <b>Luxembourg</b> is <b>Luxembourg</b>. Show all the countries where the capital is the same as the name of the country
 
The capital of <b>Luxembourg</b> is <b>Luxembourg</b>. Show all the countries where the capital is the same as the name of the country
 
<div class="hint" title="How to compare two fields">
 
<div class="hint" title="How to compare two fields">
 
You can compare two fields by using where <br/><code>db.<collection>.find({"$where":"this.<field1> <<operator>> this.<field2>"})</code><br/>
 
You can compare two fields by using where <br/><code>db.<collection>.find({"$where":"this.<field1> <<operator>> this.<field2>"})</code><br/>
Where uses JavaScript on each document,this means you are able to call string methods such as .match()
+
Where uses JavaScript on each document,this means you are able to call string methods such as <code>.match()</code>
</div>
 
<p class=strong>Find the country where the name is the capital city.</p>
 
<pre class=def>
 
pp.pprint(list(
 
    db.world.find({"$where":"this.name == 'Mexico'"},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(
 
    db.world.find({"$where":"this.name == this.capital"},{"name":1,"_id":0})
 
))
 
 
</div>
 
</div>
 +
<p class="strong">Find the country where the name is the capital city.</p>
 +
<pre class="def"><nowiki>db.world.find({"$where":"this.name === this.capital"}, {"name": 1, "capital": 1, "_id": 0}).pretty();</nowiki></pre>
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
 
The capital of <b>Mexico</b> is <b>Mexico City</b>. Show all the countries where the capital has the country together with the word "City".
 
The capital of <b>Mexico</b> is <b>Mexico City</b>. Show all the countries where the capital has the country together with the word "City".
<p class=strong>Find the country where the capital is the country plus "City".</p>
+
<p class="strong">Find the country where the capital is the country plus "City".</p>
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"$where": "this.capital === this.name + ' City'"},{"name": 1, "capital": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"$where":"this.capital == 'Mexico'+' City'"},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(
 
    db.world.find({"$where":"this.capital == this.name+' City'"},{"name":1,"_id":0})
 
))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
<p class=strong>Find the capital and the name where the capital includes the name of the country.</p>
+
<p class="strong">Find the capital and the name where the capital includes the name of the country.</p>
<pre class=def>
+
You should include countries like <b>Luxembourg</b> where the capital is <b>Luxembourg</b>, and countries like <b>Mexico</b> where the capital is <b>Mexico City</b>
pp.pprint(list(
+
<pre class="def"><nowiki>db.world.find({"$where": "this.capital.match(this.name)"}, {"name": 1, "capital": 1, "_id": 0}).pretty();</nowiki></pre>
    db.world.find({"$where":"this.capital.match('Mexico')"},{"name":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(
 
    db.world.find({"$where":"this.capital.match(this.name)"},{"name":1,"_id":0})
 
))
 
</div>
 
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class="q" data-lang="mongo">
<p class=strong>Find the capital and the name where the capital is an extension of name of the country.</p>
+
<p class="strong">Find the capital and the name where the capital is an extension of name of the country.</p>
 
You <i>should</i> include <b>Mexico City</b> as it is longer than <b>Mexico</b>. You <i>should not</i> include <b>Luxembourg</b> as the capital is the same as the country.
 
You <i>should</i> include <b>Mexico City</b> as it is longer than <b>Mexico</b>. You <i>should not</i> include <b>Luxembourg</b> as the capital is the same as the country.
<div class="hint" title="Matching at least one character with wildcards">
+
<div class="hint" title="Useful wildcards">
 
<code>.</code> matches a single character.<br/>
 
<code>.</code> matches a single character.<br/>
 
<code>*</code> matches zero or more of the previous character or string if a string is given in brackets, eg: <code>(abc)*</code><br/>
 
<code>*</code> matches zero or more of the previous character or string if a string is given in brackets, eg: <code>(abc)*</code><br/>
So <code>.*</code> matches <code>n</code> amount of single characters<br/>
+
<code>+</code> matches one or more of the previous character or string.<br/>
To match one or more you can use <code>+</code>
+
<code>.*</code> is the same as '0 or more characters' and <code>.+</code> is the same as "at least one character"
 
</div>  
 
</div>  
<pre class=def>
+
<pre class="def"><nowiki>db.world.find({"$where": "this.capital.match(new RegExp('^' + this.name + '.+$$'))"}, {"name": 1, "capital": 1, "_id": 0}).pretty();</nowiki></pre>
pp.pprint(list(
 
    db.world.find({"$where":"this.capital.match('^'+this.name+'(.*)$')"},{"name":1,"capital":1,"_id":0})
 
))
 
</pre>
 
<div class=ans>
 
pp.pprint(list(
 
    db.world.find({"$where":"this.capital.match('^'+this.name+'.+$')"},{"name":1,"capital":1,"_id":0})
 
))
 
</div>
 
 
</div>
 
</div>

Latest revision as of 19:55, 21 June 2018

Pattern Matching String

This tutorial uses RegEx to check names. We will be using find() on the collection world.

You can use '$regex': "^B" to get all the countries that start with B.

Find the countries that start with Y

db.world.find({"name": {'$regex': "^Y"}}, {"name": 1, "_id": 0}).pretty();

You can use '$regex':"a$" to get all the countries that end with a.

Find the countries that end with y

db.world.find({"name": {'$regex': "y$"}}, {"name": 1, "_id": 0}).pretty();

Luxembourg has an x, so does one other country, list them both

Find the countries that contain the letter x

db.world.find({"name": {'$regex': "x"}}, {"name": 1, "_id": 0}).pretty();

Iceland and Switzerland end with land but where are the others?

Find the countries that end with land

db.world.find({"name": {'$regex': "land$"}}, {"name": 1,"_id": 0}).pretty();

Columbia starts with a C and ends with ia - there are two other countries like this.
You can use .* to match any amount of any characters except newlines.

Find the countries that start with C and end with ia

db.world.find({"name": {'$regex': "^C.*ia$"}},{"name": 1, "_id": 0}).pretty();

Greece has a double e, who has a double o?

Find the country that has oo in its name

db.world.find({"name": {'$regex': "oo"}},{"name": 1, "_id": 0}).pretty();

Bahamas has three a, who else?

Find the countries that have three or more letter a.

[Aa] matches both capital and lowercase A.

db.world.find({"name": {'$regex': "(.*[aA].*){3}"}}, {"name": 1, "_id": 0}).pretty();

India and Angola have n as their second character.
.* Indicates zero or more characters, . indicates just one.

Find the countries that have "t" as the second character.

db.world.find({"name": {'$regex': "^.t"}}, {"name": 1, "_id": 0}).pretty();

Lesotho and Moldova both have two o characters seperated by two other characters.

Find the countries that have two "o" characters separated by two others.

db.world.find({"name": {'$regex': "o..o"}}, {"name": 1, "_id": 0}).pretty();

Cuba and Togo have four character names.

Find the countries that have exactly four characters

db.world.find({"name": {'$regex': "^.{4}$"}}, {"name": 1, "_id": 0}).pretty();

Complex Examples

The capital of Luxembourg is Luxembourg. Show all the countries where the capital is the same as the name of the country

You can compare two fields by using where
db.<collection>.find({"$where":"this.<field1> <<operator>> this.<field2>"})
Where uses JavaScript on each document,this means you are able to call string methods such as .match()

Find the country where the name is the capital city.

db.world.find({"$where":"this.name === this.capital"}, {"name": 1, "capital": 1, "_id": 0}).pretty();

The capital of Mexico is Mexico City. Show all the countries where the capital has the country together with the word "City".

Find the country where the capital is the country plus "City".

db.world.find({"$where": "this.capital === this.name + ' City'"},{"name": 1, "capital": 1, "_id": 0}).pretty();

Find the capital and the name where the capital includes the name of the country.

You should include countries like Luxembourg where the capital is Luxembourg, and countries like Mexico where the capital is Mexico City

db.world.find({"$where": "this.capital.match(this.name)"}, {"name": 1, "capital": 1, "_id": 0}).pretty();

Find the capital and the name where the capital is an extension of name of the country.

You should include Mexico City as it is longer than Mexico. You should not include Luxembourg as the capital is the same as the country.

. matches a single character.
* matches zero or more of the previous character or string if a string is given in brackets, eg: (abc)*
+ matches one or more of the previous character or string.
.* is the same as '0 or more characters' and .+ is the same as "at least one character"

db.world.find({"$where": "this.capital.match(new RegExp('^' + this.name  + '.+$$'))"}, {"name": 1, "capital": 1, "_id": 0}).pretty();