python provide module re to deal with regular expressions. It can be used for string search, text manipulation or web scraping. re module provides features which are very similar to perl.
re module provides several useful functions some are: search, match, find, findall, split, sub.
#!/usr/bin/python import re x="this is bitarray.io website, remember io!" z=re.search(r'(.*) web(.*)',x) print z.group(1) print z.group(2) ===output==== this is bitarray.io site, remember io!
In the above example, we create an object “z” with the search function. See the groups created by search function they are quite useful in parsing lines and extracting certain patterns.
search vs match
search function: find something anywhere in the string.
match function: find something at the beginning of the string.
#!/usr/bin/python import re x="this is bitarray.io website, remember io!" z=re.match(r'web(.*)',x) print type(z) print z.group(1) ======output========== <type 'NoneType'> Traceback (most recent call last): File "./p1", line 6, in <module> print z.group(1) AttributeError: 'NoneType' object has no attribute 'group'
it fails because the regex doesn’t match from the beginning of the line!
below example will work. (That’s a big difference between match vs search! – > Important Interview question!)
#!/usr/bin/python import re x="this is bitarray.io website, remember io!" z=re.match(r'(.*)web(.*)',x) print type(z) print z.group(1) ===output=== <type '_sre.SRE_Match'> this is bitarray.io
How it works?
|^||matches start of the string|
|$||matches end of the string|
|*||matches repetitions of previous pattern 0 or more times.|
|+||matches repetitions of previous pattern 1 or more times|
|?||matches repetitions of previous pattern 0 or 1 time.|
|||used to list set of characters.|
|(...)||used to represent a group.|
|\d||matches digits, like \d+ will match continuous digits|
|\D||Matches any non-digit characters|
|\s||matches any white space character, \s+ for multiple whitespaces.|
|\S||matches non-white space characters.|
|.||Matches any character (NOT newline)|
>>> import re >>> x="this is a 56th test" >>> d=re.split("5\d",x) >>> d 'this is a ' >>> d 'th test'
Note: There are some methods available for string types like find, replace, split etc. These methods DO NOT take regex. if you need regex you need to import re module.
what if you want to replace AA with BB in a string?
>>> import re >>> x="this is AA string" >>> x=re.sub("AA","BB",x) >>> print (x) this is BB string
What if the string had multiple occurrences of AA
>>> x="this is AA string AA" >>> x=re.sub("AA","BB",x) >>> print (x) this is BB string BB >>>
to replace the first occurrence only
>>> import re >>> x="this is AA string AA" >>> x=re.sub("AA","BB",x,1) >>> print (x) this is BB string AA
This will create regex object, it can be re-used within code if same type of match/search is required multiple times (regex objects are more efficient). Their behaviour can be altered using flags (like Ignore Case, Locale , Multiline etc).
>>> import re >>> x="This is a 1st test,#1 test for regex" >>> rexobj=re.compile("test") >>> rexobj.search(x) <re.Match object; span=(14, 18), match='test'> >>> if(rexobj.search(x)): print ("Found") ... Found >>>