Python Regular Expressions (Regex) 2023

Regular Expressions (Regex): Python makes regular expressions available through the re module.

Regular expressions are combinations of characters that are interpreted as rules for matching substrings. For instance, the expression ‘amount\D+\d+’ will match any string composed by the word amount plus an integral number, separated by one or more non-digits, such as:amount=100, amount is 3, amount is equal to: 33, etc.

Table of Contents

Matching the beginning of a string

The first argument of re.match() is the regular expression, the second is the string to match:

import re
pattern = r"123"
string = "123zzb"
re.match(pattern, string)

Out: <_sre.SRE_Match object; span=(0, 3), match=’123′>

match = re.match(pattern, string)
match.group()

Out: ‘123’

You may notice that the pattern variable is a string prefixed with r, which indicates that the string is a raw string literal.

A raw string literal has a slightly diﬀerent syntax than a string literal, namely a backslash \ in a raw string literal means “just a backslash” and there’s no need for doubling up backlashes to escape “escape sequences” such as newlines (\n), tabs (\t), backspaces (), form-feeds (\r), and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

Hence, r”\n” is a string of 2 characters: \ and n. Regex patterns also use backslashes, e.g. \d refers to any digit character. We can avoid having to double escape our strings (“\d”) by using raw strings (r”\d”).

For instance:

string = "\t123zzb" # here the backslash is escaped, so there's no tab, just '\' and 't' pattern = "\t123" # this will match \t (escaping the backslash) followed by 123 re.match(pattern, string).group() # no match
re.match(pattern, "\t123zzb").group() # matches '\t123'
pattern = r"\t123"
re.match(pattern, string).group() # matches '\t123'

Matching is done from the start of the string only. If you want to match anywhere use re.search instead:

match = re.match(r"(123)", "a123zzb")
match is None

Out: True

match = re.search(r"(123)", "a123zzb")
match.group()i

Out: ‘123’

Searching

pattern = r"(your base)"
sentence = "All your base are belong to us."
match = re.search(pattern, sentence)
match.group(1)

Out: ‘your base’

match = re.search(r"(belong.*)", sentence)
match.group(1)

Out: ‘belong to us.’

Searching is done anywhere in the string unlike re.match. You can also use re.findall.

You can also search at the beginning of the string (use ^),

match = re.search(r"^123", "123zzb")
match.group(0)

Out: ‘123’

match = re.search(r"^123", "a123zzb")
match is None

Out: True

at the end of the string (use $),

match = re.search(r"123$", "zzb123")
match.group(0)

Out: ‘123’

match = re.search(r"123$", "123zzb")
match is None

Out: True

or both (use both ^ and $):

match = re.search(r"^123$", "123")
match.group(0)

Out: ‘123’

Precompiled patterns

import re
precompiled_pattern = re.compile(r"(\d+)")
matches = precompiled_pattern.search("The answer is 41!")
matches.group(1)

Out: 41

matches = precompiled_pattern.search("Or was it 42?")
matches.group(1)

Out: 42

Compiling a pattern allows it to be reused later on in a program. However, note that Python caches recently-used

expressions (docs, SO answer), so “programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions”.

import re
precompiled_pattern = re.compile(r"(.*\d+)")
matches = precompiled_pattern.match("The answer is 41!")
print(matches.group(1))

Out: The answer is 41

matches = precompiled_pattern.match("Or was it 42?")
print(matches.group(1))

Out: Or was it 42

It can be used with re.match().

Flags

For some special cases we need to change the behavior of the Regular Expression, this is done using flags. Flags can be set in two ways, through the flags keyword or directly in the expression.

Flags keyword

Below an example for re.search but it works for most functions in the re module.

m = re.search("b", "ABC")
m is None

Out: True

m = re.search("b", "ABC", flags=re.IGNORECASE)
m.group()

Out: ‘B’

m = re.search("a.b", "A\nBC", flags=re.IGNORECASE)
m is None

Out: True

m = re.search("a.b", "A\nBC", flags=re.IGNORECASE|re.DOTALL)
m.group()

Out: ‘A\nB’

Common Flags
Flag Short Description
, Makes the pattern ignore the case
re.IGNORECASE re.I
, Makes,match everything including newlines
re.DOTALL re.S .

re.MULTILINE, re.M Makes ^ match the begin of a line and $ the end of a line

re.DEBUG Turns on debug information

For the complete list of all available flags check the docs

Inline flags

From the docs:

(?iLmsux) (One or more letters from the set 'i', 'L', 'm', 's', 'u', 'x'.)

The group matches the empty string; the letters set the corresponding flags: re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode dependent), and re.X (verbose), for the entire regular expression. This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

Note that the (?x) flag changes how the expression is parsed. It should be used first in the expression string, or after one or more whitespace characters. If there are non-whitespace characters before the flag, the results are undefined.

Replacing

Replacements can be made on strings using re.sub.

Replacing strings

re.sub(r"t[0-9][0-9]", "foo", "my name t13 is t44 what t99 ever t44")
Out: 'my name foo is foo what foo ever foo'

Using group references

Replacements with a small number of groups can be made as follows:

re.sub(r"t([0-9])([0-9])", r"t\2\1", "t13 t19 t81 t25")

Out: ‘t31 t91 t18 t52’

However, if you make a group ID like ’10’, this doesn’t work: \10 is read as ‘ID number 1 followed by 0’. So you have to be more specific and use the \g notation:

re.sub(r"t([0-9])([0-9])", r"t\g<2>\g<1>", "t13 t19 t81 t25")
Out: 't31 t91 t18 t52'

Using a replacement function

items = ["zero", "one", "two"]
re.sub(r"a[([0-3])]", lambda match: items[int(match.group(1))], "Items: a[0], a[1], something, a[2]")

Out: ‘Items: zero, one, something, two’

Find All Non-Overlapping Matches

re.findall(r"[0-9]{2,3}", "some 1 text 12 is 945 here 4445588899")

Out: [’12’, ‘945’, ‘444’, ‘558’, ‘889’]

Note that the r before “[0-9]{2,3}” tells python to interpret the string as-is; as a “raw” string.

You could also use re.finditer() which works in the same way as re.findall() but returns an iterator with

SRE_Match objects instead of a list of strings:

results = re.finditer(r"([0-9]{2,3})", "some 1 text 12 is 945 here 4445588899")
print(results)
Out: for result in results:
print(result.group(0))
''' Out: 12 945 444 558

889
'''

Checking for allowed characters

If you want to check that a string contains only a certain set of characters, in this case a-z, A-Z and 0-9, you can do so like this,

import re
def is_allowed(string):
characherRegex = re.compile(r'[^a-zA-Z0-9.]')
string = characherRegex.search(string)
return not bool(string)
print (is_allowed("abyzABYZ0099"))

Out: ‘True’

print (is_allowed("#*@#$%^"))

Out: ‘False’

You can also adapt the expression line from [^a-zA-Z0-9.] to [^a-z0-9.], to disallow uppercase letters for example.

Partial credit: https://stackoverflow.com/a/1325265/2697955

Splitting a string using regular expressions

You can also use regular expressions to split a string. For example,

import re
data = re.split(r'\s+', 'James 94 Samantha 417 Scarlett 74')
print( data )

Output: [‘James’, ’94’, ‘Samantha’, ‘417’, ‘Scarlett’, ’74’]

Grouping

Grouping is done with parentheses. Calling group() returns a string formed of the matching parenthesized subgroups.

match.group() # Group without argument returns the entire match found

Out: ‘123’

match.group(0) # Specifying 0 gives the same result as specifying no argument

Out: ‘123’

Arguments can also be provided to group() to fetch a particular subgroup.

From the docs:

If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument.

Calling groups() on the other hand, returns a list of tuples containing the subgroups.

sentence = "This is a phone number 672-123-456-9910"
pattern = r".(phone).?([\d-]+)"
match = re.match(pattern, sentence)
match.groups() # The entire match as a list of tuples of the paranthesized subgroups # Out: ('phone', '672-123-456-9910')
m.group() # The entire match as a string

Out: ‘This is a phone number 672-123-456-9910’

m.group(0) # The entire match as a string

Out: ‘This is a phone number 672-123-456-9910’

m.group(1) # The first parenthesized subgroup.

Out: ‘phone’

m.group(2) # The second parenthesized subgroup.

Out: ‘672-123-456-9910’

m.group(1, 2) # Multiple arguments give us a tuple.
Out: ('phone', '672-123-456-9910')

Named groups

match = re.search(r'My name is (?P[A-Za-z ]+)', 'My name is John Smith')
match.group('name')

Out: ‘John Smith’

match.group(1)

Out: ‘John Smith’

Creates a capture group that can be referenced by name as well as by index.

Non-capturing groups

Using (?:) creates a group, but the group isn’t captured. This means you can use it as a group, but it won’t pollute your “group space”.

re.match(r'(\d+)(+(\d+))?', '11+22').groups()

Out: (’11’, ‘+22′, ’22’)

re.match(r'(\d+)(?:+(\d+))?', '11+22').groups()

Out: (’11’, ’22’)

This example matches 11+22 or 11, but not 11+. This is since the + sign and the second term are grouped. On the other hand, the + sign isn’t captured.

Escaping Special Characters

Special characters (like the character class brackets [ and ] below) are not matched literally:

match = re.search(r'[b]’, ‘a[b]c’)
match.group()

Out: ‘b’

By escaping the special characters, they can be matched literally:
match = re.search(r'[b]', 'a[b]c')
match.group()

match anything in parentheses and “throw it away”

or

match an apple

Out: ‘[b]’

The re.escape() function can be used to do this for you:

re.escape('a[b]c')

Out: ‘a\[b\]c’

match = re.search(re.escape('a[b]c'), 'a[b]c')
match.group()

Out: ‘a[b]c’

The re.escape() function escapes all special characters, so it is useful if you are composing a regular expression based on user input:

username = 'A.C.' # suppose this came from the user
re.findall(r'Hi {}!'.format(username), 'Hi A.C.! Hi ABCD!')

Out: [‘Hi A.C.!’, ‘Hi ABCD!’]

re.findall(r'Hi {}!'.format(re.escape(username)), 'Hi A.C.! Hi ABCD!')

Out: [‘Hi A.C.!’]

Match an expression only in specific locations

Often you want to match an expression only in specific places (leaving them untouched in others, that is). Consider the following sentence:

An apple a day keeps the doctor away (I eat an apple everyday).

Here the “apple” occurs twice which can be solved with so called backtracking control verbs which are supported by the newer regex module. The idea is:

forget_this | or this | and this as well | (but keep this)

With our apple example, this would be:

import regex as re
string = "An apple a day keeps the doctor away (I eat an apple everyday)."
rx = re.compile(r'''
([^()]) (SKIP)(*FAIL)
|
apple
''', re.VERBOSE)
apples = rx.findall(string)
print(apples)

only one

This matches “apple” only when it can be found outside of the parentheses.

Here’s how it works:

While looking from left to right, the regex engine consumes everything to the left, the (SKIP) acts as an “always-true-assertion”. Afterwards, it correctly fails on (FAIL) and backtracks.

Now it gets to the point of (SKIP) from right to left (aka while backtracking) where it is forbidden to go any further to the left. Instead, the engine is told to throw away anything to the left and jump to the point where the (SKIP) was invoked.

Iterating over matches using `re.finditer`

You can use re.finditer to iterate over all matches in a string. This gives you (in comparison to re.findall extra information, such as information about the match location in the string (indexes):

import re
text = 'You can try to find an ant in this string'
pattern = 'an?\w' # find 'an' either with or without a following word character
for match in re.finditer(pattern, text):
Start index of match (integer) sStart = match.start()
Final index of match (integer) sEnd = match.end()
Complete match (string) sGroup = match.group()
Print match
print('Match "{}" found at: [{},{}]'.format(sGroup, sStart,sEnd))
Result:
Match "an" found at: [5,7]
Match "an" found at: [20,22]
Match "ant" found at: [23,26]

Must Read Python Interview Questions

165+ Python Interview Questions & Answers

200+ Python Tutorials With Coding Examples

Python Language Basics Tutorial	Python String Representations of Class Instances
Python For Beginners Tutorial	Python Debugging Tutorial
Python Data Types Tutorial	Reading and Writing CSV File Using Python
Python Indentation Tutorial	Writing to CSV in Python from String/List
Python Comments and Documentation Tutorial	Python Dynamic Code Execution Tutorial
Python Date And Time Tutorial	Python Code Distributing using Pyinstaller
Python Date Formatting Tutorial	Python Data Visualization Tutorial
Python Enum Tutorial	Python Interpreter Tutorial
Python Set Tutorial	Python Args and Kwargs
Python Mathematical Operators Tutorial	Python Garbage Collection Tutorial
Python Bitwise Operators Tutorial	Python Pickle Data Serialisation
Python Bolean Operators Tutorial	Python Binary Data Tutorial
Python Operator Precedance Tutorial	Python Idioms Tutorial
Python Variable Scope And Binding Tutorial	Python Data Serialization Tutorial
Python Conditionals Tutorial	Python Multiprocessing Tutorial
Python Comparisons Tutorial	Python Multithreading Tutorial
Python Loops Tutorial	Python Processes and Threads
Python Arrays Tutorial	Python Concurrency Tutorial
Python Multidimensional Arrays Tutorial	Python Parallel Computation Tutorial
Python List Tutorial	Python Sockets Module Tutorial
Python List Comprehensions Tutorial	Python Websockets Tutorial
Python List Slicing Tutorial	Sockets Encryption Decryption in Python
Python Grouby() Tutorial	Python Networking Tutorial
Python Linked Lists Tutorial	Python http Server Tutorial
Linked List Node Tutorial	Python Flask Tutorial
Python Filter Tutorial	Introduction to Rabbitmq using Amqpstorm Python
Python Heapq Tutorial	Python Descriptor Tutorial
Python Tuple Tutorial	Python Tempflile Tutorial
Python Basic Input And Output Tutorial	Input Subset and Output External Data Files using Pandas in Python
Python Files And Folders I/O Tutorial	Unzipping Files in Python Tutorial
Python os.path Tutorial	Working with Zip Archives in Python
Python Iterables And Iterators Tutorial	gzip in Python Tutorial
Python Functions Tutorial	Stack in Python Tutorial
Defining Functions With List Arguments In Python	Working with Global Interpreter Lock (GIL)
Functional Programming In Python	Python Deployment Tutorial
Partial Functions In Python	Python Logging Tutorial
Decorators Function In Python	Python Server Sent Events Tutorial
Python Classes Tutorial	Python Web Server Gateway Interface (WSGI)
Python Metaclasses Tutorial	Python Alternatives to Switch Statement
Python String Formatting Tutorial	Python Packing and Unpacking Tutorial
Python String Methods Tutorial	Accessing Python Sourcecode and Bytecode
Using Loops Within Functions In Python	Python Mixins Tutorial
Python Importing Modules Tutorial	Python Attribute Access Tutorial
Difference Betweeb Module And Package In Python	Python Arcpy Tutorial
Python Math Module Tutorial	Python Abstract Base Class Tutorial
Python Complex Math Tutorial	Python Plugin and Extension Classes
Python Collections Module Tutorial	Python Immutable Datatypes Tutorial
Python Operator Module Tutorial	Python Incompatibilities Moving from Python 2 to Python 3
Python JSON Module Tutorial	Python 2to3 Tool Tutorial
Python Sqlite3 Module Tutorial	Non-Official Python implementations
Python os Module Tutorial	Python Abstract Syntax Tree
Python Locale Module Tutorial	Python Unicode and Bytes
Python Itertools Module Tutorial	Python Serial Communication (pyserial)
Python Asyncio Module Tutorial	Neo4j and Cypher using Py2Neo
Python Random Module Tutorial	Basic Curses with Python
Python Functools Module Tutorial	Templates in Python
Python dis Module Tutorial	Python Pillow
Python Base64 Module Tutorial	Python CLI subcommands with precise help output
Python Queue Module Tutorial	Python Database Access
Python Deque Module Tutorial	Connecting Python to SQL Server
Python Webbrowser Module Tutorial	Python and Excel
Python tkinter Tutorial	Python Turtle Graphics
Python pyautogui Module Tutorial	Python Persistence
Python Indexing And Slicing Tutorial	Python Design Patterns
Python Plotting With Matplotlib Tutorial	Python hashlib
Python Graph Tool Tutorial	Creating a Windows Service Using Python
Python Generators Tutorial	Mutable vs Immutable (and Hashable) in Python
Python Reduce Tutorial	Python configparser
Python Map Function Tutorial	Python Optical Character Recognition
Python Exponentiation Tutorial	Python Virtual Environments
Python Searching Tutorial	Python Virtual Environment – virtualenv
Sorting Minimum And Maximum In Python	Python Virtual environment with virtualenvwrapper
Python Print Function Tutorial	Create virtual environment with virtualenvwrapper in windows
Python Regular Expressions Regex Tutorial	Python sys Tutorial
Copying Data In Python Tutorial	ChemPy – Python package
Python Context Managers (“with” Statement) Tutorial	Python pygame
Python Name Special Variable Tutorial	Python pyglet
Checking Path Existence And Permissions In Python	Working with Audio in Python
Creating Python Packages Tutorial	Python pyaudio
Usage of pip Module In Python Tutorial	Python shelve
Python PyPi Package Manager Tutorial	IoT Programming with Python and Raspberry PI
Parsing Command Line Arguments In Python	kivy – Cross-platform Python Framework for NUI Development
Python Subprocess Library Tutorial	Pandas Transform
Python setup.py Tutorial	Python vs. JavaScript
Python Recursion Tutorial	Call Python from C#
Python Type Hints Tutorial	Python Writing Extensions
Python Exceptions Tutorial	Python Lex-Yacc
Raise Custom Exceptions In Python	Python Unit Testing
Python Commonwealth Exceptions Tutorial	Python py.test
Python urllib Tutorial	Python Profiling
Web Scraping With Python Tutorial	Python Speed of Program
Python HTML Parsing Tutorial	Python Performance Optimization
Manipulating XML In Python	Python Security and Cryptography
Python Requests Post Tutorial	Secure Shell Connection in Python
Python Distribution Tutorial	Python Anti Patterns
Python Property Objects Tutorial	Python Common Pitfalls
Python Overloading Tutorial	Python Hidden Features
Python Polymorphism Tutorial	Python For Machine Learning
Python Method Overriding Tutorial	Python Interview Questions And Answers For Experienced
Python User Defined Methods Tutorial	Python Coding Interview Questions And Answers

Python Programming Tutorials With Examples

Matching the beginning of a string

Out: <_sre.SRE_Match object; span=(0, 3), match=’123′>

Out: ‘123’

Out: True

Out: ‘123’

Searching

Out: ‘your base’

Out: ‘belong to us.’

Out: ‘123’

Out: True

Out: ‘123’

Out: True

Out: ‘123’

Precompiled patterns

Out: 41

Out: 42

Out: The answer is 41

Out: Or was it 42

Flags

Out: True

Out: ‘B’

Out: True

Out: ‘A\nB’

Replacing

Out: ‘t31 t91 t18 t52’

Out: ‘Items: zero, one, something, two’

Find All Non-Overlapping Matches

Out: [’12’, ‘945’, ‘444’, ‘558’, ‘889’]

Checking for allowed characters

Out: ‘True’

Out: ‘False’

Splitting a string using regular expressions

Output: [‘James’, ’94’, ‘Samantha’, ‘417’, ‘Scarlett’, ’74’]

Grouping

Out: ‘123’

Out: ‘123’

Out: ‘This is a phone number 672-123-456-9910’

Out: ‘This is a phone number 672-123-456-9910’

Out: ‘phone’

Out: ‘672-123-456-9910’

Out: ‘John Smith’

Out: ‘John Smith’

Out: (’11’, ‘+22′, ’22’)

Out: (’11’, ’22’)

Escaping Special Characters

Out: ‘b’

match anything in parentheses and “throw it away”

or

match an apple

Out: ‘[b]’

Out: ‘a\[b\]c’

Out: ‘a[b]c’

Out: [‘Hi A.C.!’, ‘Hi ABCD!’]

Out: [‘Hi A.C.!’]

Match an expression only in specific locations

only one

Iterating over matches using re.finditer

Must Read Python Interview Questions

200+ Python Tutorials With Coding Examples

Other Python Tutorials

Leave a Comment Cancel reply

Iterating over matches using `re.finditer`