Regular Expressions in Python
Python’s re module provides powerful pattern matching, searching, and substitution.
Basics and flags
Compile with re.compile or use module functions; flags modify behavior.
import re
pattern = re.compile(r"abc", re.IGNORECASE)
bool(pattern.search("AbC")) # True
Common flags: re.IGNORECASE, re.MULTILINE (^/$ match lines), re.DOTALL (. matches newlines), re.VERBOSE (ignores whitespace/comments in pattern).
Searching and matching
search finds first match; match anchors at start; fullmatch matches the entire string.
re.search(r"\d+", "a1b2").group() # "1"
re.match(r"ab", "abc") is not None # True
re.fullmatch(r"\w+", "name") # Match or None
Groups and named groups
Capture subpatterns for extraction.
m = re.search(r"([A-Z])(\d+)", "A12")
m.group(0), m.group(1), m.group(2)
m = re.search(r"(?P<let>[A-Z])(?P<num>\d+)", "B99")
m.group("let"), m.group("num")
Find all and iterating
findall returns list of matches; finditer yields an iterator of Match objects.
re.findall(r"\d+", "a1 b22") # ["1","22"]
[m.group() for m in re.finditer(r"\d+", "a1 b22")]
Substitution and splitting
Replace with string or function; split by pattern.
re.sub(r"\s+", " ", "a b\t c") # "a b c"
re.sub(r"\d+", lambda m: str(int(m.group())*2), "a1 b2")
re.split(r"[;,]\s*", "a,b; c")
Lookaround and non-greedy
Support for lookahead/behind and lazy quantifiers.
re.search(r"(?<=\$)\d+", "$15").group() # "15"
re.search(r"<.+?>", "<a><b>").group() # "<a>"
Escaping and safety
Escape user input before embedding into patterns.
needle = re.escape(user_input)
re.search(fr"\b{needle}\b", text)
Performance tips
- Precompile frequently used patterns
- Keep patterns specific; avoid
.*when possible - Beware catastrophic backtracking; simplify nested quantifiers
Summary
- Use
refor matching, extraction, substitution, and splitting - Prefer compiled patterns for reuse; use flags for readability and power