GGistDev

Regular Expressions in Python

Python’s re module provides powerful pattern matching, searching, and substitution.

Basics and flags

Compile with re.compile or use module functions; flags modify behavior.

import re
pattern = re.compile(r"abc", re.IGNORECASE)
bool(pattern.search("AbC"))  # True

Common flags: re.IGNORECASE, re.MULTILINE (^/$ match lines), re.DOTALL (. matches newlines), re.VERBOSE (ignores whitespace/comments in pattern).

Searching and matching

search finds first match; match anchors at start; fullmatch matches the entire string.

re.search(r"\d+", "a1b2").group()   # "1"
re.match(r"ab", "abc") is not None   # True
re.fullmatch(r"\w+", "name")         # Match or None

Groups and named groups

Capture subpatterns for extraction.

m = re.search(r"([A-Z])(\d+)", "A12")
m.group(0), m.group(1), m.group(2)

m = re.search(r"(?P<let>[A-Z])(?P<num>\d+)", "B99")
m.group("let"), m.group("num")

Find all and iterating

findall returns list of matches; finditer yields an iterator of Match objects.

re.findall(r"\d+", "a1 b22")           # ["1","22"]
[m.group() for m in re.finditer(r"\d+", "a1 b22")]

Substitution and splitting

Replace with string or function; split by pattern.

re.sub(r"\s+", " ", "a   b\t c")         # "a b c"
re.sub(r"\d+", lambda m: str(int(m.group())*2), "a1 b2")
re.split(r"[;,]\s*", "a,b; c")

Lookaround and non-greedy

Support for lookahead/behind and lazy quantifiers.

re.search(r"(?<=\$)\d+", "$15").group()  # "15"
re.search(r"<.+?>", "<a><b>").group()     # "<a>"

Escaping and safety

Escape user input before embedding into patterns.

needle = re.escape(user_input)
re.search(fr"\b{needle}\b", text)

Performance tips

  • Precompile frequently used patterns
  • Keep patterns specific; avoid .* when possible
  • Beware catastrophic backtracking; simplify nested quantifiers

Summary

  • Use re for matching, extraction, substitution, and splitting
  • Prefer compiled patterns for reuse; use flags for readability and power