GGistDev

Regular Expressions in Ruby

Powerful pattern matching with concise literals and rich APIs.

Basics and flags

Regex literals use /.../; flags modify behavior. %r{...} helps when the pattern contains /.

/abc/i       # case-insensitive
/./m         # dot matches newlines (multiline)
/\w+/x       # extended: allows whitespace/comments in pattern
%r{https?://[^\s]+}

Inline flags apply mid‑pattern: (?i:...), (?m:...).

Anchors and classes

Use anchors for string boundaries and classes/ranges for sets.

/\Astart/    # beginning of string
/end\z/      # end of string
/^line$/m     # start/end of line in multiline mode
/[A-Za-z0-9_]/
/\d{2,4}/

Matching and MatchData

=~ returns the index or nil. match gives a MatchData with captures.

"Hello" =~ /llo/         # => 2
m = /([A-Z])(\d+)/.match("A12")
m[0]  # => "A12" (whole)
m[1]  # => "A"
m[2]  # => "12"

Named captures improve readability.

m = /(?<letter>[A-Z])(?<num>\d+)/.match("B99")
m[:letter]  # => "B"
m[:num]     # => "99"

Globals and $~ (use sparingly)

After a match: $~ is last MatchData, $1, $2, ... are captures.

/([a-z])+/ =~ "abc"
$1  # => "abc" (last capture group)

Prefer explicit match for clarity.

scan, sub, gsub, split

scan finds all matches; sub/gsub replace; blocks can compute replacements; split splits by regex.

"a1 b22".scan(/\d+/)            # => ["1", "22"]
"color".sub(/or/, "our")       # => "colour"
"a1 b2".gsub(/\d+/) { |d| d.to_i * 2 }  # => "a2 b4"
"a,b; c".split(/[;,]\s*/)      # => ["a","b","c"]

Lookaround and non-greedy

Ruby supports lookahead/behind and lazy quantifiers.

/(?<=\$)\d+/      # digits preceded by $
/\w+(?=:)/         # word before a colon
/<.+?>/            # minimal match between < and >

Escaping and safety

Escape user input before embedding into patterns.

needle = Regexp.escape(user_input)
pattern = /\b#{needle}\b/i

Performance tips

  • Precompile with REGEX = /.../ (avoid building repeatedly)
  • Prefer specific classes/anchors over .*
  • Beware catastrophic backtracking; keep patterns simple

Summary

  • Use /.../ or %r{...}; flags: i case‑insensitive, m dot‑all, x extended
  • match, scan, sub/gsub, split cover most needs; favor named captures and escaping user input