GGistDev

String Operations in Python

Strings are immutable Unicode sequences with rich manipulation APIs and formatting options.

Creating strings and basics

Use single, double, or triple quotes; raw strings skip escapes.

s = "hello\nworld"
raw = r"path\to\file"
multiline = """Line 1
Line 2"""

Common methods

"  hi ".strip()            # "hi"
"a,b,c".split(",")       # ["a","b","c"]
"test".replace("t", "T")  # "TesT"
"go" * 3                   # "gogogo"
"python".startswith("py")
"py".upper(), "PY".lower(), "Py".swapcase()

Formatting

Prefer f‑strings; use format specifiers for alignment, precision, and number formatting.

name, n = "Ada", 42
f"{name} = {n}"            # f‑string
"{} = {}".format(name, n)  # str.format
"%s = %d" % (name, n)      # legacy

pi = 3.14159
f"pi ≈ {pi:.2f}"            # pi ≈ 3.14
f"{n:04d}"                  # zero‑pad 4 digits

Joining and building

Join with separator; use io.StringIO or list + "".join for many concatenations.

"-".join(["a", "b"])     # "a-b"
from io import StringIO
buf = StringIO(); buf.write("a"); buf.write("b"); s = buf.getvalue()

Searching

s.find("lo")       # index or -1
s.index("lo")      # raises ValueError if not found
s.count("l")

Encoding and bytes

Encode to bytes and decode back; specify encodings explicitly.

b = "café".encode("utf-8")
text = b.decode("utf-8")

Normalization and graphemes

Equal‑looking strings may have different codepoint sequences. Use unicodedata.normalize for canonical forms. User‑perceived characters (graphemes) can span multiple codepoints.

import unicodedata as ud
ud.normalize("NFC", "e\u0301")  # "é"

Regex (preview)

See Regular Expressions section; use re for complex searching and substitution.

Summary

  • Strings are immutable; use methods that return new strings
  • Prefer f‑strings; join efficiently; handle encodings and normalization explicitly