网捷达

This article contains 30 hands-on exercises that build practical Python regex skills using Python’s re module — from quantifiers, character classes, and anchors to data validation, text extraction with re.search(), re.findall(), and re.finditer(), and substitution with re.sub(). Real-world tasks like date reformatting, URL parsing, and email extraction show where regex earns its place.

Each coding challenge includes a Practice Problem, Hint, Solution code, and detailed Explanation, ensuring you don’t just copy code, but genuinely practice and understand how and why it works.

All solutions have been fully tested on Python 3.
Read Python Regex: Python’s RE module for pattern matching with regular expressions.
Use our Online Code Editor to solve these exercises in real time.
Also, Solve Python Exercises: 29 topic-wise exercises with over 800+ coding questions

What you’ll practice:

Basic Matching & Quantifiers: Character classes, sets, and repetition (*, +, ?, {m,n})
Word & Character Boundaries: Using ^, $, \b, and \B
Data Validation & Cleaning: Validating IDs, formatting, and standardizing data
Search & Extraction: Using re.search(), re.findall(), and re.finditer()
String Manipulation: Performing advanced replacements with re.sub()

Who is this for?
Beginner to intermediate Python developers with basic knowledge of Python strings who want practical experience with the re module.

+ Table of Contents (30 Exercises)

Exercise 1: Check Allowed Characters
Exercise 2: Match Zero or More
Exercise 3: Match One or More
Exercise 4: Match Optional Characters
Exercise 5: Match Exact Occurrences
Exercise 6: Match Range of Occurrences
Exercise 7: Find Underscore Joined Lowercase
Exercise 8: PascalCase Match
Exercise 9: Match Start and End
Exercise 10: Match Word at Start
Exercise 11: Match Word at End
Exercise 12: Find a Specific Letter
Exercise 13: Find Letter in Middle
Exercise 14: Match Adjacent Words
Exercise 15: Filter by Starting Letter
Exercise 16: Validate Alphanumeric ID
Exercise 17: Check Starting Number
Exercise 18: Number at End
Exercise 19: Clean IP Addresses
Exercise 20: Convert Date Format
Exercise 21: Extract 1-3 Digit Numbers
Exercise 22: Search Literal Strings
Exercise 23: Find Pattern Location
Exercise 24: Find All Substrings
Exercise 25: Iterate Matches
Exercise 26: Extract Date from URL
Exercise 27: Extract All Numbers
Exercise 28: Extract Email Addresses
Exercise 29: Swap Characters
Exercise 30: Replace Multiple Delimiters

Exercise 1: Check Allowed Characters

Problem Statement: Write a Python program to verify that a string contains only alphanumeric characters (a-z, A-Z, and 0-9).

Purpose: This exercise helps you practice using regular expressions to validate input strings. Checking for allowed characters is a foundational technique used in form validation, data sanitisation, and security-sensitive input handling.

Given Input: text = "Hello123"

Expected Output: Valid: contains only alphanumeric characters

▼ Solution & Explanation

import re

text = "Hello123"

if re.fullmatch(r"[a-zA-Z0-9]+", text):
    print("Valid: contains only alphanumeric characters")
else:
    print("Invalid: contains non-alphanumeric characters")Code language: Python (python)

Explanation:

import re: Loads Python’s built-in regular expression module, which is required for all re functions.
[a-zA-Z0-9]: A character class that matches any single uppercase letter, lowercase letter, or digit.
+: A quantifier meaning one or more of the preceding character class, so the string must not be empty.
re.fullmatch(): Requires the pattern to match the entire string from start to finish. This is stricter than re.search(), which would match even a partial substring.

Exercise 2: Match Zero or More

Problem Statement: Write a Python program to match a string that has an a followed by zero or more bs (e.g., a, ab, abb).

Purpose: This exercise introduces the * quantifier, one of the most commonly used tools in regular expressions. Understanding zero-or-more matching is essential for parsing optional repeated elements in text processing and pattern recognition.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]

See: Python Regex Metacharacters and Operators

Expected Output:

a      -> Match
ab     -> Match
abb    -> Match
abbb   -> Match
b      -> No match
ba     -> No match

▼ Solution & Explanation

import re
pattern = r"ab*"
test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

ab*: Matches a literal a followed by zero or more occurrences of b. The * quantifier means the b is entirely optional but can repeat any number of times.
re.fullmatch(): Ensures the entire string is evaluated against the pattern. Without it, re.search() would match the a inside ba and produce a false positive.
f"{s:<6}": A format specifier that left-aligns the string in a field of width 6, making the output easier to read in a column.
Why b and ba fail: b has no leading a, and ba has the letters in the wrong order, so neither satisfies the ab* pattern.

Exercise 3: Match One or More

Problem Statement: Write a Python program to match a string that has an a followed by one or more bs (e.g., ab, abb, but not a).

Purpose: This exercise demonstrates the + quantifier, which enforces that at least one occurrence of a character must be present. It is a small but critical distinction from * and is widely used when a repeated element is required rather than optional.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]

Expected Output:

a      -> No match
ab     -> Match
abb    -> Match
abbb   -> Match
b      -> No match
ba     -> No match

▼ Solution & Explanation

import re
pattern = r"ab+"
test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

ab+: Matches the letter a followed by one or more bs. The + quantifier requires at least one b to be present, unlike * which allows zero.
Why a now fails: The lone a matched in Exercise 2 because * allowed zero bs. Here, + demands at least one, so a alone is rejected.
re.fullmatch(): Continues to play an important role by preventing partial matches. Without it, re.search(r"ab+", "abXYZ") would incorrectly return a match.
Key distinction: * means zero or more; + means one or more. This single character difference changes whether the repeated element is optional or required.

Exercise 4: Match Optional Characters

Problem Statement: Write a Python program to match a string that has an a followed by zero or one b (i.e., exactly a or ab, nothing else).

Purpose: This exercise introduces the ? quantifier, which marks a character as optional but non-repeating. It is commonly used when parsing elements that may or may not appear, such as an optional sign in a number, an optional prefix, or an optional suffix in a word.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]

Expected Output:

a      -> Match
ab     -> Match
abb    -> No match
abbb   -> No match
b      -> No match
ba     -> No match

▼ Solution & Explanation

import re
pattern = r"ab?"
test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

ab?: Matches the letter a followed by an optional single b. The ? quantifier means zero or one occurrence, so only a and ab are valid.
Why abb fails: The ? quantifier allows at most one b. Two or more bs exceed the limit, so abb and abbb do not match when using re.fullmatch().
Comparison with * and +: All three quantifiers are closely related. ? is 0-1, * is 0 to infinity, and + is 1 to infinity. Choosing the right one depends on how many repetitions are acceptable.
Common use case: The ? quantifier is frequently used in real-world patterns, for example https? matches both http and https in URL validation.

Exercise 5: Match Exact Occurrences

Problem Statement: Write a Python program to match a string that has an a followed by exactly three bs (i.e., only abbb is a valid match).

Purpose: This exercise introduces curly-brace quantifiers, which allow you to specify an exact number of repetitions. Exact-count matching is useful in tasks such as validating fixed-length codes, parsing structured data fields, and enforcing strict formatting rules.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]

Expected Output:

a      -> No match
ab     -> No match
abb    -> No match
abbb   -> Match
abbbb  -> No match
b      -> No match

▼ Solution & Explanation

import re
pattern = r"ab{3}"
test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

b{3}: A curly-brace quantifier that matches the character b repeated exactly three times. It is equivalent to writing bbb explicitly, but is more readable and easier to adjust.
ab{3}: The {3} applies only to the immediately preceding element, which is b. The a is still a single literal character.
Why abbbb fails: re.fullmatch() requires the entire string to match the pattern. Four bs exceed the exact count of three, so the match fails.
Range variant: {m,n} matches between m and n repetitions inclusive. For example, b{2,4} would match bb, bbb, or bbbb. Omitting n as in b{2,} means two or more, which behaves like a bounded version of +.

Exercise 6: Match Range of Occurrences

Problem Statement: Write a Python program to match a string that has an a followed by two to three bs (i.e., abb or abbb).

Purpose: This exercise introduces the range form of curly-brace quantifiers, {m,n}, which lets you set a lower and upper bound on repetitions. Range quantifiers are useful when validating fields that must fall within a length window, such as short codes, postal abbreviations, or bounded identifiers.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]

Expected Output:

a      -> No match
ab     -> No match
abb    -> Match
abbb   -> Match
abbbb  -> No match
b      -> No match

▼ Solution & Explanation

import re

pattern = r"ab{2,3}"
test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

b{2,3}: A range quantifier that matches b repeated at least twice and at most three times. Both bounds are inclusive, so two and three are both accepted.
Why ab fails: A single b falls below the minimum of two required by {2,3}, so ab does not satisfy the pattern.
Why abbbb fails: Four bs exceed the upper bound of three. Because re.fullmatch() requires the entire string to be consumed, the extra b causes the match to fail.
Relationship to other quantifiers: {2,3} is the bounded middle ground between exact matching ({3}) and open-ended matching ({2,}, which means two or more). Choosing the right form depends on how tightly you need to constrain the input.

Exercise 7: Find Underscore Joined Lowercase

Problem Statement: Write a Python program to find sequences of lowercase letters joined with an underscore (e.g., hello_world).

Purpose: This exercise practises matching multi-part patterns that involve a separator character between word segments. Recognising underscore-joined identifiers is directly applicable to parsing Python variable names, snake_case tokens in configuration files, and structured log fields.

Given Input: test_strings = ["hello_world", "foo_bar", "hello", "hello_", "_world", "Hello_world", "hello_World"]

Expected Output:

hello_world  -> Match
foo_bar      -> Match
hello        -> No match
hello_       -> No match
_world       -> No match
Hello_world  -> No match
hello_World  -> No match

▼ Solution & Explanation

import re

pattern = r"[a-z]+_[a-z]+"
test_strings = ["hello_world", "foo_bar", "hello", "hello_", "_world", "Hello_world", "hello_World"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<12} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

[a-z]+: A character class that matches one or more lowercase ASCII letters. The + ensures at least one letter must appear on each side of the underscore.
_: A literal underscore acting as the required separator between the two lowercase word segments.
Why hello_ and _world fail: The pattern requires at least one lowercase letter both before and after the underscore. A trailing or leading underscore with nothing on the other side leaves one side unsatisfied.
Why Hello_world and hello_World fail: The character class [a-z] matches only lowercase letters. An uppercase letter anywhere in the string causes re.fullmatch() to return no match.

Exercise 8: PascalCase Match

Problem Statement: Write a Python program to find sequences of one uppercase letter followed by lowercase letters (e.g., Hello, World, Python).

Purpose: This exercise practises combining character classes to enforce a strict positional rule: one thing here, something else there. Matching PascalCase or title-case words is a common requirement when parsing names, class identifiers, or capitalised tokens in natural language processing.

Given Input: test_strings = ["Hello", "World", "python", "HELLO", "Hello123", "H", "Ha"]

Expected Output:

Hello    -> Match
World    -> Match
python   -> No match
HELLO    -> No match
Hello123 -> No match
H        -> No match
Ha       -> Match

▼ Solution & Explanation

import re
pattern = r"[A-Z][a-z]+"
test_strings = ["Hello", "World", "python", "HELLO", "Hello123", "H", "Ha"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<8} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

[A-Z]: Matches exactly one uppercase ASCII letter. No quantifier is attached, so it cannot match zero or two uppercase letters.
[a-z]+: Matches one or more lowercase letters immediately after the uppercase one. The + ensures the string cannot end at just the capital letter.
Why H fails: The + on [a-z] requires at least one lowercase letter to follow the capital. A lone uppercase letter does not satisfy the pattern.
Why HELLO fails: After matching the first H with [A-Z], the pattern expects lowercase letters. The remaining characters ELLO are uppercase, so [a-z]+ finds nothing to match and the overall match fails.
Why Hello123 fails: re.fullmatch() requires the entire string to be consumed. After matching Hello, the digits 123 remain unmatched, causing the full match to fail.

Exercise 9: Match Start and End

Problem Statement: Write a Python program to match a string that starts with a, ends with b, and has any characters in between (e.g., a123b, axyzb).

Purpose: This exercise introduces the dot . wildcard and the use of anchors ^ and $ together with re.match() and re.fullmatch(). Matching by a known start and end while allowing arbitrary content in between is a practical technique used in file extension checks, protocol parsing, and delimiter-bounded field extraction.

Given Input: test_strings = ["a123b", "axyzb", "ab", "a b", "ab ", "b123a", "a123"]

Refer: Python regex re.match() for pattern matching

Expected Output:

a123b  -> Match
axyzb  -> Match
ab     -> Match
a b    -> Match
ab     -> No match (trailing space)
b123a  -> No match
a123   -> No match

▼ Solution & Explanation

import re

pattern = r"a.*b"
test_strings = ["a123b", "axyzb", "ab", "a b", "ab ", "b123a", "a123"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

.: The dot wildcard matches any single character except a newline. It does not represent a literal period; to match a literal dot you would need to escape it as \..
.*: Combines the dot with the * quantifier to match zero or more of any character. This allows the middle section of the string to be empty, one character, or arbitrarily long.
Why ab matches: The .* portion matches zero characters, so a immediately followed by b satisfies the pattern.
Why ab (trailing space) fails: re.fullmatch() requires the entire string to be consumed. The trailing space is not part of the pattern, so the match fails. This highlights how re.fullmatch() is stricter than re.search() or re.match() for boundary checking.
Greedy behaviour: By default .* is greedy and will match as many characters as possible while still allowing the overall pattern to succeed. In this pattern it consumes everything up to the last b in the string.

Exercise 10: Match Word at Start

Problem Statement: Write a Python program to match a specific word only if it appears at the very beginning of a string.

Purpose: This exercise introduces the caret anchor ^, which asserts that a match must occur at the start of the string. Start-of-string anchoring is essential in command parsing, log processing, and any situation where the position of a token in a line carries meaning.

Given Input: test_strings = ["Hello world", "Hello", "Say Hello", "hello world", "HelloWorld"]

Expected Output:

Refer: Python regex search

Hello world  -> Match
Hello        -> Match
Say Hello    -> No match
hello world  -> No match
HelloWorld   -> No match

▼ Solution & Explanation

import re
pattern = r"^Hello\b"
test_strings = ["Hello world", "Hello", "Say Hello", "hello world", "HelloWorld"]
for s in test_strings:
    result = re.search(pattern, s)
    print(f"{s:<12} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

^: The start-of-string anchor. It does not consume any characters; it simply asserts that the next part of the pattern must begin at position zero of the string.
Hello: A literal sequence of five characters. The match is case-sensitive by default, so hello (all lowercase) does not satisfy this part of the pattern.
\b: A word boundary assertion. It matches the position between a word character and a non-word character, ensuring that Hello is treated as a complete word. Without it, HelloWorld would also match because Hello appears at the start.
Why re.search() is used here instead of re.fullmatch(): The goal is only to check the beginning of the string. The string may legitimately contain more content after the word, as in Hello world. Using re.fullmatch() would incorrectly reject those valid strings.
Why Say Hello fails: Although Hello appears in the string, it is not at the start. The ^ anchor fails at position zero because the string begins with S, not H.

Exercise 11: Match Word at End

Problem Statement: Write a Python program to match a specific word only if it appears at the end of a string, ignoring any optional trailing punctuation.

Purpose: This exercise introduces the dollar anchor $ and combines it with an optional character class to handle real-world strings that may end with punctuation. End-of-string anchoring is commonly used in sentence parsing, command validation, and log line analysis where the final token carries meaning.

Given Input: test_strings = ["I love Python", "Python is great", "I love Python!", "python", "I love Python."]

Expected Output:

I love Python   -> Match
Python is great -> No match
I love Python!  -> Match
python          -> No match
I love Python.  -> Match

▼ Solution & Explanation

import re

pattern = r"\bPython[.,!?]?$"
test_strings = ["I love Python", "Python is great", "I love Python!", "python", "I love Python."]

for s in test_strings:
    result = re.search(pattern, s)
    print(f"{s:<16} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

\b: A word boundary assertion placed before Python to ensure the match begins at a word edge and does not accidentally match a longer token like CPython.
Python: A literal, case-sensitive sequence. The lowercase string python does not match because regular expressions are case-sensitive by default.
[.,!?]?: A character class covering common punctuation marks, made optional by ?. This allows the word to be followed by at most one punctuation character before the end of the string.
$: The end-of-string anchor. It asserts that nothing may follow the matched content, so Python is great fails because Python is not at the end.
Why re.search() is used: The target word is preceded by other content in most test strings. re.search() scans the entire string for a match at any position, while still respecting the $ anchor to enforce the end-of-string constraint.

Exercise 12: Find a Specific Letter

Problem Statement: Write a Python program to find all words in a string that contain the letter z.

Purpose: This exercise practises using re.findall() to extract multiple matches from a string in a single call. Scanning text for words that contain a specific character is a foundational technique in search tools, spell checkers, and vocabulary analysis.

Given Input: text = "The pizza was amazing but the fizz and buzz were too loud"

Expected Output: ['pizza', 'amazing', 'fizz', 'buzz']

Refer: Python regex find all matches

▼ Solution & Explanation

import re

text = "The pizza was amazing but the fizz and buzz were too loud"
pattern = r"\b\w*z\w*\b"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

\w*: Matches zero or more word characters on either side of the z. Using * rather than + ensures the pattern also captures words where z appears at the very start or end, such as fizz or buzz.
z: The literal character being searched for. Because the match is case-sensitive by default, uppercase Z would not be captured. To include both cases you could use [zZ] or pass the re.IGNORECASE flag.
\b on both sides: Word boundary assertions ensure the pattern matches complete words only. Without them, the pattern could return partial matches from within longer tokens.
re.findall(): Scans the entire string from left to right and returns a list of all non-overlapping matches. This is more concise than manually looping over words and checking each one with re.search().

Exercise 13: Find Letter in Middle

Problem Statement: Write a Python program to find words containing the letter z, but only if the z is not at the start or end of the word.

Purpose: This exercise builds on the previous one by adding positional constraints within a word. Requiring at least one character on both sides of a target letter is a practical technique used in linguistic pattern matching, morphological analysis, and filtering tokens by internal structure.

Given Input: text = "The pizza was amazing but the fizz and buzz were too loud"

Expected Output: ['pizza', 'amazing']

▼ Solution & Explanation

import re

text = "The pizza was amazing but the fizz and buzz were too loud"
pattern = r"\b\w+z\w+\b"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

\w+ before z: Requires at least one word character to precede the z. This eliminates words where z is the first letter, since there would be nothing to satisfy the + quantifier before it.
\w+ after z: Requires at least one word character to follow the z. This eliminates words like fizz and buzz where z is the final character.
Contrast with Exercise 12: Swapping \w* for \w+ on both sides is the only change needed. The * quantifier (zero or more) allowed z at any position; the + quantifier (one or more) forces it into a strictly interior position.
Why fizz and buzz are excluded: In both words the z characters appear at the end. After matching the final z, there are no remaining word characters to satisfy \w+, so these words are correctly filtered out.

Exercise 14: Match Adjacent Words

Problem Statement: Write a Python program to match if two consecutive words in a sentence both start with the letter P.

Purpose: This exercise practises matching multi-token patterns separated by whitespace. Detecting adjacent words that share a property is useful in natural language processing tasks such as identifying repeated initials, alliterative phrases, and consecutive proper nouns.

Given Input: test_strings = ["Peter Parker is here", "Paul and Peter met", "Pretty Please", "Python Programming is fun", "No match here"]

Expected Output:

Peter Parker is here      -> Match: Peter Parker
Paul and Peter met        -> No match
Pretty Please             -> Match: Pretty Please
Python Programming is fun -> Match: Python Programming
No match here             -> No match

▼ Solution & Explanation

import re

pattern = r"P\w*\s+P\w*"
test_strings = [
    "Peter Parker is here",
    "Paul and Peter met",
    "Pretty Please",
    "Python Programming is fun",
    "No match here"
]

for s in test_strings:
    result = re.search(pattern, s)
    if result:
        print(f"{s:<26} -> Match: {result.group()}")
    else:
        print(f"{s:<26} -> No match")Code language: Python (python)

Explanation:

P\w*: Matches any word that begins with an uppercase P, followed by zero or more word characters. Using * rather than + means the single letter P on its own would also be a valid match.
\s+: Matches one or more whitespace characters between the two words. Using + rather than a literal space handles edge cases such as multiple spaces or a tab character separating the words.
result.group(): Returns the exact substring that was matched. This makes the output more informative by showing precisely which consecutive pair was found.
Why Paul and Peter met fails: Although both Paul and Peter start with P, they are not consecutive. The word and sits between them, breaking the adjacency required by the pattern.
Case sensitivity note: The pattern only matches words starting with uppercase P. To also match lowercase p, you could use [Pp]\w* or pass the re.IGNORECASE flag to re.search().

Exercise 15: Filter by Starting Letter

Problem Statement: Write a Python program to find all words starting with either a or e in a given string.

Purpose: This exercise practises using alternation inside a character class to match multiple possible starting characters. Filtering words by their first letter is a common requirement in text analysis, vocabulary sorting, concordance building, and educational language tools.

Given Input: text = "an eagle soared above the endless empty arena every afternoon"

Expected Output: ['an', 'eagle', 'above', 'endless', 'empty', 'arena', 'every', 'afternoon']

▼ Solution & Explanation

import re

text = "an eagle soared above the endless empty arena every afternoon"
pattern = r"\b[ae]\w*"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

\b: A word boundary placed at the start of the pattern ensures that matching begins only at the edge of a word. Without it, the pattern could match e or a appearing in the interior of a longer word.
[ae]: A character class that matches either the lowercase letter a or the lowercase letter e. This is more concise than the alternation operator (a|e) for single-character options and integrates naturally with the rest of the pattern.
\w*: Matches zero or more word characters following the initial letter, capturing the full remainder of the word. Using * means single-letter words like a are also captured if they appear in the text.
re.findall(): Returns every non-overlapping match as a plain list of strings. Because the pattern contains no capturing groups, each element in the list is the full matched word rather than a tuple.
Extending the pattern: To match words starting with any vowel, expand the character class to [aeiou]. To make the match case-insensitive and include uppercase initials, either use [aeAE] or pass re.IGNORECASE as a flag to re.findall().

Exercise 16: Validate Alphanumeric ID

Problem Statement: Write a Python program to match a string that contains only uppercase letters, lowercase letters, numbers, and underscores, with no spaces or special characters allowed.

Purpose: This exercise practises building strict allowlist patterns for input validation. Alphanumeric-plus-underscore strings are the standard format for identifiers in most programming languages, database column names, and API keys. Being able to validate this format reliably is a foundational defensive-programming skill.

Given Input: test_strings = ["user_123", "User_Name", "invalid id", "bad-char!", "_leadingUnderscore", "ALL_CAPS_99"]

Expected Output:

user_123           -> Valid
User_Name          -> Valid
invalid id         -> Invalid
bad-char!          -> Invalid
_leadingUnderscore -> Valid
ALL_CAPS_99        -> Valid

▼ Solution & Explanation

import re

pattern = r"\w+"
test_strings = ["user_123", "User_Name", "invalid id", "bad-char!", "_leadingUnderscore", "ALL_CAPS_99"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<20} -> {'Valid' if result else 'Invalid'}")Code language: Python (python)

Explanation:

\w: A shorthand character class that is equivalent to [a-zA-Z0-9_]. It matches any uppercase or lowercase letter, any digit, and the underscore. It does not match spaces, hyphens, punctuation, or any other special character.
\w+: Requires at least one word character. An empty string would not match, which is typically the correct behaviour for an identifier validator.
re.fullmatch(): Enforces that every character in the string belongs to \w. A single disallowed character anywhere in the string, such as the space in invalid id or the hyphen in bad-char!, causes the entire match to fail.
Why _leadingUnderscore is valid: The underscore is part of the \w character class, so a leading underscore is perfectly acceptable. This aligns with Python’s own identifier rules, where _name is a valid variable name.
Alternative approach: You could write the explicit character class [a-zA-Z0-9_]+ instead of \w+. Both are equivalent in standard ASCII contexts, but \w is shorter and more idiomatic.

Exercise 17: Check Starting Number

Problem Statement: Write a Python program to verify if a string starts with a specific number.

Purpose: This exercise practises anchoring a numeric pattern at the start of a string. Detecting a specific leading number is useful in tasks such as validating version strings, parsing log lines that begin with a status code, and routing input based on a numeric prefix.

Given Input: test_strings = ["42 is the answer", "42", "The answer is 42", "420 wide", "142 steps"], target number: 42

Expected Output:

42 is the answer -> Match
42               -> Match
The answer is 42 -> No match
420 wide         -> No match
142 steps        -> No match

▼ Solution & Explanation

import re

target = "42"
pattern = rf"^{target}\b"
test_strings = ["42 is the answer", "42", "The answer is 42", "420 wide", "142 steps"]

for s in test_strings:
    result = re.search(pattern, s)
    print(f"{s:<17} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

rf"^{target}\b": An f-string prefixed with both r and f. The r prefix treats backslashes as raw characters (needed for \b), and the f prefix allows the {target} variable to be interpolated into the pattern at runtime. This makes the pattern reusable for any target number without editing the regex directly.
^: Anchors the match to the very start of the string. Strings like The answer is 42 and 142 steps fail immediately because their first character is not a digit matching the target.
\b after the number: A word boundary that prevents the pattern from matching 42 at the start of 420. Without this, 420 wide would incorrectly be reported as a match because its first two characters are 42.
Why 142 steps fails: The ^ anchor requires the match to start at position zero. The string begins with 1, not 4, so the pattern fails before consuming any characters.

Exercise 18: Number at End

Problem Statement: Write a Python program to check if a string ends with a number.

Purpose: This exercise practises anchoring a numeric pattern at the end of a string using the $ anchor combined with \d. Detecting a trailing number is useful when processing filenames with version suffixes, log entries that end with a numeric code, or any structured string where a numeric tail carries meaning.

Given Input: test_strings = ["version 2", "file_backup_3", "hello", "order 99b", "track5", "2024"]

Expected Output:

version 2     -> Ends with a number
file_backup_3 -> Ends with a number
hello         -> Does not end with a number
order 99b     -> Does not end with a number
track5        -> Ends with a number
2024          -> Ends with a number

▼ Solution & Explanation

import re

pattern = r"\d+$"
test_strings = ["version 2", "file_backup_3", "hello", "order 99b", "track5", "2024"]

for s in test_strings:
    result = re.search(pattern, s)
    if result:
        print(f"{s:<14} -> Ends with a number")
    else:
        print(f"{s:<14} -> Does not end with a number")Code language: Python (python)

Explanation:

\d: A shorthand character class that matches any single decimal digit, equivalent to [0-9]. It does not match letters, underscores, or any other character.
\d+: Matches one or more consecutive digits. Using + means the pattern captures the full trailing numeric run (e.g., 99 in a future input) rather than just the last digit, which is useful if you later want to extract the value via result.group().
$: Anchors the match so the digit sequence must appear at the very end of the string. Combined with re.search(), the engine scans the string for a digit run that terminates exactly at the last position.
Why order 99b fails: The string ends with the letter b, not a digit. Even though digits appear earlier in the string, the $ anchor requires the final character to satisfy \d.
Extracting the number: If you need the trailing number itself rather than just confirming its presence, replace the print statement with print(result.group()) to display the matched digit sequence.

Exercise 19: Clean IP Addresses

Problem Statement: Write a Python program to remove leading zeros from each segment of an IP address (e.g., convert 192.168.001.001 to 192.168.1.1).

Purpose: This exercise introduces re.sub() with a callable replacement function, a powerful technique that goes beyond simple string substitution. Normalising IP address segments is a practical data-cleaning task encountered in network log processing, configuration file parsing, and input sanitisation pipelines.

Given Input: ip_addresses = ["192.168.001.001", "010.000.000.001", "255.255.255.000", "192.168.1.1"]

Expected Output:

192.168.001.001 -> 192.168.1.1
010.000.000.001 -> 10.0.0.1
255.255.255.000 -> 255.255.255.0
192.168.1.1     -> 192.168.1.1

Refer: Python re.sub() regex replace

▼ Solution & Explanation

import re

ip_addresses = ["192.168.001.001", "010.000.000.001", "255.255.255.000", "192.168.1.1"]

def remove_leading_zeros(ip):
    return re.sub(r"\d+", lambda m: str(int(m.group())), ip)

for ip in ip_addresses:
    cleaned = remove_leading_zeros(ip)
    print(f"{ip:<16} -> {cleaned}")Code language: Python (python)

Explanation:

re.sub(pattern, repl, string): Finds every non-overlapping match of pattern in string and replaces each one with the value returned by repl. When repl is a callable rather than a plain string, it receives the match object as its argument and its return value is used as the replacement text.
\d+: Matches each run of one or more consecutive digits. Because the dot separators in the IP address are not digits, re.sub() naturally processes each of the four segments independently without needing to split the string manually.
lambda m: str(int(m.group())): The replacement function receives a match object m for each segment. m.group() returns the matched text (e.g., "001"), int() converts it to an integer (dropping leading zeros), and str() converts it back to a string for substitution.
Why 192.168.1.1 is unchanged: The segments 192, 168, 1, and 1 have no leading zeros, so converting them to int and back to str produces the same value. re.sub() handles already-clean inputs safely.
Alternative without a lambda: You could split on ., apply str(int(seg)) to each part in a list comprehension, and rejoin with .. The re.sub() approach is more concise and generalises better to less-regular input formats.

Exercise 20: Convert Date Format

Problem Statement: Write a Python program to convert a date string from yyyy-mm-dd format to dd-mm-yyyy format.

Purpose: This exercise introduces capturing groups in re.sub(), one of the most practical regex techniques for restructuring text. Reformatting dates is a ubiquitous data-wrangling task in ETL pipelines, report generation, and any system that exchanges data between regions with different date conventions.

Given Input: dates = ["2024-01-15", "1999-12-31", "2000-07-04", "2024-11-05"]

Expected Output:

2024-01-15 -> 15-01-2024
1999-12-31 -> 31-12-1999
2000-07-04 -> 04-07-2000
2024-11-05 -> 05-11-2024

▼ Solution & Explanation

import re

dates = ["2024-01-15", "1999-12-31", "2000-07-04", "2024-11-05"]
pattern = r"(\d{4})-(\d{2})-(\d{2})"
replacement = r"\3-\2-\1"

for date in dates:
    converted = re.sub(pattern, replacement, date)
    print(f"{date} -> {converted}")Code language: Python (python)

Explanation:

(\d{4}): The first capturing group. It matches exactly four consecutive digits and captures them as group 1, representing the year portion of the date.
(\d{2}): Used twice: once for the month (group 2) and once for the day (group 3). Each matches exactly two consecutive digits, corresponding to the zero-padded month and day values in the source format.
Backreferences in the replacement string: \1, \2, and \3 refer to the text captured by the first, second, and third groups respectively. Writing \3-\2-\1 reorders them to day-month-year without any manual string slicing.
Zero-padding is preserved: Because the groups capture the raw digit strings rather than converting them to integers, leading zeros in the month and day (e.g., 01, 07) are carried over unchanged into the output. This is the correct behaviour for date formatting.
Named groups as an alternative: For improved readability in complex patterns, you can use named groups: (?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2}) and reference them in the replacement as \g<day>-\g<month>-\g<year>. Named groups make the intent of each captured segment self-documenting.

Exercise 21: Extract 1-3 Digit Numbers

Problem Statement: Write a Python program to search a string and extract all numbers that are between 1 and 3 digits long.

Purpose: This exercise practises combining numeric patterns with range quantifiers and word boundaries to extract only the numbers that satisfy a length constraint. Selective numeric extraction is commonly needed in data parsing tasks where very large numbers (such as timestamps or IDs) must be excluded from results that are intended to capture shorter codes, counts, or scores.

Given Input: text = "There are 3 cats, 12 dogs, 500 fish, 1000 birds, and 42 turtles in the sanctuary"

Expected Output: ['3', '12', '500', '42']

▼ Solution & Explanation

import re

text = "There are 3 cats, 12 dogs, 500 fish, 1000 birds, and 42 turtles in the sanctuary"
pattern = r"\b\d{1,3}\b"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

\d{1,3}: A range quantifier that matches a consecutive run of digits with a minimum length of one and a maximum of three. On its own, without boundaries, this is a greedy sub-pattern that will match within any larger digit sequence.
\b on both sides: Word boundaries are essential here. They assert that the digit run must be bordered by a non-word character (or the start or end of the string) on each side. This prevents 1000 from being partially matched as 100 and correctly excludes it from the results entirely.
Why 1000 is excluded: The number 1000 has four digits. The opening \b anchors the pattern at the start of the digit run, and \d{1,3} can only match up to three of those digits. The closing \b then finds itself between two digit characters, which is not a word boundary, so the match fails for the whole token.
re.findall(): Returns each match as a plain string rather than a match object. Because the pattern contains no capturing groups, the full matched text is returned for each hit, giving a clean list of number strings that can be converted to integers with list(map(int, matches)) if needed.

Exercise 22: Search Literal Strings

Problem Statement: Write a Python program to search for a set of specific literal strings within a larger text and report which ones are found and where.

Purpose: This exercise introduces the alternation operator |, which allows a single pattern to match any one of several fixed strings. Searching for multiple literals simultaneously is more efficient than running separate searches and is widely used in keyword filtering, content moderation, and log scanning.

Given Input: text = "The quick brown fox jumps over the lazy dog", target words: fox and dog

Expected Output:

Found "fox" at index 16-19
Found "dog" at index 40-43

▼ Solution & Explanation

import re

text = "The quick brown fox jumps over the lazy dog"
pattern = r"\b(fox|dog)\b"

for match in re.finditer(pattern, text):
    print(f'Found "{match.group()}" at index {match.start()}-{match.end()}')Code language: Python (python)

Explanation:

fox|dog: The alternation operator | instructs the regex engine to attempt matching fox first and, if that fails at the current position, to attempt dog. The alternatives are evaluated left to right. Any number of alternatives can be chained with additional | characters.
Parentheses around the alternatives: The grouping (fox|dog) ensures the | operator applies only between fox and dog, not to any surrounding pattern elements. Without parentheses, a pattern like \bfox|dog\b would be parsed as (\bfox) or (dog\b), producing incorrect boundary behaviour.
re.finditer(): Returns an iterator of match objects rather than a list of strings. This gives access to positional metadata for each hit without storing all matches in memory at once, which matters for large texts.
match.start() and match.end(): Return the start index (inclusive) and end index (exclusive) of the matched substring within the original string. For fox in this text, start() returns 16 and end() returns 19, meaning the match spans characters at positions 16, 17, and 18.

Exercise 23: Find Pattern Location

Problem Statement: Write a Python program to find a literal string in a text and return its exact starting and ending index position.

Purpose: This exercise focuses on using re.search() to locate a single pattern and then extracting precise position information from the resulting match object. Knowing the exact span of a match is essential in text editors, syntax highlighters, and any tool that needs to annotate or replace a specific region of a string.

Given Input: text = "The quick brown fox jumps over the lazy dog", target: "brown fox"

Expected Output: Found "brown fox" at start=10, end=19

▼ Solution & Explanation

import re

text = "The quick brown fox jumps over the lazy dog"
target = "brown fox"
pattern = re.escape(target)

match = re.search(pattern, text)
if match:
    print(f'Found "{match.group()}" at start={match.start()}, end={match.end()}')
else:
    print(f'"{target}" not found in the text')Code language: Python (python)

Explanation:

re.escape(target): Escapes any characters in the target string that have special meaning in a regular expression, such as ., *, +, or (. For the input "brown fox" this makes no visible difference, but it is the correct practice whenever the search term comes from user input or an external source where special characters cannot be guaranteed absent.
re.search(): Scans the entire string from left to right and returns the first match object it finds, or None if the pattern is not present. Unlike re.match(), it does not restrict the search to the start of the string.
match.start() and match.end(): Return the zero-based start and end positions of the matched substring. The end index is exclusive, following Python’s standard slice convention. For "brown fox", which begins at position 10, the end value is 19 because the substring occupies indices 10 through 18.
match.span() as an alternative: Calling match.span() returns the tuple (start, end) in a single call. This is convenient when you need to pass the position to another function or unpack it with start, end = match.span().

Exercise 24: Find All Substrings

Problem Statement: Write a Python program to find all occurrences of a specific substring within a string using re.findall().

Purpose: This exercise demonstrates how re.findall() handles repeated occurrences of the same substring and builds familiarity with its return behaviour. Counting and collecting all occurrences of a substring is a routine operation in text analysis, frequency counting, and search-and-highlight features.

Given Input: text = "cat and cattle and catfish and catch and tomcat", target: "cat"

Expected Output:

Occurrences of "cat": ['cat', 'cat', 'cat', 'cat', 'cat']
Total count: 5

▼ Solution & Explanation

import re

text = "cat and cattle and catfish and catch and tomcat"
target = "cat"
pattern = re.escape(target)

matches = re.findall(pattern, text)
print(f'Occurrences of "{target}": {matches}')
print(f"Total count: {len(matches)}")Code language: Python (python)

Explanation:

re.escape(target): Wraps the target string to neutralise any regex metacharacters it might contain. For "cat" this has no effect, but it makes the code robust against targets like "c.t", which without escaping would be interpreted as a regex pattern rather than a literal string.
No word boundaries by design: This exercise deliberately omits \b to perform a raw substring search. All five occurrences of "cat" are captured regardless of whether they appear as standalone words (cat), prefixes (cattle, catfish, catch), or suffixes (tomcat). Compare this with Exercise 12, where \b was used to restrict matches to complete words.
re.findall() return value: When the pattern contains no capturing groups, re.findall() returns a list of the matched strings themselves. Every element is identical here because the pattern is a fixed literal, but the list length directly encodes the frequency of the substring.
len(matches): A straightforward way to obtain the occurrence count without a separate loop or counter variable. It is equivalent to text.count(target) for plain substring counting, but the regex approach scales to more complex patterns where str.count() is not applicable.

Exercise 25: Iterate Matches

Problem Statement: Write a Python program to find the occurrence and position of all matches of a substring within a string using re.finditer().

Purpose: This exercise demonstrates how re.finditer() differs from re.findall() by returning full match objects rather than plain strings. Having access to the position of every occurrence, alongside the matched text itself, is essential in tools that need to annotate, highlight, or replace matches at precise locations within a document.

Given Input: text = "cat and cattle and catfish and catch and tomcat", target: "cat"

Expected Output:

Match 1: "cat" found at position 0-3
Match 2: "cat" found at position 8-11
Match 3: "cat" found at position 19-22
Match 4: "cat" found at position 30-33
Match 5: "cat" found at position 43-46

Refer: Python regex capturing groups

▼ Solution & Explanation

import re

text = "cat and cattle and catfish and catch and tomcat"
target = "cat"
pattern = re.escape(target)

for i, match in enumerate(re.finditer(pattern, text), start=1):
    print(f'Match {i}: "{match.group()}" found at position {match.start()}-{match.end()}')Code language: Python (python)

Explanation:

re.finditer(): Returns a lazy iterator of match objects rather than materialising all results into a list at once. This is more memory-efficient than re.findall() for large texts, because each match object is produced and processed one at a time.
enumerate(..., start=1): Wraps the iterator to produce (counter, match_object) pairs. The start=1 argument makes the counter begin at 1 instead of 0, which reads more naturally in human-facing output like Match 1, Match 2, and so on.
match.group(): Returns the exact text that was matched. In this exercise every match is the same string "cat", but for variable patterns this method would return different text for each hit.
Contrast with Exercise 24: Both exercises search the same text for the same target. Exercise 24 uses re.findall() to retrieve matched strings and a total count. This exercise uses re.finditer() to retrieve matched strings together with their exact positions. The two functions are complementary: use re.findall() when you only need the values, and re.finditer() when you also need positional information or want to process matches one at a time without building a full list in memory.

Exercise 26: Extract Date from URL

Problem Statement: Write a Python program to extract the year, month, and day components from a URL string formatted as https://example.com/yyyy/mm/dd/article-slug.

Purpose: This exercise practises using multiple capturing groups to pull structured data out of a predictably formatted string. Extracting date segments from URLs is a common task in web scraping, content management systems, and analytics pipelines where publication dates are embedded in permalink structures.

Given Input: urls = ["https://example.com/2026/05/22/my-article", "https://news.site.org/2019/11/03/breaking-story", "https://blog.example.com/2023/07/30/summer-update"]

Expected Output:

URL: https://example.com/2026/05/22/my-article
  Year: 2026 | Month: 05 | Day: 22

URL: https://news.site.org/2019/11/03/breaking-story
  Year: 2019 | Month: 11 | Day: 03

URL: https://blog.example.com/2023/07/30/summer-update
  Year: 2023 | Month: 07 | Day: 30

▼ Solution & Explanation

import re

urls = [
    "https://example.com/2026/05/22/my-article",
    "https://news.site.org/2019/11/03/breaking-story",
    "https://blog.example.com/2023/07/30/summer-update"
]

pattern = r"/(\d{4})/(\d{2})/(\d{2})/"

for url in urls:
    match = re.search(pattern, url)
    if match:
        year, month, day = match.groups()
        print(f"URL: {url}")
        print(f"  Year: {year} | Month: {month} | Day: {day}\n")Code language: Python (python)

Explanation:

Leading and trailing / in the pattern: The forward slashes outside the capturing groups anchor each date segment within the URL path structure. This prevents the pattern from accidentally matching a four-digit number that appears in a different part of the URL, such as a port number or a numeric slug.
(\d{4}), (\d{2}), (\d{2}): Three capturing groups that isolate the year, month, and day respectively. The fixed-width quantifiers mirror the expected format exactly, so a three-digit year or single-digit month would not match.
match.groups(): Returns all captured groups as a tuple in the order they appear in the pattern. Unpacking directly into year, month, day gives each value a meaningful name without needing to index into the tuple manually.
Zero-padding is preserved: Because the groups capture raw digit strings rather than converting to integers, leading zeros in the month and day (e.g., 05, 03) are retained in the output. This matches the source format and avoids an unintended change in representation.
Named groups as an alternative: The pattern could be rewritten as /(?P<year>\d{4})/(?P<month>\d{2})/(?P<day>\d{2})/, after which values can be accessed by name with match.group("year") and so on. Named groups improve readability when a pattern has many components.

Exercise 27: Extract All Numbers

Problem Statement: Write a Python program to separate and extract all numeric values from a mixed string of text and digits.

Purpose: This exercise practises using re.findall() with a digit pattern to strip numbers out of unstructured mixed content. Extracting numeric values from prose is a frequent requirement in data entry parsing, invoice processing, scientific text mining, and any pipeline that ingests human-written content containing figures.

Given Input: text = "In 2024 there were 1200 participants across 3 events, with scores of 98.5, 76, and 100"

Expected Output: ['2024', '1200', '3', '98.5', '76', '100']

▼ Solution & Explanation

import re

text = "In 2024 there were 1200 participants across 3 events, with scores of 98.5, 76, and 100"
pattern = r"\d+\.?\d*"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

\d+: Matches one or more consecutive digit characters before any decimal point. This handles pure integers like 2024, 1200, 3, 76, and 100, and also anchors the start of a decimal number such as 98.5.
\.?: Matches a literal dot zero or one time. The backslash is necessary because an unescaped . in a regex pattern matches any character. The ? makes it optional so that integers without a fractional part are still matched.
\d*: Matches zero or more digits after the optional dot. Using * rather than + means a trailing dot with no digits (e.g., 98.) is still captured, with the fractional part being empty. If you need to exclude such cases, replace \d* with \d+ and use a full alternation: \d+\.\d+|\d+.
Results are strings: re.findall() always returns strings, not numeric types. To work with the values arithmetically, convert them with float(n) or int(n) as appropriate: [float(n) for n in matches].

Exercise 28: Extract Email Addresses

Problem Statement: Write a Python program to extract all valid email addresses from a large block of unstructured text.

Purpose: This exercise practises constructing a multi-part pattern that mirrors the structural rules of a real-world format. Email extraction is one of the most common practical applications of regular expressions, appearing in contact harvesting tools, data cleaning pipelines, and communication platform integrations.

Refer: Regex Special Sequences and Character classes

Given Input:

text = """Please reach out to support@example.com for help.
You can also contact the team at admin.team@company.org or sales@shop.co.uk.
Invalid addresses like @nodomain and user@ should be ignored.
For billing queries write to billing_dept+invoices@finance.example.net."""

Expected Output: ['support@example.com', 'admin.team@company.org', 'sales@shop.co.uk', 'billing_dept+invoices@finance.example.net']

▼ Solution & Explanation

import re

text = """Please reach out to support@example.com for help.
You can also contact the team at admin.team@company.org or sales@shop.co.uk.
Invalid addresses like @nodomain and user@ should be ignored.
For billing queries write to billing_dept+invoices@finance.example.net."""

pattern = r"[\w.+\-]+@[\w\-]+(?:\.[\w\-]+)*\.[a-zA-Z]{2,}"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

[\w.+\-]+: Matches the local part of the email address. The character class includes word characters (\w, which covers letters, digits, and underscores), dots, plus signs, and hyphens. The + quantifier requires at least one character, which rejects entries like @nodomain that have nothing before the @.
@: A literal at-sign acting as the required separator between the local part and the domain. Its presence is mandatory, so bare strings without @ are never matched.
[\w\-]+: Matches the first label of the domain (e.g., example, company, shop). This rejects user@ because there is nothing after the @ to satisfy the + quantifier.
(?:\.[\w\-]+)*: A non-capturing group that matches zero or more additional dot-separated domain labels. The ?: prefix means the group is used purely for grouping the alternation without creating a capture that would affect re.findall()‘s return value. This handles subdomains such as finance.example in the final test address.
\.[a-zA-Z]{2,}: Matches the mandatory top-level domain: a literal dot followed by at least two letters. This enforces that the address ends with a recognisable TLD (e.g., .com, .org, .uk, .net) and rejects fragments that trail off without one.

Exercise 29: Swap Characters

Problem Statement: Write a Python program to replace all whitespace characters with an underscore, and all underscores with a whitespace, in a single pass over the string.

Purpose: This exercise introduces the technique of performing two simultaneous character substitutions without one replacement interfering with the other. True in-place swapping requires a strategy that distinguishes the original characters from those that have already been substituted, and is a practical problem in slug generation, identifier normalisation, and format conversion pipelines.

Given Input: test_strings = ["hello world", "hello_world", "the quick_brown fox_jumps", "no_change"]

Expected Output:

hello world           -> hello_world
hello_world           -> hello world
the quick_brown fox_jumps -> the_quick brown_fox jumps
no_change             -> no change

▼ Solution & Explanation

import re

test_strings = [
    "hello world",
    "hello_world",
    "the quick_brown fox_jumps",
    "no_change"
]

def swap(s):
    return re.sub(r"[ _]", lambda m: "_" if m.group() == " " else " ", s)

for s in test_strings:
    print(f"{s:<26} -> {swap(s)}")Code language: Python (python)

Explanation:

Why two sequential re.sub() calls fail: If you first replace spaces with underscores and then replace underscores with spaces, the second call converts both the original underscores and the newly inserted ones back to spaces, producing a result with no underscores at all. The single-pass approach avoids this by deciding the replacement for each character before any substitutions have been written back into the string.
[ _]: A character class that matches either a single space or a single underscore. Each character is handled individually as the regex engine scans left to right, so mixed strings like the quick_brown fox_jumps are processed correctly in one pass.
Lambda as the replacement: The callable form of re.sub() receives a fresh match object for each character hit. The lambda inspects m.group() and returns the opposite character. Because the decision is made per-match before the string is modified, there is no risk of a substituted character being re-evaluated.
Alternative using a translation table: For plain character-for-character swaps without regex, Python’s str.translate(str.maketrans(" _", "_ ")) performs the same operation in a single pass and is slightly more efficient for this specific case. The regex approach is shown here because it scales to more complex conditional substitutions that str.translate() cannot handle.

Exercise 30: Replace Multiple Delimiters

Problem Statement: Write a Python program to replace all occurrences of spaces, commas, and dots in a string with a colon.

Purpose: This exercise demonstrates how a single re.sub() call with a character class can replace multiple different delimiters simultaneously, replacing the need for chained str.replace() calls. Normalising mixed delimiters into a single consistent separator is a standard data-cleaning step in CSV processing, configuration parsing, and token splitting.

Given Input: test_strings = ["one two three", "one,two,three", "one.two.three", "one, two. three", "no.delimiters,here today"]

Expected Output:

one two three          -> one:two:three
one,two,three          -> one:two:three
one.two.three          -> one:two:three
one, two. three        -> one::two::three
no.delimiters,here today -> no:delimiters:here:today

Refer:

▼ Solution & Explanation

import re

test_strings = [
    "one two three",
    "one,two,three",
    "one.two.three",
    "one, two. three",
    "no.delimiters,here today"
]

pattern = r"[ ,.]"

for s in test_strings:
    result = re.sub(pattern, ":", s)
    print(f"{s:<26} -> {result}")Code language: Python (python)

Explanation:

[ ,.]: A character class that matches any one of three characters: a space, a comma, or a dot. Inside a character class, the dot loses its wildcard meaning and is treated as a literal period, so no backslash is needed. Each character in the class is an independent alternative; the regex engine replaces whichever one it encounters at each position.
Plain string replacement: The second argument to re.sub() is the string ":". Because every matched delimiter is replaced with the same fixed value, no lambda or backreference is needed, keeping the call simple and readable.
Why one, two. three produces double colons: The comma and the space are two separate characters, and each is an independent match. The comma is replaced with : and the space immediately after it is also replaced with :, producing ::. This is the correct and expected behaviour for a per-character replacement. If you want to collapse consecutive delimiters into a single colon, change the pattern to [ ,.]+ to match one or more delimiters as a group.
Advantage over chained str.replace(): Replacing three delimiters with str.replace() would require three separate calls: s.replace(" ", ":").replace(",", ":").replace(".", ":"). The re.sub() approach handles all three in a single pass over the string, which is both more concise and more efficient for longer strings or larger sets of delimiters.

Table of contents

Exercise 1: Check Allowed Characters

Exercise 2: Match Zero or More

Exercise 3: Match One or More

Exercise 4: Match Optional Characters

Exercise 5: Match Exact Occurrences

Exercise 6: Match Range of Occurrences

Exercise 7: Find Underscore Joined Lowercase

Exercise 8: PascalCase Match

Exercise 9: Match Start and End

Exercise 10: Match Word at Start

Exercise 11: Match Word at End

Exercise 12: Find a Specific Letter

Exercise 13: Find Letter in Middle

Exercise 14: Match Adjacent Words

Exercise 15: Filter by Starting Letter

Exercise 16: Validate Alphanumeric ID

Exercise 17: Check Starting Number

Exercise 18: Number at End

Exercise 19: Clean IP Addresses

Exercise 20: Convert Date Format

Exercise 21: Extract 1-3 Digit Numbers

Exercise 22: Search Literal Strings

Exercise 23: Find Pattern Location

Exercise 24: Find All Substrings

Exercise 25: Iterate Matches

Exercise 26: Extract Date from URL

Exercise 27: Extract All Numbers

Exercise 28: Extract Email Addresses

Exercise 29: Swap Characters

Exercise 30: Replace Multiple Delimiters

About Vishal

Related Tutorial Topics:

All Coding Exercises:

Python Exercises and Quizzes

Leave a Reply Cancel reply

About PYnative

Follow Us

Explore Python

Coding Exercises

Legal Stuff