Regular Expressions

Course 2 · Ch 9
Regular Expressions in PHP: preg_match, preg_replace
Pattern matching for text that's more flexible than the exact-match string functions from Fundamentals can handle

Chapter 7 of Fundamentals covered str_contains(), strpos(), and str_replace() — all built around exact, literal text. Regular expressions ("regex") describe patterns instead — "a sequence of digits," "an optional plus sign followed by numbers," "anything that looks like a UK postcode" — and PHP's preg_* functions test and act on those patterns.

preg_match — Does a Pattern Appear?

<?php $text = "My phone number is 07911 123456."; if (preg_match('/\d{5} \d{6}/', $text)) { echo "Found a UK-style phone number pattern.<br>"; } ?>

The pattern is written between two / delimiters. preg_match() returns 1 if the pattern is found anywhere in the string, 0 if not — both treated as truthy/falsy correctly in an if, similar to how strpos() needed care in Chapter 7.

Core Pattern Syntax

PatternMatches
\dAny single digit (0-9)
\wAny "word" character (letters, digits, underscore)
\sAny whitespace character (space, tab, newline)
.Any single character at all (except newline by default)
+One or more of the preceding item
*Zero or more of the preceding item
?Zero or one of the preceding item (makes it optional)
{5}Exactly 5 of the preceding item
[abc]Any one of the characters listed inside the brackets
^ / $Start / end of the string
/^\d{3}-\d{4}$/ ^ start   \d{3} three digits   - literal hyphen   \d{4} four digits   $ end
Matches "123-4567" exactly — three digits, a hyphen, four digits, nothing more

Capturing Groups — Extracting Parts of a Match

<?php $text = "Order #4521 was placed on 2026-06-20."; preg_match('/Order #(\d+)/', $text, $matches); echo $matches[0]; // "Order #4521" — the whole match echo $matches[1]; // "4521" — just the part inside the parentheses preg_match('/(\d{4})-(\d{2})-(\d{2})/', $text, $dateParts); echo "Year: {$dateParts[1]}, Month: {$dateParts[2]}, Day: {$dateParts[3]}"; ?>

Parentheses ( ) create a "capturing group" — beyond just confirming a match exists, the third argument ($matches, passed by reference) is filled with the whole match at index 0, and each group's individually captured text at index 1, 2, and so on.

preg_match_all — Every Match, Not Just the First

<?php $text = "Contact: alice@example.com or bob@test.org"; preg_match_all('/[\w.]+@[\w.]+/', $text, $matches); print_r($matches[0]); // ["alice@example.com", "bob@test.org"] ?>

preg_match() stops after the first match anywhere in the string; preg_match_all() finds every occurrence, returning them as an array.

preg_replace — Pattern-Based Find and Replace

<?php $text = "Call 07911 123456 or 07700 900900."; $censored = preg_replace('/\d{5} \d{6}/', '[REDACTED]', $text); echo $censored; // "Call [REDACTED] or [REDACTED]." // Using a captured group inside the replacement, with $1, $2, etc. $reformatted = preg_replace('/(\d{4})-(\d{2})-(\d{2})/', '$3/$2/$1', "2026-06-20"); echo $reformatted; // "20/06/2026" ?>

preg_replace() mirrors str_replace() (Chapter 7 of Fundamentals), but matches by pattern rather than exact text, and replaces every match by default — $1, $2, $3 in the replacement string refer back to the captured groups from the pattern itself, genuinely useful for reformatting matched text rather than just deleting or replacing it outright.

Regex special characters need escaping with a backslash if you want them matched literally
Characters like ., +, ?, (, ) have special meaning in a pattern. To match a literal period in, say, a domain name, write \. — an unescaped . would instead match "any character at all," which is usually not the intent and can cause subtly wrong matches that are easy to miss in testing.

Validating an Email Format (Compared to filter_var)

<?php $email = "philip@example.com"; if (preg_match('/^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/', $email)) { echo "Looks like a valid email format.<br>"; } ?>
filter_var(FILTER_VALIDATE_EMAIL) is still the better choice for real email validation
Email address rules are genuinely more complex than they first appear — Chapter 8's FILTER_VALIDATE_EMAIL already handles this thoroughly and correctly. This pattern is shown purely as a realistic regex example; in real code, prefer the built-in filter for anything regex could get subtly wrong, and reach for regex when there's no equivalent built-in filter available.

Coding Challenges

Challenge 1

Write a pattern that matches a UK postcode in the simplified format "AA1 1AA" (two letters, one digit, a space, one digit, two letters). Test it with preg_match() against both a matching and a non-matching string, echoing the result of each.

📄 View solution
Challenge 2

Given a string containing several prices like "Items: £12.50, £3.99, and £100.00", use preg_match_all() with a capturing group to extract just the numeric amounts (without the £ sign) into an array, then use array_sum() to total them.

📄 View solution
Challenge 3

Write a function maskCardNumber($number) that takes a 16-digit card number string and uses preg_replace() with capturing groups to keep only the last 4 digits visible, replacing the rest with asterisks (e.g. "1234567812345678" becomes "************5678").

📄 View solution

Chapter 9 Quick Reference

  • preg_match($pattern, $str) — returns 1 if the pattern is found, 0 if not
  • \d \w \s . + * ? {n} [abc] ^ $ — core pattern building blocks
  • ( ) capturing groups — extract matched parts into $matches[1], [2], etc.
  • preg_match_all() — finds every match in the string, not just the first
  • preg_replace($pattern, $replacement, $str) — pattern-based find/replace; $1, $2 reference captured groups
  • Escape special characters (., +, ?, etc.) with a backslash to match them literally
  • Prefer filter_var() over regex for things like email validation where a reliable built-in filter exists
  • Next chapter: course capstone — a simple blog with login, CRUD posts, and database storage