Regex You Won't Hate
Pattern building from primitives. Lookarounds. JS flags.
Regular expressions are a tiny language for matching patterns in text. They look like cat-walked-on-keyboard at first, and they are a little ugly, but a few of them you'll see and write every week. This lesson teaches the pieces, then turns you loose with a live playground.
Two ways to write a regex in JS
// literal syntax
const a = /hello/i;
// constructor (for dynamic patterns built from strings)
const b = new RegExp("hello", "i");
a.test("Hello world") // true
"Hello world".match(a) // ["Hello", index: 0, ...]
"Hello world".replace(a, "Hi") // "Hi world"Character classes: what kinds of characters?
.any character except newline\ddigit,\Dnon-digit\wword character ([A-Za-z0-9_]),\Wthe opposite\swhitespace,\Snon-whitespace[abc]any of a, b, c.[^abc]none of those.[a-z]any lowercase letter.[0-9]same as\d.
Anchors: where in the string?
^start of string (or line with themflag)$end of string (or line withm)\bword boundary (between a\wand a\W)
/^hello/.test("hello world") // true (starts with hello)
/world$/.test("hello world") // true (ends with world)
/\bcat\b/.test("the cat sat") // true (cat as a whole word)
/\bcat\b/.test("category") // false (cat is part of a longer word)Quantifiers: how many?
?0 or 1*0 or more+1 or more{n}exactly n{n,}n or more{n,m}between n and m
By default, quantifiers are greedy: they match as much as possible. Add a ? after a quantifier to make it lazy: match as little as possible.
"<b>hi</b> <i>there</i>".match(/<.+>/)[0]
// "<b>hi</b> <i>there</i>" <-- greedy, eats everything
"<b>hi</b> <i>there</i>".match(/<.+?>/)[0]
// "<b>" <-- lazy, stops at first matchGroups and alternation
Parentheses (...) capture matched content for later use. (?:...) groups without capturing. The pipe | is "or".
// alternation
/cat|dog|fish/.test("I have a dog") // true
// capture groups for extraction
const m = "(415) 555-1212".match(/\((\d{3})\) (\d{3})-(\d{4})/);
m[0] // "(415) 555-1212" -- whole match
m[1] // "415" -- group 1
m[2] // "555" -- group 2
m[3] // "1212" -- group 3
// named groups
const r = /(?<year>\d{4})-(?<month>\d{2})/;
"2024-05".match(r).groups // { year: "2024", month: "05" }Lookarounds: match if surrounded by
Sometimes you want to match X but only when it's followed (or preceded) by Y, without including Y in the match itself.
(?=...)lookahead: must be followed by(?!...)negative lookahead: must NOT be followed by(?<=...)lookbehind: must be preceded by(?<!...)negative lookbehind: must NOT be preceded by
// price in dollars only, capture just the number
"$99 or 50 yen".match(/(?<=\$)\d+/)[0]
// "99"
// digits that aren't followed by px
"width: 100px; size: 20".match(/\d+(?!px)/g)
// ["10", "20"] <-- the 10 in "100px" matches because only 0 is followed by px
// (lookaround is tricky - test carefully!)JS flags
gglobal, find all matches not just the firsticase-insensitivemmultiline,^and$match line boundariessdotall,.matches newlines toouUnicode mode (treat code points correctly)vUnicode v-mode (modern, smarter sets, ES2024+)
"Cat cat CAT".match(/cat/g) // ["cat"]
"Cat cat CAT".match(/cat/gi) // ["Cat", "cat", "CAT"]
// emoji counted as one character with u flag
/^.$/.test("🤖") // false (without u, robot is 2 code units)
/^.$/u.test("🤖") // true
/\p{Emoji}/u.test("🤖") // true (Unicode property)Patterns you'll actually copy-paste
// hex color (#fff or #ffffff)
/^#([0-9a-f]{3}|[0-9a-f]{6})$/i
// ISO date (YYYY-MM-DD)
/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/
// "email-ish" (NOT real validation - see callout below)
/^[^\s@]+@[^\s@]+\.[^\s@]+$/
// URL slug (lowercase letters, digits, hyphens)
/^[a-z0-9]+(?:-[a-z0-9]+)*$/
// extract YouTube video ID from various URL forms
/(?:youtu\.be\/|v=)([\w-]{11})/
// whitespace at end of line (useful for cleanup)
/[ \t]+$/gm
// IPv4 (loose - for tighter, you need to check each octet ≤ 255)
/\b(?:\d{1,3}\.){3}\d{1,3}\b/The official email spec is hundreds of lines of grammar. Real email regexes are dozens of characters long and still wrong. To validate email, send a confirmation message. To "parse" HTML, use a parser (the browser's DOMParser, cheerio, jsdom).
HTML is not a regular language. Trying to regex it will work 80% of the time and silently corrupt your data the other 20%.
Live regex playground
Below is an interactive sandbox. Edit the regex in index.js and the text. The console shows what matches. Try the example patterns from above, or invent your own.
Performance: catastrophic backtracking
A regex with nested quantifiers like (a+)+$ against a string of as followed by a b can take exponentialtime to fail. This is called catastrophic backtracking, and it's a known way to DoS a server. Real outages have happened from a single ReDoS regex (Cloudflare, Stack Overflow).
// DON'T do this on untrusted input
/(a+)+$/.test("aaaaaaaaaaaaaaaaaaab") // hangs the runtime
// the v flag's new set/property features avoid some of this
// but the real fix is: rewrite to avoid nested quantifiers
// or use a proper parser, or set a time budgetQuick quiz
What does the regex /foo+/ match?
Recap
- Character classes (
\d \w \s [abc]), anchors (^ $ \b), and quantifiers (? * + {n}) are the building blocks. - Groups
(...)capture,(?:...)just groups,(?<name>...)names a capture. - Flags:
gall,icase-blind,mper-line,sdotall,u/vUnicode-aware. - Don't regex email or HTML. Beware catastrophic backtracking on untrusted input.