Regular Expressions Crash Course: From Zero to Pattern Matching

Published March 2026 · 14 min read

1. Introduction

Regular expressions (often abbreviated as regex or regexp) are sequences of characters that define search patterns. They are one of the most powerful tools in a developer's toolkit, enabling you to search, match, validate, and transform text with surgical precision. Whether you are validating user input, parsing log files, scraping web content, or performing complex find-and-replace operations, regex is the go-to solution.

Regular expressions originated in the 1950s when mathematician Stephen Cole Kleene formalized the concept of regular languages. They made their way into computing through early Unix text-processing tools like grep, sed, and awk. Today, virtually every programming language supports regular expressions — JavaScript, Python, Java, Go, Rust, Ruby, C#, and many more all have built-in regex engines.

Despite their reputation for being cryptic, regex follows a logical structure. Once you understand the building blocks, even complex patterns become readable. This crash course takes you from absolute zero to confidently writing your own patterns. We will cover syntax, metacharacters, quantifiers, groups, lookaheads, and finish with a cookbook of real-world patterns you can copy and adapt.

Where regex is used: Form validation, server-side input sanitization, log analysis, code linting, search-and-replace in IDEs, web scraping, URL routing, syntax highlighting, data extraction from unstructured text, and countless other tasks.

2. Basic Matching

At its simplest, a regular expression is just a literal string. The pattern hello matches the exact sequence of characters "hello" anywhere in the target string. There is no magic here — literal characters match themselves.

Pattern: cat
Text:    "The cat sat on the mat"
Match:       ^^^
Result:  Matches "cat" at position 4

By default, regex is case-sensitive. Cat does not match cat. To perform a case-insensitive search, you use the i flag (covered in the JavaScript section below). Regex engines also search left-to-right by default, returning the first match they find unless you request all matches with the global flag.

Pattern: /cat/i
Text:    "The Cat sat on the mat"
Match:       ^^^
Result:  Matches "Cat" (case-insensitive)

3. Special Characters (Metacharacters)

Metacharacters are characters with special meaning in regex. They are the building blocks that elevate regex beyond simple string matching. If you want to match a metacharacter literally, you must escape it with a backslash (\).

. — Matches any single character except a newline. The pattern h.t matches "hat", "hot", "hit", and even "h9t".
^ — Anchors the match to the start of the string (or line in multiline mode). ^Hello matches "Hello" only when it appears at the very beginning.
$ — Anchors the match to the end of the string (or line). world$ matches "world" only at the end.
\ — Escapes a metacharacter so it is treated literally. \. matches an actual dot character instead of "any character."
| — Acts as a logical OR (alternation). cat|dog matches either "cat" or "dog".

// Dot matches any character
Pattern: c.t
Matches: "cat", "cot", "cut", "c3t", "c t"
No match: "ct", "cart"

// Alternation
Pattern: yes|no
Text:    "yes or no"
Matches: "yes", "no"

// Escaping a dot
Pattern: file\.txt
Matches: "file.txt"
No match: "fileTtxt", "file-txt"

4. Character Classes

Character classes let you define a set of characters that can match at a single position. They are written inside square brackets and provide far more control than the generic dot metacharacter.

Custom Character Classes

[abc] — Matches "a", "b", or "c". Exactly one character from the set.
[a-z] — Matches any lowercase letter from a to z using a range.
[A-Z0-9] — Matches any uppercase letter or digit. You can combine ranges.
[^abc] — Negated class. Matches any character except a, b, or c. The caret inside brackets means "not."

Shorthand Character Classes

Regex provides shorthand notations for the most commonly used character classes. Each shorthand also has an uppercase counterpart that matches the inverse:

\d — Any digit (equivalent to [0-9])
\D — Any non-digit (equivalent to [^0-9])
\w — Any word character: letter, digit, or underscore (equivalent to [a-zA-Z0-9_])
\W — Any non-word character (equivalent to [^a-zA-Z0-9_])
\s — Any whitespace character: space, tab, newline, carriage return
\S — Any non-whitespace character

// Match a hex color code
Pattern: #[0-9a-fA-F]{6}
Matches: "#ff5733", "#1A2B3C", "#000000"

// Match a 3-digit number
Pattern: \d{3}
Text:    "Room 404 is on floor 4"
Matches: "404"

// Match non-digits
Pattern: \D+
Text:    "abc123def"
Matches: "abc", "def"

5. Quantifiers

Quantifiers specify how many times the preceding element should appear. Without quantifiers, each element in a pattern matches exactly once. Quantifiers give you control over repetition.

* — Zero or more times. The pattern ab*c matches "ac", "abc", "abbc", "abbbc", etc.
+ — One or more times. The pattern ab+c matches "abc", "abbc" but not "ac".
? — Zero or one time (optional). The pattern colou?r matches both "color" and "colour".
{n} — Exactly n times. \d{4} matches exactly four digits.
{n,} — At least n times. \d{2,} matches two or more consecutive digits.
{n,m} — Between n and m times (inclusive). \d{2,4} matches two, three, or four digits.

Greedy vs Lazy Matching

By default, quantifiers are greedy— they match as many characters as possible while still allowing the overall pattern to succeed. Adding a ? after a quantifier makes it lazy (also called reluctant), meaning it matches as few characters as possible.

// Greedy (default)
Pattern: ".+"
Text:    He said "hello" and "goodbye"
Match:   "hello" and "goodbye"  (matches everything between first " and last ")

// Lazy
Pattern: ".+?"
Text:    He said "hello" and "goodbye"
Match:   "hello"  (stops at the first closing quote)
Match:   "goodbye"

Rule of thumb: Use lazy quantifiers when you want the shortest possible match. This is especially important when matching content between delimiters like quotes, HTML tags, or brackets.

6. Anchors and Boundaries

Anchors do not match characters — they match positions. They assert that the engine is at a particular position in the string without consuming any characters.

^ — Start of string. In multiline mode (m flag), matches the start of each line.
$ — End of string. In multiline mode, matches the end of each line.
\b — Word boundary. Matches the position between a word character (\w) and a non-word character. This is incredibly useful for matching whole words.
\B — Non-word boundary. Matches any position that is not a word boundary.

// Word boundary prevents partial matches
Pattern: \bcat\b
Text:    "The cat sat on the catalog"
Match:       ^^^
No match:                    ^^^ (inside "catalog")

// Start and end anchors for full string validation
Pattern: ^\d{3}-\d{4}$
Match:   "555-1234"
No match: "Call 555-1234 now"  (not at start/end)

// Non-word boundary
Pattern: \Bcat
Text:    "concatenate"
Match:       ^^^  (matches "cat" inside the word)

7. Groups and Capturing

Parentheses serve two purposes in regex: they group parts of a pattern together (so quantifiers can apply to the whole group), and they capture the matched text for later use. Groups are one of the most powerful features of regex.

Numbered Capturing Groups

Every pair of parentheses creates a capturing group, numbered left-to-right starting from 1. Group 0 always refers to the entire match.

// Extract date components
Pattern: (\d{4})-(\d{2})-(\d{2})
Text:    "2026-03-10"
Group 0: "2026-03-10"  (full match)
Group 1: "2026"        (year)
Group 2: "03"          (month)
Group 3: "10"          (day)

Named Capturing Groups

Named groups use the syntax (?<name>...) to assign a name to a capturing group. This makes complex patterns far more readable and makes extracted values easier to work with in code.

// Named groups for readability
Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Text:    "2026-03-10"

// In JavaScript:
const match = "2026-03-10".match(
  /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
);
match.groups.year;   // "2026"
match.groups.month;  // "03"
match.groups.day;    // "10"

Non-Capturing Groups

Sometimes you need grouping for structure (e.g., to apply a quantifier) but do not need to capture the result. Non-capturing groups use (?:...) and have a minor performance advantage since the engine does not need to store the matched text.

// Non-capturing group for alternation
Pattern: (?:https?|ftp)://[\w.-]+
Matches: "https://example.com", "http://test.org", "ftp://files.net"

// Non-capturing group with quantifier
Pattern: (?:ab)+
Matches: "ab", "abab", "ababab"

// Compare: capturing vs non-capturing
/(foo)(bar)/.exec("foobar")   // groups: ["foobar", "foo", "bar"]
/(?:foo)(bar)/.exec("foobar") // groups: ["foobar", "bar"]

8. Lookahead and Lookbehind

Lookaround assertions check whether a pattern exists ahead of or behind the current position, without including it in the match. They are zero-width assertions — they do not consume characters. This is extremely useful when you want to match something only if it is (or is not) followed by or preceded by a specific pattern.

Positive Lookahead `(?=...)`

Matches the preceding element only if it is followed by the pattern inside the lookahead.

// Match "data" only if followed by "base"
Pattern: data(?=base)
Text:    "database datafile databank"
Match:   "data" in "database"
No match: "data" in "datafile" or "databank"

Negative Lookahead `(?!...)`

Matches the preceding element only if it is not followed by the specified pattern.

// Match "data" only if NOT followed by "base"
Pattern: data(?!base)
Text:    "database datafile databank"
Match:   "data" in "datafile", "data" in "databank"
No match: "data" in "database"

Positive Lookbehind `(?<=...)`

Matches the following element only if it is preceded by the pattern inside the lookbehind.

// Match a number only if preceded by "$"
Pattern: (?<=\$)\d+\.?\d*
Text:    "Price: $42.99 and €35.00"
Match:   "42.99"
No match: "35.00" (preceded by €, not $)

Negative Lookbehind `(?<!...)`

Matches the following element only if it is not preceded by the specified pattern.

// Match digits NOT preceded by a minus sign
Pattern: (?<!-)\b\d+\b
Text:    "Scores: 10 -5 20 -3 15"
Matches: "10", "20", "15"
No match: "5", "3" (preceded by minus sign)

Practical example: Password validation commonly uses multiple lookaheads to enforce rules simultaneously. The pattern (?=.*[A-Z])(?=.*\d)(?=.*[@#$]).{8,} requires at least one uppercase letter, one digit, one special character, and a minimum of 8 characters — all without dictating the order.

9. Common Patterns Cookbook

Here is a collection of practical regex patterns you can use as starting points. Each pattern includes a brief explanation of how it works.

Email Address

Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:
  ^                     Start of string
  [a-zA-Z0-9._%+-]+    Local part: letters, digits, dots, underscores, etc.
  @                     Literal "@" symbol
  [a-zA-Z0-9.-]+       Domain: letters, digits, dots, hyphens
  \.                   Literal dot before TLD
  [a-zA-Z]{2,}         TLD: at least 2 letters
  $                     End of string

Matches: "user@example.com", "dev.ops+tag@mail.co.uk"
No match: "@example.com", "user@", "user@.com"

URL

Pattern: https?:\/\/[\w.-]+(?:\.[a-zA-Z]{2,})(?:\/[\w./?#&=%-]*)*

Breakdown:
  https?               "http" or "https"
  :\/\/              Literal "://"
  [\w.-]+             Domain name
  (?:\.[a-zA-Z]{2,})  TLD
  (?:\/[...])*        Optional path, query, and fragment

Matches: "https://example.com", "http://sub.domain.org/path?q=1"
No match: "ftp://files.net", "not-a-url"

Phone Number (US Format)

Pattern: ^(?:\+1[-\s]?)?\(?\d{3}\)?[-\s.]?\d{3}[-\s.]?\d{4}$

Breakdown:
  (?:\+1[-\s]?)?      Optional country code "+1"
  \(?\d{3}\)?        Area code with optional parentheses
  [-\s.]?              Optional separator (dash, space, or dot)
  \d{3}[-\s.]?\d{4}  Seven remaining digits with optional separator

Matches: "(555) 123-4567", "555-123-4567", "+1 555.123.4567"
No match: "123-456", "555-1234-567"

Date (YYYY-MM-DD)

Pattern: ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Breakdown:
  \d{4}             Four-digit year
  (0[1-9]|1[0-2])   Month: 01–12
  (0[1-9]|[12]\d|3[01])  Day: 01–31

Matches: "2026-03-10", "1999-12-31"
No match: "2026-13-01", "2026-00-15", "26-03-10"

IPv4 Address

Pattern: ^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Breakdown:
  25[0-5]         Matches 250–255
  2[0-4]\d        Matches 200–249
  [01]?\d\d?      Matches 0–199
  \.              Literal dot separator
  {3}             Repeated for first three octets
                  Then the same pattern once more for the fourth octet

Matches: "192.168.1.1", "255.255.255.0", "0.0.0.0"
No match: "256.1.1.1", "192.168.1", "1.2.3.4.5"

Password Validation

Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Breakdown:
  (?=.*[a-z])       At least one lowercase letter
  (?=.*[A-Z])       At least one uppercase letter
  (?=.*\d)          At least one digit
  (?=.*[@$!%*?&])   At least one special character
  [A-Za-z\d@$!%*?&]{8,}  Minimum 8 characters from allowed set

Matches: "MyP@ss1word", "Str0ng!Pass"
No match: "password", "12345678", "NoDigit!""

Warning: These patterns are simplified for learning purposes. For production use, consider edge cases carefully. Email validation in particular is notoriously difficult to get right with regex alone — the full RFC 5322 specification is far more complex. For critical validation, combine regex with server-side checks or dedicated libraries.

10. Regex in JavaScript

JavaScript has first-class regex support through the RegExp object and regex literal syntax. Here is how to use the most important methods and flags.

Creating Regex Patterns

// Regex literal (preferred for static patterns)
const pattern = /hello/gi;

// RegExp constructor (useful for dynamic patterns)
const dynamic = new RegExp("hello", "gi");
const userInput = "search term";
const search = new RegExp(userInput, "i");

test() — Check if a Pattern Matches

Returns true or false. The fastest way to check for a match when you do not need the matched text.

const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

emailRegex.test("user@example.com");  // true
emailRegex.test("invalid-email");      // false

match() — Find Matches

Called on a string, returns an array of matches. Without the g flag, it returns the first match with capturing groups. With g, it returns all matches but without groups.

// Without global flag — detailed first match
"2026-03-10".match(/(\d{4})-(\d{2})-(\d{2})/);
// ["2026-03-10", "2026", "03", "10"]

// With global flag — all matches, no groups
"cat bat hat".match(/[a-z]at/g);
// ["cat", "bat", "hat"]

matchAll() — Iterate Over All Matches with Groups

Returns an iterator of all matches, each with full group information. Requires the g flag.

const text = "Dates: 2026-03-10, 2025-12-25";
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;

for (const match of text.matchAll(regex)) {
  console.log(match.groups);
}
// { year: "2026", month: "03", day: "10" }
// { year: "2025", month: "12", day: "25" }

replace() — Search and Replace

Replaces matched text with a replacement string or function. Supports backreferences in the replacement string using $1, $2, etc.

// Simple replacement
"Hello World".replace(/world/i, "Regex");
// "Hello Regex"

// Using capturing groups in replacement
"2026-03-10".replace(/(\d{4})-(\d{2})-(\d{2})/, "$2/$3/$1");
// "03/10/2026"

// Using a function for dynamic replacement
"hello world".replace(/\b\w/g, (char) => char.toUpperCase());
// "Hello World"

Flags

Flags modify how the regex engine behaves. They are appended after the closing slash of a regex literal or passed as the second argument to the RegExp constructor.

g — Global: Find all matches, not just the first.
i — Case-insensitive: Ignore uppercase/lowercase distinctions.
m — Multiline: Make ^ and $ match the start/end of each line, not just the entire string.
s — Dotall: Make . match newline characters as well.

// Combining flags
const regex = /^hello$/gim;

// Multiline mode
const text = "Hello\nworld\nhello";
text.match(/^hello$/gim);  // ["Hello", "hello"]

// Dotall mode
const html = "<p>Line 1\nLine 2</p>";
html.match(/<p>.*<\/p>/);   // null (dot doesn't match \n by default)
html.match(/<p>.*<\/p>/s);  // ["<p>Line 1\nLine 2</p>"]

11. Tips and Pitfalls

Regex is a powerful tool, but it comes with some common traps that can cause performance issues, incorrect matches, or unmaintainable code. Here are the most important things to watch out for.

Catastrophic Backtracking

When a regex engine encounters a pattern with nested quantifiers and overlapping possibilities, it can enter a state called catastrophic backtracking. The engine tries an exponential number of paths before concluding that there is no match. This can freeze your application or cause a denial-of-service vulnerability (ReDoS).

// DANGEROUS: nested quantifiers with overlapping patterns
const bad = /^(a+)+$/;
bad.test("aaaaaaaaaaaaaaaaaaaaaaaa!");
// This can take SECONDS or more — exponential backtracking

// SAFE: flatten the pattern
const good = /^a+$/;
good.test("aaaaaaaaaaaaaaaaaaaaaaaa!");
// Instant result

Warning: Patterns like (a+)+, (a|a)+, and (.*a){10} are classic examples of patterns vulnerable to catastrophic backtracking. Always test your regex against both matching and non-matching inputs, especially with long strings.

Avoid Over-Engineering

Not every string problem needs regex. Simple operations like checking if a string starts with a prefix, splitting on a single character, or trimming whitespace are better handled with built-in string methods. Regex adds cognitive overhead — use it when you genuinely need pattern matching, not as a hammer for every nail.

// Over-engineered — don't do this
/^Hello/.test(str);

// Simpler and faster
str.startsWith("Hello");

// Regex IS appropriate here — no simple alternative
const isValidHex = /^#[0-9a-f]{6}$/i.test(color);

Prioritize Readability

Complex regex patterns are notoriously difficult to read. When a pattern grows beyond a single line, consider breaking it into smaller parts with comments, using named groups, or building the pattern programmatically. Your future self (and your teammates) will thank you.

// Hard to read
const ugly = /^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/;

// Build it step by step for clarity
const octet = "(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)";
const ipv4 = new RegExp(`^(?:${octet}\\.){3}${octet}$`);

Always Test Edge Cases

Regex patterns that work for your initial test cases often fail on edge cases. Test with empty strings, strings with only whitespace, strings with special characters, very long strings, and strings that almost match but should not. Automated test suites for your regex patterns are just as valuable as tests for any other code.

Pro tip: Use online regex testing tools to visualize how your pattern matches against test strings in real time. This makes debugging far easier than staring at the raw pattern and trying to trace it mentally.

12. Try It Yourself

The best way to internalize regex is by practicing with real patterns against real text. Build, test, and refine your regular expressions using our free online regex tester. Paste your pattern and test string to see matches highlighted instantly, inspect capturing groups, and experiment with different flags.

Regex Tester

Paste your regex pattern and test string to see matches highlighted instantly. Inspect groups, test flags, and iterate until your pattern is perfect.

Open Regex Tester