Regex Syntax Reference

ripgrep supports two regex engines that you can switch between at the command line. The default engine is the Rust regex crate. The alternative is PCRE2, enabled with -P.

Default engine: Rust regex

The default regex engine is the Rust regex crate. It provides linear-time guarantees — it will never have catastrophic backtracking slowdowns, no matter what pattern you write. The trade-off is that some advanced features (look-arounds, backreferences) are not supported.

Supported syntax

Syntax Meaning Example
. Any character except newline f.o matches foo, f1o
* 0 or more of previous fo* matches f, fo, foo
+ 1 or more of previous fo+ matches fo, foo
? 0 or 1 of previous colou?r matches color, colour
{n} Exactly n repetitions \d{4} matches 4 digits
{n,m} Between n and m repetitions \w{2,5}
^ Start of line ^fn matches fn at line start
$ End of line error$ matches error at end
\b Word boundary \bfoo\b matches foo but not foobar
[abc] Character class [aeiou] matches any vowel
[^abc] Negated character class [^\d] matches non-digit
[a-z] Character range [a-z] matches lowercase letter
(abc) Capture group (foo|bar)
(?:abc) Non-capturing group (?:foo|bar)
a|b Alternation cat|dog
\d Digit (Unicode-aware) \d+ matches 42 or ٤٢
\w Word character \w+ matches identifiers
\s Whitespace \s+ matches spaces, tabs
\D, \W, \S Negated class \D matches non-digit

Not supported in the default engine

  • Look-ahead ((?=...), (?!...)) — use -P
  • Look-behind ((?<=...), (?<!...)) — use -P
  • Backreferences (\1, \k<name>) — use -P
  • Possessive quantifiers (*+, ++) — use -P
  • Atomic groups ((?>...)) — use -P

PCRE2 engine (-P / --pcre2)

PCRE2 is the same regex library used by Perl, Python's re module, and most other modern languages. Enable it with rg -P 'pattern'.

Note: PCRE2 is not available in all ripgrep builds. If you installed via a package manager and get an error, you may need to build ripgrep from source with cargo install ripgrep --features pcre2.

PCRE2-only features

# Look-ahead: match "foo" only when followed by "bar"
rg -P 'foo(?=bar)'

# Negative look-ahead: match "foo" NOT followed by "bar"
rg -P 'foo(?!bar)'

# Look-behind: match "bar" only preceded by "foo"
rg -P '(?<=foo)bar'

# Negative look-behind
rg -P '(?<!foo)bar'

# Backreference: find repeated word
rg -P '(\b\w+\b) \1'

# Named capture group
rg -P '(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'

Hybrid mode

Use --auto-hybrid-regex to let ripgrep automatically pick PCRE2 when the pattern requires it and fall back to the default engine otherwise. This avoids the PCRE2 overhead for simple patterns.

rg --auto-hybrid-regex '(?<=def )\w+'

Unicode support

ripgrep has Unicode support enabled by default in both engines. \d, \w, and \s match their Unicode equivalents, not just ASCII.

# \p{L} — any Unicode letter
rg '\p{L}+'

# \p{Han} — CJK Han characters
rg '\p{Han}'

# \p{Ll} — lowercase letters
rg '\p{Ll}{3,}'

# \p{N} — numeric characters (includes Arabic-Indic digits, etc.)
rg '\p{N}+'

# Disable Unicode for ASCII-only matching (faster on ASCII corpora)
rg '(?-u)\w+'

Common Unicode categories

Pattern Matches
\p{L} Any letter (all scripts)
\p{Lu} Uppercase letter
\p{Ll} Lowercase letter
\p{N} Any number
\p{Nd} Decimal digit (0–9 in any script)
\p{P} Punctuation
\p{Z} Separator (space, line sep, paragraph sep)
\p{Latin} Latin script characters
\p{Han} CJK unified ideographs
\p{Arabic} Arabic script
\p{Cyrillic} Cyrillic script

Common patterns cheatsheet

# Email address (simplified)
rg '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

# URL
rg 'https?://[^\s"<>]+'

# IPv4 address
rg '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'

# Hex color
rg '#[0-9a-fA-F]{3,6}\b'

# ISO 8601 date
rg '\d{4}-\d{2}-\d{2}'

# Version number (semver)
rg '\bv?\d+\.\d+\.\d+\b'

# TODO/FIXME comment
rg '(TODO|FIXME|HACK|XXX):'

# Function definition (Rust)
rg 'pub fn \w+'

# Import statement (Python)
rg '^(import|from) \w'

# JSON key-value (approximate)
rg '"[^"]+": "[^"]*"'

Performance tips

  • Anchor patterns^pattern or pattern$ can reduce the search space.
  • Prefer literalsfoo is faster than [f][o][o]. ripgrep can use SIMD for literal prefixes.
  • Use -F for literals — if you are not using regex features, -F (fixed strings) skips regex parsing entirely.
  • Use -w for words-w foo is faster than \bfoo\b.
  • Avoid .* anchorsfoo.*bar forces ripgrep to scan each line fully; prefer foo followed by manual review.
  • Use the default engine — PCRE2 is powerful but slower. Only switch with -P when you genuinely need it.
  • Disable Unicode when not needed(?-u)\w+ uses ASCII-only matching and can be faster on ASCII-heavy corpora.