Pattern Search
LIKE, ILIKE, regex, trigram, full-text — the spectrum of find rows that match a string pattern.
What it is#
Pattern search is the family of SQL queries that ask “find rows whose text column matches a pattern.” The pattern can be a literal prefix, a sub-string, a wildcard expression, a regular expression, a fuzzy match, or a natural-language query against a full-text index. SQL gives you a spectrum of operators with very different cost models:
WHERE name LIKE 'Acme%' -- prefix, indexableWHERE name LIKE '%acme%' -- substring, scans by defaultWHERE name ILIKE '%Acme%' -- case-insensitive (Postgres)WHERE name ~ '^Ac[me]+$' -- regex (Postgres)WHERE name REGEXP '^Ac[me]+$' -- regex (MySQL)WHERE MATCH(body) AGAINST('acme corp') -- full-text (MySQL)WHERE to_tsvector(body) @@ to_tsquery('acme & corp') -- FTS (Postgres)The cheapest operator is a LIKE with an anchored prefix on an indexed column — that’s an index range scan. The most expressive is a tokenised full-text index — that’s a posting-list lookup with ranking. Most performance bugs in this area come from picking an operator whose cost model the planner couldn’t satisfy with the available indexes.
When to use it#
Reach for pattern search when the question is “find text that looks like X”:
- Type-ahead and autocomplete.
LIKE 'query%'on an indexed name column. Cheap and accurate for prefixes. - Substring containment. Finding email addresses with a domain, log lines with an error code. Use
LIKE '%pattern%'with a trigram index, or full-text for word-level matching. - Structured pattern matching. Validating that a column looks like a phone number or extracting a code — regex.
- Natural-language search. Searching article bodies, product descriptions, comment threads — full-text search with stemming, stopwords, and ranking.
- Fuzzy / typo-tolerant search. Names that might be misspelled — trigram similarity (
pg_trgmon Postgres) or Levenshtein in extensions.
Reach for a real search engine (Elasticsearch, OpenSearch, Meilisearch) when you outgrow what the relational engine can do — multi-field ranking, faceting, synonyms, language-aware analyzers at scale. The boundary is usually around tens of millions of documents or when relevance tuning becomes the product.
How it works#
LIKE is the standard operator. Two wildcards: % matches any number of characters, _ matches one. ESCAPE '\' lets you match literal % and _.
-- Users whose email is on the acme.com domainSELECT user_id, emailFROM usersWHERE email LIKE '%@acme.com';The planner cannot use a plain B-tree index on email for that query because the pattern is right-anchored (the wildcard is on the left). An index range scan needs a prefix it can range over; %@acme.com has no fixed prefix. The plan degrades to a sequential scan.
Anchoring the wildcard on the right lets the index work:
SELECT user_id, emailFROM usersWHERE email LIKE 'alice%';-- B-tree on (email) does a range scan: [alice, alicf) and stops.For substring search on indexed columns, Postgres has trigram indexes via the pg_trgm extension:
CREATE EXTENSION pg_trgm;CREATE INDEX users_email_trgm ON users USING gin (email gin_trgm_ops);
-- Now this uses the index:SELECT user_id, email FROM users WHERE email LIKE '%@acme.com';SELECT user_id, email FROM users WHERE email ILIKE '%acme%';The trigram index breaks each value into overlapping three-character sequences and indexes them as a GIN posting list. Substring queries match the trigrams of the pattern. There’s overhead — the index is larger than a B-tree and writes are slower — but it makes the previously un-indexable queries scale.
For word-level text search, both major engines have full-text indexes.
-- PostgresCREATE INDEX articles_body_fts ON articles USING gin (to_tsvector('english', body));
SELECT id, titleFROM articlesWHERE to_tsvector('english', body) @@ to_tsquery('english', 'database & index')ORDER BY ts_rank(to_tsvector('english', body), to_tsquery('english', 'database & index')) DESCLIMIT 20;-- MySQLALTER TABLE articles ADD FULLTEXT INDEX (body);
SELECT id, title, MATCH(body) AGAINST('database index' IN NATURAL LANGUAGE MODE) AS scoreFROM articlesWHERE MATCH(body) AGAINST('database index' IN NATURAL LANGUAGE MODE)ORDER BY score DESCLIMIT 20;Both engines tokenise the text, drop stopwords, optionally stem, and index the resulting word list. Queries match against the analysed form and return ranked results. The cost model is “posting-list intersection,” roughly proportional to the rarest term — not the table size.
Variants#
Case sensitivity. Postgres LIKE is case-sensitive; ILIKE is the case-insensitive twin. MySQL’s LIKE follows the column’s collation — utf8mb4_general_ci is case-insensitive, utf8mb4_bin is sensitive. SQL Server: collation determines both. SQLite: LIKE is ASCII-case-insensitive by default; GLOB is case-sensitive.
Regex. Postgres uses ~ (case-sensitive), ~* (case-insensitive), !~ (negated). MySQL: REGEXP / RLIKE, case-insensitive by default. Oracle: REGEXP_LIKE(col, pattern). SQL Server has no native regex — LIKE patterns or CLR.
Trigram similarity. pg_trgm’s % operator (different from the % wildcard) does “similarity ≥ threshold,” useful for typo tolerance: WHERE name % 'Jhon' finds “John” via shared trigrams.
tsvector columns. Materialise the analysed form as a generated column to avoid re-analysing on every query:
ALTER TABLE articles ADD COLUMN body_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;
CREATE INDEX articles_body_tsv ON articles USING gin (body_tsv);
SELECT id FROM articles WHERE body_tsv @@ to_tsquery('english', 'database');Engine-specific full-text features. Postgres FTS supports phrase queries (<->), proximity, weight classes (title vs body), and language configurations. MySQL InnoDB FTS supports boolean mode (+required -excluded ~deprioritized) and natural-language ranking. SQL Server has CONTAINS and FREETEXT against full-text catalogs.
Common pitfalls#
- Leading wildcard kills B-tree index.
LIKE '%foo'andLIKE '%foo%'cannot use a normal B-tree. Either add a trigram index, change the query to a prefix match, or store a reversed copy of the column for suffix queries (LIKE 'oof%'onreverse(col)). - Collation surprises. A column declared with
utf8mb4_general_cimatches case-insensitively even when you wroteLIKE 'Acme%'. Verify withSHOW CREATE TABLEbefore debugging missed matches. ILIKEdoesn’t use B-tree on Postgres.ILIKE 'acme%'cannot use a plain B-tree because it would have to scan all case variants. Either indexlower(col)and queryWHERE lower(col) LIKE 'acme%', or use a trigram index, or use a citext column.- Regex catastrophic backtracking.
^(a+)+$against a long string ofas with a finalbtakes exponential time. Most engines run an NFA-based matcher that can backtrack pathologically. Avoid nested quantifiers; use anchored, atomic groups when supported. - Full-text stopwords. “to be or not to be” returns nothing in default English FTS because every word is a stopword. Either change configuration or use a different operator for short common-word queries.
- Mixing
LIKEand full-text expectations.LIKE '%database%'matches “databases” because it’s a substring; FTS with English stemming matches “database” → “databases” via stemming but might miss “DB”. They feel similar but have different boundaries. %and_as literals. Searching for “50% off” withLIKE '%50% off%'does the wrong thing — the middle%is a wildcard. Escape it:LIKE '%50\% off%' ESCAPE '\'.- Trigram on short patterns. Trigrams need ≥3 characters to be useful.
LIKE '%ab%'falls back to a scan even with a trigram index.
Practice problem#
You have a 100M-row articles(id, title, body, published_at) table. The product team wants a search box that finds articles whose body contains a phrase the user types, ordered by recency. The query needs to return in under 200 ms p95. Walk through which operator and index you’d pick, and explain why a plain LIKE '%phrase%' won’t meet the latency target.
Solution
Use Postgres full-text search with a stored tsvector column and a GIN index, then order by recency from a B-tree:
ALTER TABLE articles ADD COLUMN body_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;
CREATE INDEX articles_body_tsv ON articles USING gin (body_tsv);CREATE INDEX articles_published_at ON articles (published_at DESC);
SELECT id, title, published_atFROM articlesWHERE body_tsv @@ phraseto_tsquery('english', :query)ORDER BY published_at DESCLIMIT 50;LIKE '%phrase%' does a sequential scan of all 100M rows — single-digit minutes at best. The GIN index reduces the candidate set to documents containing the phrase tokens (typically thousands), and the B-tree on published_at lets the planner pick a BitmapAnd plan or a merge with the recency index. If recency dominates over relevance, an index on (published_at DESC) WHERE body_tsv @@ ... isn’t possible, so the planner does a bitmap heap scan; a covering index on published_at INCLUDE (id, title) keeps the sort cheap.
Related patterns#
- Existence Check — “does any row match” rather than “find all matching.”
- Set Compare — combining multiple search results across queries.
- B-tree Indexes — the index type that prefix
LIKErides on. - Hash and Bitmap Indexes — GIN posting lists are a bitmap-flavoured structure.
- Query Execution and Plans — how to read
EXPLAINoutput to confirm which index aLIKEactually used.