DBMS SQL

Subqueries and CTEs

Correlated vs non-correlated subqueries, the readability case for CTEs, recursive CTEs for trees.

Building Block Intermediate
6 min read
sql subquery cte recursive

What it is#

A subquery is a query nested inside another query. A CTE (Common Table Expression) is a named subquery introduced with WITH name AS (...) that you can reference once or many times in the outer query. Both are ways to compose SQL out of smaller pieces; both compile to the same underlying relational algebra; both can usually be rewritten as joins, but readability and reuse make them distinct tools.

The most common forms:

-- non-correlated subquery
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM vip_customers);
-- correlated subquery
SELECT o.* FROM orders o
WHERE o.amount > (SELECT AVG(amount) FROM orders WHERE customer_id = o.customer_id);
-- CTE
WITH recent_orders AS (
SELECT customer_id, COUNT(*) AS n FROM orders WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY 1
)
SELECT c.name, r.n FROM customers c JOIN recent_orders r ON c.id = r.customer_id;

CTEs are a SQL-99 feature; recursive CTEs (WITH RECURSIVE ...) were added in SQL:1999 too but landed in major engines years later. MySQL only got both in 8.0 (2018).

When to use it#

Reach for a non-correlated subquery when you need a small lookup set inside IN / NOT IN / EXISTS / =. Reach for a correlated subquery when each outer row needs a different inner answer (“the average for this customer”). Reach for a CTE when:

  • The same derived table is referenced more than once.
  • The query is deep enough that flattening it into one block hurts readability.
  • You need a recursive query (tree walk, graph reachability, generated series).

Don’t reach for subqueries/CTEs when a plain JOIN would do — the planner can almost always handle them, but inline joins read more naturally and don’t tempt junior readers into believing the subquery is somehow “separate” work.

How it works#

Logically every SQL construct is an algebraic expression, and subqueries/CTEs are just labelled subtrees in that expression. Operationally:

  • Non-correlated subquery. Executed once. The result is materialised (or kept in a hash) and the outer query consumes it. For WHERE x IN (...), modern planners convert it to a semijoin — equivalent to INNER JOIN ... DISTINCT against the subquery’s output.
  • Correlated subquery. Logically executed per outer row, because it depends on a value from that row. Planners aggressively rewrite — often into a join or window function — but if rewriting fails you can hit per-row execution and quadratic cost.
  • Non-recursive CTE. A named subquery. Postgres 12+ inlines CTEs by default unless materialisation is forced (WITH x AS MATERIALIZED (...)); MySQL and SQL Server have always inlined them. Pre-12 Postgres treated CTEs as an “optimisation fence” — they materialised, and queries that joined a CTE to a table multiple times got slow.
  • Recursive CTE. Two arms: a non-recursive anchor and a recursive UNION ALL step. The engine iterates the recursive step until it produces no new rows. Used for trees, hierarchies, graph reachability, and generated number/date series.
-- Walk a manager hierarchy down from a CEO
WITH RECURSIVE chain AS (
SELECT id, name, manager_id, 0 AS depth FROM employees WHERE name = 'CEO'
UNION ALL
SELECT e.id, e.name, e.manager_id, c.depth + 1
FROM employees e JOIN chain c ON e.manager_id = c.id
)
SELECT * FROM chain ORDER BY depth, name;

Variants#

  • EXISTS vs IN vs JOIN. All three express “is there a match in the inner set?”. EXISTS is usually fastest for correlated existence checks because it stops on the first match per outer row. NOT IN has a NULL trap: if the inner set ever produces a NULL, NOT IN returns nothing. NOT EXISTS doesn’t have this trap.
  • Scalar subquery. A subquery that returns a single row, single column, usable wherever a value is expected (SELECT ..., (SELECT MAX(...) FROM ...) AS x FROM ...). If the subquery returns more than one row, runtime error.
  • Lateral subquery. LEFT JOIN LATERAL (subquery) ON true — the subquery can reference earlier rows in the FROM clause. Useful for “top-N per group” patterns when a window function is awkward.
  • Materialised CTE. Postgres’ WITH x AS MATERIALIZED (...) forces materialisation and re-use; the inverse NOT MATERIALIZED forces inlining. Use sparingly — the planner usually picks correctly.
  • Recursive vs iterative. A recursive CTE walks a hierarchy in one query; an iterative procedure (PL/pgSQL loop, app-side recursion) walks it in many. The CTE is one round-trip; the procedure is N. CTEs win for short, well-bounded hierarchies; procedures win when the iteration body needs side effects.

Trade-offs#

Subqueries inline. Compact; no extra scope to scan. Good for one-shot derived sets. Hard to reuse, and nested subqueries past two levels get unreadable fast.
CTEs (WITH). Named; reusable; readable; recursive cases possible. One extra mental step (you read the CTE, then the outer query). Materialisation behaviour varied by engine until ~2018 — always test on your version.

Other axes:

  • IN vs EXISTS. Functionally close; performance can differ. EXISTS short-circuits per outer row; IN against a small static list is unbeatable on cardinality. Read the plan.
  • CTE vs subquery vs temp table. A CTE is logically a one-shot named subquery; a temp table is a real on-disk table you write once and query many times. For very large derived sets queried repeatedly across statements, a temp table can outperform a CTE that re-materialises.
  • Recursive depth limit. Recursive CTEs can run away. Postgres has no default limit; MySQL’s cte_max_recursion_depth defaults to 1000. Always add LIMIT or a depth guard for production queries.

Common pitfalls#

  • NOT IN + NULL. WHERE x NOT IN (SELECT y FROM t) returns empty if any row of t.y is NULL. Use NOT EXISTS instead, or filter NULLs explicitly.
  • Correlated subquery in SELECT. Looks innocuous (SELECT (SELECT ... WHERE inner.id = outer.id) AS x FROM outer) but executes per outer row. Often a window function or LATERAL is the better shape.
  • Self-referencing CTE without termination. Forgetting the join condition in a recursive CTE’s recursive step produces a runaway scan. The recursive step’s predicate must shrink the working set.
  • CTE as “view”. A CTE is not a view — it’s scoped to the statement, evaluated as part of it, and may or may not be materialised. Don’t expect view-level reuse.
  • Misleading row counts. Subqueries inside IN deduplicate; subqueries inside EXISTS don’t. Subtle when refactoring between them.
  • CTE inlining surprises. Postgres pre-12 always materialised CTEs (optimisation fence); 12+ inlines unless told otherwise. A query that was fast on 11 can be slow on 12 because the planner now joins a CTE multiple times instead of materialising once.
A recursive-CTE pattern worth memorising

WITH RECURSIVE series AS (SELECT 1 AS n UNION ALL SELECT n+1 FROM series WHERE n < 100) SELECT * FROM series; — generates 1..100 without a generate_series table. Works on every engine with recursive-CTE support. Adapt to date ranges, calendar tables, or padding zero-count buckets in time-series queries.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.