Recursive Tree — DBMS · Engineering Playbook

What it is#

The recursive tree pattern uses a WITH RECURSIVE common table expression to walk a parent-child relationship across an arbitrary number of levels. It’s the SQL answer to “find all descendants of X”, “list every node reachable from Y”, and “compute the depth of each node in a hierarchy”. Without recursion you’d write a self-join per level — which only works when the depth is known and bounded.

WITH RECURSIVE subordinates AS (
  -- Anchor: the starting node(s)
  SELECT employee_id, manager_id, name, 0 AS depth
  FROM   employees
  WHERE  employee_id = 1
  UNION ALL
  -- Recursive: rows whose manager_id matches a row in the CTE
  SELECT e.employee_id, e.manager_id, e.name, s.depth + 1
  FROM   employees   e
  JOIN   subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates ORDER BY depth, employee_id;

The shape is always: anchor query that produces the seed rows, UNION ALL, recursive query that references the CTE’s own name to produce the next level. The engine evaluates the anchor once, then iterates the recursive query — each iteration treats the rows produced by the previous iteration as its input — until the recursive query produces zero rows.

WITH RECURSIVE is standard SQL since SQL:1999. Engine support: Postgres, MySQL 8+, SQL Server (no RECURSIVE keyword — every CTE may be recursive), Oracle (and the older CONNECT BY syntax), SQLite, BigQuery, Snowflake. Older MySQL (< 8.0) has no recursive CTE — you must use stored procedures or application-side iteration.

When to use it#

Reach for a recursive CTE whenever the data is shaped like a tree or a directed graph and the query needs to walk across nodes:

Organisational hierarchies — “all subordinates of the CEO”, “every employee under a given manager”, “depth of each role from the top”.
Category / taxonomy trees — “all leaf categories under Electronics”, “the full breadcrumb path for a product”.
Threaded comments / replies — “every reply (direct or transitive) to a given comment”.
Bill of materials — “every component, sub-component, and raw material in a finished product”.
Transitive closures — “every account reachable from this one through transfer edges”.
Date / number series generation — generate (1..N) or every day in a range without a calendar table.
Shortest path / depth labelling — annotate each node with its distance from a chosen root.

If the depth is fixed and small (say, three levels), a series of self-joins or a flat hierarchical column may be simpler and faster — recursion has overhead. If the data is large and queried frequently, consider materialising the closure into a separate table (a “closure table”). If you only need some aggregate per node rather than the full traversal, see Aggregates, GROUP BY, HAVING.

How it works#

A recursive CTE has three required pieces and one optional one.

The CTE name — referenced inside the recursive query.
The anchor query — runs once, produces the seed rows. Must produce columns matching the CTE definition.
The recursive query — references the CTE’s own name; runs repeatedly until it produces zero rows. Joined to the CTE on the edge condition.
An optional ORDER BY / LIMIT in the outer query.

WITH RECURSIVE descendants AS (
  -- Anchor
  SELECT id, parent_id, name, ARRAY[id] AS path, 0 AS depth
  FROM   categories
  WHERE  parent_id IS NULL                -- start from roots
  UNION ALL
  -- Recursive step
  SELECT c.id, c.parent_id, c.name,
         d.path || c.id,                  -- append to path
         d.depth + 1
  FROM   categories c
  JOIN   descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants ORDER BY path;

The engine maintains a working set: after the anchor runs, the working set is the seed rows. Each iteration runs the recursive query against the previous working set (not the cumulative result), produces the next batch, appends it to the result, and replaces the working set with the new batch. When the recursive query produces zero rows, iteration stops.

UNION ALL is the standard choice: it preserves duplicates and is cheap. UNION (distinct) is allowed but rarely useful — it forces the engine to de-duplicate at every iteration, which is slow and usually masks a real bug (a cycle, or the same path discovered twice).

Top-down (anchor = roots, descend through children). The most common shape — start from the top of the tree and recursively visit children.

WITH RECURSIVE org AS (
  SELECT employee_id, manager_id, name, 0 AS depth
  FROM   employees
  WHERE  manager_id IS NULL              -- the CEO
  UNION ALL
  SELECT e.employee_id, e.manager_id, e.name, o.depth + 1
  FROM   employees e
  JOIN   org o ON e.manager_id = o.employee_id
)
SELECT * FROM org;

Bottom-up (anchor = a leaf or target, ascend through parents). Find every ancestor of a given node.

WITH RECURSIVE ancestors AS (
  SELECT employee_id, manager_id, name, 0 AS depth
  FROM   employees
  WHERE  employee_id = 42                -- the starting employee
  UNION ALL
  SELECT e.employee_id, e.manager_id, e.name, a.depth + 1
  FROM   employees e
  JOIN   ancestors a ON e.employee_id = a.manager_id
)
SELECT * FROM ancestors;

The recursive join condition is the only thing that changes between top-down and bottom-up: which side of the edge is the CTE row, and which side is the new row.

Variants#

Transitive closure. Compute the full set of reachable nodes from each starting node. The classic graph-reachability query.

-- Every account reachable from each starting account through transfer edges
WITH RECURSIVE reach AS (
  SELECT from_account AS start, to_account AS reachable
  FROM   transfers
  UNION
  SELECT r.start, t.to_account
  FROM   reach r
  JOIN   transfers t ON t.from_account = r.reachable
)
SELECT * FROM reach;

Note UNION (distinct) here — necessary because the same node may be reachable through multiple paths and we only want it once per starting node. This is exactly the case where UNION is the right call over UNION ALL.

Shortest path with depth. Each iteration tracks depth; the minimum depth per (start, reachable) pair is the shortest path length.

WITH RECURSIVE paths AS (
  SELECT from_node, to_node, 1 AS hops
  FROM   edges
  UNION ALL
  SELECT p.from_node, e.to_node, p.hops + 1
  FROM   paths p
  JOIN   edges e ON e.from_node = p.to_node
  WHERE  p.hops < 10                       -- bound the depth
)
SELECT from_node, to_node, MIN(hops) AS shortest
FROM   paths
GROUP BY from_node, to_node;

The hops < 10 bound is a manual safety net. Without it, a cyclic graph runs forever (or until the engine hits its recursion limit).

Cycle detection by carrying a path. When the graph might have cycles, carry the visited-nodes path and exclude rows that revisit a node.

WITH RECURSIVE walk AS (
  SELECT from_node, to_node, ARRAY[from_node, to_node] AS path
  FROM   edges
  WHERE  from_node = 1
  UNION ALL
  SELECT w.from_node, e.to_node, w.path || e.to_node
  FROM   walk w
  JOIN   edges e ON e.from_node = w.to_node
  WHERE  e.to_node <> ALL(w.path)         -- skip if already visited
)
SELECT DISTINCT to_node FROM walk;

Standard SQL also provides a dedicated CYCLE clause: WITH RECURSIVE walk(...) AS (...) CYCLE node SET is_cycle TO 1 DEFAULT 0 USING path. Postgres 14+ supports it. Older engines need the manual array-path approach above.

Series generation. Generate integers or dates without a numbers table.

-- Integers 1 to 100
WITH RECURSIVE nums(n) AS (
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM nums WHERE n < 100
)
SELECT * FROM nums;

Postgres has generate_series(1, 100). SQL Server has a recursive-CTE-or-cross-join idiom. MySQL has only the recursive form. Performance: generate_series is one to two orders of magnitude faster — prefer it on Postgres.

Closure table materialisation. For trees queried often, materialise the transitive closure into a permanent table:

CREATE TABLE category_closure (
  ancestor    BIGINT NOT NULL,
  descendant  BIGINT NOT NULL,
  depth       INT    NOT NULL,
  PRIMARY KEY (ancestor, descendant)
);

Maintained by triggers or application code on every insert/delete of an edge. Queries become a single join instead of a recursive CTE — orders of magnitude faster at read time, at the cost of write-side complexity and storage.

Common pitfalls#

Missing terminating condition leads to infinite recursion. A recursive CTE without a stopping condition runs until the engine hits its max-recursion safety limit and errors out.

Wrong — no terminating condition.

WITH RECURSIVE nums(n) AS (
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM nums
)
SELECT * FROM nums;

The recursive query always produces a row — there’s no WHERE to stop it. The engine eventually errors with “max recursion exceeded”.

Right — explicit bound.

WITH RECURSIVE nums(n) AS (
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM nums WHERE n < 100
)
SELECT * FROM nums;

The WHERE n < 100 stops the recursion when n = 100 — the recursive query produces no row, and iteration ends.

Recursion limits. Every engine has a default max iteration count to protect against runaway queries. Postgres has no hard limit on iteration count (memory is the real bound). MySQL 8 defaults cte_max_recursion_depth = 1000. SQL Server defaults MAXRECURSION to 100 — override with OPTION (MAXRECURSION 0) for unlimited or any explicit number. SQLite has no limit by default. If your query errors with “recursion limit exceeded”, check the engine’s setting — and consider whether your data has a cycle.

Cyclic graphs without cycle protection. If the data has a cycle and the query doesn’t track visited nodes, the recursion never terminates. Either use the standard CYCLE clause (Postgres 14+) or carry a path array and filter on it as shown in the variants section.

Wrong join direction. A top-down join (children from parents) is e.manager_id = cte.employee_id. Flipping it to e.employee_id = cte.manager_id walks bottom-up instead. Both are valid recursions; pick the one that matches the question.

Type mismatch between anchor and recursive query. The anchor and recursive query must produce the same column types. A common slip: the anchor’s depth is INT but the recursive step writes 0 + 1 as BIGINT in some engines, causing a type mismatch error. Cast explicitly in both branches.

Performance — recursive CTEs are materialised. Most engines do not push predicates from the outer SELECT into the recursive CTE. A query like WITH RECURSIVE org AS (...) SELECT * FROM org WHERE depth = 3 computes the entire tree and filters afterwards. Push the filter into the recursive query when possible — bound the depth in the recursive step itself rather than the outer query.

No index on the join column. The recursive query joins on the parent / child column on every iteration. Without an index on that column, each iteration is a full scan. For an employees table joined on manager_id, you want CREATE INDEX ON employees(manager_id). See B-tree indexes.

Using UNION when UNION ALL is intended. UNION de-duplicates the entire accumulated result on every iteration — extremely expensive and rarely necessary. Use UNION ALL by default; switch to UNION only when you specifically need set semantics (typically for transitive-closure queries on a graph with multiple paths).

Confusing the recursive reference with self-reference inside the recursive query. Inside the recursive query, the CTE name refers to the previous iteration’s output, not the cumulative result and not the recursive query itself. You cannot reference the CTE name in a subquery in a way that produces a self-join within a single iteration — only one reference per iteration is allowed in standard SQL.

Practice problem#

You have an employees table:

CREATE TABLE employees (
  employee_id BIGSERIAL PRIMARY KEY,
  name        TEXT    NOT NULL,
  title       TEXT    NOT NULL,
  manager_id  BIGINT  REFERENCES employees(employee_id)
);

Given employee_id = 1 is the CEO, find all descendants of the CEO — every employee who reports to the CEO directly or transitively. Return employee_id, name, title, and depth (1 for direct reports, 2 for their reports, and so on). Order by depth ascending, then employee_id.

Solution

WITH RECURSIVE descendants AS (
  -- Anchor: direct reports of the CEO
  SELECT employee_id, name, title, manager_id, 1 AS depth
  FROM   employees
  WHERE  manager_id = 1
  UNION ALL
  -- Recursive: reports of anyone already in the set
  SELECT e.employee_id, e.name, e.title, e.manager_id, d.depth + 1
  FROM   employees e
  JOIN   descendants d ON e.manager_id = d.employee_id
)
SELECT employee_id, name, title, depth
FROM   descendants
ORDER BY depth, employee_id;

Top-down recursion starting at the CEO’s direct reports — the anchor seeds depth 1 so the CEO itself (depth 0) is excluded. The recursive step joins each employee back to the existing CTE by manager_id = employee_id, incrementing depth. UNION ALL is correct because a tree has no cycles or shared paths — every employee appears in exactly one row.

Subqueries and CTEs — covers non-recursive CTEs as the building block; recursion is one extra keyword on top.
Joins inner outer cross — the recursive step is a join; understanding join shapes is prerequisite.
Gaps and Islands — an alternative tool for sequence problems; recursive CTEs can do the same but window functions are usually simpler.
Window functions — often combined with recursive CTEs for ordering or path indexing within the result.
Query execution and plans — recursive CTEs have a distinct plan node; reading it is essential for performance tuning.