Window Functions — DBMS · Engineering Playbook

What it is#

A window function computes an aggregate-like value across a set of rows defined relative to the current row, without collapsing the rows the way GROUP BY does. The syntax is function() OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN ...). Common functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM(), AVG(), FIRST_VALUE(), LAST_VALUE(), NTILE(n), PERCENT_RANK().

Window functions are the SQL feature that most often closes the gap between “I had to dump this to a script” and “I can do it in one query.” Running totals per user, top-N per group, percentile bands, lag-based diffs — all of these are window-function problems. Every relational engine since the late 2000s supports them; MySQL was the last major holdout and added them in 8.0.

-- Rank orders per customer by amount
SELECT customer_id, order_id, amount,
       RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS rank
FROM orders;

-- 7-day moving average of daily revenue
SELECT day,
       AVG(revenue) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma7
FROM daily_revenue;

When to use it#

Reach for a window function when you need to compute a value per row that depends on other rows. Concretely:

Ranking and top-N per group. ROW_NUMBER() OVER (PARTITION BY group ORDER BY metric) plus a WHERE rn <= K outer filter beats every other top-N pattern.
Running totals and moving averages. SUM(x) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).
Diff to previous / next row. LAG(x) OVER (ORDER BY date) and LEAD(x).
First / last value in a partition. FIRST_VALUE(x) OVER (PARTITION BY g ORDER BY t).
Percentile and bucketing. NTILE(10), PERCENT_RANK(), CUME_DIST().
De-duplication keeping the “best” row per key. ROW_NUMBER() OVER (PARTITION BY key ORDER BY priority) with WHERE rn = 1.

Don’t reach for them for plain aggregates — GROUP BY is simpler when you actually want to collapse rows. Don’t reach for them when an index can satisfy the same shape (a LIMIT 1 ORDER BY per key may outperform a window for very tall tables).

How it works#

A window query is conceptually executed in three phases:

Compute the result of the rest of the query as if the window function weren’t there. This produces a working row set.
For each row, define its window — the subset of rows from the working set that share its PARTITION BY values, sorted by ORDER BY, with the framing ROWS BETWEEN ... clause cutting the range.
Evaluate the window function over that window and attach the result to the row.

The window doesn’t collapse rows. Every input row produces exactly one output row.

The three knobs#

SUM(amount) OVER (
  PARTITION BY customer_id    -- split rows into independent groups
  ORDER BY order_date         -- order within each group
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW   -- frame: 7-row trailing window
)

PARTITION BY — optional. If omitted, the entire result is one partition. If present, the function restarts per partition.
ORDER BY — optional. Defines order within the partition (and is required for any function that depends on row order: LAG, LEAD, RANK, running aggregates).
ROWS / RANGE / GROUPS frame — optional. Default for ordered windows is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Use ROWS for row-count-based windows; RANGE for value-range windows.

Plan shape#

Postgres’ EXPLAIN shows window functions as a WindowAgg node above a Sort. The sort cost is the main thing to watch — if a window partitions on an indexed column the engine may use a Memoize or skip-sort plan. Otherwise it sorts the entire working set once per OVER clause shape. Sharing the same OVER across multiple aggregates is free; mixing different OVER clauses costs separate sorts.

Variants#

Ranking family. ROW_NUMBER (always unique, 1..N), RANK (ties share a number, gaps after), DENSE_RANK (ties share, no gaps), NTILE(n) (bucket into n quantiles).
Lag/lead. LAG(x, n DEFAULT default) reads from n rows back; LEAD is the mirror. Default n is 1.
First/last/nth value. FIRST_VALUE, LAST_VALUE (watch the default frame — RANGE UNBOUNDED PRECEDING AND CURRENT ROW, which surprises people: LAST_VALUE returns the current row’s value by default; usually you want ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).
Cumulative distribution. CUME_DIST, PERCENT_RANK — useful for percentile bands.
Aggregate family. Any aggregate (SUM, AVG, MIN, MAX, COUNT, even ARRAY_AGG / STRING_AGG) becomes a window function when paired with OVER (...).
Named windows. WINDOW w AS (PARTITION BY ... ORDER BY ...) lets you name a window once and reference it multiple times in the same SELECT — improves both readability and planner cost-sharing.

Trade-offs#

Window functions. Keep all rows; compute per-row aggregates relative to neighbours. Replace ugly self-joins. Standardised across engines. Plan cost dominated by partition+sort.

GROUP BY aggregates. Collapse rows to one per group. Cheaper when collapsing is what you want. Can’t return both group-level and row-level data in one query.

Other axes:

PARTITION BY vs GROUP BY. Window partition keeps rows; group collapses. Choose by whether the consumer needs the original rows.
ROWS vs RANGE. ROWS BETWEEN N PRECEDING AND CURRENT ROW is a row-count window — useful for moving averages on row sequences. RANGE is value-range based — useful for date ranges. RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW is the correct shape for time-windowed metrics with irregular timestamps.
Performance vs subquery. A ROW_NUMBER() OVER (PARTITION BY k ORDER BY t) + WHERE rn = 1 top-N is usually faster than a correlated subquery. For very tall tables, an index on (k, t) may make a LATERAL/index-only-scan plan even faster.

Common pitfalls#

LAST_VALUE default frame. The default frame for ordered windows ends at the current row, so LAST_VALUE(x) OVER (ORDER BY t) returns the current row’s x — almost certainly not what you wanted. Either swap to FIRST_VALUE(x) OVER (ORDER BY t DESC) or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
Forgetting PARTITION BY. Without it the window spans the whole result, which is rarely what you want for per-group ranking.
Window function in WHERE. Illegal — window functions are evaluated after WHERE. To filter by a window result, wrap in a subquery or CTE and filter outside.
NULLs in ORDER BY. Postgres treats NULLS LAST by default for ASC and NULLS FIRST for DESC. Other engines differ. Explicit NULLS FIRST/LAST is portable.
Tied rows with ROW_NUMBER. ROW_NUMBER is deterministic only if the ORDER BY produces no ties. Two rows with identical sort keys can get different numbers on re-execution. Add a tie-breaker key.
Confusing RANK and DENSE_RANK. RANK skips numbers after ties; DENSE_RANK doesn’t. Pick by whether the consumer expects the next rank to be tie_count + 1 or just tie + 1.
Forgetting that COUNT(*) OVER () gives total rows. A handy pattern for “show each row alongside the total count” — but make sure that’s what the consumer wants. Cheap because the planner computes it once across the partition.

The `top-N per group` pattern, memorised

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY metric DESC) AS rn
  FROM source
) t WHERE rn <= 3;

Replaces every awkward correlated-subquery / self-join attempt at “top 3 per group”. Reads cleanly, performs well, and works on every engine. Memorise it.

Why window functions feel revelatory the first time

Coming from GROUP BY, the implicit assumption is that aggregation collapses rows. Window functions break that assumption — they let you stand inside a row and look around at its neighbours. Once you have that mental model, a lot of queries that previously required dumping to a script (running totals, deltas, top-N per group, percentile bands) become one-liners. The cost is one new piece of syntax; the benefit is permanent.