Rank and Row — DBMS · Engineering Playbook

What it is#

The rank and row pattern is the standard way to assign an ordinal position to each row within a partition. Three window functions cover ninety-nine percent of interview questions in this space: ROW_NUMBER, RANK, and DENSE_RANK. They share the same skeleton.

SELECT *,
       ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col DESC) AS rn,
       RANK()       OVER (PARTITION BY group_col ORDER BY sort_col DESC) AS rk,
       DENSE_RANK() OVER (PARTITION BY group_col ORDER BY sort_col DESC) AS drk
FROM   t;

The three differ only in how they treat ties on sort_col. For an input ordered as 100, 90, 90, 80:

ROW_NUMBER produces 1, 2, 3, 4 — every row gets a unique number, ties are broken arbitrarily.
RANK produces 1, 2, 2, 4 — tied rows share a rank, the next rank skips ahead.
DENSE_RANK produces 1, 2, 2, 3 — tied rows share a rank, the next rank is the next integer.

Once you internalise the contrast, every variant of “top-N per group”, “the Nth largest by group”, and “the second-highest salary” maps onto the same shape — pick the right function, filter on the rank column.

-- Top 3 highest-paid employees per department
SELECT department_id, employee_id, salary
FROM (
  SELECT department_id, employee_id, salary,
         DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rk
  FROM   employees
) ranked
WHERE rk <= 3
ORDER BY department_id, rk;

When to use it#

Reach for rank-and-row whenever the answer is “one specific row per group” or “the top few rows per group” and the grouping has more than a handful of distinct values. The pattern also fits “the Nth largest X” questions where N is fixed.

“The most recent order per customer” — ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC), keep rn = 1.
“The top 3 products by revenue per category” — DENSE_RANK() ... rk <= 3.
“The second-highest salary overall” — DENSE_RANK() OVER (ORDER BY salary DESC), rk = 2.
“Median salary per department” — ROW_NUMBER ascending and descending, average the rows where the two ranks meet in the middle.

If you only need a single aggregate value per group (max, min, sum), prefer plain GROUP BY — see Aggregates, GROUP BY, HAVING. Ranking is the right hammer when you need the row the aggregate came from, not just the value. If you want adjacent-row navigation rather than ordinal ranking, see Lead-Lag Navigation.

How it works#

Window functions compute over a frame defined by an OVER (...) clause. For the ranking family, two clauses matter:

PARTITION BY — splits the input into independent windows. Ranking resets at every partition boundary. Omit it to rank across the whole table.
ORDER BY — the sort key that determines the rank order. Mandatory for the ranking family; without it the rank is undefined.

A frame clause (ROWS BETWEEN ...) is not used for ROW_NUMBER / RANK / DENSE_RANK — they don’t aggregate over a sliding window, they assign a position based on the sort.

-- One row per (department, employee), ordinal within department
SELECT department_id, employee_id, salary,
       ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
FROM   employees;

The engine evaluates window functions after WHERE and GROUP BY but before the outer SELECT projection and ORDER BY. That means you cannot reference the rank column in the same query’s WHERE. Wrap the ranking in a subquery or CTE and filter outside.

-- The classic top-N-per-group with a CTE
WITH ranked AS (
  SELECT department_id, employee_id, salary,
         ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
  FROM   employees
)
SELECT department_id, employee_id, salary
FROM   ranked
WHERE  rn = 1;

The second-highest-salary canon. Every SQL interview eventually asks for the second-highest salary. The clean answer uses DENSE_RANK:

SELECT salary
FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rk
  FROM   employees
) x
WHERE rk = 2;

DENSE_RANK is the right choice because it treats tied salaries as the same rank with no gap — if two people share the highest salary, the second-highest is the next distinct value. RANK would skip past rank 2 if there are two rank-1 ties. ROW_NUMBER is wrong because it picks an arbitrary single row for each ordinal.

Variants#

NTILE(N) for equal-count buckets. Splits the partition into N buckets of (nearly) equal row count and labels each row with its bucket number 1..N. Useful for percentiles, quartiles, deciles.

-- Quartiles of order amount
SELECT order_id, amount,
       NTILE(4) OVER (ORDER BY amount) AS quartile
FROM   orders;

NTILE(100) gives percentile buckets. NTILE(4) gives quartiles. If the row count isn’t divisible by N, the engine puts the extra rows into the earlier buckets — bucket 1 may have one more row than bucket 4.

PERCENT_RANK and CUME_DIST. Both return a fractional value in [0, 1] rather than an integer rank.

PERCENT_RANK — (rank - 1) / (rows_in_partition - 1). The lowest row is 0; the highest is 1.
CUME_DIST — rows_with_value_at_or_below_current / total_rows. Always greater than 0; the highest row is 1.

Most interviews don’t ask for these by name, but knowing they exist saves you from inventing the math manually.

Top-N with ties resolved by a tiebreaker. When ties on the primary sort key are unwanted, add a tiebreaker column to ORDER BY.

-- One row per customer: most recent order, tiebreak by largest amount
WITH ranked AS (
  SELECT customer_id, order_id, created_at, amount,
         ROW_NUMBER() OVER (
           PARTITION BY customer_id
           ORDER BY created_at DESC, amount DESC, order_id DESC
         ) AS rn
  FROM   orders
)
SELECT customer_id, order_id, created_at, amount
FROM   ranked
WHERE  rn = 1;

A unique column at the end (order_id) makes the result deterministic even when the leading sort columns tie.

Modern alternatives. Postgres supports DISTINCT ON (col) — a non-standard but very compact way to pick one row per partition.

-- Postgres: one most-recent order per customer, no window
SELECT DISTINCT ON (customer_id) customer_id, order_id, created_at
FROM   orders
ORDER BY customer_id, created_at DESC;

The ordering is mandatory and must lead with the DISTINCT ON column. SQL Server has TOP (1) WITH TIES paired with ORDER BY. Neither is portable; the window form is.

Top-N-per-group with LATERAL. Postgres and standard SQL allow a correlated lateral join that fetches the top-N for each row of an outer table.

-- Top 3 orders per customer, lateral form
SELECT c.customer_id, o.order_id, o.amount
FROM   customers c
CROSS JOIN LATERAL (
  SELECT order_id, amount
  FROM   orders
  WHERE  customer_id = c.customer_id
  ORDER BY amount DESC
  LIMIT 3
) o;

On large tables with a matching index this can outperform the window-function form because the engine does a tight index lookup per customer rather than ranking every order.

Common pitfalls#

Using ROW_NUMBER when ties matter. This is the most common slip. The interviewer asks for “the top 3 salaries per department” and a candidate writes ROW_NUMBER() ... rn <= 3. If the department has two people tied at the highest salary, one of them is silently dropped from the result. RANK or DENSE_RANK includes both.

ROW_NUMBER — pick one per tier.

Use when you need exactly one row per partition — “the latest order per customer”, “any one record per duplicate group”. A deterministic tiebreaker in ORDER BY is essential.

-- One row per customer
ROW_NUMBER() OVER (
  PARTITION BY customer_id
  ORDER BY created_at DESC, id DESC
)

RANK / DENSE_RANK — keep tied rows.

Use when “top 3” should include all rows tied at rank 3. DENSE_RANK is the safer default; RANK is right only when the gap behaviour is wanted.

-- Top 3 including ties
DENSE_RANK() OVER (
  PARTITION BY department_id
  ORDER BY salary DESC
)

Filtering the rank in WHERE. Window functions are computed after WHERE, so this fails:

-- WRONG: rk is not visible in WHERE of the same query
SELECT employee_id, DENSE_RANK() OVER (ORDER BY salary DESC) AS rk
FROM   employees
WHERE  rk = 2;

Wrap in a subquery or CTE and filter outside, or use QUALIFY (BigQuery, Snowflake, Teradata) which is designed exactly for this purpose:

-- BigQuery / Snowflake: QUALIFY filters on a window expression
SELECT employee_id, salary
FROM   employees
QUALIFY DENSE_RANK() OVER (ORDER BY salary DESC) = 2;

Forgetting ORDER BY in OVER. A ranking function without ORDER BY is undefined behaviour. Some engines accept it (assigning rank 1 to every row); others error. Always provide a sort key.

Non-deterministic results from tied sorts. ROW_NUMBER() OVER (ORDER BY salary DESC) with two rows tied at salary 100000 picks one of them arbitrarily, and the choice can change between runs. If determinism matters (and it usually does in interview settings), add a unique column to the ORDER BY.

Index mismatch on large tables. PARTITION BY a ORDER BY b DESC over a billion rows without an index on (a, b DESC) forces a full sort. The plan shows a Sort node before the WindowAgg. See query execution and plans for how to read the plan.

NTILE distribution off-by-one. With 11 rows and NTILE(4), the engine produces buckets of sizes 3, 3, 3, 2 — not 2, 3, 3, 3. The extra rows go to earlier buckets. If you need percentile boundaries with exact equality semantics, use PERCENT_RANK instead.

Counting distinct ranks for “top N”. A common bug: writing WHERE rk <= 3 with DENSE_RANK and assuming “exactly 3 rows per group”. If the top three salaries are 100, 100, 90, 90, 80, you get four rows, not three. This is usually what you want, but the interviewer may expect exactly three — clarify the requirement.

Practice problem#

You have an employees table:

CREATE TABLE employees (
  employee_id   BIGSERIAL PRIMARY KEY,
  department_id BIGINT  NOT NULL,
  name          TEXT    NOT NULL,
  salary        NUMERIC NOT NULL
);

Find the second-highest salary in each department, including departments where multiple employees share that salary. Return one row per (department_id, salary) pair plus the list of employee names tied at that salary (comma-separated). If a department has fewer than two distinct salary values, omit it.

Solution

WITH ranked AS (
  SELECT department_id, name, salary,
         DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rk
  FROM   employees
)
SELECT department_id,
       salary,
       STRING_AGG(name, ', ' ORDER BY name) AS employees
FROM   ranked
WHERE  rk = 2
GROUP BY department_id, salary
ORDER BY department_id;

DENSE_RANK because ties at rank 1 still leave rank 2 as the next distinct salary. Group by (department_id, salary) so all employees tied at the second-highest salary are aggregated into a single row. Departments with only one distinct salary naturally drop out — they have no rank-2 row. STRING_AGG is Postgres / SQL Server 2017+; use GROUP_CONCAT in MySQL or LISTAGG in Oracle / Snowflake.

Lead-Lag Navigation — when you want the adjacent row by sort order rather than the ordinal rank.
Gaps and Islands — uses ROW_NUMBER paired with a value to identify runs of consecutive rows.
Window functions — the building block; covers frames, partitioning, and the aggregate window family.
Aggregates, GROUP BY, HAVING — when you need only the aggregate value per group, not the row it came from.
Subqueries and CTEs — the wrapping required to filter on a ranked column.