Pivot Flip — DBMS · Engineering Playbook

What it is#

The pivot flip pattern reshapes data between two layouts: long (one row per measurement, with a column naming the metric) and wide (one row per entity, with separate columns for each metric). Pivot is long → wide; unpivot is wide → long. Both are essential when the analytics layer expects one shape but the storage layer holds the other.

Long form (one row per (entity, metric) pair)

| user_id | event_type | count |
|---------|------------|-------|
| 1       | click      | 12    |
| 1       | purchase   | 3     |
| 2       | click      | 8     |
| 2       | purchase   | 1     |

Pivot ───────────►

Wide form (one row per entity, columns per metric)

| user_id | clicks | purchases |
|---------|--------|-----------|
| 1       | 12     | 3         |
| 2       | 8      | 1         |

There are three ways to write a pivot in SQL: a vendor-specific PIVOT clause, an extension function like Postgres crosstab, and conditional aggregation — SUM(CASE WHEN ... THEN ... END) — which is portable to every engine and works regardless of vendor support. Conditional aggregation is the default professional choice; vendor pivots are syntactic sugar that occasionally win on readability.

The unpivot direction — wide → long — has a similar story: vendor UNPIVOT clauses in SQL Server and Oracle, the portable form is UNION ALL over column-shaped subqueries (or VALUES (...) joined laterally on modern engines).

When to use it#

Reach for pivot whenever:

A dashboard or report expects one row per entity with metric columns, but the data is stored in event-log form.
A CSV export needs to be in spreadsheet-wide format.
A query produces a comparison table — months as columns, products as rows.
You want a contingency table (cross-tabulation of two categorical variables).

Reach for unpivot whenever:

Source data is in wide form (e.g., a survey table with one column per question) and you want to normalise it for aggregation.
An ETL pipeline flattens wide rows into key-value pairs.
You want to compare values across columns as if they were rows — for example, finding which of q1_score, q2_score, q3_score is largest per row.

If you only need a per-group breakdown (without reshaping into columns), prefer Tally Count — it’s the same GROUP BY shape without the conditional projection. If the pivot column set is unknown at query time, see the “dynamic pivot” variant — but consider whether the application layer should pivot instead.

How it works#

The portable form: conditional aggregation. The skeleton is GROUP BY entity_key with one SUM(CASE WHEN metric = '...' THEN value END) expression per pivot column.

-- Pivot long → wide using conditional aggregation
SELECT
  user_id,
  SUM(CASE WHEN event_type = 'click'    THEN cnt END) AS clicks,
  SUM(CASE WHEN event_type = 'purchase' THEN cnt END) AS purchases,
  SUM(CASE WHEN event_type = 'signup'   THEN cnt END) AS signups
FROM   user_event_counts
GROUP BY user_id
ORDER BY user_id;

Read the CASE expression as a column-level filter: each row’s cnt is either kept (if it matches the metric) or replaced with NULL. SUM ignores NULLs, so the aggregation collapses to the matching rows. For a count rather than a sum: COUNT(*) FILTER (WHERE event_type = 'click') on Postgres / SQLite, or SUM(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) everywhere else.

This works on every engine. It scales to any number of pivot columns (with the caveat that they must be enumerated at write time — see “dynamic pivot” below). It composes with other clauses naturally.

The SQL Server / Oracle PIVOT clause. Vendor-specific syntax that takes a list of pivot values and an aggregate function.

-- SQL Server / Oracle PIVOT
SELECT user_id, [click] AS clicks, [purchase] AS purchases, [signup] AS signups
FROM (
  SELECT user_id, event_type, cnt FROM user_event_counts
) src
PIVOT (
  SUM(cnt) FOR event_type IN ([click], [purchase], [signup])
) AS p;

Cleaner-looking for a small known set of pivot values. Not supported on Postgres, MySQL, SQLite, or BigQuery. Snowflake supports PIVOT as of recent versions.

Postgres crosstab from the tablefunc extension.

-- Postgres crosstab (requires CREATE EXTENSION tablefunc)
SELECT *
FROM   crosstab(
  $$SELECT user_id, event_type, cnt
    FROM   user_event_counts
    ORDER BY 1, 2$$,
  $$VALUES ('click'), ('purchase'), ('signup')$$
) AS ct(user_id BIGINT, clicks BIGINT, purchases BIGINT, signups BIGINT);

crosstab is faster than conditional aggregation on very large data because it doesn’t evaluate the CASE per row. The tradeoff is two string-quoted SQL fragments and an explicit column type list — fiddly enough that most teams stick with conditional aggregation unless performance demands the switch.

Unpivot: wide → long. SQL Server and Oracle have an UNPIVOT clause. The portable form is UNION ALL:

-- Unpivot survey table: one row per (user_id, question, score)
SELECT user_id, 'q1' AS question, q1_score AS score FROM survey
UNION ALL
SELECT user_id, 'q2', q2_score FROM survey
UNION ALL
SELECT user_id, 'q3', q3_score FROM survey;

A nicer form on modern Postgres / SQL Server uses VALUES with a lateral join:

-- Postgres: unpivot via LATERAL VALUES
SELECT s.user_id, v.question, v.score
FROM   survey s
CROSS JOIN LATERAL (VALUES
  ('q1', s.q1_score),
  ('q2', s.q2_score),
  ('q3', s.q3_score)
) AS v(question, score);

One pass over survey produces three rows per input row — same result as the UNION ALL, cleaner syntax. SQL Server 2017+ supports a near-identical pattern with CROSS APPLY VALUES (...).

Variants#

Pivot with multiple aggregates per column. Conditional aggregation makes this trivial — just project additional aggregate columns per metric:

-- For each (department, role): sum salary, average salary, headcount
SELECT department_id,
       SUM(CASE WHEN role = 'engineer'  THEN salary END) AS eng_total,
       AVG(CASE WHEN role = 'engineer'  THEN salary END) AS eng_avg,
       COUNT(*) FILTER (WHERE role = 'engineer')         AS eng_count,
       SUM(CASE WHEN role = 'manager'   THEN salary END) AS mgr_total,
       AVG(CASE WHEN role = 'manager'   THEN salary END) AS mgr_avg,
       COUNT(*) FILTER (WHERE role = 'manager')          AS mgr_count
FROM   employees
GROUP BY department_id;

Vendor PIVOT syntax is awkward for multiple aggregates — you’d typically write multiple PIVOT queries and join them.

Pivot with a derived bucket column. The pivot column doesn’t have to be a stored value — it can be a CASE expression over a continuous column.

-- One row per product, columns = price tier
SELECT product_id,
       SUM(CASE WHEN price <  10 THEN qty END) AS cheap_qty,
       SUM(CASE WHEN price BETWEEN 10 AND 50 THEN qty END) AS mid_qty,
       SUM(CASE WHEN price >  50 THEN qty END) AS premium_qty
FROM   order_items
GROUP BY product_id;

Dynamic pivot — column names unknown at write time. None of the above forms cover the case where the pivot columns are determined by the data (e.g., one column per distinct event_type that exists today). Standard SQL has no answer; every engine handles it differently:

Application-side: query distinct pivot values first, build the SQL string with those column names, execute.
Postgres: a function that returns a RECORD with dynamic shape, or PL/pgSQL that returns a refcursor.
SQL Server: dynamic SQL via EXEC sp_executesql with a generated column list.
Snowflake: PIVOT with ANY (FOR event_type IN (ANY ORDER BY event_type)).

For interview purposes, “dynamic pivot is application-side; the database can’t return rows with column names unknown at parse time” is the right answer. The Snowflake ANY form is a recent exception.

GROUPING SETS, ROLLUP, CUBE for sub-totals. Not strictly pivot, but often the right tool when the question is “show me totals broken down multiple ways at once”:

-- One query, results include per-(country, plan), per-country, per-plan, grand total
SELECT country, plan, COUNT(*) AS cnt
FROM   users
GROUP BY GROUPING SETS ((country, plan), (country), (plan), ());

The output is long-form with NULL in the columns that aren’t part of the current grouping. If you then need it wide, pivot.

Pivot in the application vs the database. A common temptation: query long-form data, pivot in Python / pandas / the report engine. This is usually wrong for two reasons. First, you pull more rows over the wire than necessary. Second, application-side pivot can’t be aggregated further in SQL — once it’s in the application, it’s stuck there. Pivot in the database when the consumer expects wide; pivot in the application only when the wide layout is truly the final form.

Common pitfalls#

Pivoting in the application instead of the database. A 100k-row result over the wire is much more expensive than a 1k-row pivoted result. Always prefer in-database pivot for serving dashboards.

Dynamic pivot via string concatenation. Building a pivot SQL string by concatenating distinct values directly into the query text is a SQL injection vector. Use parameterised dynamic SQL where the engine supports it (sp_executesql on SQL Server with named parameters), or validate the value list against an allowlist before substitution.

SUM(CASE ...) returning NULL when you expected 0. SUM(CASE WHEN event_type = 'click' THEN cnt END) returns NULL when no rows match (the entire SUM is over an all-NULL set). If you need 0 instead, wrap with COALESCE:

COALESCE(SUM(CASE WHEN event_type = 'click' THEN cnt END), 0) AS clicks

This matters for downstream arithmetic — NULL + anything = NULL, so missing buckets contaminate computations unless coalesced.

UNPIVOT losing column types. The UNION ALL form of unpivot requires every branch to produce the same column types. If q1_score is INTEGER and q2_comment is TEXT, you cannot unpivot them into the same score column without a cast. SQL Server’s UNPIVOT enforces this at write time (column types must match); the portable form lets you write the bug and find out at execution.

Wrong — type mismatch silently broken.

SELECT user_id, 'q1' AS field, q1_score AS value FROM survey
UNION ALL
SELECT user_id, 'q2', q2_comment FROM survey;

q1_score is integer; q2_comment is text. Some engines error; some implicitly cast to text, silently corrupting the integer column.

Right — explicit cast to a common type.

SELECT user_id, 'q1' AS field, q1_score::text AS value FROM survey
UNION ALL
SELECT user_id, 'q2', q2_comment FROM survey;

Force both branches to text. Downstream queries that need the numeric value must re-cast — pay attention to the actual data shape.

Forgetting to group all non-pivot columns. A pivot is just a GROUP BY with conditional aggregates. Every column in the SELECT that isn’t aggregated must be in GROUP BY. Missing this produces “column must appear in GROUP BY” errors.

Performance: pivot over many distinct values. A pivot with 50 conditional aggregates evaluates 50 CASE expressions per row. On a billion-row table, this is the slow part — not the GROUP BY. If the pivot column is read-heavy, consider materialising the pivoted form into a summary table refreshed periodically.

Confusing pivot with cross join. A pivot reshapes the same data into a different layout — the row count decreases (one row per group). A cross join with a values table increases the row count (one row per group × value). They look superficially similar in queries but produce opposite shapes.

GROUP BY on the wrong key. Pivoting on event_type while accidentally including event_at in GROUP BY produces one wide row per timestamp instead of one wide row per entity. Strip the time column or aggregate it (DATE_TRUNC('day', event_at)).

Hardcoded column lists drift from data. A pivot with clicks, purchases, signups will silently miss a new event_type = 'subscribe' introduced last week. Either codegen the pivot from a metadata table or accept that pivot column lists need maintenance — there’s no fully automatic version in portable SQL.

Unpivot loses rows when source columns are nullable. UNION ALL over q1_score, q2_score, q3_score produces three rows per source row, including rows where the score is NULL. If you want to drop those, add WHERE score IS NOT NULL after the unpivot.

Practice problem#

You have an events table tracking user actions on a product:

CREATE TABLE events (
  user_id     BIGINT      NOT NULL,
  event_type  TEXT        NOT NULL,  -- one of: 'view', 'click', 'purchase'
  ts          TIMESTAMPTZ NOT NULL
);

Produce a pivot where each row is one user and the columns are views, clicks, purchases — each the count of that event type for the user. Include only users with at least one event. Order by purchases descending, then clicks descending, then user_id ascending.

Solution

SELECT user_id,
       COUNT(*) FILTER (WHERE event_type = 'view')     AS views,
       COUNT(*) FILTER (WHERE event_type = 'click')    AS clicks,
       COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases
FROM   events
GROUP BY user_id
ORDER BY purchases DESC, clicks DESC, user_id ASC;

Conditional aggregation with COUNT(*) FILTER (...), supported on Postgres and SQLite. For MySQL / SQL Server, replace each COUNT(*) FILTER (WHERE p) with SUM(CASE WHEN p THEN 1 ELSE 0 END). GROUP BY user_id ensures one row per user; the filter narrows each count to the matching event type. Users with no events naturally don’t appear (no row in the source). Three-key ordering breaks ties deterministically.

Tally Count — the long-form precursor; pivot is “tally, but with one column per value”.
Aggregates, GROUP BY, HAVING — the underlying mechanism; pivot is conditional aggregation with GROUP BY.
Subqueries and CTEs — pivots often live in a CTE that downstream queries consume.
Joins inner outer cross — pivot and join compose: pivot first, then join the wide result.
Select and the WHERE clause — pre-pivot filtering goes in WHERE; post-pivot in HAVING or an outer query.