Rolling Totals — DBMS · Engineering Playbook

What it is#

A rolling total is a cumulative or sliding aggregation computed per row over a frame of nearby rows. The standard template is SUM(x) OVER (ORDER BY t ROWS BETWEEN ... AND CURRENT ROW).

SELECT   day,
         signups,
         SUM(signups) OVER (
           ORDER BY day
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
         ) AS cumulative,
         AVG(signups) OVER (
           ORDER BY day
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
         ) AS moving_avg_7
FROM     daily_signups
ORDER BY day;

Two shapes, both critical to know:

Cumulative (UNBOUNDED PRECEDING AND CURRENT ROW) — running total from the beginning. The “total revenue to date” report.
Sliding (N PRECEDING AND CURRENT ROW) — moving sum or average over the last N rows. The “7-day moving average” report.

This pattern is a Window functions usage that comes up so often in interviews and dashboards that it deserves its own page. If you have ever computed a running total or moving average in a spreadsheet, this is the SQL equivalent — and it scales to billions of rows.

When to use it#

Reach for rolling totals when:

You have an ordered series of data (events, daily metrics, transactions per day) and need per-row context from preceding rows.
You need a cumulative value: total to date, items shipped year-to-date, balance after each transaction.
You need a sliding aggregation: 7-day moving average, last-100-trades volume, 30-day rolling churn.
You need a running maximum or minimum: highest stock price so far, lowest latency to date.

If the aggregation has no ordering (just per-group totals across all rows), use plain GROUP BY instead — see Aggregates, GROUP BY, HAVING. If the rolling computation needs to look at adjacent rows but not aggregate (current row minus previous row), use LAG() / LEAD().

How it works#

A window function has three optional clauses inside OVER (...):

PARTITION BY — restart the computation per partition. Each partition is independent.
ORDER BY — the ordering that determines which rows precede which. Required for rolling totals — without it the frame is undefined.
ROWS BETWEEN ... AND ... or RANGE BETWEEN ... AND ... — the frame: the subset of rows in the partition that the aggregate sees from this row’s perspective.

SELECT   user_id,
         day,
         spend,
         SUM(spend) OVER (
           PARTITION BY user_id
           ORDER BY day
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
         ) AS cumulative_spend
FROM     daily_user_spend;

For each row, the engine identifies the partition (user_id), orders the partition by day, and sums spend from the start of the partition up to and including the current row. The result has the same number of rows as the input — window functions do not collapse rows the way GROUP BY does.

ROWS vs RANGE. The difference is subtle and a frequent interview gotcha.

ROWS BETWEEN N PRECEDING AND CURRENT ROW

Counts physical rows. Three rows preceding means exactly three rows, regardless of their ORDER BY values.

SUM(x) OVER (
  ORDER BY day
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)

On 7 distinct days, the frame is exactly 7 rows.

RANGE BETWEEN N PRECEDING AND CURRENT ROW

Counts logical values. RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW is 7 days of data — which may be more or fewer than 7 rows if there are duplicates or gaps.

SUM(x) OVER (
  ORDER BY day
  RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
)

Handles missing days and ties correctly. Postgres, Oracle, SQL Server 2022+ support RANGE with intervals.

If you omit the frame clause but include ORDER BY, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — which sums all rows with the same ORDER BY value as the current row, plus all rows ordered before. For SUM this is usually fine if there are no duplicate ORDER BY values; with duplicates, you get the same total for all peer rows, which is rarely what you want.

Always specify the frame explicitly unless you’ve thought through the peer-row behaviour.

-- Recommended: explicit frame, no surprises
SUM(x) OVER (ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Risky: implicit frame, defaults to RANGE on peers
SUM(x) OVER (ORDER BY day)

Variants#

Cumulative total per partition. The most common shape — running total restarting per user, per account, per region.

SELECT   account_id,
         txn_date,
         amount,
         SUM(amount) OVER (
           PARTITION BY account_id
           ORDER BY txn_date, txn_id    -- include a tie-breaker
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
         ) AS running_balance
FROM     transactions;

The txn_id tie-breaker matters: two transactions on the same day in undefined order would produce different running balances. A deterministic ORDER BY is a requirement, not a nicety.

Moving average over N rows.

-- 30-day moving average of daily revenue
SELECT   day,
         revenue,
         AVG(revenue) OVER (
           ORDER BY day
           ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
         ) AS ma_30
FROM     daily_revenue;

For the first 29 days, the window has fewer than 30 rows, so the average is over whatever rows exist. If you need a strict “only emit the average once 30 days of data are present” rule, wrap in a CTE and filter WHERE day_number >= 30.

Moving average over a time window (handles gaps).

-- Postgres: 7-day rolling average that handles missing days
SELECT   day,
         revenue,
         AVG(revenue) OVER (
           ORDER BY day
           RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
         ) AS ma_7_days
FROM     daily_revenue;

If daily_revenue already has a row per day (including zeros for empty days from the Patch Gaps pattern), ROWS BETWEEN 6 PRECEDING and RANGE BETWEEN INTERVAL '6 days' give the same answer. They diverge as soon as some days are missing — RANGE correctly averages over the actual 7-day window, ROWS averages over the last 7 recorded days.

Running min / max / count. Same template, different aggregate.

-- High-water-mark balance: largest cumulative balance to date
SELECT   account_id,
         txn_date,
         amount,
         MAX(SUM(amount) OVER (
           PARTITION BY account_id ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
         )) OVER (
           PARTITION BY account_id ORDER BY txn_date
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
         ) AS high_water_mark
FROM     transactions;

Or — cleaner — compute the running sum in a CTE, then MAX over the running sum in the outer query.

WITH running AS (
  SELECT account_id, txn_date,
         SUM(amount) OVER (PARTITION BY account_id ORDER BY txn_date
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS bal
  FROM transactions
)
SELECT account_id, txn_date, bal,
       MAX(bal) OVER (PARTITION BY account_id ORDER BY txn_date
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS high_water
FROM running;

Centered moving average. Look both directions.

-- 7-day centered moving average: 3 days back, 3 days forward
AVG(revenue) OVER (
  ORDER BY day
  ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
)

Period-over-period change. Combine a rolling total with LAG.

SELECT   day,
         revenue,
         revenue - LAG(revenue) OVER (ORDER BY day) AS day_over_day_change,
         SUM(revenue) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7
FROM     daily_revenue;

Common pitfalls#

Forgetting ROWS BETWEEN and getting RANGE behaviour with peer rows. As covered above, the default frame is RANGE from unbounded preceding through all peer rows. On a column like day with duplicate values, this means rows on the same day all see the same total. Always specify ROWS BETWEEN ... AND CURRENT ROW (or RANGE BETWEEN) explicitly.

No ORDER BY in the window. A window with PARTITION BY but no ORDER BY has an undefined frame — the engine may return any value. Some engines will silently produce the partition-wide total for every row; others will error. Add the ORDER BY always, even if you think rows are pre-sorted.

Non-deterministic ordering when ORDER BY has ties. Two rows with the same day may appear in either order across runs. The running total is technically correct (the sum is the same), but the per-row running total assigned to each peer is arbitrary. Add a secondary key — ORDER BY day, id or ORDER BY day, created_at — for reproducibility.

Performance on large tables. A windowed SUM over a billion-row partition is O(n) per partition — much cheaper than the self-join alternative (O(n^2)). But partitioning on a low-cardinality column followed by a high-cardinality ordering may still spill to disk. Tips:

Pre-filter with WHERE to shrink the input before the window function runs.
Match the partition + order to an existing index — Postgres can avoid the sort if the index covers PARTITION BY ... , ORDER BY ... keys. See B-tree indexes.
For warehouse workloads, row-store vs column-store matters — columnar engines (BigQuery, Snowflake, Redshift) parallelise window functions across partitions efficiently.

MySQL pre-8.0 has no window functions. If you’re stuck on MySQL 5.7, simulate with a correlated subquery (SELECT SUM(x) FROM t t2 WHERE t2.day <= t.day) — O(n^2) and painful past a few thousand rows. Upgrade to 8.0+.

Mixing OVER () with GROUP BY. Window functions execute after GROUP BY. So SUM(x) OVER () in the same query as GROUP BY y operates on the post-aggregation rows, not the original rows. This is sometimes exactly what you want (running total of grouped daily revenue) and sometimes a surprise.

-- Cumulative revenue by day, computed on already-aggregated daily totals
SELECT   day,
         SUM(amount) AS daily,
         SUM(SUM(amount)) OVER (ORDER BY day
                                ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative
FROM     orders
GROUP BY day
ORDER BY day;

Practice problem#

You have a signups table with one row per signup event:

CREATE TABLE signups (
  id          BIGSERIAL PRIMARY KEY,
  user_id     BIGINT NOT NULL,
  signed_up_at TIMESTAMPTZ NOT NULL
);

Write a query that returns, for each day from 30 days ago to today, the daily signup count and a 7-day rolling average of daily signups (current day plus the six preceding days).

Solution

WITH daily AS (
  SELECT DATE(signed_up_at) AS day, COUNT(*) AS signups
  FROM   signups
  WHERE  signed_up_at >= CURRENT_DATE - INTERVAL '30 days'
  GROUP  BY DATE(signed_up_at)
)
SELECT   day,
         signups,
         AVG(signups) OVER (
           ORDER BY day
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
         ) AS ma_7
FROM     daily
ORDER BY day;

Aggregate to a daily count first, then compute the 7-day rolling average with ROWS BETWEEN 6 PRECEDING AND CURRENT ROW. If empty days are possible and need to be averaged as zero, prefix this with the Patch Gaps pattern to fill the calendar.

Window functions — the building block this pattern uses; covers RANK, LAG, LEAD, NTILE.
Patch Gaps — fill missing days before computing a rolling average over a time window.
Tally Count — the per-day count that this pattern typically rolls.
Group Bucket — for time buckets larger than a day (week, month).
Aggregates, GROUP BY, HAVING — when you do not need the per-row preservation that windows give you.

What it is#

When to use it#

How it works#

Variants#

Common pitfalls#

Practice problem#

Related patterns#