Patch Gaps

Fill missing rows in a sequence with default values via LEFT JOIN against a generated calendar or series.

Pattern Intermediate
9 min read
sql sql-pattern joins time-series calendar

What it is#

The patch gaps pattern fills missing rows in an ordered sequence — usually dates — with default values (typically zero or null). It is built from two ingredients: a complete sequence generated on the fly, then a LEFT JOIN against the sparse data.

-- Postgres: daily signups for the last 30 days, including days with zero signups
WITH calendar AS (
SELECT generate_series(
CURRENT_DATE - INTERVAL '29 days',
CURRENT_DATE,
INTERVAL '1 day'
)::date AS day
)
SELECT c.day,
COALESCE(COUNT(s.id), 0) AS signups
FROM calendar c
LEFT JOIN signups s
ON DATE(s.signed_up_at) = c.day
GROUP BY c.day
ORDER BY c.day;

Without this pattern, a daily-signup query produces a row only for days where at least one signup happened. A dashboard with a sawtooth gap pattern is almost always missing a calendar join.

It is also a critical companion to the Rolling Totals pattern: a 7-day moving average computed over a series with missing days gives wrong answers — the missing days were zero, not “not in the window.”

When to use it#

Use patch gaps when:

  • A time-series report needs to show every period in a range, including periods with no events.
  • A funnel report needs all steps even if some steps had no users.
  • A grid join is needed: every combination of two dimensions (every user × every product, every region × every month).
  • A downstream window function (moving average, rolling sum) must treat missing periods as zero.
  • A line chart visualiser needs evenly-spaced x-axis values; missing days produce a misleading slope.

Don’t use it when:

  • The downstream consumer can interpret missing periods correctly on its own.
  • The “gap” is actually an artifact of an upstream filter that should be relaxed instead.
  • The series cardinality is huge and most rows would be zero — sparse representations are more efficient.

How it works#

Three steps:

  1. Generate the complete sequence as a CTE or subquery. Date series for time, integer series for sequence numbers, cross-join for grids.
  2. LEFT JOIN the sparse table onto the sequence. The sequence is on the left; the data is on the right. Missing rows on the right come back as NULL.
  3. COALESCE the right-side columns to apply the default value — usually 0 for counts and sums, but NULL is sometimes the correct default.
SELECT c.day,
COALESCE(COUNT(o.id), 0) AS orders,
COALESCE(SUM(o.amount), 0) AS revenue
FROM calendar c
LEFT JOIN orders o
ON DATE(o.created_at) = c.day
GROUP BY c.day
ORDER BY c.day;

The filter must be in the ON clause, not the WHERE clause. This is the single most common bug in this pattern.

Correct — filter in ON:

SELECT c.day, COUNT(o.id) AS orders
FROM calendar c
LEFT JOIN orders o
ON DATE(o.created_at) = c.day
AND o.status = 'paid'
GROUP BY c.day;

Days with no paid orders still appear with count 0. The status filter is part of the join condition.

Wrong — filter in WHERE:

SELECT c.day, COUNT(o.id) AS orders
FROM calendar c
LEFT JOIN orders o
ON DATE(o.created_at) = c.day
WHERE o.status = 'paid'
GROUP BY c.day;

On days with no orders at all, the LEFT JOIN produces a row with o.status = NULL. The WHERE clause then drops it (because NULL = 'paid' is NULL, not TRUE), turning the LEFT JOIN into an effective INNER JOIN and re-introducing the gaps.

Mnemonic: filters on the right side of a LEFT JOIN belong in ON; filters on the left side belong in WHERE. The right-side filter in WHERE defeats the LEFT JOIN.

Variants#

Generate a series — by engine#

Postgres / Redshift: generate_series is the canonical helper.

SELECT day::date
FROM generate_series(DATE '2026-01-01', DATE '2026-12-31', INTERVAL '1 day') AS day;

It also generates integer series, useful for grid joins or numbers-table substitutes.

SELECT n FROM generate_series(1, 100) AS n;

BigQuery: GENERATE_DATE_ARRAY, then UNNEST.

SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2026-01-01', '2026-12-31', INTERVAL 1 DAY)) AS day;

Snowflake: GENERATOR + ROW_NUMBER arithmetic, or DATEADD from a row sequence.

SELECT DATEADD(day, SEQ4(), DATE '2026-01-01') AS day
FROM TABLE(GENERATOR(ROWCOUNT => 365));

SQL Server: recursive CTE or sys.all_objects row trick.

WITH calendar AS (
SELECT CAST('2026-01-01' AS DATE) AS day
UNION ALL
SELECT DATEADD(day, 1, day) FROM calendar WHERE day < '2026-12-31'
)
SELECT day FROM calendar
OPTION (MAXRECURSION 366);

MAXRECURSION 0 disables the default 100-row cap entirely.

MySQL 8+: recursive CTE.

WITH RECURSIVE calendar(day) AS (
SELECT DATE('2026-01-01')
UNION ALL
SELECT DATE_ADD(day, INTERVAL 1 DAY) FROM calendar WHERE day < '2026-12-31'
)
SELECT day FROM calendar;

The default cte_max_recursion_depth is 1000; raise it with SET cte_max_recursion_depth = 100000; for longer ranges.

MySQL pre-8.0 / SQLite: persistent numbers table.

-- Materialised once: integers 0..99999
CREATE TABLE numbers (n INT PRIMARY KEY);
INSERT INTO numbers SELECT n FROM (
-- some join over information_schema or hand-rolled population
) src;
-- Use it as a series
SELECT DATE_ADD('2026-01-01', INTERVAL n DAY) AS day
FROM numbers
WHERE n < 365;

A persistent numbers table is the most portable, cheapest source of an integer series across engines — useful even on MySQL 8+ where it can be a view.

Patch gaps in a non-date sequence#

The pattern is not date-specific. Anywhere you have a known full set joined to a partial set, the same shape applies.

-- All product × month pairs, including months with zero sales
WITH products AS (SELECT id FROM products),
months AS (
SELECT day FROM generate_series(DATE '2026-01-01', DATE '2026-12-01', INTERVAL '1 month') AS day
)
SELECT p.id,
m.day AS month,
COALESCE(SUM(s.amount), 0) AS revenue
FROM products p
CROSS JOIN months m
LEFT JOIN sales s
ON s.product_id = p.id
AND DATE_TRUNC('month', s.sold_at) = m.day
GROUP BY p.id, m.day
ORDER BY p.id, m.day;

The CROSS JOIN between products and months produces the complete grid; the LEFT JOIN to sales then fills it.

Continuous gaps — finding which values are missing#

If the goal is not to patch the gaps but to find them, the inverse query uses NOT EXISTS or LEFT JOIN ... IS NULL against the same calendar.

-- Which days in the range had zero signups
SELECT c.day
FROM calendar c
LEFT JOIN signups s ON DATE(s.signed_up_at) = c.day
WHERE s.id IS NULL;

This is a use of the Existence Check pattern on top of the calendar.

Common pitfalls#

Right-side filter in WHERE (covered above). The single most common bug. If your LEFT JOIN suddenly looks like an INNER JOIN, check the WHERE clause.

Timezone bugs when generating the calendar. CURRENT_DATE and DATE(timestamp) follow the session timezone. If your data is stored in UTC and users see local time, a day in user-local time may straddle two days in UTC. Decide on one timezone (usually the report’s timezone), and convert both the calendar and the timestamp into it:

-- Postgres: report in America/Los_Angeles
WITH calendar AS (
SELECT generate_series(
(CURRENT_DATE - INTERVAL '29 days') AT TIME ZONE 'UTC' AT TIME ZONE 'America/Los_Angeles',
CURRENT_DATE AT TIME ZONE 'UTC' AT TIME ZONE 'America/Los_Angeles',
INTERVAL '1 day'
)::date AS day
)
SELECT c.day, COUNT(s.id)
FROM calendar c
LEFT JOIN signups s
ON DATE(s.signed_up_at AT TIME ZONE 'America/Los_Angeles') = c.day
GROUP BY c.day;

COALESCE on the aggregate, not the column. When patching counts, COUNT(s.id) is already 0 for unmatched rows (because COUNT of zero rows is 0, not NULL). The COALESCE is redundant. But SUM(s.amount) over zero rows is NULL — there COALESCE(SUM(s.amount), 0) is required.

COUNT(s.id) -- 0 when no match, no COALESCE needed
COALESCE(SUM(s.amount), 0) -- NULL when no match, COALESCE needed
COALESCE(AVG(s.amount), 0) -- NULL when no match (cannot average zero rows)
COALESCE(MAX(s.amount), 0) -- NULL when no match

Calendar too narrow. If the range is parameterised, an off-by-one on the upper bound silently drops the most recent day. Test with a query like SELECT MIN(day), MAX(day), COUNT(*) FROM calendar before joining.

Performance on huge calendars. A daily calendar across 10 years is 3650 rows — tiny. A minute-by-minute calendar across the same range is 5.3 million rows. If a fact table has 100M rows and the LEFT JOIN doesn’t have a good index on the join column, the plan can degrade to a nested loop with a 5.3M outer side. Aggregate to a coarser grain first.

Generated series ignores DST. Postgres generate_series(..., INTERVAL '1 day') in a session timezone with DST produces some 23-hour and 25-hour days, but the ::date cast smooths over that. For minute-level or hour-level series, generate in UTC and convert per-row.

Practice problem#

You have a signups table:

CREATE TABLE signups (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
signed_up_at TIMESTAMPTZ NOT NULL
);

Write a Postgres query that returns daily signup counts for the last 30 days, including the current day, with one row per day even when no one signed up. Days with zero signups should show signups = 0, not be absent.

Solution
WITH calendar AS (
SELECT generate_series(
CURRENT_DATE - INTERVAL '29 days',
CURRENT_DATE,
INTERVAL '1 day'
)::date AS day
)
SELECT c.day,
COUNT(s.id) AS signups
FROM calendar c
LEFT JOIN signups s
ON DATE(s.signed_up_at) = c.day
GROUP BY c.day
ORDER BY c.day;

generate_series produces 30 dates, the LEFT JOIN matches each against signup rows by truncated date, and COUNT(s.id) correctly returns 0 for unmatched calendar rows (no COALESCE needed because COUNT of zero rows is 0, not NULL).

  • Joins (inner, outer, cross) — the LEFT JOIN semantics this pattern relies on.
  • Rolling Totals — frequently consumes the gap-patched output for moving averages over time windows.
  • Existence Check — the inverse use case: finding which days are missing instead of patching them.
  • Tally Count — the per-period aggregate this pattern wraps with a calendar.
  • Subqueries and CTEs — the CTE structure used to define the calendar cleanly.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.