Lead-Lag Navigation — DBMS · Engineering Playbook

What it is#

The lead-lag navigation pattern uses two paired window functions — LEAD and LAG — to reach across rows for the next or previous value in a sorted partition. They are the SQL equivalent of array index arithmetic: instead of computing arr[i] - arr[i-1], you write value - LAG(value) OVER (ORDER BY ...).

SELECT user_id, event_at,
       LAG(event_at)  OVER (PARTITION BY user_id ORDER BY event_at) AS prev_event_at,
       LEAD(event_at) OVER (PARTITION BY user_id ORDER BY event_at) AS next_event_at
FROM   events;

Both functions take three arguments: LEAD(col, offset, default) and LAG(col, offset, default). The offset defaults to 1 (one row ahead or behind). The default is the value returned when the offset reaches past the partition boundary — without it you get NULL, which is fine for many queries but breaks subtraction and equality tests at the edges.

-- Difference from previous day's revenue, NULL on the first day
SELECT day, revenue,
       revenue - LAG(revenue) OVER (ORDER BY day) AS delta
FROM   daily_revenue;

-- Same, but with a sensible default so the first day shows 0
SELECT day, revenue,
       revenue - LAG(revenue, 1, 0) OVER (ORDER BY day) AS delta
FROM   daily_revenue;

Once you internalise that LAG = previous row and LEAD = next row, every “change from previous”, “compare this row to the next”, and “inter-event gap” query reduces to a one-liner.

When to use it#

Use lead-lag whenever the answer depends on comparing each row to a neighbouring row in a defined order. The grouping is rarely “the whole table” — usually it’s per-user, per-session, per-product.

“How long between consecutive logins for each user?” — event_at - LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at).
“Did the user’s status change since their last record?” — status != LAG(status) OVER (PARTITION BY user_id ORDER BY recorded_at).
“What’s the day-over-day delta in active users?” — dau - LAG(dau) OVER (ORDER BY day).
“How long was each session?” — LEAD(event_at, 1, event_at) - event_at for inter-event gap, or boundary detection for session ends.
“Identify the start of every streak” — LAG(status) != status flags transitions.

If you want an ordinal position rather than an adjacent value, see Rank and Row. If you want to identify runs of consecutive rows by some grouping key, see Gaps and Islands — lead-lag is often the first step.

How it works#

LEAD and LAG are window functions, so they follow the standard OVER (...) rules: PARTITION BY defines the per-group reset, ORDER BY defines the row order within the partition. Frame clauses are not applicable — neither function aggregates.

-- Previous and next event per user
SELECT user_id, event_id, event_at,
       LAG(event_at, 1, NULL) OVER w  AS prev_at,
       LEAD(event_at, 1, NULL) OVER w AS next_at
FROM   events
WINDOW w AS (PARTITION BY user_id ORDER BY event_at);

The WINDOW clause is standard SQL — you can name a window definition once and reference it by name from multiple functions in the same SELECT. Postgres, MySQL 8, SQL Server 2022, and Oracle support it; older MySQL and SQLite require repeating the OVER (...) clause each time.

The arguments. LAG(col) is shorthand for LAG(col, 1, NULL). The full form is occasionally useful:

LAG(col, n) — reach n rows back instead of 1.
LAG(col, n, d) — same, but return d instead of NULL when there’s no such row.

-- 7-day moving comparison (this week vs last week)
SELECT day, signups,
       signups - LAG(signups, 7, 0) OVER (ORDER BY day) AS wow_delta
FROM   daily_signups;

LEAD mirrors LAG exactly with the offset going forward instead of backward.

Change-from-previous as the canonical use. The pattern is so common it has a name in time-series analytics: “delta from previous”.

-- Day-over-day percent change in revenue
SELECT day,
       revenue,
       LAG(revenue) OVER (ORDER BY day)                                       AS prev_revenue,
       (revenue - LAG(revenue) OVER (ORDER BY day)) AS abs_delta,
       ROUND(
         100.0 * (revenue - LAG(revenue) OVER (ORDER BY day))
              / NULLIF(LAG(revenue) OVER (ORDER BY day), 0),
         2
       ) AS pct_change
FROM   daily_revenue
ORDER BY day;

NULLIF guards against division by zero in case the previous day’s revenue was zero — without it the query errors on the first such row.

Streak detection as the second canonical use. A streak is a run of consecutive rows where some condition holds. The first step is always: compare each row to its predecessor and mark transitions.

-- Mark every row where the user's status changed since the previous row
SELECT user_id, recorded_at, status,
       CASE
         WHEN LAG(status) OVER (PARTITION BY user_id ORDER BY recorded_at) IS DISTINCT FROM status
         THEN 1 ELSE 0
       END AS is_change
FROM   user_status_history;

IS DISTINCT FROM (Postgres / SQL Server / SQLite) treats two NULLs as equal — important because the very first row has LAG = NULL and a plain != would return NULL instead of TRUE. In MySQL use NOT (a <=> b) for the same semantics.

Variants#

FIRST_VALUE and LAST_VALUE. These window functions look superficially similar — return a value from somewhere in the frame — but they read the first or last row of the frame rather than an offset from the current row.

-- First and last login per user, in one pass
SELECT user_id, login_at,
       FIRST_VALUE(login_at) OVER w AS first_login,
       LAST_VALUE(login_at)  OVER w AS last_login
FROM   logins
WINDOW w AS (
  PARTITION BY user_id ORDER BY login_at
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

The frame clause is critical — see the pitfalls. FIRST_VALUE rarely needs it (the default frame already starts at the first row); LAST_VALUE almost always does.

NTH_VALUE(col, n). Returns the nth value in the frame. Useful when you want, say, the third login per user without ranking the whole partition.

Multi-step lookback. Reach back more than one row by passing a higher offset.

-- 30-day-ago value for a daily metric
SELECT day, metric,
       LAG(metric, 30) OVER (PARTITION BY metric_name ORDER BY day) AS metric_30d_ago
FROM   daily_metrics;

Session boundary detection. A common product-analytics use: define a session as events with less than 30 minutes between consecutive events. Lead-lag finds the boundaries.

-- Mark the first event of every new session per user
SELECT user_id, event_at,
       CASE
         WHEN EXTRACT(EPOCH FROM (event_at - LAG(event_at)
                       OVER (PARTITION BY user_id ORDER BY event_at))) > 1800
         OR   LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at) IS NULL
         THEN 1 ELSE 0
       END AS is_session_start
FROM   events;

A running sum over the is_session_start column then gives each event a session id — the same trick used in Gaps and Islands.

Default value tricks. The third argument of LAG / LEAD can be more than a literal — it can be the current row’s column, which is a clean way to express “same value if no neighbour”.

-- Time until next event, or 0 if last event in partition
SELECT user_id, event_at,
       LEAD(event_at, 1, event_at) OVER (PARTITION BY user_id ORDER BY event_at)
         - event_at AS time_to_next
FROM   events;

Common pitfalls#

Forgetting ORDER BY in OVER. LAG(col) OVER (PARTITION BY user_id) without ORDER BY is undefined — the engine may pick any previous row in the partition. Always specify the order, and add a tiebreaker column if the primary sort key is not unique.

PARTITION BY too coarse. A common bug: writing LAG(event_at) OVER (ORDER BY event_at) when you meant LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at). Without the partition, LAG crosses user boundaries — the first event of user B references the last event of user A.

Wrong — crosses user boundaries.

SELECT user_id, event_at,
       event_at - LAG(event_at) OVER (ORDER BY event_at) AS gap
FROM   events;

The gap for user B’s first event is the time since user A’s last event — usually meaningless.

Right — partitioned by user.

SELECT user_id, event_at,
       event_at - LAG(event_at) OVER (
         PARTITION BY user_id ORDER BY event_at
       ) AS gap
FROM   events;

LAG is NULL for each user’s first event — exactly what you want.

The LAST_VALUE frame bug. This trap catches even experienced SQL developers. The default frame for a window with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — meaning LAST_VALUE returns the current row’s value, not the last row of the partition.

-- WRONG: returns current row's login_at, not the last per user
SELECT user_id, login_at,
       LAST_VALUE(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS last_login
FROM   logins;

-- RIGHT: explicit frame to reach the last row
SELECT user_id, login_at,
       LAST_VALUE(login_at) OVER (
         PARTITION BY user_id ORDER BY login_at
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS last_login
FROM   logins;

FIRST_VALUE doesn’t have this problem because the default frame already starts at the partition’s first row. LAG and LEAD ignore the frame entirely — they navigate by offset, not by frame.

NULL propagation through equality. LAG(status) != status returns NULL (not TRUE) when the previous status was NULL or doesn’t exist. Use IS DISTINCT FROM (Postgres / SQL Server / SQLite) or NOT (a <=> b) (MySQL) for null-safe comparison.

Division by zero in pct-change. (curr - prev) / prev * 100 errors when prev = 0. Wrap with NULLIF(prev, 0) so the result is NULL rather than an error.

Performance — sort is the cost. LAG and LEAD require sorting the partition. On a billion-row events table partitioned by user_id and ordered by event_at, the sort dominates. A composite index on (user_id, event_at) lets the engine stream rows in order; without it, expect a Sort node above the WindowAgg. See query execution and plans.

Confusing offset direction. LAG looks backward (earlier row by sort order); LEAD looks forward. A quick rule: in ORDER BY day ASC, LAG(day) is yesterday, LEAD(day) is tomorrow. In ORDER BY day DESC, the meanings flip.

Using LAG/LEAD to bridge unrelated tables. If you need values from two different tables in the same row, you want a JOIN, not lead-lag — see joins. Lead-lag is purely within a single sorted partition.

Practice problem#

You have a login_events table tracking when each user logged into the product:

CREATE TABLE login_events (
  user_id     BIGINT      NOT NULL,
  login_date  DATE        NOT NULL,
  PRIMARY KEY (user_id, login_date)
);

Assume one row per (user_id, login_date) — at most one login record per day per user. Find the longest streak of consecutive days each user logged in, returning one row per user with user_id and longest_streak. A streak of 1 counts.

Solution

WITH marked AS (
  SELECT user_id, login_date,
         CASE
           WHEN LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date)
                = login_date - INTERVAL '1 day'
           THEN 0 ELSE 1
         END AS is_streak_start
  FROM   login_events
),
streak_ids AS (
  SELECT user_id, login_date,
         SUM(is_streak_start) OVER (PARTITION BY user_id ORDER BY login_date) AS streak_id
  FROM   marked
),
streak_lengths AS (
  SELECT user_id, streak_id, COUNT(*) AS streak_len
  FROM   streak_ids
  GROUP BY user_id, streak_id
)
SELECT user_id, MAX(streak_len) AS longest_streak
FROM   streak_lengths
GROUP BY user_id
ORDER BY user_id;

LAG flags every row where the previous login wasn’t exactly one day earlier — those are streak starts. A running SUM over the start flags assigns each row a streak id (it increments only at boundaries). Group by (user_id, streak_id) to get streak lengths, then take the max per user. This is the canonical Gaps and Islands workflow, with LAG as the boundary detector.

Rank and Row — when you want an ordinal position rather than an adjacent value.
Gaps and Islands — extends lead-lag with running sums to identify whole runs of consecutive rows.
Window functions — the building block; covers FIRST_VALUE, LAST_VALUE, NTH_VALUE, and frame semantics in depth.
Subqueries and CTEs — multi-step lead-lag queries chain naturally through CTEs.
Aggregates, GROUP BY, HAVING — the final step in streak queries: aggregate per streak id.

What it is#

When to use it#

How it works#

Variants#

Common pitfalls#

Practice problem#

Related patterns#