Lead-Lag Navigation
LEAD() and LAG() for previous/next-row comparison. Streak detection, change-from-previous metrics, run-length tracking.
What it is#
The lead-lag navigation pattern uses two paired window functions — LEAD and LAG — to reach across rows for the next or previous value in a sorted partition. They are the SQL equivalent of array index arithmetic: instead of computing arr[i] - arr[i-1], you write value - LAG(value) OVER (ORDER BY ...).
SELECT user_id, event_at, LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at) AS prev_event_at, LEAD(event_at) OVER (PARTITION BY user_id ORDER BY event_at) AS next_event_atFROM events;Both functions take three arguments: LEAD(col, offset, default) and LAG(col, offset, default). The offset defaults to 1 (one row ahead or behind). The default is the value returned when the offset reaches past the partition boundary — without it you get NULL, which is fine for many queries but breaks subtraction and equality tests at the edges.
-- Difference from previous day's revenue, NULL on the first daySELECT day, revenue, revenue - LAG(revenue) OVER (ORDER BY day) AS deltaFROM daily_revenue;
-- Same, but with a sensible default so the first day shows 0SELECT day, revenue, revenue - LAG(revenue, 1, 0) OVER (ORDER BY day) AS deltaFROM daily_revenue;Once you internalise that LAG = previous row and LEAD = next row, every “change from previous”, “compare this row to the next”, and “inter-event gap” query reduces to a one-liner.
When to use it#
Use lead-lag whenever the answer depends on comparing each row to a neighbouring row in a defined order. The grouping is rarely “the whole table” — usually it’s per-user, per-session, per-product.
- “How long between consecutive logins for each user?” —
event_at - LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at). - “Did the user’s status change since their last record?” —
status != LAG(status) OVER (PARTITION BY user_id ORDER BY recorded_at). - “What’s the day-over-day delta in active users?” —
dau - LAG(dau) OVER (ORDER BY day). - “How long was each session?” —
LEAD(event_at, 1, event_at) - event_atfor inter-event gap, or boundary detection for session ends. - “Identify the start of every streak” —
LAG(status) != statusflags transitions.
If you want an ordinal position rather than an adjacent value, see Rank and Row. If you want to identify runs of consecutive rows by some grouping key, see Gaps and Islands — lead-lag is often the first step.
How it works#
LEAD and LAG are window functions, so they follow the standard OVER (...) rules: PARTITION BY defines the per-group reset, ORDER BY defines the row order within the partition. Frame clauses are not applicable — neither function aggregates.
-- Previous and next event per userSELECT user_id, event_id, event_at, LAG(event_at, 1, NULL) OVER w AS prev_at, LEAD(event_at, 1, NULL) OVER w AS next_atFROM eventsWINDOW w AS (PARTITION BY user_id ORDER BY event_at);The WINDOW clause is standard SQL — you can name a window definition once and reference it by name from multiple functions in the same SELECT. Postgres, MySQL 8, SQL Server 2022, and Oracle support it; older MySQL and SQLite require repeating the OVER (...) clause each time.
The arguments. LAG(col) is shorthand for LAG(col, 1, NULL). The full form is occasionally useful:
LAG(col, n)— reachnrows back instead of 1.LAG(col, n, d)— same, but returndinstead ofNULLwhen there’s no such row.
-- 7-day moving comparison (this week vs last week)SELECT day, signups, signups - LAG(signups, 7, 0) OVER (ORDER BY day) AS wow_deltaFROM daily_signups;LEAD mirrors LAG exactly with the offset going forward instead of backward.
Change-from-previous as the canonical use. The pattern is so common it has a name in time-series analytics: “delta from previous”.
-- Day-over-day percent change in revenueSELECT day, revenue, LAG(revenue) OVER (ORDER BY day) AS prev_revenue, (revenue - LAG(revenue) OVER (ORDER BY day)) AS abs_delta, ROUND( 100.0 * (revenue - LAG(revenue) OVER (ORDER BY day)) / NULLIF(LAG(revenue) OVER (ORDER BY day), 0), 2 ) AS pct_changeFROM daily_revenueORDER BY day;NULLIF guards against division by zero in case the previous day’s revenue was zero — without it the query errors on the first such row.
Streak detection as the second canonical use. A streak is a run of consecutive rows where some condition holds. The first step is always: compare each row to its predecessor and mark transitions.
-- Mark every row where the user's status changed since the previous rowSELECT user_id, recorded_at, status, CASE WHEN LAG(status) OVER (PARTITION BY user_id ORDER BY recorded_at) IS DISTINCT FROM status THEN 1 ELSE 0 END AS is_changeFROM user_status_history;IS DISTINCT FROM (Postgres / SQL Server / SQLite) treats two NULLs as equal — important because the very first row has LAG = NULL and a plain != would return NULL instead of TRUE. In MySQL use NOT (a <=> b) for the same semantics.
Variants#
FIRST_VALUE and LAST_VALUE. These window functions look superficially similar — return a value from somewhere in the frame — but they read the first or last row of the frame rather than an offset from the current row.
-- First and last login per user, in one passSELECT user_id, login_at, FIRST_VALUE(login_at) OVER w AS first_login, LAST_VALUE(login_at) OVER w AS last_loginFROM loginsWINDOW w AS ( PARTITION BY user_id ORDER BY login_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);The frame clause is critical — see the pitfalls. FIRST_VALUE rarely needs it (the default frame already starts at the first row); LAST_VALUE almost always does.
NTH_VALUE(col, n). Returns the nth value in the frame. Useful when you want, say, the third login per user without ranking the whole partition.
Multi-step lookback. Reach back more than one row by passing a higher offset.
-- 30-day-ago value for a daily metricSELECT day, metric, LAG(metric, 30) OVER (PARTITION BY metric_name ORDER BY day) AS metric_30d_agoFROM daily_metrics;Session boundary detection. A common product-analytics use: define a session as events with less than 30 minutes between consecutive events. Lead-lag finds the boundaries.
-- Mark the first event of every new session per userSELECT user_id, event_at, CASE WHEN EXTRACT(EPOCH FROM (event_at - LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at))) > 1800 OR LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at) IS NULL THEN 1 ELSE 0 END AS is_session_startFROM events;A running sum over the is_session_start column then gives each event a session id — the same trick used in Gaps and Islands.
Default value tricks. The third argument of LAG / LEAD can be more than a literal — it can be the current row’s column, which is a clean way to express “same value if no neighbour”.
-- Time until next event, or 0 if last event in partitionSELECT user_id, event_at, LEAD(event_at, 1, event_at) OVER (PARTITION BY user_id ORDER BY event_at) - event_at AS time_to_nextFROM events;Common pitfalls#
Forgetting ORDER BY in OVER. LAG(col) OVER (PARTITION BY user_id) without ORDER BY is undefined — the engine may pick any previous row in the partition. Always specify the order, and add a tiebreaker column if the primary sort key is not unique.
PARTITION BY too coarse. A common bug: writing LAG(event_at) OVER (ORDER BY event_at) when you meant LAG(event_at) OVER (PARTITION BY user_id ORDER BY event_at). Without the partition, LAG crosses user boundaries — the first event of user B references the last event of user A.
Wrong — crosses user boundaries.
SELECT user_id, event_at, event_at - LAG(event_at) OVER (ORDER BY event_at) AS gapFROM events;The gap for user B’s first event is the time since user A’s last event — usually meaningless.
Right — partitioned by user.
SELECT user_id, event_at, event_at - LAG(event_at) OVER ( PARTITION BY user_id ORDER BY event_at ) AS gapFROM events;LAG is NULL for each user’s first event — exactly what you want.
The LAST_VALUE frame bug. This trap catches even experienced SQL developers. The default frame for a window with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — meaning LAST_VALUE returns the current row’s value, not the last row of the partition.
-- WRONG: returns current row's login_at, not the last per userSELECT user_id, login_at, LAST_VALUE(login_at) OVER (PARTITION BY user_id ORDER BY login_at) AS last_loginFROM logins;
-- RIGHT: explicit frame to reach the last rowSELECT user_id, login_at, LAST_VALUE(login_at) OVER ( PARTITION BY user_id ORDER BY login_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_loginFROM logins;FIRST_VALUE doesn’t have this problem because the default frame already starts at the partition’s first row. LAG and LEAD ignore the frame entirely — they navigate by offset, not by frame.
NULL propagation through equality. LAG(status) != status returns NULL (not TRUE) when the previous status was NULL or doesn’t exist. Use IS DISTINCT FROM (Postgres / SQL Server / SQLite) or NOT (a <=> b) (MySQL) for null-safe comparison.
Division by zero in pct-change. (curr - prev) / prev * 100 errors when prev = 0. Wrap with NULLIF(prev, 0) so the result is NULL rather than an error.
Performance — sort is the cost. LAG and LEAD require sorting the partition. On a billion-row events table partitioned by user_id and ordered by event_at, the sort dominates. A composite index on (user_id, event_at) lets the engine stream rows in order; without it, expect a Sort node above the WindowAgg. See query execution and plans.
Confusing offset direction. LAG looks backward (earlier row by sort order); LEAD looks forward. A quick rule: in ORDER BY day ASC, LAG(day) is yesterday, LEAD(day) is tomorrow. In ORDER BY day DESC, the meanings flip.
Using LAG/LEAD to bridge unrelated tables. If you need values from two different tables in the same row, you want a JOIN, not lead-lag — see joins. Lead-lag is purely within a single sorted partition.
Practice problem#
You have a login_events table tracking when each user logged into the product:
CREATE TABLE login_events ( user_id BIGINT NOT NULL, login_date DATE NOT NULL, PRIMARY KEY (user_id, login_date));Assume one row per (user_id, login_date) — at most one login record per day per user. Find the longest streak of consecutive days each user logged in, returning one row per user with user_id and longest_streak. A streak of 1 counts.
Solution
WITH marked AS ( SELECT user_id, login_date, CASE WHEN LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) = login_date - INTERVAL '1 day' THEN 0 ELSE 1 END AS is_streak_start FROM login_events),streak_ids AS ( SELECT user_id, login_date, SUM(is_streak_start) OVER (PARTITION BY user_id ORDER BY login_date) AS streak_id FROM marked),streak_lengths AS ( SELECT user_id, streak_id, COUNT(*) AS streak_len FROM streak_ids GROUP BY user_id, streak_id)SELECT user_id, MAX(streak_len) AS longest_streakFROM streak_lengthsGROUP BY user_idORDER BY user_id;LAG flags every row where the previous login wasn’t exactly one day earlier — those are streak starts. A running SUM over the start flags assigns each row a streak id (it increments only at boundaries). Group by (user_id, streak_id) to get streak lengths, then take the max per user. This is the canonical Gaps and Islands workflow, with LAG as the boundary detector.
Related patterns#
- Rank and Row — when you want an ordinal position rather than an adjacent value.
- Gaps and Islands — extends lead-lag with running sums to identify whole runs of consecutive rows.
- Window functions — the building block; covers
FIRST_VALUE,LAST_VALUE,NTH_VALUE, and frame semantics in depth. - Subqueries and CTEs — multi-step lead-lag queries chain naturally through CTEs.
- Aggregates, GROUP BY, HAVING — the final step in streak queries: aggregate per streak id.