JOIN — INNER, LEFT, RIGHT, FULL, CROSS
Connecting tables. When LEFT JOIN is the right tool, what FULL OUTER is for, and how CROSS JOIN should be feared.
What it is#
A JOIN combines rows from two relations based on a predicate, producing a new relation whose schema is the concatenation of the inputs’ schemas. SQL ships five flavors:
INNER JOIN— only rows where the predicate matches in both sides.LEFT [OUTER] JOIN— every left-side row; right-side columns NULL when no match.RIGHT [OUTER] JOIN— symmetric mirror of LEFT. Rarely seen; people swap the operand order and use LEFT.FULL [OUTER] JOIN— every row from both sides; NULLs on whichever side has no match.CROSS JOIN— Cartesian product; no predicate. Every left row paired with every right row.
Algebraically, INNER is a theta join (σ on top of ×), CROSS is the bare product, and the outer joins are extensions that pad unmatched tuples with NULL — they require the language to have NULL semantics, which is why pure relational algebra doesn’t include them natively.
When to use it#
INNER JOIN— the default for connecting tables when you only want rows that have a match on both sides. “Orders and their users” — exclude orders without a known user (shouldn’t happen, but the join enforces it).LEFT JOIN— when one side is the “main” relation and the other is “optional information about it”. “Users and their last order (if any)”. Anti-joins (LEFT JOIN ... WHERE other.id IS NULL) find “rows in A with no matching row in B”.FULL OUTER JOIN— reconciling two datasets where either side may be missing rows. ETL diff jobs, comparing yesterday’s snapshot to today’s, finding rows that exist in only one of two systems.CROSS JOIN— explicit when you genuinely want the Cartesian product. Generating combinations (every product × every region for a planning matrix), generate_series cross-multiplied, calendar tables for reporting.- Self-join — joining a table to itself with a different alias. Hierarchies (
employees e JOIN employees m ON e.manager_id = m.id), comparing rows pairwise.
Avoid implicit cross joins (a comma in FROM with no join predicate in WHERE) — they look like a missing condition and produce silently catastrophic result sizes.
How it works#
INNER JOIN#
SELECT u.id, u.email, o.id AS order_id, o.total_centsFROM users uINNER JOIN orders o ON o.user_id = u.idWHERE o.created_at >= now() - INTERVAL '7 days';A row appears in the result iff there’s at least one match on each side. A user with no orders is dropped. A user with three orders appears three times. Inner joins are the default; INNER keyword is optional but recommended for clarity.
LEFT OUTER JOIN#
SELECT u.id, u.email, COUNT(o.id) AS order_countFROM users uLEFT JOIN orders o ON o.user_id = u.idGROUP BY u.id, u.email;Every user appears at least once. Users with no orders show order_count = 0 because COUNT(o.id) excludes NULL — the NULL-padding on the unmatched right side becomes a 0 count, which is usually what you want.
Anti-join — “users with no orders”:
SELECT u.id, u.emailFROM users uLEFT JOIN orders o ON o.user_id = u.idWHERE o.id IS NULL;This is the canonical “find rows in A with no match in B” idiom. The alternative WHERE NOT EXISTS (...) is equivalent and often planner-equivalent too.
FULL OUTER JOIN#
SELECT COALESCE(a.id, b.id) AS id, a.value AS value_a, b.value AS value_bFROM source_a aFULL OUTER JOIN source_b b ON a.id = b.idWHERE a.id IS NULL OR b.id IS NULL OR a.value <> b.value;Rows in only A, only B, or both with differing values. The reconciliation classic.
CROSS JOIN#
-- Generate one row per (region, day) combination for a reporting matrix:SELECT r.name AS region, d.dayFROM regions rCROSS JOIN generate_series('2026-01-01'::date, '2026-12-31'::date, INTERVAL '1 day') d(day);CROSS JOIN is the right tool when you genuinely want every combination. Always spell it out; never let it happen by accident.
Self-join#
SELECT e.name AS employee, m.name AS managerFROM employees eLEFT JOIN employees m ON e.manager_id = m.id;The two aliases (e and m) are essential — without rename, the column names collide.
ON vs WHERE with outer joins#
A predicate in ON is part of the join condition; a predicate in WHERE is applied after the join. With INNER JOIN they’re interchangeable. With LEFT JOIN they’re not:
-- Returns every user, with `o.id` NULL for users with no recent orders:SELECT u.id, o.id FROM users uLEFT JOIN orders o ON o.user_id = u.id AND o.created_at >= now() - INTERVAL '7 days';
-- Returns only users who DO have a recent order — silently turns LEFT JOIN into INNER:SELECT u.id, o.id FROM users uLEFT JOIN orders o ON o.user_id = u.idWHERE o.created_at >= now() - INTERVAL '7 days';In the second form, NULL-padded rows are filtered out by the WHERE because NULL >= ... is UNKNOWN.
Join algorithms#
The planner picks one of three physical join algorithms per join:
- Nested Loop Join — for each outer row, look up matching inner rows (usually via an index on the inner side). Best when one side is small or the inner side has a great index.
- Hash Join — build a hash table on one input, probe it with the other. Best for large unindexed equi-joins.
- Merge Join — sort both inputs by the join key, walk in lockstep. Best when both sides are already sorted (e.g., index-ordered scans on the same key).
EXPLAIN shows the chosen algorithm. A query that’s slow because of the wrong join algorithm is usually a sign of stale statistics — ANALYZE first.
Variants#
USING (col)— shorthand forON a.col = b.colwhen both sides share a column name. Coalesces the column in the output.NATURAL JOIN— joins on every shared column name. Avoid: implicit, breaks when someone adds a column to either side.LATERAL JOIN(Postgres) — lets the right side reference columns from the left side. Replaces correlated subquery patterns:LATERAL (SELECT ... FROM ... WHERE other.fk = outer.id LIMIT 5)for “top 5 per group”.- Semi-join / anti-join — not separate SQL keywords; expressed via
EXISTS/NOT EXISTSorLEFT JOIN ... IS NULL. The planner often rewrites between them. STRAIGHT_JOIN(MySQL) — forces join order; planner override.
Trade-offs#
Other practical considerations:
- Join order is the planner’s call — for INNER joins, the planner picks the order based on cost. For outer joins, the order is semantically meaningful (LEFT JOIN B then LEFT JOIN C is different from LEFT JOIN C then LEFT JOIN B in general).
- Composite join conditions — multi-column joins are fine and common; index the columns together if the join is hot.
IN (subquery)vsJOIN— the planner often rewrites between them. Logically, IN matches first occurrence and doesn’t multiply rows; JOIN can multiply rows if there are duplicates on the joined side. Choose by intent.- Cardinality estimation — the planner needs to estimate how many rows will match. Bad estimates lead to nested loop where hash join would win, or vice versa. Up-to-date statistics matter.
Common pitfalls#
WHEREon the outer side of a LEFT JOIN turning it into an INNER JOIN. Put the right-table filter inON, notWHERE, if you want to preserve unmatched left rows.- Multiplying rows. Joining
userstoorderstoorder_itemsreturns one row per(user, order, item)triple. Counts and sums over user attributes get multiplied; aggregate withDISTINCTor use a CTE to pre-aggregate one side. - Missing join condition.
FROM a, b, cwithWHERE a.id = b.idand nob.id = c.idis a CROSS JOIN withc. Always prefer explicitJOIN ... ON. - NULL = NULL not matching. Joining on a nullable column where both sides may be NULL drops those rows because
NULL = NULLis UNKNOWN. UseIS NOT DISTINCT FROMor coalesce both sides. - Self-join without distinct aliases. Column references become ambiguous, the query errors or worse — picks the wrong column silently.
RIGHT JOINfor left-side filtering. It works, but readers parseLEFT JOINfaster. Swap operands and use LEFT.SELECT *in a multi-table join. Returns every column from every table, with name collisions. The ORM-styleSELECT users.*, orders.id AS order_idpattern is much cleaner.
The mental model that prevents most join bugs
Visualize the join as a result set, not as a control flow. For INNER JOIN, the result is the matching pairs. For LEFT JOIN, the result is every left row paired with its matches (or with NULL padding). For FULL OUTER, the result is the union of LEFT and RIGHT outer joins. Then ask: which rows do you want in the result? If only matched rows, INNER. If every row of the main table, LEFT. If every row of both, FULL. Predicates in ON shape what counts as a match; predicates in WHERE filter the post-join result. The order matters for outer joins, never for inner.
Related building blocks#