JOIN — INNER, LEFT, RIGHT, FULL, CROSS — DBMS

What it is#

A JOIN combines rows from two relations based on a predicate, producing a new relation whose schema is the concatenation of the inputs’ schemas. SQL ships five flavors:

INNER JOIN — only rows where the predicate matches in both sides.
LEFT [OUTER] JOIN — every left-side row; right-side columns NULL when no match.
RIGHT [OUTER] JOIN — symmetric mirror of LEFT. Rarely seen; people swap the operand order and use LEFT.
FULL [OUTER] JOIN — every row from both sides; NULLs on whichever side has no match.
CROSS JOIN — Cartesian product; no predicate. Every left row paired with every right row.

Algebraically, INNER is a theta join (σ on top of ×), CROSS is the bare product, and the outer joins are extensions that pad unmatched tuples with NULL — they require the language to have NULL semantics, which is why pure relational algebra doesn’t include them natively.

When to use it#

INNER JOIN — the default for connecting tables when you only want rows that have a match on both sides. “Orders and their users” — exclude orders without a known user (shouldn’t happen, but the join enforces it).
LEFT JOIN — when one side is the “main” relation and the other is “optional information about it”. “Users and their last order (if any)”. Anti-joins (LEFT JOIN ... WHERE other.id IS NULL) find “rows in A with no matching row in B”.
FULL OUTER JOIN — reconciling two datasets where either side may be missing rows. ETL diff jobs, comparing yesterday’s snapshot to today’s, finding rows that exist in only one of two systems.
CROSS JOIN — explicit when you genuinely want the Cartesian product. Generating combinations (every product × every region for a planning matrix), generate_series cross-multiplied, calendar tables for reporting.
Self-join — joining a table to itself with a different alias. Hierarchies (employees e JOIN employees m ON e.manager_id = m.id), comparing rows pairwise.

Avoid implicit cross joins (a comma in FROM with no join predicate in WHERE) — they look like a missing condition and produce silently catastrophic result sizes.

How it works#

INNER JOIN#

SELECT u.id, u.email, o.id AS order_id, o.total_cents
FROM users u
INNER JOIN orders o ON o.user_id = u.id
WHERE o.created_at >= now() - INTERVAL '7 days';

A row appears in the result iff there’s at least one match on each side. A user with no orders is dropped. A user with three orders appears three times. Inner joins are the default; INNER keyword is optional but recommended for clarity.

LEFT OUTER JOIN#

SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.email;

Every user appears at least once. Users with no orders show order_count = 0 because COUNT(o.id) excludes NULL — the NULL-padding on the unmatched right side becomes a 0 count, which is usually what you want.

Anti-join — “users with no orders”:

SELECT u.id, u.email
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;

This is the canonical “find rows in A with no match in B” idiom. The alternative WHERE NOT EXISTS (...) is equivalent and often planner-equivalent too.

FULL OUTER JOIN#

SELECT
  COALESCE(a.id, b.id) AS id,
  a.value AS value_a,
  b.value AS value_b
FROM source_a a
FULL OUTER JOIN source_b b ON a.id = b.id
WHERE a.id IS NULL OR b.id IS NULL OR a.value <> b.value;

Rows in only A, only B, or both with differing values. The reconciliation classic.

CROSS JOIN#

-- Generate one row per (region, day) combination for a reporting matrix:
SELECT r.name AS region, d.day
FROM regions r
CROSS JOIN generate_series('2026-01-01'::date, '2026-12-31'::date, INTERVAL '1 day') d(day);

CROSS JOIN is the right tool when you genuinely want every combination. Always spell it out; never let it happen by accident.

Self-join#

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

The two aliases (e and m) are essential — without rename, the column names collide.

`ON` vs `WHERE` with outer joins#

A predicate in ON is part of the join condition; a predicate in WHERE is applied after the join. With INNER JOIN they’re interchangeable. With LEFT JOIN they’re not:

-- Returns every user, with `o.id` NULL for users with no recent orders:
SELECT u.id, o.id FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.created_at >= now() - INTERVAL '7 days';

-- Returns only users who DO have a recent order — silently turns LEFT JOIN into INNER:
SELECT u.id, o.id FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.created_at >= now() - INTERVAL '7 days';

In the second form, NULL-padded rows are filtered out by the WHERE because NULL >= ... is UNKNOWN.

Join algorithms#

The planner picks one of three physical join algorithms per join:

Nested Loop Join — for each outer row, look up matching inner rows (usually via an index on the inner side). Best when one side is small or the inner side has a great index.
Hash Join — build a hash table on one input, probe it with the other. Best for large unindexed equi-joins.
Merge Join — sort both inputs by the join key, walk in lockstep. Best when both sides are already sorted (e.g., index-ordered scans on the same key).

EXPLAIN shows the chosen algorithm. A query that’s slow because of the wrong join algorithm is usually a sign of stale statistics — ANALYZE first.

Variants#

USING (col) — shorthand for ON a.col = b.col when both sides share a column name. Coalesces the column in the output.
NATURAL JOIN — joins on every shared column name. Avoid: implicit, breaks when someone adds a column to either side.
LATERAL JOIN (Postgres) — lets the right side reference columns from the left side. Replaces correlated subquery patterns: LATERAL (SELECT ... FROM ... WHERE other.fk = outer.id LIMIT 5) for “top 5 per group”.
Semi-join / anti-join — not separate SQL keywords; expressed via EXISTS / NOT EXISTS or LEFT JOIN ... IS NULL. The planner often rewrites between them.
STRAIGHT_JOIN (MySQL) — forces join order; planner override.

Trade-offs#

INNER JOIN — small result, planner-friendly, the default. Drops orphans cleanly. Use when the relationship is mandatory in both directions and you only want matched rows. Can hide data integrity issues (orphaned rows just silently disappear from results).

LEFT OUTER JOIN — preserves left side; right side is “optional”. Use for “main relation plus optional info”. Slightly more expensive (NULL-padded rows in the output), and the temptation to slap WHERE on the right side silently turns it into an INNER JOIN.

Other practical considerations:

Join order is the planner’s call — for INNER joins, the planner picks the order based on cost. For outer joins, the order is semantically meaningful (LEFT JOIN B then LEFT JOIN C is different from LEFT JOIN C then LEFT JOIN B in general).
Composite join conditions — multi-column joins are fine and common; index the columns together if the join is hot.
IN (subquery) vs JOIN — the planner often rewrites between them. Logically, IN matches first occurrence and doesn’t multiply rows; JOIN can multiply rows if there are duplicates on the joined side. Choose by intent.
Cardinality estimation — the planner needs to estimate how many rows will match. Bad estimates lead to nested loop where hash join would win, or vice versa. Up-to-date statistics matter.

Common pitfalls#

WHERE on the outer side of a LEFT JOIN turning it into an INNER JOIN. Put the right-table filter in ON, not WHERE, if you want to preserve unmatched left rows.
Multiplying rows. Joining users to orders to order_items returns one row per (user, order, item) triple. Counts and sums over user attributes get multiplied; aggregate with DISTINCT or use a CTE to pre-aggregate one side.
Missing join condition. FROM a, b, c with WHERE a.id = b.id and no b.id = c.id is a CROSS JOIN with c. Always prefer explicit JOIN ... ON.
NULL = NULL not matching. Joining on a nullable column where both sides may be NULL drops those rows because NULL = NULL is UNKNOWN. Use IS NOT DISTINCT FROM or coalesce both sides.
Self-join without distinct aliases. Column references become ambiguous, the query errors or worse — picks the wrong column silently.
RIGHT JOIN for left-side filtering. It works, but readers parse LEFT JOIN faster. Swap operands and use LEFT.
SELECT * in a multi-table join. Returns every column from every table, with name collisions. The ORM-style SELECT users.*, orders.id AS order_id pattern is much cleaner.

The mental model that prevents most join bugs

Visualize the join as a result set, not as a control flow. For INNER JOIN, the result is the matching pairs. For LEFT JOIN, the result is every left row paired with its matches (or with NULL padding). For FULL OUTER, the result is the union of LEFT and RIGHT outer joins. Then ask: which rows do you want in the result? If only matched rows, INNER. If every row of the main table, LEFT. If every row of both, FULL. Predicates in ON shape what counts as a match; predicates in WHERE filter the post-join result. The order matters for outer joins, never for inner.