Existence Check
EXISTS vs NOT EXISTS vs IN vs LEFT JOIN IS NULL — four ways to say where this related row does or does not exist.
What it is#
The existence check pattern asks: for each row in table A, does a related row exist (or not exist) in table B? SQL gives four ways to say this, and they are not interchangeable — they differ in NULL handling, performance, and readability.
-- Users who have placed at least one order
-- 1. EXISTS (correlated subquery)SELECT u.* FROM users uWHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- 2. IN (subquery)SELECT u.* FROM users uWHERE u.id IN (SELECT user_id FROM orders);
-- 3. LEFT JOIN ... IS NOT NULLSELECT u.* FROM users uLEFT JOIN orders o ON o.user_id = u.idWHERE o.user_id IS NOT NULL;
-- 4. INNER JOIN with DISTINCTSELECT DISTINCT u.* FROM users uINNER JOIN orders o ON o.user_id = u.id;For the affirmative case (“users with orders”), all four return the same rows on a NULL-free schema and modern optimisers often produce identical plans. The interesting case is the negative form (“users without orders”) — where NOT IN will silently return zero rows if the subquery contains any NULL, and the other three are safer.
Picking the right form is a frequent interview discussion point and a real correctness hazard in production.
When to use it#
Use an existence check when:
- The question is “does at least one matching row exist?” — not “how many” or “what value.”
- You want to filter parent rows by the presence or absence of child rows.
- You don’t need any columns from the child table in the output.
Use a regular JOIN instead when:
- You need columns from the child table.
- You want one row per parent–child match, not one row per parent.
Use Tally Count instead when you actually need the count of child rows. Use a HAVING COUNT(*) > N when the threshold is > N, not just “at least one.”
How it works#
EXISTS — semi-join, short-circuits, NULL-safe#
EXISTS (subquery) returns TRUE if the subquery returns one or more rows, FALSE if it returns zero. The values inside the subquery don’t matter — SELECT 1, SELECT *, SELECT user_id all behave the same.
SELECT u.* FROM users uWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id);This is a semi-join: every user appears at most once in the output, regardless of how many matching orders exist. It is the cleanest and safest form, and modern optimisers (Postgres, SQL Server 2017+, Oracle, MySQL 8.0+) translate it to a hash-semi-join or nested-loop-semi-join automatically.
EXISTS is NULL-safe: NULL user_id values in orders simply don’t match any user, and NULL id in users doesn’t match anything either — no silent zero-row trap.
NOT EXISTS — anti-join, also NULL-safe#
-- Users who have NEVER placed an orderSELECT u.* FROM users uWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id);NOT EXISTS is an anti-join: every user with zero matching orders. NULL-safe in the same way EXISTS is. This is the recommended form for “missing” queries.
IN — concise for static lists, NULL-trap for subqueries#
-- Users in a known set (no NULL risk)SELECT * FROM users WHERE id IN (1, 2, 3, 4);
-- Users with orders (subquery — works when no NULLs)SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);For the affirmative case, IN is fine and concise. Modern optimisers will rewrite IN (subquery) into a semi-join — identical to EXISTS.
The trap is the negative case with a nullable column.
-- DANGER: if any order has user_id IS NULL, this returns nothingSELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);The semantics are three-valued logic: x NOT IN (a, b, NULL) evaluates to x <> a AND x <> b AND x <> NULL — and x <> NULL is NULL, not TRUE. The whole AND chain becomes NULL, which is treated as FALSE by WHERE. Result: every row is filtered out.
Workarounds (if you must use NOT IN):
-- Filter NULLs from the subquerySELECT * FROM usersWHERE id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);But NOT EXISTS is simpler, safer, and identically fast on modern planners. Just use it.
LEFT JOIN … IS NULL — anti-join via joins#
-- Users with no ordersSELECT u.* FROM users uLEFT JOIN orders o ON o.user_id = u.idWHERE o.user_id IS NULL;This expresses the anti-join via a LEFT JOIN: every user is joined to their matching orders; users with no matches get a single row with all orders columns NULL; the WHERE o.user_id IS NULL filter keeps only those.
It is NULL-safe for the join condition (NULL never equals anything, so unmatched rows correctly come through), but the choice of column for the IS NULL check matters: pick a NOT NULL column of the right side (typically the primary key or join key), not an arbitrary column that could legitimately be NULL.
-- Wrong if orders.notes can be NULL: matches users WITH orders that have null notesWHERE o.notes IS NULL
-- Right: orders.id is NOT NULL when the join matchedWHERE o.id IS NULLPerformance: when does each win?#
On modern optimisers (Postgres 12+, SQL Server 2017+, Oracle, BigQuery, Snowflake), EXISTS, IN, and LEFT JOIN ... IS NULL for an affirmative or negative semi/anti-join are typically planned identically — a hash-semi-join or hash-anti-join with the same cost.
Edge cases where the planner picks differently:
- Large outer, small inner: nested-loop-semi-join with the inner table indexed is cheapest.
EXISTSmakes this most obvious to the planner. - Small outer, large inner: hash-anti-join with the outer hashed. All three forms should plan this way, but
LEFT JOIN ... IS NULLsometimes loses materialisation opportunities on older versions of MySQL. - Subquery returns very few distinct values:
IN (subquery)can hash the inner set and probe the outer — sometimes faster thanEXISTSif the inner is much smaller than the outer.
When in doubt, EXPLAIN the query. See Query execution and plans for how to read the output.
EXPLAIN ANALYZESELECT u.* FROM users uWHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);Variants#
Existence with extra conditions in the subquery.
-- Users with at least one paid order in the last 30 daysSELECT u.* FROM users uWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'paid' AND o.created_at >= NOW() - INTERVAL '30 days');Cleaner than the equivalent join + DISTINCT.
Existence in multiple tables (OR EXISTS).
-- Users with an order OR a reviewSELECT u.* FROM users uWHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id) OR EXISTS (SELECT 1 FROM reviews r WHERE r.user_id = u.id);The two existence checks can be planned independently and combined.
Existence with ANY / SOME / ALL.
-- ANY/SOME are aliases — equivalent to INSELECT * FROM users WHERE id = ANY (SELECT user_id FROM orders);
-- ALL means: u.amount is greater than every value in the subquerySELECT * FROM orders WHERE amount > ALL (SELECT amount FROM refunds);ANY and IN are interchangeable but IN is more idiomatic. ALL and <> ALL have NULL-handling quirks similar to NOT IN — prefer NOT EXISTS for “greater than the max” style queries.
Existence with INTERSECT / EXCEPT.
-- Users who appear in BOTH tables (intersect)SELECT id FROM usersINTERSECTSELECT user_id FROM orders;
-- Users in the first table but NOT the second (except / anti-join)SELECT id FROM usersEXCEPTSELECT user_id FROM orders;EXCEPT is NULL-safe the way NOT EXISTS is — it doesn’t have the NOT IN trap. MySQL added EXCEPT in 8.0.31; before that, use NOT EXISTS.
Common pitfalls#
The NOT IN + NULL trap (covered above). Default to NOT EXISTS for any “missing” query. The performance is equivalent; the correctness is not.
Forgetting the correlation in EXISTS. A subquery inside EXISTS without a reference to the outer table is TRUE whenever the subquery returns any row at all — usually a bug.
-- BUG: EXISTS is TRUE for every user as long as ANY order exists in the tableSELECT u.* FROM users uWHERE EXISTS (SELECT 1 FROM orders);
-- Correct: correlate on user_idSELECT u.* FROM users uWHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);Picking the wrong column for IS NULL in the join form.
-- BUG: orders.amount can legitimately be NULLSELECT u.* FROM users uLEFT JOIN orders o ON o.user_id = u.idWHERE o.amount IS NULL;-- Matches users with NO orders AND users with orders that had a NULL amountUse the primary key or the join key — they’re guaranteed non-null on a successful match.
INNER JOIN with DISTINCT instead of EXISTS. It works, but DISTINCT is more expensive than a semi-join — the engine has to materialise all joined rows and then dedupe.
Semi-join — efficient:
SELECT u.*FROM users uWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id);Stops scanning orders as soon as one match is found per user. Hash-semi-join is O(n + m).
INNER JOIN + DISTINCT — wasteful:
SELECT DISTINCT u.*FROM users uINNER JOIN orders o ON o.user_id = u.id;Materialises every user × matching-order row, then deduplicates by all columns of users. Becomes O(n × k) where k is the average orders per user, plus a sort/hash for DISTINCT.
Subquery rewrite quirks on older engines. MySQL 5.6 famously executed IN (subquery) as a correlated subquery — re-running the inner query per outer row, O(n × m). MySQL 5.7 fixed this with semi-join optimisation; 8.0 is faster still. If you’re stuck on an old engine, rewriting IN (subquery) to a JOIN can produce a 100x speedup. On modern engines, the planner handles it.
Mixing existence with aggregation. WHERE EXISTS (... HAVING COUNT(*) > 5 ...) is almost right but slow — the subquery materialises the count. Prefer WHERE id IN (SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5).
Practice problem#
You have a users table with 100M rows and an orders table with 500M rows:
CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, email TEXT NOT NULL);CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, user_id BIGINT REFERENCES users(id), -- nullable: anonymous orders allowed amount NUMERIC NOT NULL);CREATE INDEX ON orders (user_id);Find every user who has never placed an order. Show three correct queries, then identify which one is safest for production given that orders.user_id is nullable.
Solution
-- 1. NOT EXISTS (NULL-safe, recommended)SELECT u.* FROM users uWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- 2. LEFT JOIN ... IS NULL (also NULL-safe, sometimes faster on large outer sets)SELECT u.* FROM users uLEFT JOIN orders o ON o.user_id = u.idWHERE o.id IS NULL;
-- 3. NOT IN (UNSAFE here — orders.user_id is nullable)SELECT * FROM usersWHERE id NOT IN (SELECT user_id FROM orders);-- Returns 0 rows if any order has user_id IS NULL.Use form (1) NOT EXISTS for production. It is NULL-safe, the optimiser plans it as a hash-anti-join, and on a 100M × 500M dataset with the index on orders(user_id) the cost is the same as form (2). Form (3) is incorrect for this schema because orders.user_id is nullable; even if the data happens to have no NULLs today, a single future row with user_id IS NULL will silently break the report.
Related patterns#
- Joins (inner, outer, cross) — the LEFT JOIN form of the anti-join.
- Subqueries and CTEs — the building block for
EXISTSandINsubqueries. - Tally Count — when “at least one” should become “how many.”
- Query execution and plans — how to use
EXPLAINto verify the optimiser picked the semi-join. - SELECT and the WHERE clause — the three-valued NULL logic that drives the
NOT INtrap.