SQL DML — INSERT, UPDATE, DELETE, MERGE
Mutating data. RETURNING, UPSERT / ON CONFLICT, the things to know about row-locking on writes.
What it is#
DML is the data-manipulation subset of SQL — the statements that change rows rather than schema. Four verbs cover the territory: INSERT adds rows, UPDATE modifies existing rows, DELETE removes rows, and MERGE combines insert/update/delete based on a match condition (the SQL standard’s UPSERT, plus more).
The interesting parts are the modifiers that have accumulated around these four. RETURNING lets a write statement also act as a read. ON CONFLICT (Postgres) and ON DUPLICATE KEY UPDATE (MySQL) turn INSERT into a single-statement upsert. Row-level locking via SELECT ... FOR UPDATE is technically DQL but lives in the same conversation. Every shipping system has its own dialect; the standard captures only the lowest common denominator.
When to use it#
INSERT— adding new rows. Single-row or batched. Use batched inserts (INSERT ... VALUES (...), (...), (...)orINSERT ... SELECT) for bulk ingestion — round-trip overhead dominates per-row inserts.UPDATE— modifying existing rows. Always with aWHEREclause; an unboundedUPDATEis the canonical “oh no” incident.DELETE— removing rows. SameWHEREdiscipline. Soft-delete (setdeleted_at) is often safer than hard-delete and lets you reverse mistakes.MERGE/INSERT ... ON CONFLICT— idempotent writes (“create this row, or update it if it already exists”). The right tool for ingestion of records that may arrive multiple times, upsert-style sync from external systems, and any retry-safe write path.SELECT ... FOR UPDATE— pessimistic row lock. Read a row with the intent to update it later in the same transaction; the engine blocks other writers from touching it until you commit.
Avoid unbatched single-row inserts in tight loops, DELETE without a WHERE, and the temptation to write your own upsert via “SELECT then INSERT-or-UPDATE” — that pattern is racy under concurrency.
How it works#
INSERT#
-- Single rowINSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice');
-- Multi-row (one statement, one transaction)INSERT INTO users (email, name) VALUES ('bob@example.com', 'Bob'), ('carol@example.com','Carol'), ('dave@example.com', 'Dave');
-- From a SELECTINSERT INTO archived_orders (id, total_cents, archived_at)SELECT id, total_cents, now()FROM ordersWHERE created_at < '2024-01-01';
-- With RETURNING (Postgres, SQLite 3.35+)INSERT INTO users (email) VALUES ('eve@example.com')RETURNING id, created_at;RETURNING is one of the most useful Postgres-ism in production code: you avoid a separate SELECT round-trip to get the generated id after insert. MySQL has LAST_INSERT_ID() as a less-flexible alternative.
UPDATE#
UPDATE usersSET status = 'suspended', updated_at = now()WHERE id = 42;
-- With a join (UPDATE ... FROM in Postgres)UPDATE orders oSET region = u.regionFROM users uWHERE o.user_id = u.id AND o.region IS NULL;
-- Returning the updated rowsUPDATE inventorySET quantity = quantity - 1WHERE sku = 'WIDGET-1' AND quantity > 0RETURNING quantity;UPDATE ... RETURNING is especially powerful when combined with a conditional WHERE: it lets you atomically decrement, check the result, and abort if the precondition wasn’t met — all in one round trip with no race.
DELETE#
DELETE FROM sessions WHERE expires_at < now() - INTERVAL '30 days';
-- With RETURNING for an audit trailDELETE FROM users WHERE id = 42RETURNING id, email;
-- Bulk delete via subqueryDELETE FROM logsWHERE id IN ( SELECT id FROM logs WHERE created_at < now() - INTERVAL '90 days' LIMIT 10000);Bulk delete on a hot table is best done in batches — a single 10-million-row DELETE holds locks and bloats the WAL. Loop the batched version above until it returns zero rows.
Upsert: ON CONFLICT and MERGE#
The Postgres-style upsert (and the SQL standard’s MERGE, which Postgres also supports as of 15):
-- Postgres-style: idiomatic, fastINSERT INTO daily_metrics (day, metric, value)VALUES ('2026-05-24', 'signups', 42)ON CONFLICT (day, metric)DO UPDATE SET value = EXCLUDED.value + daily_metrics.value;
-- MySQL equivalentINSERT INTO daily_metrics (day, metric, value)VALUES ('2026-05-24', 'signups', 42)ON DUPLICATE KEY UPDATE value = VALUES(value) + value;
-- SQL standard MERGE (Postgres 15+, SQL Server, Oracle, MySQL 8.0.31+)MERGE INTO daily_metrics tUSING (VALUES ('2026-05-24', 'signups', 42)) AS s(day, metric, value)ON t.day = s.day AND t.metric = s.metricWHEN MATCHED THEN UPDATE SET value = t.value + s.valueWHEN NOT MATCHED THEN INSERT (day, metric, value) VALUES (s.day, s.metric, s.value);The EXCLUDED pseudo-table in Postgres’s ON CONFLICT refers to the row that would have been inserted, letting the update clause compose insert-and-update values.
Row locking#
BEGIN;SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- row locked until commitUPDATE accounts SET balance = balance - 100 WHERE id = 1;COMMIT;FOR UPDATE acquires an exclusive row lock. FOR SHARE acquires a shared lock (other readers OK, no writers). FOR UPDATE SKIP LOCKED skips rows another transaction has already locked — the classic recipe for queue-style workloads:
-- Worker pulls one available jobSELECT id, payload FROM jobsWHERE status = 'pending'ORDER BY priority DESC, idLIMIT 1FOR UPDATE SKIP LOCKED;Variants#
INSERT ... ON CONFLICT DO NOTHING— silent insert-if-not-exists. Useful for idempotent ingestion where the existing row’s content doesn’t matter.INSERT ... ON CONFLICT (col) WHERE pred DO UPDATE— conflict scoped to a partial unique index. Powerful and underused.UPDATE ... FROM/DELETE ... USING(Postgres) — join in additional tables for the WHERE clause without a subquery.COPY(Postgres) /LOAD DATA INFILE(MySQL) — bulk-load path that bypasses the row-by-row INSERT machinery. 10-100x faster for big imports.- CTE-based DML —
WITH x AS (DELETE FROM ... RETURNING *) INSERT INTO archive SELECT * FROM xmoves rows between tables in a single statement, atomically. TRUNCATEinstead ofDELETEfor clearing a whole table — DDL-fast but takes ACCESS EXCLUSIVE lock and doesn’t fire row-level triggers.
Trade-offs#
INSERT ... ON CONFLICT (single statement upsert) — atomic, no race, one round trip, leverages the unique index the engine already maintains. The right default for idempotent writes. Limited to a single conflict target; complex multi-key upserts need MERGE or per-key logic. Other practical considerations:
- Batching —
INSERT ... VALUES (...), (...), (...)is dramatically faster than N single-row inserts because of per-statement parse/plan/network overhead. Batch sizes of 500-5000 rows typically hit the sweet spot. - Soft delete vs hard delete — soft delete (
UPDATE ... SET deleted_at = now()) keeps the row recoverable and avoids cascading FK action; hard delete reclaims space but is irreversible without a backup. UPDATErewriting the whole row (Postgres MVCC) — every UPDATE creates a new tuple version; on a heavily-updated row this generates bloat. HOT (heap-only-tuple) optimisation kicks in when the updated column isn’t indexed and the new tuple fits in the same page.- Trigger amplification — every INSERT/UPDATE/DELETE may fire BEFORE and AFTER triggers, statement-level and row-level. A “simple” bulk update can run many million trigger invocations.
Common pitfalls#
UPDATE/DELETEwithoutWHERE. The canonical disaster. Most teams add a check (Postgres’s--single-transactionthen preview-and-confirm, or aLIMITclause in MySQL) to make this hard.- Forgetting
updated_at. Most teams maintain it via trigger; some via application code. Inconsistency between writers is the usual bug. - Racy SELECT-then-INSERT. Two workers see “no row” simultaneously, both insert, one wins, the other gets a unique violation. Fix:
ON CONFLICTorSELECT ... FOR UPDATEplus the insert in the same transaction. - Bulk inserts inside a giant transaction. Holds locks for the duration, balloons the WAL, blocks autovacuum on the target table. Batch into transactions of thousands-to-tens-of-thousands of rows.
DELETEcascading further than expected.ON DELETE CASCADEpropagates through FK chains. A single delete can trigger million-row cascades — measure before you commit.RETURNINGoverheads. Returning many rows from a bulk DML doubles the network footprint. UseRETURNINGfor IDs and small confirmations, not for full row dumps.- Locking the same row twice in one transaction.
SELECT ... FOR UPDATEis reentrant within a transaction; problems happen when two transactions lock rows in different orders and deadlock. Always lock in a consistent order (e.g., byid ASC).
The upsert pattern most production code converges on
For idempotent ingestion (events, metrics, sync), use INSERT ... ON CONFLICT (...) DO UPDATE SET col = EXCLUDED.col. For the queue pattern, use SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1 to pull a job and UPDATE jobs SET status = 'done' WHERE id = ? to mark it complete, all in one transaction. For decrement-with-precondition (inventory, budget), use UPDATE ... SET col = col - 1 WHERE id = ? AND col > 0 RETURNING col and check the row count — zero means the precondition failed, no race, no separate SELECT. These three patterns cover most of what people write custom locking code for.
Related building blocks#