DBMS SQL

SQL DML — INSERT, UPDATE, DELETE, MERGE

Mutating data. RETURNING, UPSERT / ON CONFLICT, the things to know about row-locking on writes.

Building Block Foundational
8 min read
dml insert update delete upsert

What it is#

DML is the data-manipulation subset of SQL — the statements that change rows rather than schema. Four verbs cover the territory: INSERT adds rows, UPDATE modifies existing rows, DELETE removes rows, and MERGE combines insert/update/delete based on a match condition (the SQL standard’s UPSERT, plus more).

The interesting parts are the modifiers that have accumulated around these four. RETURNING lets a write statement also act as a read. ON CONFLICT (Postgres) and ON DUPLICATE KEY UPDATE (MySQL) turn INSERT into a single-statement upsert. Row-level locking via SELECT ... FOR UPDATE is technically DQL but lives in the same conversation. Every shipping system has its own dialect; the standard captures only the lowest common denominator.

When to use it#

  • INSERT — adding new rows. Single-row or batched. Use batched inserts (INSERT ... VALUES (...), (...), (...) or INSERT ... SELECT) for bulk ingestion — round-trip overhead dominates per-row inserts.
  • UPDATE — modifying existing rows. Always with a WHERE clause; an unbounded UPDATE is the canonical “oh no” incident.
  • DELETE — removing rows. Same WHERE discipline. Soft-delete (set deleted_at) is often safer than hard-delete and lets you reverse mistakes.
  • MERGE / INSERT ... ON CONFLICT — idempotent writes (“create this row, or update it if it already exists”). The right tool for ingestion of records that may arrive multiple times, upsert-style sync from external systems, and any retry-safe write path.
  • SELECT ... FOR UPDATE — pessimistic row lock. Read a row with the intent to update it later in the same transaction; the engine blocks other writers from touching it until you commit.

Avoid unbatched single-row inserts in tight loops, DELETE without a WHERE, and the temptation to write your own upsert via “SELECT then INSERT-or-UPDATE” — that pattern is racy under concurrency.

How it works#

INSERT#

-- Single row
INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice');
-- Multi-row (one statement, one transaction)
INSERT INTO users (email, name) VALUES
('bob@example.com', 'Bob'),
('carol@example.com','Carol'),
('dave@example.com', 'Dave');
-- From a SELECT
INSERT INTO archived_orders (id, total_cents, archived_at)
SELECT id, total_cents, now()
FROM orders
WHERE created_at < '2024-01-01';
-- With RETURNING (Postgres, SQLite 3.35+)
INSERT INTO users (email) VALUES ('eve@example.com')
RETURNING id, created_at;

RETURNING is one of the most useful Postgres-ism in production code: you avoid a separate SELECT round-trip to get the generated id after insert. MySQL has LAST_INSERT_ID() as a less-flexible alternative.

UPDATE#

UPDATE users
SET status = 'suspended',
updated_at = now()
WHERE id = 42;
-- With a join (UPDATE ... FROM in Postgres)
UPDATE orders o
SET region = u.region
FROM users u
WHERE o.user_id = u.id
AND o.region IS NULL;
-- Returning the updated rows
UPDATE inventory
SET quantity = quantity - 1
WHERE sku = 'WIDGET-1' AND quantity > 0
RETURNING quantity;

UPDATE ... RETURNING is especially powerful when combined with a conditional WHERE: it lets you atomically decrement, check the result, and abort if the precondition wasn’t met — all in one round trip with no race.

DELETE#

DELETE FROM sessions WHERE expires_at < now() - INTERVAL '30 days';
-- With RETURNING for an audit trail
DELETE FROM users WHERE id = 42
RETURNING id, email;
-- Bulk delete via subquery
DELETE FROM logs
WHERE id IN (
SELECT id FROM logs WHERE created_at < now() - INTERVAL '90 days'
LIMIT 10000
);

Bulk delete on a hot table is best done in batches — a single 10-million-row DELETE holds locks and bloats the WAL. Loop the batched version above until it returns zero rows.

Upsert: ON CONFLICT and MERGE#

The Postgres-style upsert (and the SQL standard’s MERGE, which Postgres also supports as of 15):

-- Postgres-style: idiomatic, fast
INSERT INTO daily_metrics (day, metric, value)
VALUES ('2026-05-24', 'signups', 42)
ON CONFLICT (day, metric)
DO UPDATE SET value = EXCLUDED.value + daily_metrics.value;
-- MySQL equivalent
INSERT INTO daily_metrics (day, metric, value)
VALUES ('2026-05-24', 'signups', 42)
ON DUPLICATE KEY UPDATE value = VALUES(value) + value;
-- SQL standard MERGE (Postgres 15+, SQL Server, Oracle, MySQL 8.0.31+)
MERGE INTO daily_metrics t
USING (VALUES ('2026-05-24', 'signups', 42)) AS s(day, metric, value)
ON t.day = s.day AND t.metric = s.metric
WHEN MATCHED THEN UPDATE SET value = t.value + s.value
WHEN NOT MATCHED THEN INSERT (day, metric, value) VALUES (s.day, s.metric, s.value);

The EXCLUDED pseudo-table in Postgres’s ON CONFLICT refers to the row that would have been inserted, letting the update clause compose insert-and-update values.

Row locking#

BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- row locked until commit
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

FOR UPDATE acquires an exclusive row lock. FOR SHARE acquires a shared lock (other readers OK, no writers). FOR UPDATE SKIP LOCKED skips rows another transaction has already locked — the classic recipe for queue-style workloads:

-- Worker pulls one available job
SELECT id, payload FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC, id
LIMIT 1
FOR UPDATE SKIP LOCKED;

Variants#

  • INSERT ... ON CONFLICT DO NOTHING — silent insert-if-not-exists. Useful for idempotent ingestion where the existing row’s content doesn’t matter.
  • INSERT ... ON CONFLICT (col) WHERE pred DO UPDATE — conflict scoped to a partial unique index. Powerful and underused.
  • UPDATE ... FROM / DELETE ... USING (Postgres) — join in additional tables for the WHERE clause without a subquery.
  • COPY (Postgres) / LOAD DATA INFILE (MySQL) — bulk-load path that bypasses the row-by-row INSERT machinery. 10-100x faster for big imports.
  • CTE-based DMLWITH x AS (DELETE FROM ... RETURNING *) INSERT INTO archive SELECT * FROM x moves rows between tables in a single statement, atomically.
  • TRUNCATE instead of DELETE for clearing a whole table — DDL-fast but takes ACCESS EXCLUSIVE lock and doesn’t fire row-level triggers.

Trade-offs#

INSERT ... ON CONFLICT (single statement upsert) — atomic, no race, one round trip, leverages the unique index the engine already maintains. The right default for idempotent writes. Limited to a single conflict target; complex multi-key upserts need MERGE or per-key logic.
SELECT-then-INSERT-or-UPDATE in application code — readable but racy: between your SELECT and your INSERT, another transaction can insert the same key. You’ll hit a unique-violation error you have to retry. Avoid unless you have a reason the engine’s conflict handling can’t express.

Other practical considerations:

  • BatchingINSERT ... VALUES (...), (...), (...) is dramatically faster than N single-row inserts because of per-statement parse/plan/network overhead. Batch sizes of 500-5000 rows typically hit the sweet spot.
  • Soft delete vs hard delete — soft delete (UPDATE ... SET deleted_at = now()) keeps the row recoverable and avoids cascading FK action; hard delete reclaims space but is irreversible without a backup.
  • UPDATE rewriting the whole row (Postgres MVCC) — every UPDATE creates a new tuple version; on a heavily-updated row this generates bloat. HOT (heap-only-tuple) optimisation kicks in when the updated column isn’t indexed and the new tuple fits in the same page.
  • Trigger amplification — every INSERT/UPDATE/DELETE may fire BEFORE and AFTER triggers, statement-level and row-level. A “simple” bulk update can run many million trigger invocations.

Common pitfalls#

  • UPDATE / DELETE without WHERE. The canonical disaster. Most teams add a check (Postgres’s --single-transaction then preview-and-confirm, or a LIMIT clause in MySQL) to make this hard.
  • Forgetting updated_at. Most teams maintain it via trigger; some via application code. Inconsistency between writers is the usual bug.
  • Racy SELECT-then-INSERT. Two workers see “no row” simultaneously, both insert, one wins, the other gets a unique violation. Fix: ON CONFLICT or SELECT ... FOR UPDATE plus the insert in the same transaction.
  • Bulk inserts inside a giant transaction. Holds locks for the duration, balloons the WAL, blocks autovacuum on the target table. Batch into transactions of thousands-to-tens-of-thousands of rows.
  • DELETE cascading further than expected. ON DELETE CASCADE propagates through FK chains. A single delete can trigger million-row cascades — measure before you commit.
  • RETURNING overheads. Returning many rows from a bulk DML doubles the network footprint. Use RETURNING for IDs and small confirmations, not for full row dumps.
  • Locking the same row twice in one transaction. SELECT ... FOR UPDATE is reentrant within a transaction; problems happen when two transactions lock rows in different orders and deadlock. Always lock in a consistent order (e.g., by id ASC).
The upsert pattern most production code converges on

For idempotent ingestion (events, metrics, sync), use INSERT ... ON CONFLICT (...) DO UPDATE SET col = EXCLUDED.col. For the queue pattern, use SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1 to pull a job and UPDATE jobs SET status = 'done' WHERE id = ? to mark it complete, all in one transaction. For decrement-with-precondition (inventory, budget), use UPDATE ... SET col = col - 1 WHERE id = ? AND col > 0 RETURNING col and check the row count — zero means the precondition failed, no race, no separate SELECT. These three patterns cover most of what people write custom locking code for.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.