Integrity Constraints — DBMS · Engineering Playbook

Summary#

An integrity constraint is a declarative rule the DBMS enforces on every write so the data never enters a state the schema considers invalid. Four families cover almost everything:

Entity integrity — every row in a relation has a complete, non-null primary key.
Referential integrity — every foreign-key value points at a real row in the referenced relation.
Domain constraints — each attribute value belongs to the declared domain (data type, NOT NULL, CHECK, length bounds).
User-defined constraints — arbitrary business rules expressed via CHECK, partial unique indexes, exclusion constraints, or (as a last resort) triggers.

The defensible default is to push as much as you can into the declarative layer. The DBMS evaluates constraints inside the same transaction as the write, atomically, with no race window. Application-level validation alone is always racy across concurrent writers, and code paths multiply faster than the schema does.

Why it matters#

A constraint is a free regression test that runs on every write forever. The day someone adds a new ingestion script, a one-off backfill, an analyst REPL session, or a third-party tool, the constraint is still there. Application checks live in one codebase; database constraints live underneath all codebases.

The cost of a missing constraint is a class of bug that doesn’t surface until a customer report comes in months later — orphaned rows pointing at deleted users, two “current” addresses for the same person, a negative inventory count, a referral relationship that loops back on itself. Each one is repairable once, but repairing it is reactive work; the constraint prevents the bug class entirely.

In interviews, integrity constraints are how you signal you’ve shipped real systems. A candidate who reaches for a CHECK or a partial unique index when the natural answer is “validate in the app” is signalling correctly.

How it works#

Entity integrity#

Every relation must have a primary key, and primary key columns are implicitly NOT NULL. Two consequences: rows can always be uniquely identified, and the primary key can be referenced by foreign keys without nullability ambiguity.

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,    -- implicitly NOT NULL UNIQUE
  email TEXT NOT NULL UNIQUE
);

Referential integrity#

A foreign key declares that values in one table must exist in another. The DBMS rejects any write that would violate this, evaluated at the end of the statement (or end of transaction if the FK is DEFERRABLE INITIALLY DEFERRED).

CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
  total_cents INT NOT NULL CHECK (total_cents >= 0)
);

ON DELETE actions are the policy when a referenced row is deleted: RESTRICT / NO ACTION reject the delete; CASCADE deletes child rows too; SET NULL or SET DEFAULT nulls out the reference. Match the action to the business meaning.

Domain constraints#

NOT NULL, type, length, and CHECK constraints constrain a single column or row’s values:

CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  sku TEXT NOT NULL CHECK (length(sku) BETWEEN 4 AND 32),
  price_cents INT NOT NULL CHECK (price_cents > 0),
  status TEXT NOT NULL CHECK (status IN ('draft', 'active', 'archived')),
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CHECK expressions can reference multiple columns of the same row but not other rows (use a trigger or exclusion constraint for cross-row rules):

ALTER TABLE bookings
  ADD CONSTRAINT chk_dates CHECK (ends_at > starts_at);

UNIQUE and partial UNIQUE#

UNIQUE enforces no two rows share the same value (per the set semantics, NULLs are treated as distinct in most engines):

CREATE UNIQUE INDEX one_primary_address_per_user
  ON addresses (user_id)
  WHERE is_primary;   -- partial: only enforce among primary addresses

The partial form is the right tool for “one of these per parent” rules that don’t apply to all rows.

Exclusion constraints (Postgres)#

For “no two rows can both satisfy P” where P involves ranges or geometry:

CREATE TABLE room_bookings (
  room_id INT NOT NULL,
  during TSTZRANGE NOT NULL,
  EXCLUDE USING gist (room_id WITH =, during WITH &&)
);

This rejects any insert where two rows have the same room_id and overlapping (&&) time ranges — a classic constraint that would otherwise need a trigger plus a lock.

Triggers — the catch-all#

When the rule cannot be expressed declaratively (cross-row aggregates, derived columns, audit trails), a trigger runs application logic inside the transaction:

CREATE FUNCTION enforce_max_orders_per_day() RETURNS TRIGGER AS $$
BEGIN
  IF (SELECT COUNT(*) FROM orders
        WHERE user_id = NEW.user_id
          AND created_at::date = CURRENT_DATE) >= 100 THEN
    RAISE EXCEPTION 'daily order cap exceeded';
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER limit_orders BEFORE INSERT ON orders
  FOR EACH ROW EXECUTE FUNCTION enforce_max_orders_per_day();

Triggers are correct but heavy: every write to the table runs the function, and they tend to accumulate side effects that surprise the next maintainer. Prefer them when nothing else fits.

Variants and trade-offs#

Declarative constraints (PK, FK, UNIQUE, CHECK, EXCLUDE) — pushed down to the storage engine, evaluated with row locks, visible in \d and tooling, named in error messages. The right default. Cost: a small write-time check, usually amortised by indexes that already exist.

Triggers — arbitrary logic in the transaction. Necessary for cross-row rules, audit trails, computed columns in older engines. Cost: hidden control flow, harder to reason about, easy to deadlock under concurrency, can multiply work invisibly. Reach for them when declarative doesn’t fit.

Other practical considerations:

DEFERRABLE INITIALLY DEFERRED — defers FK checks to commit time, letting you insert circular references within one transaction. Useful for graph data and bulk loads.
NOT VALID then VALIDATE CONSTRAINT (Postgres) — adding a CHECK to a huge table without a long write lock: declare it NOT VALID (applies to new rows only), then run VALIDATE CONSTRAINT in the background.
MySQL CHECK — silently ignored before 8.0.16. Test on the target engine version.
Cross-database referential integrity — there isn’t any. Once data lives in two systems, you’re back to application-level reconciliation or sagas.

Why constraints belong in the database, even when 'the app handles it'

Three things eventually happen to every long-lived schema. A new write path is added (a job, an admin tool, a CLI) and forgets the validation. A bug in the existing path lets an invalid value slip through under load. A migration or backfill runs raw SQL that bypasses the ORM. Application validation prevents the happy path from misbehaving; the database constraint is the only thing that survives all three. The cost of the constraint is one line in DDL. The cost of cleaning up months of bad rows is open-ended.

When this is asked in interviews#

Usually as a follow-up to schema design: “what constraints would you put on this table?” The answer should walk through:

Primary key and NOT NULL on every identity column.
Foreign keys with explicit ON DELETE for every reference, with the action chosen to match the business meaning.
CHECK constraints on enums and ranges — status IN ('a','b','c'), price > 0, ends_at > starts_at.
Partial UNIQUE for “one X per parent” rules.
Triggers only when nothing else fits, and only with a comment explaining why.

Senior rounds may dig into ordering: when constraints fire during a transaction, deferred vs immediate, what happens on bulk inserts. The cleanest answer is “I enforce as much as the engine lets me declaratively, deferred only when I need cyclic inserts, and I add triggers as a documented exception”.