Keys — Primary, Candidate, Foreign, Composite — DBMS

Summary#

A key in the relational model is a minimal set of attributes whose values uniquely identify a tuple. Four flavours show up constantly: super key (any set that’s unique, including trivially the whole row), candidate key (a minimal super key), primary key (the candidate the schema designer picks as the canonical identifier), and foreign key (an attribute set whose values must appear as a primary key in another relation). A composite key is any of the above with more than one column; a surrogate key is an artificial column (usually a serial integer or UUID) introduced specifically to be the primary key when no natural choice fits.

The interesting design question is rarely “what’s a key?” — it’s “what should the primary key be?” The defensible default in 2026 is a surrogate key (a generated integer or UUID) with the natural identifier kept as a separate UNIQUE constraint. The reasoning compounds: natural keys mutate (people change email, companies rename, ISBNs collide), they leak into foreign keys (forcing cascading updates), and they make joins wider than they need to be. Surrogate keys decouple identity from any business meaning.

Why it matters#

Keys are the load-bearing concept in every schema. A primary key choice locks in: how every foreign key referencing this table looks; the clustered-index layout (in InnoDB and SQL Server); the URL or external API identifier; what “the same row” even means across replications and exports. Get it wrong and the cost compounds quietly until a migration costs weeks.

In interviews, “what would you pick as the primary key for this users table?” is a near-universal warm-up. The answer the interviewer is listening for is the meta-answer: name the candidate keys (email, phone, employee_id), explain why each is risky as a primary key (mutable, nullable, scope-limited), and propose a surrogate (id BIGSERIAL or id UUID) with the natural identifiers kept as UNIQUE constraints. That sequence shows you’ve shipped a schema before.

How it works#

The hierarchy#

Super key — any set of attributes that uniquely identifies a tuple. {id}, {id, email}, and {id, email, name} are all super keys if id is unique. There can be many.
Candidate key — a super key with no redundant attributes (removing any column breaks uniqueness). {id} is a candidate; {id, email} is not, because dropping email still uniquely identifies.
Primary key — the candidate the designer elevates to canonical. Exactly one per table. Implicitly NOT NULL and UNIQUE. By convention, columns named id.
Alternate key — any candidate key that isn’t the primary key. Usually enforced with a UNIQUE constraint.

Foreign keys#

A foreign key declares that the values in some column(s) of one table must appear as a primary (or unique) key in another:

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT UNIQUE NOT NULL
);

CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(id),
  total_cents INT NOT NULL
);

The DBMS enforces three things at write time: an INSERT or UPDATE on orders.user_id must reference an existing users.id; a DELETE on users of a row that’s still referenced is rejected (default) or cascades; an UPDATE of the referenced key behaves similarly.

ON DELETE and ON UPDATE can be NO ACTION (the default, reject), CASCADE (propagate), SET NULL, SET DEFAULT, or RESTRICT. The choice should match the business meaning: usually CASCADE for owned-child relationships (order_items deleted with their order), SET NULL for optional references, RESTRICT for shared-reference data.

Composite keys#

A composite key has multiple columns. Common in junction tables for many-to-many relationships:

CREATE TABLE enrollments (
  student_id BIGINT NOT NULL REFERENCES students(id),
  course_id BIGINT NOT NULL REFERENCES courses(id),
  enrolled_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (student_id, course_id)
);

Column order in a composite key matters for index access — (student_id, course_id) answers queries on student_id alone but not course_id alone. Pick the leading column to match the predominant access pattern.

Surrogate vs natural#

A natural key is drawn from the domain (email, ISBN, SSN, employee number). A surrogate key is a generated value with no meaning outside the database (BIGSERIAL, UUID, ULID).

-- Surrogate primary key, natural identifiers as UNIQUE:
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  sku TEXT UNIQUE NOT NULL,
  isbn TEXT UNIQUE
);

The surrogate is immutable, narrow (8 bytes for BIGSERIAL), and decoupled from business changes. The UNIQUE constraints on sku and isbn give the database the same integrity guarantees a natural primary key would have.

Variants and trade-offs#

Auto-increment integer (BIGSERIAL, AUTO_INCREMENT) — narrow (8 bytes), monotonic (good for B-tree append patterns and clustered indexes), cheap to generate. The default for single-region OLTP. Leaks rough row count and creation order externally — use a separate slug or UUID for public URLs if that matters.

UUID (v4 random or v7 time-ordered) — 16 bytes, globally unique without coordination, safe to generate client-side and on multiple writers. v4 randomness causes index fragmentation; v7 (time-prefixed) restores monotonic insertion. The right default for distributed systems and externally-visible IDs.

Other tradeoffs to be aware of:

Natural keys when they’re truly stable — country codes (US, IN), currency codes (USD, INR), ISO date — these don’t change and using them directly avoids a join. Even then, a surrogate id with a UNIQUE on the code is often cleaner.
Composite keys as primary keys — fine for junction tables; questionable for entity tables because every referencing FK has to carry the same composite, widening all child tables.
SERIAL vs IDENTITY (Postgres) — GENERATED ALWAYS AS IDENTITY is the SQL-standard successor to SERIAL and avoids a few sequence-ownership quirks. New code should prefer it.
ULID and Snowflake IDs — variants of UUID with deliberate sort order. Useful when you want client-generatable IDs that also cluster well in a B-tree.

Why surrogate-by-default works in practice

The argument for surrogate keys isn’t theoretical purity, it’s a sequence of bets: that any natural identifier you pick will eventually need to change (or already does in edge cases the product team hasn’t surfaced); that you’ll want to expose internal IDs to external systems in a controlled way; that your foreign keys should be as narrow as possible; and that the cost of an extra 8-byte column on the parent table is dwarfed by the cost of a primary key migration. Surrogates trade a small column for optionality. In a decade-long codebase, optionality wins.

When this is asked in interviews#

Always, in some form. The typical sequence:

“Design a schema for X.” — the very first decision is the primary key on the central entity. Pick a surrogate, declare the natural identifiers as UNIQUE, move on.
“What’s wrong with using email as the primary key?” — mutability, case sensitivity, the cascade problem, and the visibility leak. One-by-one.
“What’s the difference between a candidate and a primary key?” — candidates are all the minimal unique sets; the primary is the one you pick.
“How would you handle a many-to-many?” — junction table with composite primary key (a_id, b_id), possibly plus a surrogate id if the relationship itself has attributes.

In system-design rounds the question is often framed around scale: “what’s the primary key for this 50-billion-row events table?” — answer with (tenant_id, event_id) composite or a time-ordered UUID, depending on the partitioning story.