Relational Model Concepts — DBMS · Engineering Playbook

Summary#

Codd’s 1970 paper proposed a deceptively simple idea: model all data as relations — finite sets of tuples, each tuple a fixed-arity assignment of values to named attributes, each attribute drawn from a domain. A table is the practical face of a relation; a row is the practical face of a tuple. The names are different in the formal literature and in SQL, and people fight about that needlessly. The model that won is the one with three properties: a well-defined vocabulary (relation, tuple, attribute, domain), a small algebra closed over relations (every operator takes relations and returns a relation), and a declarative query language (SQL) that compiles down to that algebra.

The relational model won over the hierarchical and network models of the 1960s because composition is free. Any query result is itself a relation, so it can feed into another query without translation. The closed algebra is what makes views, subqueries, CTEs, and the entire SQL ecosystem possible.

Why it matters#

Every relational concept downstream — keys, normalization, joins, views, transactions — assumes you can talk about a relation as a first-class value. Without that, “what does this query return?” has no clean answer; with it, every query returns the same kind of thing the inputs were.

In interviews, this is rarely asked head-on, but it underlies everything. When a candidate says “I’ll join orders with users and group by region”, they’re using the closed-algebra property without naming it. A clean mental model of relation-in / relation-out helps when reasoning about NULL semantics, set vs bag operations, and why SELECT DISTINCT exists.

The model also clarifies one of the oldest sources of confusion: a relation is a set (no duplicates, no ordering), but a SQL table is a bag (duplicates allowed, ordering implementation-defined). The gap is why UNION defaults to set semantics (deduplicating) while UNION ALL is the bag operation, and why every primary-key choice is really a declaration that this particular table aspires to relation-like uniqueness.

Relation vs table — the practical translation

Formal	SQL
Relation	Table
Tuple	Row
Attribute	Column
Domain	Data type (with `CHECK` constraints)
Relation schema	`CREATE TABLE` statement
Relation instance	Current rows in the table

How it works#

The vocabulary#

A relation R has a fixed schema — an ordered list of attribute names, each tagged with a domain. The schema is part of the relation’s identity: R(a: INT, b: TEXT) and R(a: TEXT, b: INT) are different relations even at the same name.

An instance of R is a set of tuples conforming to that schema. The instance changes over time as rows are inserted, updated, deleted; the schema is more static.

A domain is the set of legal values for an attribute. In SQL, the domain is most of INT, TEXT, TIMESTAMP, etc., usually narrowed further by NOT NULL, CHECK, or UNIQUE constraints.

The closure property#

Every relational operator — select (filter), project (pick columns), join, union, difference, rename — takes one or two relations and returns a relation. Nothing else. This is why an arbitrary SQL expression can be a subquery, a view, a CTE, or the target of another join — they all have the same shape.

-- The result of this query is itself a relation with attributes (region, n)
SELECT region, COUNT(*) AS n
FROM users
GROUP BY region;

You can wrap it in another query without translation:

SELECT region FROM (
  SELECT region, COUNT(*) AS n FROM users GROUP BY region
) t
WHERE n > 1000;

Set vs bag semantics#

The pure relational model uses set semantics — a relation cannot contain two identical tuples. SQL relaxes this to bag semantics by default: a table can have duplicate rows unless a uniqueness constraint forbids it. The gap matters most for:

SELECT col FROM t — returns a bag (duplicates kept). SELECT DISTINCT col FROM t is the set version.
UNION ALL (bag) vs UNION (set, deduplicating).
COUNT(*) (counts the bag) vs COUNT(DISTINCT col) (counts the set).

Picking the right one is a performance choice as much as a correctness one — DISTINCT typically requires a sort or hash, while bag operations stream.

NULL: the awkward fourth value#

Codd’s original model didn’t have NULL. SQL added it to represent “unknown”, and the resulting three-valued logic (TRUE / FALSE / UNKNOWN) is the most-complained-about wart in the relational world. NULL = NULL is UNKNOWN, not TRUE. WHERE col = NULL matches no rows; you have to write WHERE col IS NULL. Most aggregates skip NULL silently. The model still works, but the rough edges are real.

Variants and trade-offs#

Pure relational (the textbook) — sets only, no NULL, every operator total. Beautiful algebra; useful for proofs, formal reasoning, normalization theory. Not how any shipping database actually behaves.

SQL relational (what you ship) — bags by default, NULL with three-valued logic, ordering exposed via ORDER BY, duplicate elimination opt-in. Pragmatic, performant, and consistent with what the storage engine actually does. The model the rest of the database stack is built on.

Other model variants worth knowing:

Object-relational — Postgres’s heritage. Adds composite types, inheritance, custom domains. Most teams use a sliver of this (arrays, JSONB) without buying the full story.
Document model — relaxes the schema-per-relation contract; each document carries its own structure. Trades the closed algebra for flexibility, then spends a decade reinventing joins.
Graph model — promotes relationships to first-class objects. The relational equivalent is a many-to-many junction table with extra columns; the graph version is more ergonomic for variable-depth traversal.

Why composition matters more than syntax

The relational model would not be worth much if every query language had to be hand-written. The point of the closed algebra is that an optimiser can rewrite queries — push selections below joins, reorder joins, materialise common subexpressions — and prove the rewrites preserve semantics because every step is a relation-in / relation-out operator. SQL is declarative because the algebra underneath is composable. Take the closure property away and you’re back to writing imperative loops over file records.

When this is asked in interviews#

Rarely as a direct “define the relational model” question; usually woven into normalization, JOIN, or schema-design discussions. Be ready to:

State the vocabulary cleanly — relation, tuple, attribute, domain. One sentence each.
Explain set vs bag — why SELECT DISTINCT exists, why UNION and UNION ALL differ.
Handle NULL — three-valued logic, IS NULL vs = NULL, what aggregates do.
Defend the closure property — “why can I put a subquery in any expression position?” because every relational operation returns a relation.

System-design rounds use this as a foundation: when the candidate proposes a relational schema, the interviewer is implicitly checking that they think in tables-as-relations rather than tables-as-files.

Summary#

Why it matters#

How it works#

The vocabulary#

The closure property#

Set vs bag semantics#

NULL: the awkward fourth value#

Variants and trade-offs#

When this is asked in interviews#

Related concepts#