DBMS Architecture — The Three-Schema Model
Conceptual, logical, and physical schemas; data independence; how a schema change at one level shouldn't break the others.
Summary#
The three-schema architecture (ANSI/SPARC, 1975) splits a database’s description into three layers — external (what each user or application sees), conceptual (the unified logical model of the whole database), and internal (how the data is physically stored). The point isn’t bureaucratic separation; it’s data independence. A change at one layer should not cascade into the layers above it. Add an index? The conceptual schema doesn’t move. Reorganise a column? Existing views keep working. Add a new view for a new app? The underlying tables don’t change.
Most working engineers don’t consciously think in three layers, but they do operate them: CREATE TABLE is conceptual, CREATE INDEX is internal, CREATE VIEW is external. Naming the layers is what lets you reason about migrations without breaking consumers.
Why it matters#
Every long-lived system eventually needs to change one layer without changing the others. A team adds a covering index — the SQL on top should be unaffected. A team renames a column for an internal client but can’t touch the dozen reports built on the old name — a view does the rename. A team migrates the storage engine from heap to clustered index — the application queries are unchanged. The three-schema model is the mental scaffolding for all of these moves.
In interviews, it shows up as “explain data independence” or “why do we use views?”. Both questions are answered by the same diagram. It also frames the deeper conversation about why physical reorganisations are cheap and logical migrations are expensive — the answer is “because physical lives below the schema everyone references; logical is the schema everyone references”.
How it works#
The three layers and what lives at each:
┌────────────────────────────────────────────────┐ │ External / View level │ │ CREATE VIEW active_customers AS … │ │ per-user, per-app subsets and projections │ └────────────────────────────────────────────────┘ ▲ │ external/conceptual mapping ▼ ┌────────────────────────────────────────────────┐ │ Conceptual / Logical level │ │ CREATE TABLE customers (id, email, …) │ │ the single unified model of all data │ └────────────────────────────────────────────────┘ ▲ │ conceptual/internal mapping ▼ ┌────────────────────────────────────────────────┐ │ Internal / Physical level │ │ heap, B-tree on (email), partitioned by date │ │ files, blocks, page layouts │ └────────────────────────────────────────────────┘External level#
Each user, application, or role sees a tailored view. A customer-support tool sees customers without the marketing-opt-in column; a billing service sees only the rows where status = 'active'. Implemented as SQL views, row-level-security policies, or per-role projections.
Conceptual level#
The single, canonical description of all data — entities, attributes, keys, relationships, constraints. This is what CREATE TABLE and CREATE FOREIGN KEY describe. The conceptual schema is owned by the schema author and changed via migrations; it’s the artefact every external view and every internal storage decision is built on top of.
Internal level#
How the conceptual schema is physically realised — which storage engine, what file layout, which indexes, which partitions, which page size, which compression. CREATE INDEX, CLUSTER, ALTER TABLE ... SET TABLESPACE, partition definitions. Two databases with identical conceptual schemas can have radically different internal schemas.
Mappings and data independence#
Between each pair of layers there’s a mapping — a translation from one layer’s terms to the next. Two kinds of independence follow:
- Logical data independence — change the conceptual schema (add a column, split a table) without changing external views. Achieved by writing views that reference only the columns the consumer cares about.
- Physical data independence — change the internal schema (add an index, reorganise storage, switch engines) without changing the conceptual schema. The DBMS query planner reads the new physical layout from the catalogue and adjusts plans automatically.
Physical independence is the stronger and more universally delivered guarantee. Logical independence is partial — adding a column is safe, renaming or dropping one almost always breaks something.
Variants and trade-offs#
A few real-world refinements:
- Logical replication and CDC. A consumer outside the DBMS subscribes to changes on the conceptual schema and projects them into its own store. This is the external level reframed as an asynchronous feed — Debezium, Kafka Connect, Postgres logical decoding.
- Information schema and catalogue. Every DBMS exposes a self-describing catalogue (
information_schemain standard SQL,pg_catalogin Postgres) that documents the conceptual and internal schemas at runtime. The query planner reads it on every query. - Layer leakage. Internal decisions sometimes show up at the conceptual level — partition keys become part of the primary key, denormalisations leak storage concerns into the model. Pure independence is the goal; production schemas usually have one or two acknowledged leaks.
Why renaming a column is almost never safe even with logical independence
Logical data independence promises that additive conceptual changes (new columns, new tables) don’t break external views. Subtractive or renaming changes propagate by definition — every view, every saved query, every ORM model that references the old name has to move. The “expand then contract” migration pattern (add the new column, dual-write, switch readers, drop the old) exists precisely because column renames cannot ride on independence guarantees.
When this is asked in interviews#
Three contexts. As a pure-DBMS question — “explain the three-schema architecture” — the answer walks the diagram with one or two sentences per layer plus the independence definitions. As a schema-evolution question — “how would you rename a column without downtime?” — the answer leans on logical independence and the expand-contract pattern. As a system-design question — “how would you let analytics teams query the production data?” — the right answer is “give them a read-replica with views over the conceptual schema; don’t let them touch the tables directly”, which is the external-level argument applied operationally.
A common follow-up is “give me an example where data independence fails”. Renaming a column is the easy answer; dropping an index in a way that changes which plans are cheap (a physical change that breaks SLA without breaking semantics) is the subtler one.
Related concepts#