Data Models, Schemas, and Instances — DBMS

Summary#

A data model is the vocabulary used to describe data — the noun categories the system understands. The relational model has relations, tuples, attributes, domains. The document model has documents, fields, arrays, objects. The graph model has nodes and edges. A schema is the specific structure declared in that vocabulary for a particular database — the tables, columns, types, constraints. An instance is the actual data sitting in that schema at a moment in time. Three levels: vocabulary, structure, contents.

The distinction sounds pedantic until you watch two engineers argue past each other because one means “schema” (the DDL) and the other means “data model” (the conceptual approach). Naming the three layers explicitly is what makes migration discussions, validation rules, and design reviews coherent.

Why it matters#

Every database conversation eventually touches all three layers. “Which database should we use?” is a data-model question. “What should the orders table look like?” is a schema question. “How many orders are pending?” is an instance question. Conflating them produces vague answers — “we’ll use MongoDB because we need a flexible schema” is mixing a data-model choice (document) with a schema-evolution argument (additive changes are easy).

The three-level vocabulary also maps cleanly onto the three-schema architecture (external / conceptual / internal). The conceptual schema is what most engineers mean by “the schema”; the data model is the vocabulary that schema is written in; the instance is what flows through it at runtime.

How it works#

Data model#

The vocabulary. Each model gives you a small set of building blocks plus rules for combining them:

Relational — relations (tables), tuples (rows), attributes (columns), domains (types). Closed under a set of operations (the relational algebra).
Document — documents (JSON-like), fields, arrays, nested objects. No native join; documents are usually denormalised.
Key-value — opaque keys mapped to opaque values. No structure beyond the key.
Wide-column — (partition_key, sort_key) → row of arbitrary columns. Schema flexible per row.
Graph — nodes with properties, edges with properties and types, optional traversal language.
Object — classes, instances, inheritance, references. (Object-oriented databases lost the war but the model survives in ORMs.)

The model is fixed by the DBMS choice. Postgres gives you relational (plus document via JSONB); MongoDB gives you document; Neo4j gives you graph. You can’t “redesign the model” without switching systems.

Schema#

The structure declared in the model’s vocabulary, for one specific database. In a relational system, the schema is the set of CREATE TABLE, CREATE INDEX, CREATE VIEW, CREATE TYPE, CREATE CONSTRAINT statements. It says what tables exist, what columns each has, what types those columns hold, what constraints relate them, and what views project them.

-- the schema
CREATE TABLE users (
  id          bigserial PRIMARY KEY,
  email       text UNIQUE NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE orders (
  id          bigserial PRIMARY KEY,
  user_id     bigint NOT NULL REFERENCES users (id),
  amount      numeric(12, 2) NOT NULL CHECK (amount >= 0),
  placed_at   timestamptz NOT NULL DEFAULT now()
);

A schema in a document store is the set of validators on each collection (MongoDB’s $jsonSchema, for example). In a graph store, it’s the set of node labels and edge types. Same idea, different syntax.

Instance#

The data conforming to the schema at a given moment. For the schema above, the instance might be:

users:    (1, 'ada@example.com',  2026-01-15 09:00:00+00)
          (2, 'grace@example.com',2026-02-03 11:00:00+00)
orders:   (10, 1,  42.00,         2026-03-12 14:00:00+00)
          (11, 2, 199.99,         2026-03-13 10:00:00+00)

Instances change continuously as writes commit. The schema changes only via migrations. The model changes only by adopting a different DBMS family.

Why three levels#

The levels have different lifetimes and different change costs:

Model — changed once, when the system was first designed. Switching is a multi-quarter project.
Schema — changed on a sprint-to-sprint cadence via migrations. Each change has a backwards-compatibility story.
Instance — changed every millisecond by application writes.

Treating instance churn as a schema question, or schema migrations as a model question, leads to over-engineered solutions.

Variants and trade-offs#

Schema-on-write (strict) — schema declared up front, every write validated against it. Errors surface at insert time. Postgres tables, MySQL with strict mode, MongoDB with $jsonSchema, Avro with a registry. The right default for anything where data quality matters more than ingestion speed.

Schema-on-read (flexible) — store whatever, interpret it when reading. Data lakes, JSON files, MongoDB without validators, Hive over CSV. Cheap to ingest, expensive to query (every reader has to handle every quirk). Right for log ingestion and exploratory analytics; wrong for the system of record.

Other dimensions to keep in mind:

Logical vs physical schema. The logical schema is what CREATE TABLE declares; the physical schema is the on-disk realisation (heap, B-tree, partition, compression). The three-schema architecture covers this; for modelling discussions, “schema” usually means logical.
Schema evolution. Additive changes (new column, new table) are safe. Subtractive changes (dropped column, renamed table) require expand-then-contract migrations. The data model doesn’t change either way.
Schemaless is a misnomer. A “schemaless” document database still has a schema — it just lives in the application code instead of the database catalogue. The choice is where the schema is enforced, not whether one exists.

Why 'instance' is rarely said out loud in industry

Academic DBMS texts say “instance” and “state” interchangeably. Working engineers usually say “the data” or “the contents of the table” instead. The vocabulary still matters in interviews and in formal discussions about isolation (an isolation level is a guarantee about which instances of the database a transaction may observe). If you only ever hear “instance” referring to “the running database process” — that’s a different and unrelated use of the word.

When this is asked in interviews#

Two shapes. As a definitions question — “what’s the difference between a data model, a schema, and an instance?” — the answer is the three one-liners plus an example. As a setup for schema design — “design a data model for X” — the question is technically asking for a schema in the relational model, and the right move is to clarify which model is assumed and then walk the ER drill.

A trap question: “is JSON a data model?” — JSON is a data format. The document model is the data model that uses JSON-shaped values; documents in Mongo, Couchbase, Firestore all use JSON or JSON-adjacent serialisations.

Another follow-up: “how does the data model constrain the schema?” — the model determines what you can express. You cannot declare a foreign key in a pure key-value store; you cannot declare a nested array as a column in a strict relational table without resorting to a JSONB extension.