Classification of Database Systems

Relational vs document vs key-value vs graph vs columnar vs time-series. What each is good at and where the boundaries blur.

Concept Foundational
7 min read
classification nosql oltp olap data-models

Summary#

“Database” is a category with at least six commonly-discussed families: relational, document, key-value, wide-column, graph, and specialised systems like columnar analytics and time-series. Each family picks a different weighting of the four database properties (structured, queryable, shared, persistent) and a different access pattern. The right way to classify a DBMS is not by vendor or by “SQL vs NoSQL”, but by the data model it exposes and the workload it’s tuned for.

The boundaries blur in practice. PostgreSQL ships a JSONB type and behaves as a document store when you use it. MongoDB added ACID transactions and looks more relational every release. The classification is still useful — it tells you where each system is most natural — but real systems often combine two or three.

Why it matters#

System-design interviews almost always include a “which database?” decision. The expected answer is not “Postgres” or “DynamoDB” reflexively; it’s a walk through the access pattern, the consistency requirements, the scale, and the right family for each. Picking the wrong family — using a graph database for tabular analytics, or a relational database for a billion-edge social graph — turns into a multi-quarter rewrite.

In day-to-day work, the classification frames build-vs-buy and integration decisions. A new feature wants a flexible schema for user-defined attributes; that’s a document or a JSONB column, not a new table-per-tenant. A new feature needs fast multi-hop traversal; that’s a graph database or a recursive CTE, not a hand-rolled join.

How it works#

The six families and their core access pattern:

Relational (row-oriented OLTP)#

PostgreSQL, MySQL, Oracle, SQL Server, SQLite. Data lives in rows, each row a fixed-schema tuple. Access is via SQL: declarative, set-oriented, supports joins, aggregates, transactions. The right default for anything financial, account-centric, or with rich relationships between entities. Strong on ACID. Weaker on horizontal write scale beyond one node — though Aurora, Spanner, CockroachDB chase this.

Document#

MongoDB, Couchbase, Firestore. Data lives in JSON-like documents grouped into collections. Each document is self-contained and can have its own fields. Access is via a query language that mixes filter, project, and aggregate but doesn’t natively join. Good fit for hierarchical or denormalised data where the document boundary aligns with the access pattern (a product with nested variants, an order with nested line items).

Key-value#

Redis, DynamoDB (in its simplest mode), Memcached, etcd. Data is key → value where the value is opaque to the store (Redis is the exception — it understands lists, sets, hashes, sorted sets). Access is by exact key. No queries beyond the key. Microsecond-scale reads when the working set fits in memory.

Wide-column#

Cassandra, ScyllaDB, HBase, Bigtable. Data is (partition_key, sort_key) → row of arbitrary columns. Access is by partition key plus a range on the sort key. Optimised for high write throughput at horizontal scale. Schemas are flexible per row but query patterns must be known up front (because you choose partition and sort keys before you can query).

Graph#

Neo4j, JanusGraph, Amazon Neptune, TigerGraph. Data is nodes and edges, each potentially with properties. Access via a graph query language (Cypher, Gremlin) optimised for multi-hop traversal — “friends of friends”, “shortest path”, “all transitive dependencies”. Where the access pattern is recursion, graph wins; where it’s tabular, relational beats it.

Columnar analytics#

ClickHouse, BigQuery, Snowflake, Redshift, Druid, DuckDB. Data is stored column-by-column, compressed heavily, scanned in vectorised chunks. Optimised for SELECT agg(...) FROM big_table WHERE ... GROUP BY .... Useless at point updates; ideal for analytical scans across billions of rows.

Time-series (specialised)#

InfluxDB, TimescaleDB, Prometheus, QuestDB. Data is (timestamp, dimensions, metric) triples. Storage is partitioned by time, compressed per dimension, with TTLs and downsampling baked in. Where the workload is metrics, logs, IoT readings, observability, this family beats general-purpose stores by an order of magnitude.

A summary table to keep in your head:

FamilyModelStrong onWeak on
Relationalrows, tables, FKsjoins, ACID, complex querieshorizontal write scale
DocumentJSON-like documentsnested data, flexible schemacross-document joins
Key-valuek → v opaquelatency at scaleanything query-shaped
Wide-columnpartition+sort keyshuge write throughput, scansad-hoc queries
Graphnodes and edgesmulti-hop traversalaggregations
Columnarcolumnar pagesanalytical scanspoint writes
Time-seriestime + tags + metricmetrics, observabilitynon-temporal data

Variants and trade-offs#

Polyglot persistence — different families for different bounded contexts. Postgres for the transactional core, DynamoDB for sessions, Elasticsearch for full-text search, ClickHouse for analytics, Redis for cache. Each fits its workload, but the operational tax and consistency-across-stores story grow with each new system.
One database, many features — Postgres with JSONB, full-text search, pgvector, TimescaleDB extension, and a read-replica for analytics. Less specialised at each task, but a single source of truth and one operations team. Often the right answer for the first several years of a system’s life.

Other classification dimensions that come up:

  • OLTP vs OLAP — orthogonal to the data model. ClickHouse and Postgres are both “SQL” but solve opposite problems. The HTAP (hybrid transactional/analytical) story is real but still partial; most systems still split the two.
  • Embedded vs server — SQLite, DuckDB, RocksDB run inside the application process; Postgres, MySQL, MongoDB run as separate servers. Embedded is one-writer-process simple; server is multi-user proper.
  • Open-source vs proprietary cloud-native — Postgres / MySQL / MongoDB are open and runnable anywhere; Spanner / DynamoDB / BigQuery / Snowflake exist only as managed services. The cloud-native side often gives up portability for radically better scale.
  • CAP positioning — across replicas, every distributed DBMS picks (mostly) CP or (mostly) AP. Spanner, CockroachDB lean CP. Cassandra, Dynamo lean AP. Single-node systems don’t have to choose.
Why 'NoSQL' is a useless classification by itself

“NoSQL” lumps together MongoDB (document), Redis (key-value), Cassandra (wide-column), and Neo4j (graph) — four families whose data models, query languages, scaling stories, and use cases have almost nothing in common. The label tells you what a system isn’t; it tells you nothing about what to reach for it for. The data-model axis is the only one with predictive value.

When this is asked in interviews#

Two main shapes. As a vocabulary check — “name the database families and what each is good at” — the answer is the six families with one sentence each, plus the OLTP/OLAP axis. As a design choice — “you need to store X; which database?” — the expected motion is: clarify the access pattern (reads vs writes, point vs range, latency vs throughput), then map to one or two families, then pick a representative.

The follow-up that separates juniors from seniors: “what would force you to switch?”. A good answer names the access pattern shift (a new analytical workload, a new graph traversal, a 10x growth in writes) and the next family up. Treating the choice as static is a tell.

A common trap is over-eager polyglot persistence — proposing five databases for what could be one Postgres with extensions. The operational tax is the right counter-argument.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.