DBMS Workbook
53 topics across foundations, non-functional requirements, building blocks, full system designs, and public postmortems. Every system uses the same 7-step interview walk-through; every building block has a consistent design template.
Foundations
4 items What a database is, what a DBMS adds over a pile of files, the three-schema architecture, and the family of database systems.
Foundations
4 items- What Is a Database?
A shared, structured, queryable, persistent collection of data — and the DBMS that gives those four properties teeth.
Concept Foundational - Why Not Files? The Case for a DBMS
The limits of file-based systems that drove the move to databases — redundancy, integrity, concurrency, recovery.
Concept Foundational - DBMS Architecture — The Three-Schema Model
Conceptual, logical, and physical schemas; data independence; how a schema change at one level shouldn't break the others.
Concept Foundational - Classification of Database Systems
Relational vs document vs key-value vs graph vs columnar vs time-series. What each is good at and where the boundaries blur.
Concept Foundational
Data Modeling
5 items ER modeling — entities, attributes, relationships, constraints, weak entities — and translating ER to relational tables.
Data Modeling
5 items- Data Models, Schemas, and Instances
Model = vocabulary. Schema = structure. Instance = data. The difference matters for migration, validation, and discussion.
Concept Foundational - Entity-Relationship Model — Entities, Attributes, Keys
Entities, attributes, primary keys; entity sets vs entity types; the smallest vocabulary to design a schema.
Concept Foundational - ER Relationships and Constraints
Relationship types, degree (binary / ternary), cardinality constraints, participation constraints, attributes on relationships.
Concept Intermediate - Weak Entities and Identifying Relationships
Entities whose identity depends on another entity, identifying keys, and where this pattern shows up in real schemas.
Concept Intermediate - Translating ER Diagrams to Relational Tables
The standard rules — each entity becomes a table, each relationship becomes a foreign key or junction table, weak entities collapse.
Building Block Intermediate
Relational Model
6 items Relations, keys, integrity constraints, the algebra, transactions, ACID, isolation levels.
Relational Model
6 items- Relational Model Concepts
Relations, tuples, attributes, domains. Why the relational model won — composability and a closed algebra.
Concept Foundational - Keys — Primary, Candidate, Foreign, Composite
What each key is for, how to choose a primary key, why surrogate keys are usually the right default.
Concept Foundational - Integrity Constraints
Entity integrity, referential integrity, domain constraints, CHECK constraints, and triggers as the catch-all.
Concept Foundational - Relational Algebra Basics
Selection, projection, join, union, difference. The algebra SQL is compiled into.
Building Block Intermediate - Transactions and ACID
Atomicity, Consistency, Isolation, Durability. What each one promises and where each one bends in practice.
Concept Foundational - Isolation Levels and Anomalies
Read-uncommitted to serializable; the anomalies each prevents (dirty read, non-repeatable, phantom, write skew).
Concept Intermediate
Normalization
3 items Functional dependencies and the normal forms (1NF, 2NF, 3NF, BCNF) — and when stopping is the right answer.
Normalization
3 items- Functional Dependencies
A → B as the unit of schema reasoning; closures; minimal covers; how FDs drive the normal forms.
Concept Intermediate - Normalization — 1NF, 2NF, 3NF
Atomicity, full functional dependency, transitive dependency — the three levels every relational schema starts with.
Concept Intermediate - BCNF, 4NF, and When to Stop Normalizing
BCNF as the strict-3NF; the pragmatic stopping point and where denormalization is genuinely worth it.
Concept Intermediate
SQL
8 items SQL as the user-facing surface — DDL, DML, SELECT, joins, aggregates, subqueries, CTEs, window functions, EXPLAIN.
SQL
8 items- SQL DDL — CREATE, ALTER, DROP
Defining the schema. Tables, columns, constraints, indexes — and the ALTER traps on a live system.
Building Block Foundational - SQL DML — INSERT, UPDATE, DELETE, MERGE
Mutating data. RETURNING, UPSERT / ON CONFLICT, the things to know about row-locking on writes.
Building Block Foundational - SELECT and the WHERE Clause
The shape of every read. Predicate logic, NULL handling, the difference between AND/OR and short-circuit eval.
Building Block Foundational - JOIN — INNER, LEFT, RIGHT, FULL, CROSS
Connecting tables. When LEFT JOIN is the right tool, what FULL OUTER is for, and how CROSS JOIN should be feared.
Building Block Foundational - Aggregates, GROUP BY, HAVING
COUNT / SUM / AVG / MIN / MAX, the GROUP BY contract, why HAVING is not WHERE.
Building Block Foundational - Subqueries and CTEs
Correlated vs non-correlated subqueries, the readability case for CTEs, recursive CTEs for trees.
Building Block Intermediate - Window Functions
OVER, PARTITION BY, ORDER BY, frame clauses. Running totals, ranks, lag/lead — the part of SQL most engineers underuse.
Building Block Intermediate - Query Execution and EXPLAIN Plans
How the planner reads a query, the cost-based optimizer, EXPLAIN/EXPLAIN ANALYZE, the four operators you'll see most.
Concept Intermediate
Storage & Indexes
6 items Row-store vs column-store, B-trees, hash and bitmap indexes, LSM-trees, WAL, MVCC.
Storage & Indexes
6 items- Row-Store vs Column-Store
Why OLTP is row-oriented and OLAP is column-oriented; the storage layout that explains the performance gap.
Building Block Intermediate - B-Tree Indexes
The classic disk-friendly tree, fanout, leaf-page linking, why every relational DBMS defaults to B-tree.
Building Block Intermediate - Hash and Bitmap Indexes
Hash indexes for equality lookups, bitmap indexes for low-cardinality columns, the trade-offs vs B-tree.
Building Block Intermediate - LSM-Tree Storage
Memtables, SSTables, compaction. The structure behind LevelDB, RocksDB, Cassandra, ScyllaDB, and DynamoDB's storage.
Building Block Advanced - Write-Ahead Logging and Recovery
Why every durable database writes a WAL, ARIES, redo/undo, checkpointing, crash recovery.
Building Block Intermediate - MVCC — Multi-Version Concurrency Control
Snapshot isolation, vacuum, the trade-off vs lock-based concurrency. Why Postgres, MySQL InnoDB, and Spanner all use MVCC.
Building Block Advanced
Systems & Postmortems
6 items Named systems — PostgreSQL, MySQL/InnoDB, DynamoDB, Spanner — and public postmortems worth reading.
Systems & Postmortems
6 items- PostgreSQL — The Reference Open-Source RDBMS
Process-per-connection model, MVCC, the planner, extensibility (FDW, custom types), and what makes Postgres the default.
System Intermediate - MySQL / InnoDB
The B+ tree clustered-index storage engine, group replication, the historical mistakes and how 8.x undid them.
System Intermediate - Amazon DynamoDB
Partition + sort key model, single-table design, on-demand vs provisioned capacity, GSIs, transactions, the cost model.
System Advanced - Google Spanner
TrueTime, externally consistent transactions, Paxos groups, the breakthrough that made global ACID actually work.
System Advanced - GitLab 2017 — The Database Outage
A mistaken `rm -rf` on the primary; five backup mechanisms that all failed; the public postmortem everyone should read.
Postmortem Foundational - Knight Capital 2012 — $440M in 45 Minutes
An old code path enabled by a deploy; loose-state assumptions in a trading system; what 'shared mutable state' costs at scale.
Postmortem Foundational
SQL Patterns
15 items Fifteen reusable query templates that show up in every SQL interview — aggregation, filtering, comparison, sequencing/hierarchical, transformation. The decomposition vocabulary for data problems.
SQL Patterns
15 items- Tally Count
Count occurrences of values in a column. The COUNT(*) GROUP BY template every SQL interview opens with.
Pattern Foundational - Group Bucket
Bucket continuous values into ranges (age bands, price tiers) using CASE WHEN inside GROUP BY.
Pattern Foundational - Rolling Totals
Cumulative aggregations over a window. SUM() OVER (ORDER BY ... ROWS BETWEEN ...). The time-series running-total template.
Pattern Intermediate - Patch Gaps
Fill missing rows in a sequence with default values via LEFT JOIN against a generated calendar or series.
Pattern Intermediate - Existence Check
EXISTS vs NOT EXISTS vs IN vs LEFT JOIN IS NULL — four ways to say where this related row does or does not exist.
Pattern Foundational - Deduplication
ROW_NUMBER() OVER (PARTITION BY ...) = 1 and the DISTINCT ON (Postgres) alternatives for keeping one row per group.
Pattern Foundational - Pattern Search
LIKE, ILIKE, regex, trigram, full-text — the spectrum of find rows that match a string pattern.
Pattern Foundational - Set Compare
INTERSECT / EXCEPT / UNION and the equivalent JOIN / NOT EXISTS forms. When set algebra beats joins.
Pattern Intermediate - Join Variants
Self-join, anti-join, semi-join, lateral join, theta-join. Beyond the standard four kinds of join.
Pattern Intermediate - Nested Query
Correlated vs uncorrelated subqueries; subquery in SELECT vs WHERE vs FROM. When CTEs beat nested subqueries.
Pattern Intermediate - Rank and Row
ROW_NUMBER vs RANK vs DENSE_RANK. Per-partition top-N selection. The second-highest-salary canon.
Pattern Intermediate - Lead-Lag Navigation
LEAD() and LAG() for previous/next-row comparison. Streak detection, change-from-previous metrics, run-length tracking.
Pattern Intermediate - Recursive Tree
Recursive CTEs (WITH RECURSIVE) for tree traversal, transitive closures, organisational hierarchies.
Pattern Advanced - Gaps and Islands
The streak-detection template: subtract row number from a value, group by the difference. Identifies runs of consecutive rows.
Pattern Advanced - Pivot Flip
Long-to-wide and wide-to-long reshapes. PIVOT (T-SQL), crosstab (Postgres), conditional aggregation as the portable form.
Pattern Intermediate