Normalization
Functional dependencies and the normal forms (1NF, 2NF, 3NF, BCNF) — and when stopping is the right answer.
Normalization is the part of database theory most engineers learn once and then internalise to the point of invisibility. The procedure is mechanical (apply 1NF → 2NF → 3NF, stop when functional dependencies are satisfied), but the value is in the reasoning behind it — eliminate redundancy, prevent update anomalies, make the schema robust to changing business rules.
The sweet spot for most OLTP schemas is 3NF or BCNF. Going further (4NF, 5NF) is rarely worth it; stopping earlier (2NF or even 1NF) is usually a denormalization decision worth defending in writing.
Key concepts
- Functional dependency A → B means: given A, B is determined
- 1NF: atomic values per cell
- 2NF: every non-key column depends on the whole primary key (only matters with composite keys)
- 3NF: no transitive dependencies — non-key columns don't determine other non-key columns
- BCNF: every functional dependency's left side is a superkey (strict 3NF)
Reference template
// Normalising step by step
1. Identify candidate keys (every FD's LHS that's a superkey)
2. Apply 1NF (atomic cells)
3. Apply 2NF (remove partial dependencies)
4. Apply 3NF (remove transitive dependencies)
5. Check BCNF (every non-trivial FD has a superkey LHS)
6. Decide whether to denormalise (and write down the reason) Adapt to your problem; the structure is the load-bearing part.
Common pitfalls
- Normalising to 5NF for an OLTP schema — overkill, hurts query performance
- Denormalising without measurement — you're betting on a problem you may not have
- Forgetting that normalisation theorems assume FDs are stable; business rules change
- Treating multi-valued attributes as 'just a JSON column' without thinking about queries
Related topics
Items (3)
- Functional Dependencies
A → B as the unit of schema reasoning; closures; minimal covers; how FDs drive the normal forms.
Concept Intermediate - Normalization — 1NF, 2NF, 3NF
Atomicity, full functional dependency, transitive dependency — the three levels every relational schema starts with.
Concept Intermediate - BCNF, 4NF, and When to Stop Normalizing
BCNF as the strict-3NF; the pragmatic stopping point and where denormalization is genuinely worth it.
Concept Intermediate