Functional Dependencies
A → B as the unit of schema reasoning; closures; minimal covers; how FDs drive the normal forms.
Summary#
A functional dependency (FD) is a statement that within one relation, the values of one set of attributes uniquely determine the values of another. Written X → Y, read “X functionally determines Y”: for every two tuples that agree on X, they must also agree on Y. student_id → name says “two rows with the same student_id always have the same name”. FDs are the formal vocabulary normalisation theory is built on — every normal form (2NF, 3NF, BCNF, 4NF) is defined in terms of which FDs are allowed in a relation.
The reason to know the theory, even casually, is that it gives you a precise way to argue “this schema has a redundancy bug”. Without FDs, the argument is hand-wavy (“looks denormalised”); with FDs, you can point to a specific dependency that violates a specific normal form.
Why it matters#
Schema bugs caused by missing or wrong dependencies are silent and durable. A schema that allows (course_code, course_name) to drift apart will eventually drift apart — different rows will record different names for the same course — and the bug surfaces months later as inconsistent reports. FD analysis catches this at design time by asking: what depends on what, and is the key strong enough to enforce it?
In interviews, FDs are the prerequisite for any serious normalisation discussion. The exam-style version is “given FDs {A → B, B → C}, is the relation in 3NF?”. The applied version is “you have a users table with (user_id, email, country_code, country_name). Find the redundancy.” Both want FD reasoning.
How it works#
What X → Y actually means#
X → Y is a constraint on the relation, not on the current instance. It says: for any valid state of this relation, no two tuples can agree on X and disagree on Y. The fact that today’s data happens to satisfy X → Y is not enough; you have to know it always will.
This matters because FDs come from the domain, not from the data:
ssn → name— true if we know SSN identifies a person and their name is recorded.name → ssn— usually false (two people can share a name).(order_id, line_number) → product_id— true if line numbers are unique within an order.
Trivial vs non-trivial FDs#
X → Y is trivial if Y ⊆ X — every set of attributes determines itself. (user_id, email) → email is trivial. Non-trivial FDs are the interesting ones; normalisation cares only about non-trivial dependencies.
Closure#
The closure of an attribute set X under a set of FDs F, written X+, is the set of all attributes that can be inferred to depend on X. Compute it by repeatedly applying FDs in F:
F = { A → B, B → C, CD → E }
X = {A}X+ = {A} -- start with X itselfA → B applies => X+ = {A, B}B → C applies => X+ = {A, B, C}CD → E does NOT apply (D is not in X+ yet)done. X+ = {A, B, C}X is a superkey of the relation iff X+ equals all attributes. X is a candidate key if it’s a minimal superkey — removing any attribute makes X+ lose at least one attribute.
This is the algorithmic core of every normalisation check: compute closures, find candidate keys, then evaluate which FDs do or don’t satisfy the form you’re targeting.
Minimal cover#
A minimal cover of an FD set F is an equivalent set F' (same closures) with three properties:
- Every FD in
F'has a single attribute on the right-hand side. - No FD in
F'is redundant (removing it changes some closure). - No left-hand side has an extraneous attribute.
Minimal covers are what synthesis-style 3NF decomposition operates on. A relation can have several minimal covers; any of them works.
How FDs drive normal forms#
Each normal form is a constraint on the allowed FDs:
- 2NF — no non-prime attribute depends on only part of a candidate key. (Forbids partial dependency on composite keys.)
- 3NF — for every non-trivial
X → A, eitherXis a superkey orAis a prime attribute. (Forbids transitive dependency through a non-key.) - BCNF — for every non-trivial
X → A,Xis a superkey. (Strict version of 3NF.) - 4NF — generalises BCNF to multi-valued dependencies.
You cannot evaluate any of these without first writing down the FDs the relation is supposed to satisfy. That’s the whole reason FDs come first.
Variants and trade-offs#
Other angles:
- Multi-valued dependencies (MVDs).
X →→ Ymeans: givenX, the set ofYvalues is independent of the rest. MVDs drive 4NF; they show up when a single relation tries to record two independent multi-valued facts about the same entity. - Inclusion dependencies.
R[X] ⊆ S[Y]— every value ofXinRappears as a value ofYinS. This is what foreign keys enforce. Inclusion dependencies are a separate family from FDs but interact with them in design. - Probabilistic / approximate FDs. “Holds in 99.5% of rows”. Used in data cleaning and quality work, not in schema design. Don’t normalise based on them.
- Embedded FDs. An FD that holds on a projection of a relation but not on the relation itself. Mostly an academic concept; rare in practice.
Worked example — finding candidate keys via closures
Suppose R(A, B, C, D, E) with F = { A → BC, CD → E, B → D, E → A }. To find candidate keys, try minimal supersets and compute closures:
{A}+ = {A, B, C, D, E}✓ (A → BC gives B and C; B → D gives D; CD → E gives E). SoAis a superkey.{E}+ = {E, A, B, C, D}✓ (E → A, then the same chain).Eis a superkey.{B}+ = {B, D}✗ —Balone is not a superkey.{C}+ = {C}✗ — not a superkey.{BC}+ = {B, C, D, E, A}✓ — superkey, and minimal (drop either B or C and it fails).
So the candidate keys are {A}, {E}, {BC}. Multiple candidate keys is common; the choice of primary key among them is a design call.
When this is asked in interviews#
Three shapes. As theory — “what’s a functional dependency?” — the answer is the formal definition plus an example plus how it relates to keys. As a worked problem — “given these FDs, find the candidate keys / decompose to 3NF” — the answer walks closures and applies the synthesis algorithm.
As an applied design question — “where’s the redundancy in this table?” — the candidate is expected to write down the FDs they see, find the one that violates BCNF (or 3NF), and propose a decomposition. Senior interviewers care less about the formal proof and more about whether the candidate can name the offending dependency.
A common trap is reading FDs off the data rather than the domain. “Looking at this sample, name → email” — no, that’s a coincidence in the current rows; the domain doesn’t promise it.
Related concepts#