NFRs in Interviews — System Design

Summary#

NFRs (non-functional requirements) are the how well questions — availability, latency, throughput, durability, consistency, scale, security, cost. They’re what interviewers grade you on without saying so. Surfacing the right NFRs in the right order, with the right level of numeric specificity, is the single biggest differentiator between a junior and a senior answer.

Why it matters#

Functional requirements are easy — “users can post, like, comment”. The interviewer can grade an FR answer in 30 seconds. NFRs are where the real signal lives, because they force the candidate to commit to numbers that constrain every later choice. Without NFRs, every design is correct in some scale; with NFRs, the design has to be correct at the agreed scale.

The candidate who volunteers “let’s target 99.95% availability, sub-100 ms p99 read, eventual consistency on the timeline” has already done half the design work. The candidate who waits to be asked has signalled they don’t think about NFRs until forced.

How it works#

The NFR checklist (the order to surface them in)#

In step 1 of the walk-through, after FRs are agreed, run this list. Don’t quote all of them — pick the 3–5 that matter for the prompt and put a number on each.

NFR	Default number	Question to ask the “PM”
Availability	99.9% or 99.95%	“What’s the user impact of being down for 5 min vs 5 hours?”
Latency (read p50, p99)	50 ms / 300 ms	”Is this a foreground action or a background sync?”
Latency (write p99)	500 ms	”Does the write block the user?”
Throughput / QPS	derived from DAU × actions/day	(compute, don’t ask)
Durability (RPO)	minutes to zero	”How much data are we willing to lose if the primary dies?”
Recovery time (RTO)	minutes	”How long can the system be in read-only mode?”
Consistency	read-your-writes + monotonic	”Show user their own posts immediately. Other users can lag by seconds.”
Geographic scope	single-region to start	”Where are our users? Single market or global?”
Security / privacy	TLS, auth, PII handling	”Are we storing PII? Regulated data?”
Cost ceiling	rarely numeric in interview	”What’s the order-of-magnitude budget here?”

Converting vague asks to numbers#

“It should be fast” → “p99 read latency under 200 ms; p99 write latency under 500 ms.”

“It should be highly available” → “99.95% — ~4 hours of downtime per year.”

“It should scale” → “Designed for 100k QPS today, 1M QPS in 18 months.”

“It should be reliable” → “RPO of 1 minute on writes; RTO of 5 minutes on failover; zero tolerance for silent data corruption.”

The signal isn’t the number itself — it’s that the candidate insists on a number and ties it to a design choice. “99.95% means multi-AZ with automated failover; 99.99% means multi-region active-active; you tell me which side of that line we’re on.”

NFRs in step 7 (the loop-back)#

The other place NFRs matter is the evaluation step. After the design is on the board, the senior candidate re-states the NFRs and walks through which ones the design hits and which ones it doesn’t:

“We hit 99.95% — multi-AZ covers zone failure. To get 99.99%, we’d need multi-region active-active, and the consistency story changes.”
“P99 read latency is under 100 ms in the happy path; under a cache miss, it’s ~150 ms. P99.9 during deploys spikes to ~500 ms — would that violate the budget?”
“RPO is 1 second from the async replica. If we need zero RPO, we’d need synchronous replication and the write latency budget goes up to ~50 ms.”

This loop-back is what distinguishes a candidate who designed to spec from one who drew boxes and named the boxes.

Variants and trade-offs#

Quote conservative NFRs (three nines, seconds of staleness, 100 ms p99) — gives you room to design simply. Most consumer products genuinely live here. Risk: interviewer pushes for tighter and now you’re retrofitting.

Quote aggressive NFRs (four+ nines, milliseconds of staleness, sub-50 ms p99) — forces sophisticated design choices. Signals you can work at scale. Risk: every component now needs justification at that bar; one weak link tanks the whole answer.

The trap: NFRs that don’t trade off against each other.

Availability vs consistency — strong consistency requires coordination, coordination drops during partitions, so you either fail or you go stale. Pick.
Latency vs durability — synchronous replication adds RTT to every write; async loses writes during failover. Pick.
Latency vs throughput — beyond saturation, batching trades p99 latency for QPS. Pick which side of the knee you’re on.
Cost vs everything — more nines, lower latency, multi-region all cost real money. Numeric NFR targets are also implicit cost commitments.

A senior candidate volunteers these trade-offs unprompted: “We can have low latency or strong consistency on this read path, not both. I’m picking low latency with read-your-writes session guarantees — most reads will look strong to the user without paying the full coordination cost.”

The NFR most candidates forget

Recoverability — how do you get back from a bad state? RTO and RPO cover the happy-path failover, but recoverability also covers: “we shipped a bug that wrote bad data for 6 hours, how do we roll back?” The answer involves backups, point-in-time recovery, idempotent migrations, and audit logs. It almost never comes up unprompted; interviewers respect candidates who surface it.

When this is asked in interviews#

Every system-design interview, at minimum twice — once in step 1 (clarify) and once in step 7 (evaluate). The first time is the candidate proposing targets; the second is grading the design against them.

NFRs are also the most common path through which the interview goes up a level. A solid functional design earns the candidate the mid-level rating. A solid NFR conversation — proactive targets, trade-off awareness, loop-back validation — earns the senior rating. The same candidate can hit either level depending on how they handled NFRs.

Heavier emphasis at:

L5+ FAANG-equivalent bars — NFR rigour is part of the rubric.
Infrastructure / SRE / platform interviews — NFRs are the job.
Companies with regulatory or contractual SLAs (Stripe, Twilio, AWS, telco) — NFRs are externally enforced.

Common follow-ups:

“Why those targets? Defend each number.”
“Walk me through one trade-off you accepted to hit those targets.”
“Which NFR is the design closest to violating? Where’s the weak link?”
“If I doubled the availability target, which design choices change?”