Domain Name System (DNS)

Hierarchical name resolution, caching, TTL trade-offs, and DNS as a load-balancing primitive.

Building Block Foundational
6 min read
dns networking caching
Companies this resembles: Cloudflare · Route 53 · Google Public DNS · NS1

Use cases#

DNS is the substrate every other distributed system sits on. Three places it earns its keep:

  • Service discovery for usersapi.stripe.com resolves to whichever IPs are healthy and closest.
  • Internal service discovery — Consul, etcd, and Kubernetes’ CoreDNS all expose service endpoints over DNS so that microservices can find each other without a hardcoded registry client.
  • Traffic engineering — geo-routing, weighted routing, latency-based routing, and failover all collapse into “return a different A record” at the authoritative server.

Functional requirements#

  • Resolve a domain name to one or more IPv4 (A) or IPv6 (AAAA) addresses.
  • Support record types beyond A: CNAME (alias), MX (mail), TXT (verification, SPF), SRV (service ports), NS (delegation).
  • Allow operators to publish multiple answers for the same name with weights or geo-rules.
  • Cache responses respecting TTL and nxdomain negative caching.

Non-functional requirements#

  • Latency: a cold resolution should clear in under 50 ms p99; a warm cache hit, sub-millisecond. Every TCP connection starts with a DNS lookup, so this lands on the user-perceived critical path.
  • Availability: 100% is the design target. Public resolvers run anycast across hundreds of POPs precisely because DNS being down means the internet being down.
  • Throughput: Google’s 8.8.8.8 reportedly fields well over a trillion queries per day; authoritative servers for hot zones see hundreds of thousands of QPS.
  • Consistency: eventually consistent within the TTL window. There is no “the answer” — there’s only “an answer that was correct N seconds ago”.

High-level design#

┌─ root NS (.) ───────┐
├─ TLD NS (.com) ─────┤
client ├─ authoritative NS ──┤ edge POP / origin
│ │ (stripe.com) │ │
└─> stub ─> recursive resolver ─────┴──> answer ──┘
(OS) (1.1.1.1, ISP)
cache cache

The stub resolver lives in the OS and talks to one recursive resolver (usually the ISP, or 1.1.1.1 / 8.8.8.8). The recursive resolver walks the hierarchy: root → TLD → authoritative — caching each step. Once a recursive cache is warm, it answers locally and the hierarchy is invisible to the user.

Detailed design#

The resolution walk#

  1. Client asks recursive resolver for api.stripe.com.
  2. Resolver checks its cache. Cache hit → return immediately.
  3. Cache miss → ask a root server. Roots return the .com TLD NS records.
  4. Resolver asks a .com TLD server. It returns the NS records for stripe.com.
  5. Resolver asks stripe.com’s authoritative NS. It returns the A record(s).
  6. Resolver caches each step for the record’s TTL and returns the answer to the client.

A modern DNS client almost never walks the full hierarchy — well-known names live in every recursive cache somewhere on the planet at any moment.

TTL is the central dial#

Lower TTL means faster propagation of changes (failover, IP rotation, geo-rebalance) but more queries against the authoritative servers and more user-visible cold-cache latency. Higher TTL is cheaper and faster steady-state but slower to react.

Anycast#

Public DNS providers announce the same IP (1.1.1.1, 8.8.8.8) from hundreds of POPs via BGP. The internet’s shortest-path routing naturally sends each query to the closest POP, giving sub-10 ms RTT in most metros. Anycast also gives free failover — withdraw the route from a sick POP and traffic shifts to its neighbors within seconds.

DNS as a load-balancing primitive#

The simplest load balancer in existence is round-robin DNS: return all healthy IPs and let the client pick one. The OS resolver typically picks the first; modern resolvers shuffle. Limitations:

  • TTL caching defeats fine-grained traffic shifting.
  • Failed instances stay in the rotation until the TTL expires.
  • Clients with bugs (or aggressive caches) ignore subsequent answers.

This is why production load balancing uses DNS only for coarse decisions (which region, which CDN POP) and a real L4/L7 load balancer for the rest.

Protocols#

Classic DNS runs over UDP/53 (with a TCP/53 fallback for responses > 512 bytes pre-EDNS, > 4096 with EDNS). Modern privacy-aware DNS uses:

  • DoT (DNS over TLS, TCP/853) — encrypts the resolver hop.
  • DoH (DNS over HTTPS, TCP/443) — same but goes over HTTPS, indistinguishable from web traffic.
  • DoQ (DNS over QUIC) — same goals, with UDP-based QUIC’s lower handshake cost.

Negative caching#

NXDOMAIN responses are cached too, governed by the SOA record’s minimum TTL. Without this, a misspelled domain would re-walk the hierarchy on every retry — a classic foot-gun when something repeatedly probes internal-service-typo.local.

Trade-offs#

Short TTL (30-60 s) — fast failover, geo-rebalance reacts within a minute, perfect for active-active setups. Costs more query load on authoritative servers and slows cold connections.
Long TTL (1 hour+) — cheap, hits more recursive caches, cold lookups feel fast. Failover takes the full TTL; a regional outage stays user-visible for up to an hour.

Other dials:

  • Anycast vs unicast authoritative — anycast gives sub-50 ms global p99 but requires BGP peering at every POP. Unicast is simpler but slower for far-away clients.
  • Hosting your own authoritative vs managed (Route 53, NS1, Cloudflare) — managed providers run massive anycast meshes and DDoS scrubbing. Self-hosting is fine for internal zones, risky for public ones.
  • CNAME vs ALIAS/ANAMECNAME cannot coexist with other records at the zone apex, which is why apex domains (example.com) need provider-specific ALIAS records that resolve to A records server-side.

Real-world examples#

  • Cloudflare’s 1.1.1.1 runs anycast across 300+ POPs, signs responses with DNSSEC, and supports DoT/DoH/DoQ. Sub-15 ms median resolution globally.
  • AWS Route 53 integrates health checks: a failing endpoint is automatically pulled from rotation within ~60 s.
  • Netflix uses DNS-based geo-routing to send users to the nearest Open Connect appliance — a CDN built on DNS responses, not URL rewriting.
  • Kubernetes’ CoreDNS answers service.namespace.svc.cluster.local for every service in the cluster; pods get a resolv.conf pointing at the cluster IP of CoreDNS.
  • The 2016 Dyn outage showed the blast radius: a DDoS against a major managed DNS provider took Twitter, Spotify, and GitHub off the public internet despite their own origins being healthy.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.