Intradomain Routing — OSPF
Link-state routing inside an AS, Dijkstra's algorithm, hello protocol, area hierarchies — the routing protocol of enterprises and ISPs.
What it is#
OSPF (Open Shortest Path First, RFC 2328 for v2, RFC 5340 for v3) is the dominant intradomain routing protocol — the one running inside enterprises, ISPs, and many cloud underlay fabrics. It is a link-state protocol: every router floods a description of its local links to every other router in the routing domain, every router builds the same picture of the topology (the Link-State Database, LSDB), and every router runs Dijkstra’s algorithm locally to compute shortest paths.
OSPF runs directly on IP — protocol number 89, no TCP or UDP. It uses multicast (224.0.0.5 for “all OSPF routers”, 224.0.0.6 for “DR routers”) to keep adjacency traffic off broadcast domains. It supports authentication (plaintext, MD5, or SHA on modern implementations), area hierarchies for scaling, and equal-cost multipath (ECMP) for load splitting.
The competitor is IS-IS (Intermediate System to Intermediate System, ISO 10589 / RFC 1142). Conceptually identical (link-state with Dijkstra), but encoded in TLVs over the link layer rather than IP. IS-IS dominates large ISPs and tier-1 backbones; OSPF dominates enterprises and smaller ISPs. The mechanics covered here apply equally to both.
When to use it#
OSPF is the default intradomain choice when:
- The network has more than ~20 routers and needs fast convergence on failures. Distance-vector protocols (RIP) cannot keep up.
- You operate a single administrative domain — one organisation, one set of policies, one IGP. (Multiple ASes need BGP between them.)
- Topology can be split into areas. A backbone (area 0) plus regional areas keeps each router’s LSDB and Dijkstra cost bounded.
- You need ECMP — Dijkstra naturally returns equal-cost paths; OSPF installs them all into the routing table for hash-based load splitting.
- Datacenter underlay. Spine-leaf fabrics often run OSPF (or IS-IS, or BGP-as-IGP) to distribute loopback reachability for an overlay (VXLAN / EVPN) on top.
Choose IS-IS instead when running a large ISP backbone or when you need protocol-agnostic encoding (IS-IS adds IPv6 by extending TLVs, no v3 split). Choose EIGRP only if you are 100% Cisco and accept the lock-in. Choose BGP when crossing AS boundaries.
How it works#
Hello and adjacencies#
Every OSPF interface sends Hello packets to 224.0.0.5 every 10 seconds (default on point-to-point and broadcast networks; 30s on NBMA). A neighbour goes through states Down → Init → 2-Way → ExStart → Exchange → Loading → Full. Reaching Full means LSDBs are synchronised; the neighbour is now a peer for forwarding decisions.
On broadcast networks (Ethernet), OSPF elects a Designated Router (DR) and Backup DR. Other routers form adjacencies only with the DR/BDR — this avoids O(n^2) adjacencies on a LAN with n routers and reduces flooding. The DR is chosen by priority then router-id; ties broken by highest router-id.
Link-State Advertisements (LSAs)#
Each router floods LSAs describing its own links. Other routers store them in the LSDB and re-flood (with split-horizon-on-receive) until every router has every LSA. LSA types you’ll see:
Type Name Purpose1 Router LSA this router's links and their costs2 Network LSA DR-originated, lists routers on a multi-access network3 Summary LSA area-border router (ABR) summarising another area4 ASBR Summary LSA ABR reaching the autonomous-system boundary router5 External LSA ASBR-originated, routes redistributed from outside OSPF7 NSSA External LSA External LSA inside a not-so-stubby areaOSPFv3 (IPv6) restructures LSAs but the concepts map directly.
Dijkstra (the SPF run)#
When the LSDB changes, the router (re)runs Dijkstra’s shortest-path-first algorithm with itself as the root. The output is a shortest-path tree; the router installs the next hops into the routing table.
spf(root): dist[root] = 0 open = {root} closed = {} while open is not empty: v = node in open with smallest dist move v from open to closed for each link (v, w, cost) in LSDB: if w in closed: skip candidate = dist[v] + cost if candidate < dist[w]: dist[w] = candidate next_hop[w] = first hop on path from root via v open.add(w)cost is the OSPF metric, by default reference_bandwidth / interface_bandwidth (Cisco default reference is 100 Mbps, so a 1 Gbps interface gets cost 1, a 10 Gbps interface also 1 — bump the reference for modern fabrics).
Areas — the scaling lever#
+-------------+ | Area 0 | | (backbone) | +-------+-----+ | +------------+------------+ | | | +-------+ +-------+ +-------+ | ABR-1 | | ABR-2 | | ABR-3 | +---+---+ +---+---+ +---+---+ | | | +---+---+ +---+---+ +---+---+ | Area | | Area | | Area | | 1 | | 2 | | 3 | +-------+ +-------+ +-------+Each area is an SPF-isolated zone — internal LSAs (Types 1, 2) stay inside. Area Border Routers (ABRs) summarise area routes into Type 3 LSAs and inject them into Area 0. Every non-backbone area must be connected to Area 0; virtual links exist as a workaround when physical connectivity is impossible.
Practical sizing: 50-100 routers per area is comfortable; 200-300 starts to stress SPF on commodity CPUs. Real-world deployments run tens of areas in Area 0 and 5-10 areas per region.
Special area types#
- Stub area. No External LSAs (Type 5). ABR injects a default route. Reduces LSDB size in branches that do not need full Internet routing.
- Totally stubby. No External, no Inter-area. ABR injects only a default. Cisco-specific extension; even smaller LSDB.
- NSSA (Not-So-Stubby Area). No Type 5, but allows local ASBR injection via Type 7 (translated to Type 5 by ABR). Useful when a stub area also has a small external redistribution.
Variants#
- OSPFv2 (RFC 2328, 1998). The IPv4 version. Runs on IP protocol 89.
- OSPFv3 (RFC 5340, 2008). IPv6 (and dual-stack via RFC 5838). Restructured LSAs, per-link configuration instead of per-subnet, uses link-local addresses for neighbour adjacency.
- IS-IS. Different encoding (TLV over the link layer, not IP), same algorithm (link-state + Dijkstra). Dominates large ISP backbones.
- OSPF over MPLS / Traffic Engineering. OSPF-TE extensions carry bandwidth and admin-group information so RSVP-TE or SR-TE can compute traffic-engineered paths.
- OSPF with Bidirectional Forwarding Detection (BFD). BFD detects link failure in milliseconds instead of waiting for OSPF hello-dead timer (default 40s). Sub-second convergence requires BFD or aggressive timer tuning.
- OSPF-as-IGP-in-the-DC. Some datacenter underlay designs use BGP as the IGP (BGP “unnumbered” or RFC 7938) instead of OSPF — the argument is BGP’s policy expressiveness and a single protocol up and down the stack.
Trade-offs#
Other tensions:
- LSDB size vs visibility. Single-area OSPF is conceptually simple but every router runs Dijkstra over every link in the network. Areas bound this cost but introduce inter-area path issues (sub-optimal routes through the wrong ABR).
- Convergence vs stability. Faster hellos / shorter dead-interval mean faster failure detection, but more false positives on busy CPUs. BFD separates failure detection from the routing protocol cleanly.
- Metric assignment. Default
bw / referencemetric is intuitive but coarse. Modern fabrics with 100 Gbps links need a higher reference bandwidth (1000000Mbps is common) to differentiate link speeds. - Authentication. Plaintext is theatre. MD5 is acceptable. SHA / OSPFv3 IPsec for new deployments. Rogue OSPF speakers can otherwise inject any topology they want.
Why does OSPF need a DR on a broadcast network?
Without a DR, n routers on the same LAN would form O(n^2) adjacencies and flood O(n^2) LSAs every time anything changed. The DR centralises this: every router adjacencies with just the DR (and BDR for resilience). The DR originates a single Type 2 Network LSA describing all routers on the segment, replacing what would otherwise be n separate adjacency LSAs. The cost is the DR election complexity and the corner case of DR loss — the BDR takes over without re-election, and only then does a new BDR get elected.
Common pitfalls#
- Mismatched Hello / Dead intervals. Adjacency stays in
Initor2-Wayforever. Symptoms:show ip ospf neighborshows the neighbour but never reachesFull. Always match timers on both ends. - MTU mismatch. OSPF Database Description packets carry the interface MTU. If they disagree, adjacency hangs in
ExStart/Exchange. Fix the MTU or useip ospf mtu-ignore(cosmetic only). - Area 0 partitioned. Once Area 0 splits, summary LSAs cannot reach across — non-backbone areas on the wrong side become unreachable. Use virtual links as a temporary bridge; redesign the backbone for permanence.
- Redistribution loops. OSPF redistributing from BGP, BGP redistributing back from OSPF — metrics can flip and routes flap. Always set explicit metrics on redistribution and use route filters.
- DR election surprises. A new high-priority router takes over as DR mid-stream; LSDB churn follows. Once a DR is elected, keep it sticky unless it actually fails.
- Cargo-cult timer tuning. Halving hello timers without BFD increases load without much benefit; deploy BFD instead.
- Forgetting authentication. A laptop with a free OSPF daemon plugged into the LAN can inject a default route. Always configure area or interface auth.
- Misunderstanding “cost”. Lower is better. A
cost 1link is preferred over acost 10link. Reverse this intuition and you’ll set policy backward. - Believing OSPF replaces BGP. OSPF is intradomain only. The moment you cross AS boundaries (peering with another organisation), BGP is required.
Related building blocks#