Project — A Distance-Vector (RIP) Routing Simulator

Build a small network of router processes that exchange distance vectors, converge to shortest paths, and survive a link failure.

Building Block Intermediate
11 min read
project python routing rip distance-vector

What it is#

A simulator of RIPv2-style distance-vector routing. Each router is a separate process (or thread) that knows only its direct neighbours and the cost to reach them. Routers periodically exchange their full distance vector — a table of {destination: cost} — with each neighbour. On receipt, a router applies the Bellman-Ford update: for each destination D in the neighbour’s vector, the candidate cost via that neighbour is link_cost + neighbour_says_about_D, and the router keeps the minimum across all neighbours.

The project teaches three things in one go: how a distributed shortest-path algorithm converges with no global view, why the count-to-infinity problem is fundamental to distance-vector, and how split-horizon (or poison reverse) tames it. Implementing it makes the algorithm’s mechanics — which mostly look like accountancy — much harder to forget than reading about them.

When to use it#

Build this when you want to:

  • Watch convergence happen step by step. Print every vector exchange and you can literally trace the algorithm forward by eye.
  • See count-to-infinity with your own data. Drop a link and watch two routers update each other’s bad information up toward the configured infinity value. Add split-horizon and watch it stop.
  • Compare with the link-state project. Same topology, same shortest paths, very different mechanics. Build both and the design trade-off becomes obvious.
  • Understand RIP’s timer constants. 30 s update interval, 180 s route-expiry, 16 as infinity — they all make sense once you see what they’re protecting against.

Skip this and use a real routing daemon (FRR, BIRD) when you want to actually route packets between Linux network namespaces or VMs. The simulator is for learning the algorithm, not running production routes.

How it works#

Topology#

A 4-node cycle: A — B — C — D — A. Every link cost is 1 for simplicity. The shortest paths look like:

1
A ───── B
│ │
1 │ │ 1
│ │
D ───── C
1

From A: B=1 (via B), C=2 (via B or D), D=1 (via D).

Every router will independently arrive at this view by exchanging vectors. Then we’ll cut the A-B link and watch it converge again.

The simulator#

"""rip_sim.py — a distance-vector routing simulator.
Routers run as threads. They exchange vectors over an in-process queue
(a stand-in for UDP/520). Set SPLIT_HORIZON=True to enable the loop-breaker
and watch count-to-infinity disappear.
"""
import queue
import threading
import time
from dataclasses import dataclass, field
INFINITY = 16 # RIP's "unreachable" marker
UPDATE_INTERVAL = 2.0 # condensed from RIP's real 30 s, for fast demos
EXPIRY_SECONDS = 8.0 # neighbour silent this long --> mark routes via it infinite
SPLIT_HORIZON = True
@dataclass
class Route:
cost: int
next_hop: str | None
last_updated: float = field(default_factory=time.monotonic)
class Router(threading.Thread):
def __init__(self, name: str, links: dict[str, int], inbox: queue.Queue) -> None:
super().__init__(daemon=True)
self.name = name
self.links = dict(links) # neighbour -> direct link cost
self.inbox = inbox
self.neighbour_inboxes: dict[str, queue.Queue] = {}
self.table: dict[str, Route] = {name: Route(0, None)}
for n, c in self.links.items():
self.table[n] = Route(c, n)
self.last_seen: dict[str, float] = {n: time.monotonic() for n in self.links}
self.lock = threading.Lock()
self.running = True
def attach(self, neighbour: str, inbox: queue.Queue) -> None:
self.neighbour_inboxes[neighbour] = inbox
def vector_for(self, neighbour: str) -> dict[str, int]:
"""Return the vector to advertise to `neighbour` (with optional split-horizon)."""
out: dict[str, int] = {}
with self.lock:
for dest, route in self.table.items():
if SPLIT_HORIZON and route.next_hop == neighbour and dest != neighbour:
out[dest] = INFINITY # poison reverse: tell N you can't reach D via N
else:
out[dest] = min(route.cost, INFINITY)
return out
def send_updates(self) -> None:
for n, inbox in self.neighbour_inboxes.items():
inbox.put(('update', self.name, self.vector_for(n)))
def expire_stale(self) -> None:
now = time.monotonic()
changed = False
with self.lock:
for n, last in list(self.last_seen.items()):
if now - last > EXPIRY_SECONDS:
# neighbour gone silent -- mark all routes via it as infinite
for dest, route in list(self.table.items()):
if route.next_hop == n and route.cost < INFINITY:
route.cost = INFINITY
route.last_updated = now
changed = True
if changed:
self.send_updates() # triggered update
def absorb(self, sender: str, vector: dict[str, int]) -> None:
now = time.monotonic()
changed = False
with self.lock:
self.last_seen[sender] = now
link_cost = self.links.get(sender, INFINITY)
for dest, advertised in vector.items():
if dest == self.name:
continue
candidate = min(link_cost + advertised, INFINITY)
existing = self.table.get(dest)
if existing is None:
self.table[dest] = Route(candidate, sender, now)
changed = changed or candidate < INFINITY
elif existing.next_hop == sender:
# always trust the current next-hop, even if cost worsened
if existing.cost != candidate:
existing.cost = candidate
existing.last_updated = now
changed = True
elif candidate < existing.cost:
existing.cost = candidate
existing.next_hop = sender
existing.last_updated = now
changed = True
if changed:
self.send_updates() # triggered update on change
def break_link(self, neighbour: str) -> None:
"""Simulate a link failure: stop sending to N and forget routes via N."""
with self.lock:
self.links.pop(neighbour, None)
self.neighbour_inboxes.pop(neighbour, None)
for dest, route in self.table.items():
if route.next_hop == neighbour:
route.cost = INFINITY
self.send_updates()
def run(self) -> None:
next_update = time.monotonic() + UPDATE_INTERVAL
self.send_updates()
while self.running:
timeout = max(0.0, next_update - time.monotonic())
try:
msg = self.inbox.get(timeout=timeout)
except queue.Empty:
msg = None
if msg is not None:
kind, sender, payload = msg
if kind == 'update':
self.absorb(sender, payload)
if time.monotonic() >= next_update:
self.expire_stale()
self.send_updates()
next_update = time.monotonic() + UPDATE_INTERVAL
def snapshot(self) -> list[tuple[str, int, str | None]]:
with self.lock:
return sorted(
(d, r.cost, r.next_hop) for d, r in self.table.items()
)
def build_ring() -> dict[str, Router]:
topology = {
'A': {'B': 1, 'D': 1},
'B': {'A': 1, 'C': 1},
'C': {'B': 1, 'D': 1},
'D': {'A': 1, 'C': 1},
}
inboxes = {n: queue.Queue() for n in topology}
routers = {n: Router(n, links, inboxes[n]) for n, links in topology.items()}
for n, r in routers.items():
for m in r.links:
r.attach(m, inboxes[m])
for r in routers.values():
r.start()
return routers
def print_tables(routers: dict[str, Router]) -> None:
print('-' * 40)
for name in sorted(routers):
rows = [f'{d}:{c}/{nh or "-"}' for d, c, nh in routers[name].snapshot()]
print(f' {name} ' + ' '.join(rows))
if __name__ == '__main__':
routers = build_ring()
time.sleep(6) # let it converge
print('converged on the ring:')
print_tables(routers)
print('\nbreaking A <-> B link...')
routers['A'].break_link('B')
routers['B'].break_link('A')
time.sleep(6)
print('after link failure:')
print_tables(routers)

Run it:

converged on the ring:
A A:0/- B:1/B C:2/B D:1/D
B A:1/A B:0/- C:1/C D:2/A
C B:1/B C:0/- D:1/D A:2/B
D A:1/A B:2/A C:1/C D:0/-
breaking A <-> B link...
after link failure:
A A:0/- B:3/D C:2/D D:1/D
B A:3/C B:0/- C:1/C D:2/C
C B:1/B C:0/- D:1/D A:2/D
D A:1/A B:2/C C:1/C D:0/-

A’s cost to B was 1 via B; now it’s 3 via D → C → B. Symmetric for B. The other two tables barely change.

Why the code does what it does#

  • Triggered updates on change. RIP says “send your vector every 30 s, and also immediately on change”. The simulator does both — periodic in run(), triggered in absorb and break_link. Triggered updates are the difference between 90 s and 3 s recovery from a failure.
  • existing.next_hop == sender branch in absorb. Always trust the current next hop, even if its advertised cost goes up. Without this, a router would refuse to accept the bad news that its preferred path got worse and would happily forward to a black hole.
  • Split-horizon with poison reverse. When advertising vector to neighbour N, every route whose next hop is N gets advertised as INFINITY. Without this, A tells D “I can reach B at cost 1”, D tells A “I can reach B at cost 2 (via A)”, and on a link failure they ping-pong cost upward until both hit 16.
  • Expiry timer. If a neighbour stops sending updates for 8 s (the scaled-down version of RIP’s 180 s), every route via that neighbour is forced to infinity and a triggered update goes out. Without this, dead neighbours stay in the routing table forever.
  • INFINITY = 16. RIP defines 16 as “unreachable”. The small upper bound is what makes count-to-infinity terminate in finite time — without it, two routers in a loop would keep incrementing forever.

Watching count-to-infinity#

Set SPLIT_HORIZON = False at the top, rerun, and break a link. You’ll see the cost climb 2, 3, 4 … 15, 16 across several updates before the routes finally stabilise as “unreachable”. With SPLIT_HORIZON = True the cost jumps cleanly to the new value (or to INFINITY if no alternative exists) and stays there.

Variants#

  • Real UDP transport. Replace the in-process queues with UDP sockets on different ports per router. Now the simulator survives across machines. Watch for datagram loss: triggered updates may need a small retry.
  • Multiple link costs. Use {'B': 1, 'D': 4} so the ring is asymmetric and the shortest paths are no longer obvious. Convergence still works; the printed tables become more interesting.
  • Poison reverse without split-horizon. Advertise the route with cost INFINITY only on the reverse link (instead of omitting it). Slightly more bandwidth but identical effect.
  • RIPng / RIPv2 message formats. Encode the vector as a real RIPv2 PDU (UDP port 520, 25-route limit per packet). The simulator becomes interoperable with frr-rip.
  • Hold-down timers. When a route goes to infinity, refuse updates for that destination for a fixed window. Trades convergence speed for resistance to flapping.
  • Failure injection driver. Add a small REPL that lets you break A B, restore A B, setcost A B 5 interactively. Great for teaching.

Trade-offs#

Distance-vector — every router stores only its own table (O(N) memory) and sends only its own vector (O(N) bandwidth per update). Algorithm is dead simple: a min over neighbours. But convergence is slow because bad news has to propagate hop-by-hop, and the count-to-infinity problem is fundamental to the design.
Link-state (OSPF) — every router floods its link-state advertisement and stores the full topology (O(N+E) memory and bandwidth). Then runs Dijkstra locally. No count-to-infinity, faster convergence, but more memory, more bandwidth, more complex protocol machinery.

Other tensions worth weighing:

  • Update interval. Short (1 s) detects failures fast but wastes bandwidth on a stable network. Long (60 s) is efficient but a failure can blackhole traffic for a minute.
  • Infinity value. RIP picks 16 because it limits networks to 15 hops — small enough to terminate count-to-infinity in seconds, large enough for the autonomous systems of 1988. Today it’s a hard limit.
  • Split-horizon vs poison reverse. Same effect in two-router loops; different bandwidth profile (split-horizon omits, poison reverse advertises infinity). Both fail to prevent loops on three or more routers — that’s why RIP also has hold-down timers.
Why does RIP have a hop-count limit of 15 if the algorithm has no such requirement?

The limit exists exactly because of count-to-infinity. The algorithm itself happily handles any number of hops; what it doesn’t handle is a routing loop. A small INFINITY is RIP’s escape hatch: the longest legitimate path is 15 hops, so any cost of 16 is “definitely not real, drop it”. Larger INFINITY values would mean longer count-to-infinity episodes during failures.

Common pitfalls#

  • Forgetting the next_hop == sender rule. A router that ignores its current next hop’s worsening news will route through a stale path forever. Easy bug; subtle effects.
  • Not sending triggered updates. Wait for the periodic 30 s tick and a single failure can blackhole traffic for the full interval. Always trigger on change.
  • Missing the expiry timer. If a neighbour just stops sending (cable yanked), no message ever arrives to mark its routes infinite. The expiry timer is what catches this.
  • Mutating the table outside the lock. Two threads write to self.table (the receive path and the expiry path). Without the lock you’ll see intermittent KeyError or stale next_hop values.
  • Advertising the same vector to all neighbours. Without split-horizon, count-to-infinity is unavoidable for any link failure in a loop. Compute a per-neighbour vector.
  • Hard-coding INFINITY = 999. RIP defines INFINITY = 16 for a reason. A larger value extends count-to-infinity from seconds to minutes.
  • Treating “no update from neighbour” as “neighbour is fine”. Only an expired timer tells you a neighbour is gone. Missing this is how routes via dead neighbours stay in the table indefinitely.
  • Comparing costs without saturating at INFINITY. 15 + 1 = 16, which is fine; 15 + 5 = 20, which is wrong — the route is unreachable, not “cost 20”. Always min(candidate, INFINITY).
  • Sending the local entry (self.name: 0) without thinking. Other routers add link cost and conclude “I can reach you at cost link_cost”. That’s correct and intentional — but easy to filter out by mistake.
  • No daemon=True on threads. SIGINT then leaves zombie router threads running. Either mark them daemon or implement a clean stop.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.