Project — A Distance-Vector (RIP) Routing Simulator
Build a small network of router processes that exchange distance vectors, converge to shortest paths, and survive a link failure.
What it is#
A simulator of RIPv2-style distance-vector routing. Each router is a separate process (or thread) that knows only its direct neighbours and the cost to reach them. Routers periodically exchange their full distance vector — a table of {destination: cost} — with each neighbour. On receipt, a router applies the Bellman-Ford update: for each destination D in the neighbour’s vector, the candidate cost via that neighbour is link_cost + neighbour_says_about_D, and the router keeps the minimum across all neighbours.
The project teaches three things in one go: how a distributed shortest-path algorithm converges with no global view, why the count-to-infinity problem is fundamental to distance-vector, and how split-horizon (or poison reverse) tames it. Implementing it makes the algorithm’s mechanics — which mostly look like accountancy — much harder to forget than reading about them.
When to use it#
Build this when you want to:
- Watch convergence happen step by step. Print every vector exchange and you can literally trace the algorithm forward by eye.
- See count-to-infinity with your own data. Drop a link and watch two routers update each other’s bad information up toward the configured infinity value. Add split-horizon and watch it stop.
- Compare with the link-state project. Same topology, same shortest paths, very different mechanics. Build both and the design trade-off becomes obvious.
- Understand RIP’s timer constants. 30 s update interval, 180 s route-expiry, 16 as infinity — they all make sense once you see what they’re protecting against.
Skip this and use a real routing daemon (FRR, BIRD) when you want to actually route packets between Linux network namespaces or VMs. The simulator is for learning the algorithm, not running production routes.
How it works#
Topology#
A 4-node cycle: A — B — C — D — A. Every link cost is 1 for simplicity. The shortest paths look like:
1 A ───── B │ │ 1 │ │ 1 │ │ D ───── C 1From A: B=1 (via B), C=2 (via B or D), D=1 (via D).
Every router will independently arrive at this view by exchanging vectors. Then we’ll cut the A-B link and watch it converge again.
The simulator#
"""rip_sim.py — a distance-vector routing simulator.
Routers run as threads. They exchange vectors over an in-process queue(a stand-in for UDP/520). Set SPLIT_HORIZON=True to enable the loop-breakerand watch count-to-infinity disappear."""import queueimport threadingimport timefrom dataclasses import dataclass, field
INFINITY = 16 # RIP's "unreachable" markerUPDATE_INTERVAL = 2.0 # condensed from RIP's real 30 s, for fast demosEXPIRY_SECONDS = 8.0 # neighbour silent this long --> mark routes via it infiniteSPLIT_HORIZON = True
@dataclassclass Route: cost: int next_hop: str | None last_updated: float = field(default_factory=time.monotonic)
class Router(threading.Thread): def __init__(self, name: str, links: dict[str, int], inbox: queue.Queue) -> None: super().__init__(daemon=True) self.name = name self.links = dict(links) # neighbour -> direct link cost self.inbox = inbox self.neighbour_inboxes: dict[str, queue.Queue] = {} self.table: dict[str, Route] = {name: Route(0, None)} for n, c in self.links.items(): self.table[n] = Route(c, n) self.last_seen: dict[str, float] = {n: time.monotonic() for n in self.links} self.lock = threading.Lock() self.running = True
def attach(self, neighbour: str, inbox: queue.Queue) -> None: self.neighbour_inboxes[neighbour] = inbox
def vector_for(self, neighbour: str) -> dict[str, int]: """Return the vector to advertise to `neighbour` (with optional split-horizon).""" out: dict[str, int] = {} with self.lock: for dest, route in self.table.items(): if SPLIT_HORIZON and route.next_hop == neighbour and dest != neighbour: out[dest] = INFINITY # poison reverse: tell N you can't reach D via N else: out[dest] = min(route.cost, INFINITY) return out
def send_updates(self) -> None: for n, inbox in self.neighbour_inboxes.items(): inbox.put(('update', self.name, self.vector_for(n)))
def expire_stale(self) -> None: now = time.monotonic() changed = False with self.lock: for n, last in list(self.last_seen.items()): if now - last > EXPIRY_SECONDS: # neighbour gone silent -- mark all routes via it as infinite for dest, route in list(self.table.items()): if route.next_hop == n and route.cost < INFINITY: route.cost = INFINITY route.last_updated = now changed = True if changed: self.send_updates() # triggered update
def absorb(self, sender: str, vector: dict[str, int]) -> None: now = time.monotonic() changed = False with self.lock: self.last_seen[sender] = now link_cost = self.links.get(sender, INFINITY) for dest, advertised in vector.items(): if dest == self.name: continue candidate = min(link_cost + advertised, INFINITY) existing = self.table.get(dest) if existing is None: self.table[dest] = Route(candidate, sender, now) changed = changed or candidate < INFINITY elif existing.next_hop == sender: # always trust the current next-hop, even if cost worsened if existing.cost != candidate: existing.cost = candidate existing.last_updated = now changed = True elif candidate < existing.cost: existing.cost = candidate existing.next_hop = sender existing.last_updated = now changed = True if changed: self.send_updates() # triggered update on change
def break_link(self, neighbour: str) -> None: """Simulate a link failure: stop sending to N and forget routes via N.""" with self.lock: self.links.pop(neighbour, None) self.neighbour_inboxes.pop(neighbour, None) for dest, route in self.table.items(): if route.next_hop == neighbour: route.cost = INFINITY self.send_updates()
def run(self) -> None: next_update = time.monotonic() + UPDATE_INTERVAL self.send_updates() while self.running: timeout = max(0.0, next_update - time.monotonic()) try: msg = self.inbox.get(timeout=timeout) except queue.Empty: msg = None if msg is not None: kind, sender, payload = msg if kind == 'update': self.absorb(sender, payload) if time.monotonic() >= next_update: self.expire_stale() self.send_updates() next_update = time.monotonic() + UPDATE_INTERVAL
def snapshot(self) -> list[tuple[str, int, str | None]]: with self.lock: return sorted( (d, r.cost, r.next_hop) for d, r in self.table.items() )
def build_ring() -> dict[str, Router]: topology = { 'A': {'B': 1, 'D': 1}, 'B': {'A': 1, 'C': 1}, 'C': {'B': 1, 'D': 1}, 'D': {'A': 1, 'C': 1}, } inboxes = {n: queue.Queue() for n in topology} routers = {n: Router(n, links, inboxes[n]) for n, links in topology.items()} for n, r in routers.items(): for m in r.links: r.attach(m, inboxes[m]) for r in routers.values(): r.start() return routers
def print_tables(routers: dict[str, Router]) -> None: print('-' * 40) for name in sorted(routers): rows = [f'{d}:{c}/{nh or "-"}' for d, c, nh in routers[name].snapshot()] print(f' {name} ' + ' '.join(rows))
if __name__ == '__main__': routers = build_ring() time.sleep(6) # let it converge print('converged on the ring:') print_tables(routers)
print('\nbreaking A <-> B link...') routers['A'].break_link('B') routers['B'].break_link('A') time.sleep(6) print('after link failure:') print_tables(routers)Run it:
converged on the ring: A A:0/- B:1/B C:2/B D:1/D B A:1/A B:0/- C:1/C D:2/A C B:1/B C:0/- D:1/D A:2/B D A:1/A B:2/A C:1/C D:0/-
breaking A <-> B link...after link failure: A A:0/- B:3/D C:2/D D:1/D B A:3/C B:0/- C:1/C D:2/C C B:1/B C:0/- D:1/D A:2/D D A:1/A B:2/C C:1/C D:0/-A’s cost to B was 1 via B; now it’s 3 via D → C → B. Symmetric for B. The other two tables barely change.
Why the code does what it does#
- Triggered updates on change. RIP says “send your vector every 30 s, and also immediately on change”. The simulator does both — periodic in
run(), triggered inabsorbandbreak_link. Triggered updates are the difference between 90 s and 3 s recovery from a failure. existing.next_hop == senderbranch inabsorb. Always trust the current next hop, even if its advertised cost goes up. Without this, a router would refuse to accept the bad news that its preferred path got worse and would happily forward to a black hole.- Split-horizon with poison reverse. When advertising vector to neighbour N, every route whose next hop is N gets advertised as
INFINITY. Without this, A tells D “I can reach B at cost 1”, D tells A “I can reach B at cost 2 (via A)”, and on a link failure they ping-pong cost upward until both hit 16. - Expiry timer. If a neighbour stops sending updates for 8 s (the scaled-down version of RIP’s 180 s), every route via that neighbour is forced to infinity and a triggered update goes out. Without this, dead neighbours stay in the routing table forever.
INFINITY = 16. RIP defines 16 as “unreachable”. The small upper bound is what makes count-to-infinity terminate in finite time — without it, two routers in a loop would keep incrementing forever.
Watching count-to-infinity#
Set SPLIT_HORIZON = False at the top, rerun, and break a link. You’ll see the cost climb 2, 3, 4 … 15, 16 across several updates before the routes finally stabilise as “unreachable”. With SPLIT_HORIZON = True the cost jumps cleanly to the new value (or to INFINITY if no alternative exists) and stays there.
Variants#
- Real UDP transport. Replace the in-process queues with UDP sockets on different ports per router. Now the simulator survives across machines. Watch for datagram loss: triggered updates may need a small retry.
- Multiple link costs. Use
{'B': 1, 'D': 4}so the ring is asymmetric and the shortest paths are no longer obvious. Convergence still works; the printed tables become more interesting. - Poison reverse without split-horizon. Advertise the route with cost
INFINITYonly on the reverse link (instead of omitting it). Slightly more bandwidth but identical effect. - RIPng / RIPv2 message formats. Encode the vector as a real RIPv2 PDU (UDP port 520, 25-route limit per packet). The simulator becomes interoperable with
frr-rip. - Hold-down timers. When a route goes to infinity, refuse updates for that destination for a fixed window. Trades convergence speed for resistance to flapping.
- Failure injection driver. Add a small REPL that lets you
break A B,restore A B,setcost A B 5interactively. Great for teaching.
Trade-offs#
Other tensions worth weighing:
- Update interval. Short (1 s) detects failures fast but wastes bandwidth on a stable network. Long (60 s) is efficient but a failure can blackhole traffic for a minute.
- Infinity value. RIP picks 16 because it limits networks to 15 hops — small enough to terminate count-to-infinity in seconds, large enough for the autonomous systems of 1988. Today it’s a hard limit.
- Split-horizon vs poison reverse. Same effect in two-router loops; different bandwidth profile (split-horizon omits, poison reverse advertises infinity). Both fail to prevent loops on three or more routers — that’s why RIP also has hold-down timers.
Why does RIP have a hop-count limit of 15 if the algorithm has no such requirement?
The limit exists exactly because of count-to-infinity. The algorithm itself happily handles any number of hops; what it doesn’t handle is a routing loop. A small INFINITY is RIP’s escape hatch: the longest legitimate path is 15 hops, so any cost of 16 is “definitely not real, drop it”. Larger INFINITY values would mean longer count-to-infinity episodes during failures.
Common pitfalls#
- Forgetting the
next_hop == senderrule. A router that ignores its current next hop’s worsening news will route through a stale path forever. Easy bug; subtle effects. - Not sending triggered updates. Wait for the periodic 30 s tick and a single failure can blackhole traffic for the full interval. Always trigger on change.
- Missing the expiry timer. If a neighbour just stops sending (cable yanked), no message ever arrives to mark its routes infinite. The expiry timer is what catches this.
- Mutating the table outside the lock. Two threads write to
self.table(the receive path and the expiry path). Without the lock you’ll see intermittent KeyError or stalenext_hopvalues. - Advertising the same vector to all neighbours. Without split-horizon, count-to-infinity is unavoidable for any link failure in a loop. Compute a per-neighbour vector.
- Hard-coding
INFINITY = 999. RIP defines INFINITY = 16 for a reason. A larger value extends count-to-infinity from seconds to minutes. - Treating “no update from neighbour” as “neighbour is fine”. Only an expired timer tells you a neighbour is gone. Missing this is how routes via dead neighbours stay in the table indefinitely.
- Comparing costs without saturating at INFINITY.
15 + 1 = 16, which is fine;15 + 5 = 20, which is wrong — the route is unreachable, not “cost 20”. Alwaysmin(candidate, INFINITY). - Sending the local entry (
self.name: 0) without thinking. Other routers add link cost and conclude “I can reach you at cost link_cost”. That’s correct and intentional — but easy to filter out by mistake. - No
daemon=Trueon threads. SIGINT then leaves zombie router threads running. Either mark them daemon or implement a clean stop.
Related building blocks#