Spanning Tree Protocol (STP) — Computer Networks

What it is#

Spanning Tree Protocol (STP) is a Layer 2 control protocol that prevents loops on a switched Ethernet network by blocking redundant links. Without it, two switches connected by two cables will infinitely re-flood any broadcast frame back and forth, multiplying the traffic exponentially until the network melts within seconds. STP runs on every modern Ethernet switch by default, even when not strictly required, because the cost of accidentally creating a loop is catastrophic and the cost of running STP is near zero.

Defined originally as IEEE 802.1D in 1990 (Radia Perlman’s design), the protocol elects one switch as the “root bridge”, computes the lowest-cost path from every other switch to that root, and blocks any port not on that path. The blocked ports stay in standby — they wake up and forward traffic only when the active path fails. The result is a loop-free tree spanning every switch in the broadcast domain.

When to use it#

STP runs by default. The real questions are which variant, and where to override its defaults:

Classic STP (802.1D) — original 1990 protocol. Convergence takes 30–50 seconds after a topology change. Mostly replaced by RSTP.
RSTP (802.1w, 2001) — Rapid STP. Convergence in 1–6 seconds. Backwards-compatible with classic STP. The default on every modern switch.
MSTP (802.1s) — Multiple STP. Runs multiple spanning-tree instances, one per VLAN group. Lets traffic on different VLANs use different paths through the network (load balancing).
PVST+ / Rapid-PVST+ (Cisco) — Per-VLAN Spanning Tree. One spanning tree per VLAN. Less efficient than MSTP at scale but simpler to reason about.
No STP at all — modern datacentre leaf-spine fabrics replace STP with Layer 3 routing (eBGP between leaves and spines). Loops at Layer 3 are handled by TTL, not by blocking links.

If you control a flat Layer 2 network larger than a single switch, you need STP (or its replacement). If you’re operating a routed fabric, you’ve designed STP out of the picture.

How it works#

STP is a distance-vector-like protocol where every switch exchanges Bridge Protocol Data Units (BPDUs) on every link. The protocol converges in three phases: root election, path-cost computation, and port-state assignment.

Topology before and after STP#

     Without STP (loop)            With STP (spanning tree)

         +-------+                       +-------+
         |  S1   |  ROOT                 |  S1   | ROOT
         +-------+                       +-------+
        /         \                     /         \
       /           \                   /           \
   +-------+   +-------+           +-------+   +-------+
   |  S2   |---|  S3   |           |  S2   |   |  S3   |
   +-------+   +-------+           +-------+   +-------+
                                            X    (blocked)
                                       (S2-S3 link blocked)

Both diagrams represent the same physical cabling. STP elects S1 as root and blocks one port on the S2-S3 link so the tree has no cycle.

BPDUs#

Every switch sends a BPDU out every active port every 2 seconds (the “hello interval”). The BPDU carries:

Bridge ID — 8 bytes total: 2-byte priority (default 32768) + 6-byte MAC. Lower wins.
Root Bridge ID — the sender’s current belief about who the root is.
Root Path Cost — the cost from the sender to the root.
Sender Bridge ID — who sent this BPDU.
Port ID — which port it went out on.
Timers — Hello (2 s), Max Age (20 s), Forward Delay (15 s).

Root election#

Every switch starts believing it is the root and announces itself in its BPDUs. When a switch hears a BPDU advertising a lower Bridge ID, it accepts the sender’s claim and starts forwarding that BPDU on. Eventually the entire network converges on one root: the switch with the lowest Bridge ID. Operators force a specific switch to be root by lowering its priority (spanning-tree vlan 1 priority 4096).

Port roles#

Once the root is elected, every other switch computes the lowest-cost path to the root and assigns roles to its ports:

Root port (RP) — the single port on this switch closest to the root. Always forwarding.
Designated port (DP) — for each LAN segment, the port from the closest switch to the root. Forwarding.
Blocked / Alternate port — every other port. Receives BPDUs but does not forward data frames.

Costs are pre-set per link speed: 10 Mbps = 100, 100 Mbps = 19, 1 Gbps = 4, 10 Gbps = 2, 100 Gbps = 1 (defaults under the long-cost scheme).

Port-state machine (classic STP)#

Disabled  ──admin up──▶  Blocking
                            │
                     learn BPDUs (20s)
                            │
                            ▼
                        Listening  ─── 15s (Forward Delay) ───┐
                                                              ▼
                                                          Learning ──15s──▶ Forwarding

A new link takes 30–50 seconds to start forwarding traffic in classic STP. This is why a freshly plugged laptop on an enterprise network used to “wait” so long before getting an IP — the switch port was in Listening/Learning before DHCP could traverse it. PortFast (Cisco term, standardised as “edge port” in RSTP) skips this on access ports where no other switch can possibly be plugged in.

RSTP improvements#

RSTP collapses the state machine to three states (Discarding, Learning, Forwarding) and adds proposal/agreement handshakes so a new link can transition to Forwarding in 1–6 seconds instead of 30+. It also adds new port roles (Backup port, Alternate port) so failover doesn’t require re-running the whole algorithm — the alternate path is precomputed.

Topology change#

When a link goes down, the switch that noticed sends a Topology Change Notification (TCN) toward the root. The root floods a topology-change BPDU back out. Every switch ages out its MAC address table aggressively (down from 5 minutes to 15 seconds) so stale entries from the old path get flushed.

Variants#

STP (802.1D, 1990) — the original. 30–50 second convergence.
RSTP (802.1w, 2001) — rapid convergence. Now the default.
MSTP (802.1s, 2002) — multiple spanning-tree instances mapped to VLAN groups. Enables load-sharing across redundant links.
PVST+ (Cisco) — one STP instance per VLAN. CPU-heavy at scale.
Rapid-PVST+ — RSTP per VLAN. Cisco’s default on Catalyst switches.
TRILL / SPB (802.1aq) — replace STP with Layer 2 routing. Used in some datacentre and carrier networks; never went mainstream because Layer 3 leaf-spine won that battle.

Trade-offs#

Classic STP — convergence in 30–50 seconds. Simple, single tree, every blocked link is wasted capacity. Acceptable when topology rarely changes; painful when a flapping link causes repeated convergence.

RSTP / MSTP — convergence in 1–6 seconds. Per-instance trees can put different VLANs on different paths, so a “blocked” link in one tree is forwarding in another. The default for any modern network with redundant uplinks.

Wasted bandwidth. Every redundant link is blocked. A pair of switches connected by two 10 Gbps cables forwards over one and wastes the other. Solutions: MSTP (use both for different VLANs), LACP (bond the two cables into one logical link STP sees as a single port), or replace Layer 2 with routed leaf-spine.
Diameter limit. STP’s Max Age timer caps the network at 7 switch hops. Modern variants extend this, but no STP scales to thousands of switches in one broadcast domain — datacentre fabrics route at Layer 3 instead.
Convergence under flap. A flapping link or unstable BPDU source can cause repeated re-elections. BPDU Guard on access ports drops the port the moment it receives a BPDU (a host should never send one).
Root election fragility. A misconfigured switch with priority 0 announces itself as root, redirecting all traffic suboptimally. Root Guard prevents downstream switches from becoming root.

Common pitfalls#

Disabling STP “for performance”. STP costs nanoseconds. A loop costs everything. Never disable it on a switch with redundant uplinks.
PortFast on a trunk port. PortFast / edge-port assumes the port connects to a host, not another switch. Enabling it on a switch-to-switch link bypasses loop protection. Always pair PortFast with BPDU Guard.
Forgetting BPDU Guard. A user plugging a cheap switch into the office wall can take over root-bridge election with priority 0 (the device’s MAC is low). BPDU Guard shuts down any access port that receives a BPDU.
Mixed STP / RSTP / MSTP domains. Connecting an RSTP switch to an old STP switch falls back to classic-STP timers everywhere. Audit firmware versions before assuming RSTP convergence.
Asymmetric paths under MSTP. Different VLANs taking different paths can confuse troubleshooting tools that assume one path per pair of hosts. Document the MSTP region and which instance carries which VLAN.
STP on a leaf-spine fabric. If you have eBGP between leaves and spines, STP should not be running between them. Some operators leave STP on as a “belt and suspenders” — fine, as long as the BGP-managed Layer 3 ECMP isn’t blocked by an over-eager Layer 2 election.

Why does the root bridge matter so much?

Every frame in the network is forwarded along a path that includes the root or hangs off it. If the root is a slow, congested, or geographically wrong switch, the entire network suffers. Operators almost always force the root to be a high-capacity switch at a topologically central point (typically a core/distribution switch) by setting its priority to 0 or 4096. Letting the network “pick” the root by MAC address means the oldest, slowest switch — the one with the lowest MAC — often ends up running the show.