Ethernet — Frame Format, Switches, VLANs
Preamble + dst + src + EtherType + payload + FCS; the switch fabric; VLAN tagging (802.1Q); the spanning tree.
What it is#
Ethernet is the family of IEEE 802.3 standards for wired local-area networking. It defines a frame format, a 48-bit MAC addressing scheme, a physical-layer specification for copper and fibre, and (historically) a medium-access control protocol. Born in 1973 at Xerox PARC over a single coaxial cable shared by many hosts, modern Ethernet is almost unrecognisable from its origin — every host has a dedicated full-duplex link to a switch, collisions never happen, and speeds run from 10 Mbps to 800 Gbps.
What survived the evolution is the frame format. Every Ethernet payload — IP packet, ARP message, LACP control frame — sits between the same header (destination MAC, source MAC, EtherType) and the same trailer (32-bit CRC FCS). That stability is why Ethernet won the LAN: an Ethernet frame on a 1990s 10BASE-T cable and a 2026 datacentre 400GBASE-DR4 link share the bytes between preamble and FCS, even though the physics moving them is unrecognisable.
When to use it#
Ethernet is the default wired LAN. The decisions are which Ethernet:
- 1 Gbps copper (1000BASE-T) — current baseline for offices and homes. Cheap, works over Cat 5e/6 up to 100m.
- 10 Gbps copper (10GBASE-T) — server-to-top-of-rack in older datacentres. Requires Cat 6A and dissipates real heat.
- 10/25/40/100/400/800 Gbps fibre (SFP+, QSFP+, QSFP28, QSFP-DD, OSFP) — modern datacentre fabric. The number on the optic, the connector, and the reach define the deployment.
- 2.5/5 Gbps (NBASE-T) — middle ground for Wi-Fi 6/6E/7 access points whose backhaul exceeds 1 Gbps but doesn’t justify 10 Gbps.
- Power over Ethernet (PoE / PoE+ / PoE++) — same Ethernet frame format, but the cable also carries up to 90W of power. Used for IP phones, cameras, access points.
Wi-Fi replaces Ethernet for client devices. Fibre channel replaces it in some storage networks. Otherwise: if a LAN is wired, it’s Ethernet.
How it works#
A 1500-byte payload, a 14-byte header, a 4-byte trailer, and a 7-byte preamble + 1-byte start-of-frame delimiter make up the on-the-wire frame. The preamble and SFD are stripped by the receiver before the frame reaches the OS, so software-visible Ethernet starts at the destination MAC.
Ethernet II frame layout#
+----------+-----+--------+--------+----------+-------------+-----+----+| Preamble | SFD | Dst | Src | EtherType| Payload | FCS | IFG|| 7 B | 1 B | 6 B | 6 B | 2 B | 46-1500 B | 4 B |12 B|+----------+-----+--------+--------+----------+-------------+-----+----+ |<------------ covered by FCS ------------->|- Preamble + SFD — 7 bytes of
10101010+ 1 byte10101011. Lets the receiver clock-recover and find the frame boundary. - Destination MAC / Source MAC — 6 bytes each. Globally unique (manufacturer’s OUI in the top 3 bytes, device ID in the bottom 3). Broadcast is
FF:FF:FF:FF:FF:FF; multicast addresses have the least-significant bit of the first byte set. - EtherType — 2 bytes naming the upper-layer protocol.
0x0800= IPv4,0x86DD= IPv6,0x0806= ARP,0x8100= 802.1Q tag,0x8809= LACP. - Payload — 46 to 1500 bytes. The 46-byte minimum ensures a frame is long enough that a collision on the longest legal cable run is detected before the sender finishes transmitting. Payloads under 46 bytes are padded with zeros.
- FCS — 32-bit CRC over destination, source, EtherType, and payload. Recomputed by every switch that modifies the frame (e.g. on VLAN tag insertion).
- IFG (Inter-Frame Gap) — 12 bytes of idle line. Not part of the frame; gives the receiver time to handle one frame before the next starts.
Switches#
A switch is a learning bridge. It maintains a MAC address table mapping MAC → outgoing port. When a frame arrives:
- Record
source MAC → ingress port(learning). - Look up
destination MACin the table. - If found, forward out that one port. If not found (or broadcast), flood out every port except the ingress (this is how learning eventually populates the table).
A switch is forwarding-only — it does not modify the source/destination MAC. (Routers do, on every Layer-3 hop.) Switches operate at line rate in hardware ASICs; a modern top-of-rack switch can forward at 12.8 Tbps aggregate with sub-microsecond port-to-port latency.
VLANs and 802.1Q#
A single switch can host multiple isolated broadcast domains by tagging frames with a 12-bit VLAN ID. The 802.1Q tag is a 4-byte insertion between the source MAC and the EtherType:
+--------+--------+---------+---------+----------+| Dst | Src | 0x8100 |TCI(VID) | EtherType| ...| 6 B | 6 B | 2 B | 2 B | 2 B |+--------+--------+---------+---------+----------+The TCI field carries a 3-bit priority code point (PCP), a 1-bit drop-eligible indicator (DEI), and the 12-bit VLAN ID (1–4094; 0 and 4095 reserved). A switch port is configured as either:
- Access port — carries one untagged VLAN. Endpoint hosts plug in here; they have no idea VLANs exist.
- Trunk port — carries many tagged VLANs. Switch-to-switch and switch-to-router uplinks use trunks.
The router-on-a-stick pattern uses one router interface as a trunk; the router routes between VLAN sub-interfaces.
Switched vs hubbed Ethernet#
Variants#
- Ethernet II (DIX) — the default. EtherType field. What everything uses today.
- IEEE 802.3 with LLC/SNAP — replaces EtherType with a length field plus an LLC/SNAP header. Used by some legacy protocols (Spanning Tree BPDUs, AppleTalk). Coexists with Ethernet II — values less than 1536 in the EtherType slot are interpreted as length.
- Jumbo frames — payload up to 9000 bytes. Non-standard but universally supported in datacentres. Cuts per-byte CPU cost by ~5x for large transfers. Must be configured end-to-end; a single 1500-MTU hop fragments or drops jumbo frames.
- 802.1Q tagged — VLAN-tagged Ethernet. The norm in any non-trivial network.
- 802.1ad QinQ — double-tagging. A service-provider outer tag wraps a customer inner tag. Lets ISPs carry customer VLANs without VLAN-ID collisions across customers.
- 802.3ad / LACP — link aggregation. Two or more physical links bonded into one logical link. Hashes flows across members for higher aggregate throughput and link-failure survival.
- PoE (802.3af / at / bt) — Power over Ethernet. Up to 15W / 30W / 90W per port over the same cable.
Trade-offs#
- Frame size. 1500-byte MTU is a 1980s legacy. Cuts TCP throughput on high-bandwidth links because of per-packet overhead (Linux’s TCP stack spends ~1 µs/packet regardless of size). Jumbo frames fix it inside a datacentre; the public Internet is still 1500.
- MAC learning vs flooding. A switch with an overflowing MAC table starts flooding more frames. Used to be a 16k entry limit; modern switches scale to 256k+. Still, a botnet doing MAC spoofing can DoS a small switch by overflowing the table.
- Broadcast domain size. Every host on a VLAN sees every broadcast. ARP requests, DHCP discovers, mDNS — they all flood. Large flat VLANs (>1024 hosts) become broadcast-noisy. Split into smaller VLANs and route between them.
- Switched fabric topology. A tree of switches has predictable forwarding but limited bisection bandwidth. A leaf-spine clos topology gives equal-cost paths and predictable congestion. The modern datacentre default.
- Latency. Cut-through switches start forwarding the moment they read the destination MAC (~200 ns); store-and-forward switches wait for the FCS (~1.5 µs at 1500 bytes on 10 Gbps). HFT shops pay for cut-through; everyone else uses store-and-forward.
Common pitfalls#
- Confusing MAC address with IP address. MAC is a Layer 2 hardware identifier, local to one broadcast domain. IP is a Layer 3 logical identifier, routable across the Internet. ARP bridges them.
- Forgetting EtherType when reading a packet capture.
0x0800is IPv4,0x0806is ARP,0x86DDis IPv6,0x8100is a VLAN tag (read past it for the real EtherType). Without recognising the EtherType you can’t decode the payload. - Untagged VLAN confusion. An access port carries an untagged VLAN; a trunk usually has a “native” untagged VLAN plus tagged ones. Mismatching native VLAN on both ends of a trunk silently merges two broadcast domains.
- Jumbo frames partially configured. Any MTU mismatch on the path drops or fragments jumbo frames. Symptom: small pings work, large pings or bulk transfers fail. Always set MTU end-to-end and run a
ping -s 8972 -M doto verify. - Spanning Tree off. Two switches uplinked to each other twice without STP create a Layer 2 loop. Frames multiply exponentially. The network melts in seconds.
- Storm-control off. A flood from one host (ARP storm, broadcast loop) saturates every port in the broadcast domain. Storm-control rate-limits broadcast/multicast/unknown-unicast per port.
- Trusting source MACs. Anyone with
ip link set dev eth0 address ...can spoof a MAC. 802.1X authenticates the device, not the MAC; without 802.1X, the network trusts whoever asks.
Why 1500 bytes specifically?
The DIX consortium picked 1500 as a balance between two concerns: frames should be short enough that a collision is detected before transmission ends (which constrains the maximum minimum delay on the longest legal cable run) and long enough that header overhead doesn’t dominate. The 1500-byte number is a compromise tuned to a 10 Mbps shared cable from 1980. Every other MTU on the Internet — PPPoE’s 1492, GRE-tunnel 1476, IPsec’s variable hit — is “1500 minus something”. The number is an accident of history that everything else compensates for.
Related building blocks#