Ethernet — Frame Format, Switches, VLANs

Preamble + dst + src + EtherType + payload + FCS; the switch fabric; VLAN tagging (802.1Q); the spanning tree.

Building Block Foundational
9 min read
ethernet switching vlan layer-2 802-1q

What it is#

Ethernet is the family of IEEE 802.3 standards for wired local-area networking. It defines a frame format, a 48-bit MAC addressing scheme, a physical-layer specification for copper and fibre, and (historically) a medium-access control protocol. Born in 1973 at Xerox PARC over a single coaxial cable shared by many hosts, modern Ethernet is almost unrecognisable from its origin — every host has a dedicated full-duplex link to a switch, collisions never happen, and speeds run from 10 Mbps to 800 Gbps.

What survived the evolution is the frame format. Every Ethernet payload — IP packet, ARP message, LACP control frame — sits between the same header (destination MAC, source MAC, EtherType) and the same trailer (32-bit CRC FCS). That stability is why Ethernet won the LAN: an Ethernet frame on a 1990s 10BASE-T cable and a 2026 datacentre 400GBASE-DR4 link share the bytes between preamble and FCS, even though the physics moving them is unrecognisable.

When to use it#

Ethernet is the default wired LAN. The decisions are which Ethernet:

  • 1 Gbps copper (1000BASE-T) — current baseline for offices and homes. Cheap, works over Cat 5e/6 up to 100m.
  • 10 Gbps copper (10GBASE-T) — server-to-top-of-rack in older datacentres. Requires Cat 6A and dissipates real heat.
  • 10/25/40/100/400/800 Gbps fibre (SFP+, QSFP+, QSFP28, QSFP-DD, OSFP) — modern datacentre fabric. The number on the optic, the connector, and the reach define the deployment.
  • 2.5/5 Gbps (NBASE-T) — middle ground for Wi-Fi 6/6E/7 access points whose backhaul exceeds 1 Gbps but doesn’t justify 10 Gbps.
  • Power over Ethernet (PoE / PoE+ / PoE++) — same Ethernet frame format, but the cable also carries up to 90W of power. Used for IP phones, cameras, access points.

Wi-Fi replaces Ethernet for client devices. Fibre channel replaces it in some storage networks. Otherwise: if a LAN is wired, it’s Ethernet.

How it works#

A 1500-byte payload, a 14-byte header, a 4-byte trailer, and a 7-byte preamble + 1-byte start-of-frame delimiter make up the on-the-wire frame. The preamble and SFD are stripped by the receiver before the frame reaches the OS, so software-visible Ethernet starts at the destination MAC.

Ethernet II frame layout#

+----------+-----+--------+--------+----------+-------------+-----+----+
| Preamble | SFD | Dst | Src | EtherType| Payload | FCS | IFG|
| 7 B | 1 B | 6 B | 6 B | 2 B | 46-1500 B | 4 B |12 B|
+----------+-----+--------+--------+----------+-------------+-----+----+
|<------------ covered by FCS ------------->|
  • Preamble + SFD — 7 bytes of 10101010 + 1 byte 10101011. Lets the receiver clock-recover and find the frame boundary.
  • Destination MAC / Source MAC — 6 bytes each. Globally unique (manufacturer’s OUI in the top 3 bytes, device ID in the bottom 3). Broadcast is FF:FF:FF:FF:FF:FF; multicast addresses have the least-significant bit of the first byte set.
  • EtherType — 2 bytes naming the upper-layer protocol. 0x0800 = IPv4, 0x86DD = IPv6, 0x0806 = ARP, 0x8100 = 802.1Q tag, 0x8809 = LACP.
  • Payload — 46 to 1500 bytes. The 46-byte minimum ensures a frame is long enough that a collision on the longest legal cable run is detected before the sender finishes transmitting. Payloads under 46 bytes are padded with zeros.
  • FCS — 32-bit CRC over destination, source, EtherType, and payload. Recomputed by every switch that modifies the frame (e.g. on VLAN tag insertion).
  • IFG (Inter-Frame Gap) — 12 bytes of idle line. Not part of the frame; gives the receiver time to handle one frame before the next starts.

Switches#

A switch is a learning bridge. It maintains a MAC address table mapping MAC → outgoing port. When a frame arrives:

  1. Record source MAC → ingress port (learning).
  2. Look up destination MAC in the table.
  3. If found, forward out that one port. If not found (or broadcast), flood out every port except the ingress (this is how learning eventually populates the table).

A switch is forwarding-only — it does not modify the source/destination MAC. (Routers do, on every Layer-3 hop.) Switches operate at line rate in hardware ASICs; a modern top-of-rack switch can forward at 12.8 Tbps aggregate with sub-microsecond port-to-port latency.

VLANs and 802.1Q#

A single switch can host multiple isolated broadcast domains by tagging frames with a 12-bit VLAN ID. The 802.1Q tag is a 4-byte insertion between the source MAC and the EtherType:

+--------+--------+---------+---------+----------+
| Dst | Src | 0x8100 |TCI(VID) | EtherType| ...
| 6 B | 6 B | 2 B | 2 B | 2 B |
+--------+--------+---------+---------+----------+

The TCI field carries a 3-bit priority code point (PCP), a 1-bit drop-eligible indicator (DEI), and the 12-bit VLAN ID (1–4094; 0 and 4095 reserved). A switch port is configured as either:

  • Access port — carries one untagged VLAN. Endpoint hosts plug in here; they have no idea VLANs exist.
  • Trunk port — carries many tagged VLANs. Switch-to-switch and switch-to-router uplinks use trunks.

The router-on-a-stick pattern uses one router interface as a trunk; the router routes between VLAN sub-interfaces.

Switched vs hubbed Ethernet#

Hub (deprecated) — repeats every signal out every port. One collision domain shared by all hosts. CSMA/CD arbitrates. Throughput is shared: a 10 Mbps hub with 10 hosts gives ~1 Mbps each under contention.
Switch (modern) — forwards each frame only to the egress port from the MAC table. Per-port collision domain (i.e. none with full-duplex). CSMA/CD never fires. Throughput is per-port: a 10-port 1 Gbps switch can sustain 10 simultaneous 1 Gbps flows.

Variants#

  • Ethernet II (DIX) — the default. EtherType field. What everything uses today.
  • IEEE 802.3 with LLC/SNAP — replaces EtherType with a length field plus an LLC/SNAP header. Used by some legacy protocols (Spanning Tree BPDUs, AppleTalk). Coexists with Ethernet II — values less than 1536 in the EtherType slot are interpreted as length.
  • Jumbo frames — payload up to 9000 bytes. Non-standard but universally supported in datacentres. Cuts per-byte CPU cost by ~5x for large transfers. Must be configured end-to-end; a single 1500-MTU hop fragments or drops jumbo frames.
  • 802.1Q tagged — VLAN-tagged Ethernet. The norm in any non-trivial network.
  • 802.1ad QinQ — double-tagging. A service-provider outer tag wraps a customer inner tag. Lets ISPs carry customer VLANs without VLAN-ID collisions across customers.
  • 802.3ad / LACP — link aggregation. Two or more physical links bonded into one logical link. Hashes flows across members for higher aggregate throughput and link-failure survival.
  • PoE (802.3af / at / bt) — Power over Ethernet. Up to 15W / 30W / 90W per port over the same cable.

Trade-offs#

  • Frame size. 1500-byte MTU is a 1980s legacy. Cuts TCP throughput on high-bandwidth links because of per-packet overhead (Linux’s TCP stack spends ~1 µs/packet regardless of size). Jumbo frames fix it inside a datacentre; the public Internet is still 1500.
  • MAC learning vs flooding. A switch with an overflowing MAC table starts flooding more frames. Used to be a 16k entry limit; modern switches scale to 256k+. Still, a botnet doing MAC spoofing can DoS a small switch by overflowing the table.
  • Broadcast domain size. Every host on a VLAN sees every broadcast. ARP requests, DHCP discovers, mDNS — they all flood. Large flat VLANs (>1024 hosts) become broadcast-noisy. Split into smaller VLANs and route between them.
  • Switched fabric topology. A tree of switches has predictable forwarding but limited bisection bandwidth. A leaf-spine clos topology gives equal-cost paths and predictable congestion. The modern datacentre default.
  • Latency. Cut-through switches start forwarding the moment they read the destination MAC (~200 ns); store-and-forward switches wait for the FCS (~1.5 µs at 1500 bytes on 10 Gbps). HFT shops pay for cut-through; everyone else uses store-and-forward.

Common pitfalls#

  • Confusing MAC address with IP address. MAC is a Layer 2 hardware identifier, local to one broadcast domain. IP is a Layer 3 logical identifier, routable across the Internet. ARP bridges them.
  • Forgetting EtherType when reading a packet capture. 0x0800 is IPv4, 0x0806 is ARP, 0x86DD is IPv6, 0x8100 is a VLAN tag (read past it for the real EtherType). Without recognising the EtherType you can’t decode the payload.
  • Untagged VLAN confusion. An access port carries an untagged VLAN; a trunk usually has a “native” untagged VLAN plus tagged ones. Mismatching native VLAN on both ends of a trunk silently merges two broadcast domains.
  • Jumbo frames partially configured. Any MTU mismatch on the path drops or fragments jumbo frames. Symptom: small pings work, large pings or bulk transfers fail. Always set MTU end-to-end and run a ping -s 8972 -M do to verify.
  • Spanning Tree off. Two switches uplinked to each other twice without STP create a Layer 2 loop. Frames multiply exponentially. The network melts in seconds.
  • Storm-control off. A flood from one host (ARP storm, broadcast loop) saturates every port in the broadcast domain. Storm-control rate-limits broadcast/multicast/unknown-unicast per port.
  • Trusting source MACs. Anyone with ip link set dev eth0 address ... can spoof a MAC. 802.1X authenticates the device, not the MAC; without 802.1X, the network trusts whoever asks.
Why 1500 bytes specifically?

The DIX consortium picked 1500 as a balance between two concerns: frames should be short enough that a collision is detected before transmission ends (which constrains the maximum minimum delay on the longest legal cable run) and long enough that header overhead doesn’t dominate. The 1500-byte number is a compromise tuned to a 10 Mbps shared cable from 1980. Every other MTU on the Internet — PPPoE’s 1492, GRE-tunnel 1476, IPsec’s variable hit — is “1500 minus something”. The number is an accident of history that everything else compensates for.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.