Encapsulation, Headers, and the Envelope Metaphor

How a packet picks up headers on the way down and sheds them on the way up — the mental model for every layered protocol.

Concept Foundational
7 min read
encapsulation headers layered-model packet-anatomy

Summary#

Encapsulation is what every layered network protocol actually does: each layer wraps the data it receives from the layer above in its own header (and sometimes a trailer), then hands the result to the layer below. On the receiving side, each layer strips its own header and hands the payload up. The data on the wire is a stack of envelopes, each addressed to a different participant.

The envelope metaphor is exact, not loose. The link-layer envelope (Ethernet frame) is addressed to the next-hop router or switch. Open it: inside is an IP envelope, addressed to the final destination host. Open that: inside is a TCP envelope, addressed to a port (a process) on that host. Open that: inside is the application payload — the HTTP request, the DNS query, the SMTP mail. Each envelope is opened by, and only by, the participant it was addressed to.

Why it matters#

Every networking debug session is “which envelope, and what’s wrong with it.” A packet that gets dropped at a firewall has the wrong IP destination or the wrong TCP port. A packet that loops has a TTL the IP layer is decrementing. A packet that fragments has an MTU mismatch at the link layer. Naming the envelope locates the bug.

It also matters for performance reasoning. Headers are overhead — bytes the wire carries that aren’t your payload. For a 1500-byte Ethernet MTU, the headers (14 + 20 + 20 = 54 bytes for Ethernet + IPv4 + TCP) consume roughly 3.6% of every frame. With IPv6 + TCP options the share grows. With short packets (a TCP ACK is ~66 bytes total) the headers dominate. This is why protocols like TLS 1.3 fight for every header byte, and why batched/streamed protocols beat request-per-message protocols on small data.

Finally, encapsulation explains how tunnels work — VPNs, GRE, MPLS, VXLAN. A tunnel is an outer encapsulation that carries an inner packet across an intermediate network as if it were payload. The model handles tunnels by simple recursion.

How it works#

The send path#

The sender’s stack walks layers top-down. At each step the layer prepends its own header (and sometimes appends a trailer):

Application: [ HTTP request bytes ]
Transport: [ TCP hdr | HTTP bytes ]
Internet: [ IP hdr | TCP hdr | HTTP bytes ]
Link: [ Eth hdr | IP hdr | TCP hdr | HTTP | FCS ]
^---- trailer
On the wire: electrical/optical/RF signal of the bottom row

Each layer’s header carries exactly the information the peer at the same layer needs:

  • Ethernet — source MAC, destination MAC, EtherType (what protocol is inside: 0x0800 for IPv4, 0x86DD for IPv6, 0x0806 for ARP). Followed by a CRC trailer (FCS) for error detection.
  • IP — source IP, destination IP, protocol field (6 for TCP, 17 for UDP, 1 for ICMP), TTL, fragmentation fields, header checksum.
  • TCP — source port, destination port, sequence and ack numbers, flags, window, options.
  • Application — whatever the protocol speaks: HTTP method+headers, DNS query, etc.

The receive path#

The receiver walks layers bottom-up. At each step the layer reads its header (interpreting it), then strips it, then hands the remainder up:

Link layer: sees Eth frame. Verify FCS, check dest MAC.
EtherType = 0x0800 -> hand payload to IP.
Internet: sees IP packet. Verify checksum, check dest IP.
protocol = 6 -> hand payload to TCP.
Transport: sees TCP segment. Demux by dest port.
port 80 -> hand payload to the HTTP server process.
Application: parses HTTP request, dispatches handler.

Notice the demux fields: EtherType, IP protocol, TCP port. Each is the layer’s question “which peer up the stack should I deliver this to?” Each layer is essentially a multiplexer.

Routers do partial decapsulation#

A router on the path does not unwrap all the way to the app. It strips the link header, reads the IP header (to look up the destination and decrement TTL), re-wraps with a new link header for the next hop, and forwards. The TCP segment is never opened.

Host A Router R1 Router R2 Host B
[Eth1|IP|TCP] -> [Eth2|IP|TCP] -> [Eth3|IP|TCP] -> [Eth4|IP|TCP]
^ ^ ^ ^
| | | |
new Eth at each hop. IP unchanged (TTL--). TCP untouched until B.

The MAC addresses change every hop. The IP addresses do not (NAT excepted). This is the “each layer talks to its peer” property in practice.

A real example, byte-counted#

A minimal HTTPS GET request:

Ethernet header: 14 bytes (6 dst MAC + 6 src MAC + 2 EtherType)
IPv4 header: 20 bytes (no options)
TCP header: 20 bytes (no options) ... but often 32 with TS option
TLS record header: 5 bytes
HTTP/2 frame header: 9 bytes
HTTP/2 HEADERS payload: ~50 bytes (HPACK-compressed)
Ethernet FCS: 4 bytes
-------------------------------------
Total on wire: ~122 bytes for what is logically "GET /"

Variants and trade-offs#

Layered (encapsulation) model — each layer independent, headers self-describing, easy to evolve one layer without touching others (IPv6 came in without breaking TCP). Cost: overhead bytes per packet; multiple lookups per hop.
Monolithic / cross-layer optimisation — wireless stacks (5G, Wi-Fi 6E) and some HPC fabrics fuse layers for performance — link-layer knows congestion signals, transport knows radio quality. Cost: protocols stop being independently evolvable; harder to interop.

Other axes:

  • Header vs trailer — most layers prepend headers; Ethernet’s FCS is one of the few trailers (it’s a CRC computed over the whole frame, so it must be appended after the payload is known).
  • Fixed vs variable headers — IPv4 header is 20 bytes plus optional extensions; IPv6 fixed at 40 bytes with extension headers chained. TCP header is 20 plus optional. Fixed headers parse faster in hardware.
  • Tunnels — encapsulating an IP packet inside another IP packet (IP-in-IP, GRE, IPsec) adds 20-50 bytes and one more lookup. Cost: MTU shrinks (the inner payload must fit in the outer MTU); benefit: virtual networks, VPNs, BGP-free overlays.
Why headers, not in-band markers?

Early networks (some serial protocols, BISYNC) used in-band markers: special byte sequences that meant “this is the start of a frame”. They required byte-stuffing to escape data that happened to contain the marker, and made parsing context-dependent. Modern networking moved to fixed-format headers with explicit length fields — you read a known number of bytes for the header, the header tells you how long the payload is, you read that many. Predictable, hardware-friendly, no escape sequences.

When this is asked in interviews#

Usually as a follow-up to “explain TCP/IP” or “what happens when you type a URL”. The interviewer wants to hear “the packet picks up headers on the way down, sheds them on the way up, and each layer reads only its own header.”

Common probes:

  • “What’s in an Ethernet frame?” — destination MAC (6), source MAC (6), EtherType (2), payload (46-1500), FCS (4). EtherType identifies the next layer.
  • “How does a router know what to do with a packet?” — strips the Ethernet, reads the IP destination, looks up in the FIB, re-encapsulates with the next-hop MAC, sends. Never reads the TCP header.
  • “Why is the TTL field important?” — guarantees forwarding loops eventually drop the packet. Each router decrements; at 0, the packet is discarded and ICMP Time-Exceeded is returned. traceroute exploits this.
  • “What is MTU and why does it matter?” — maximum transmission unit at the link layer; the largest payload the link can carry in one frame. Path MTU is the minimum across all hops; oversized packets get fragmented (IPv4) or rejected (IPv6).

The trap is treating headers as static metadata. They are active — TTL changes per hop, checksums must be recomputed when fields change (NAT recalculates the IP and TCP checksum), MAC addresses are rewritten every hop.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.