The Evolution of HTTP — 1.1, 2, 3
Pipelining to multiplexing to QUIC. How each version solved the previous one's head-of-line problem and what that buys API designers.
What it is#
HTTP did not arrive at its current shape in one step. Four versions ship in production today, each solving a specific bottleneck the previous one left behind:
| Version | Year | RFC | Transport | The bottleneck it solved |
|---|---|---|---|---|
| HTTP/1.0 | 1996 | 1945 | TCP | First codified version; one connection per request |
| HTTP/1.1 | 1997 | 2068 (later 9112) | TCP | Persistent connections + pipelining + chunked encoding |
| HTTP/2 | 2015 | 7540 (later 9113) | TCP + TLS | Multiplexed streams + HPACK header compression |
| HTTP/3 | 2022 | 9114 | QUIC over UDP | TCP head-of-line blocking; mobile reconnection |
For an API designer, the practical effect of each upgrade is fewer round-trips, fewer connections, fewer head-of-line stalls — but the wire format and semantics carry forward unchanged. A GET /users/42 looks the same on the wire (in JSON-on-curl terms) in HTTP/1.1, HTTP/2, and HTTP/3. The difference is in how the bytes are framed, multiplexed, and recovered when they’re lost.
This page is the evolution narrative: what each version added, what trade-off it made, and what the API designer should actually do with that. The wire details of methods, status codes, and headers are in HTTP — The Foundational Protocol for APIs.
When to use it#
This is less “when to use HTTP/x” and more “when do I need to care which version is on the wire”:
- Care about HTTP/2 when your public API serves browsers or mobile clients over TLS. Most CDNs, load balancers, and modern frameworks negotiate it transparently via ALPN; you mostly need to confirm it’s on and your server isn’t artificially capping concurrency.
- Care about HTTP/3 when your audience is mobile-heavy (lossy radio links), globally distributed (long RTTs), or moves between networks mid-request (Wi-Fi → LTE). Cloudflare, Google, Meta, and Akamai have served HTTP/3 to a majority of clients since 2023.
- Care about HTTP/1.1 when your service is internal-only and behind a load balancer that terminates HTTP/2 at the edge — many production setups still speak HTTP/1.1 between the LB and the origin. There’s nothing wrong with that.
You should not be picking versions per-request. You should be picking a server that negotiates the best version the client supports and lets you opt out only for diagnostics.
How it works#
HTTP/1.1 — persistent connections, but serial within them#
HTTP/1.0 opened one TCP connection per request. A page that loaded 30 assets opened 30 connections, each paying a TCP handshake and (over HTTPS) a TLS handshake. The latency was crippling.
HTTP/1.1 fixed this with persistent connections (Connection: keep-alive implicit) and pipelining: the client could send multiple requests back-to-back without waiting for each response. In theory this was a huge win. In practice, pipelining shipped broken in every major proxy and intermediary, was widely disabled by default, and is essentially never used today.
The remaining HTTP/1.1 model:
client ──── conn-A ──── server A: GET /a → wait → response → GET /b → wait → responseclient ──── conn-B ──── server B: GET /c → wait → responseclient ──── conn-C ──── server C: GET /d → wait → responseBrowsers worked around it by opening 6 parallel TCP connections per origin (the de facto cap). With 6 parallel pipes and serial requests within each pipe, a page with 30 assets needs roughly 5 round-trips of serialised time per pipe. Better than HTTP/1.0, far from optimal.
The head-of-line problem at the application layer: response B in the pipeline cannot start streaming until response A is done. A single slow endpoint stalls every request behind it on the same connection.
HTTP/2 — multiplexed streams over one connection#
HTTP/2 shipped in 2015 (RFC 7540, refreshed as 9113). Three big ideas:
1. Binary framing. The whole protocol is reframed in length-prefixed binary frames (HEADERS, DATA, SETTINGS, WINDOW_UPDATE, PING, GOAWAY). Curl and Wireshark show them as readable frames; the parser is bytes, not text.
2. Multiplexed streams. A single TCP connection carries many concurrent streams, each identified by a stream ID. Requests and responses interleave at the frame level:
TCP connection│├── stream 1: HEADERS ──── DATA ──── DATA ──── (end)│ /users/1├── stream 3: HEADERS ──── DATA ────── (end)│ /orders/42├── stream 5: HEADERS ──── DATA│ /inventory└── ...Six parallel connections (HTTP/1.1) become one connection with hundreds of concurrent streams. The head-of-line problem at the application layer disappears — a slow /inventory stream no longer blocks /users/1.
3. HPACK header compression. HTTP headers are repetitive (User-Agent, Accept, Cookie, Authorization all repeat on every request). HPACK indexes them: the first send is Authorization: Bearer eyJhbGciOi..., the next is just <idx 62>. Header overhead on a request drops by 80-90% on a warm connection.
A fourth feature, server push, let the server pre-send resources the client would ask for next. It was deprecated by Chrome in 2022 because the gains were marginal and the cache invalidation was hard. Don’t design around it.
HTTP/2 also requires (in practice) TLS — RFC 7540 permits cleartext, but every browser refuses to speak HTTP/2 without TLS, so the practical floor for HTTP/2 is HTTPS. ALPN (Application-Layer Protocol Negotiation, an extension to TLS) is how the client and server agree on HTTP/2 vs HTTP/1.1 during the TLS handshake.
The remaining problem: TCP head-of-line blocking#
HTTP/2 solved head-of-line at the application layer. But TCP itself reorders bytes — if a packet on the wire is lost, every byte after it must wait for the retransmission, even if those bytes belong to a different HTTP/2 stream. The browser sees frame 47 of stream 1 deliver fine; frame 12 of stream 3 was in the lost packet; frame 48 of stream 1 arrived but TCP refuses to deliver it because TCP guarantees in-order bytes. Stream 1 stalls because stream 3 lost a packet. Head-of-line moved one layer down.
On a stable wired network this is invisible. On a flaky mobile connection with 2% packet loss, it costs visibly more than HTTP/1.1 with 6 connections — because HTTP/1.1 can route around a stalled connection by using one of the other 5.
HTTP/3 — multiplexed streams over QUIC, over UDP#
HTTP/3 (RFC 9114, 2022) keeps HTTP/2’s semantics — multiplexed streams, header compression (now QPACK, an HTTP/3-friendly variant) — and replaces TCP with QUIC (RFC 9000), a transport protocol over UDP that was originally designed at Google in 2012-2016.
QUIC’s contribution:
- Streams are first-class at the transport layer. Each QUIC stream is independently flow-controlled and lossless. A lost packet on stream A does not stall stream B.
- Encryption is built-in. TLS 1.3 is baked into QUIC. You do not negotiate “plaintext QUIC vs encrypted QUIC” — there is only encrypted QUIC.
- 0-RTT resumption. A returning client can send data on the very first packet, with no handshake round-trip. Cuts cold-start latency dramatically for mobile.
- Connection migration. A connection has a connection ID that survives an IP-address change. A phone moving from Wi-Fi to LTE keeps the same QUIC connection. TCP cannot do this — its 4-tuple
(src-ip, src-port, dst-ip, dst-port)defines the connection.
The connection setup compared:
HTTP/1.1 + TLS 1.3: TCP handshake (1 RTT) + TLS handshake (1 RTT) = 2 RTTHTTP/2 + TLS 1.3: TCP handshake (1 RTT) + TLS handshake (1 RTT) = 2 RTTHTTP/3 (QUIC): QUIC handshake (1 RTT, includes TLS) = 1 RTTHTTP/3 (QUIC, 0-RTT): 0 RTT for a returning clientFor a mobile API call from a phone in a cafe, the difference between 2 RTT and 0 RTT is the difference between 200ms and 0ms before the first byte goes out.
Enabling HTTP/2 in a client — three languages#
Most clients negotiate HTTP/2 automatically over TLS via ALPN; you just have to make sure you’re not capping protocol negotiation. The example below shows the explicit-on path.
# requests does not speak HTTP/2; use httpx with the http2 extra:# pip install "httpx[http2]"import httpx
with httpx.Client(http2=True, timeout=5.0) as client: resp = client.get( "https://api.example.com/v1/health", headers={"Authorization": "Bearer eyJhbGciOi..."}, ) print(resp.http_version) # 'HTTP/2' print(resp.status_code)package main
import ( "crypto/tls" "fmt" "net/http"
"golang.org/x/net/http2")
func main() { // Go's net/http auto-negotiates HTTP/2 over TLS via ALPN. // The explicit transport below is only useful for diagnostics // or to force HTTP/2 over a non-default TLS config. tr := &http.Transport{TLSClientConfig: &tls.Config{}} _ = http2.ConfigureTransport(tr) client := &http.Client{Transport: tr}
resp, err := client.Get("https://api.example.com/v1/health") if err != nil { panic(err) } defer resp.Body.Close()
fmt.Println(resp.Proto) // 'HTTP/2.0' fmt.Println(resp.StatusCode)}// Node's built-in 'http2' module speaks HTTP/2 natively.const http2 = require("http2");
const client = http2.connect("https://api.example.com");const req = client.request({ ":path": "/v1/health", authorization: "Bearer eyJhbGciOi...",});
req.on("response", (headers) => { console.log("status:", headers[":status"]);});
let body = "";req.setEncoding("utf8");req.on("data", (chunk) => (body += chunk));req.on("end", () => { console.log(body); client.close();});req.end();Most production code does not need this level of ceremony — fetch, requests, axios, and Go’s net/http all negotiate the best protocol available. The explicit form is useful for diagnostics, capacity testing, or forcing a particular protocol for a benchmark.
Checking which version actually shipped#
$ curl -sIv https://api.example.com/v1/health 2>&1 | grep -i 'HTTP/'> GET /v1/health HTTP/2< HTTP/2 200$ curl --http3 -sIv https://api.example.com/v1/health 2>&1 | grep -i 'HTTP/'> GET /v1/health HTTP/3< HTTP/3 200Browser dev-tools also show the version in the Network panel under “Protocol”.
Variants#
| Variant | Where it fits |
|---|---|
| HTTP/2 over TLS | The 2024 baseline for public APIs. Browsers, mobile SDKs, CDNs all speak it. |
| HTTP/2 cleartext (h2c) | Internal east-west traffic where TLS is terminated at the edge. gRPC inside a cluster commonly does this. |
| HTTP/3 with QUIC | Mobile-heavy / global APIs. Cloudflare, Fastly, AWS CloudFront, Google all serve it; clients fall back to HTTP/2 if QUIC is blocked. |
| HTTP/1.1 inside the cluster | Legacy services behind a modern edge. Still ubiquitous; nothing wrong with it. |
| gRPC over HTTP/2 | Internal RPC. Uses HTTP/2 streams for bidirectional streaming; covered in gRPC — Protobuf over HTTP/2. |
Trade-offs#
What HTTP/2 gives you:
- Hundreds of concurrent requests on one connection. Connection pools shrink dramatically — typically 1 connection per origin instead of 6.
- Header compression. Small frequent calls (auth heartbeats, telemetry) get 80-90% smaller.
- Stream priorities and flow control. Critical resources can preempt less-critical ones.
What HTTP/2 costs you:
- TCP head-of-line blocking still bites on lossy networks. A single dropped packet stalls every multiplexed stream.
- A more complex on-the-wire format that is harder to debug with tcpdump (always encrypted, binary-framed).
- A heavier server-side implementation. HTTP/1.1 servers can be 200 lines of code; conformant HTTP/2 servers are thousands.
What HTTP/3 gives you:
- No TCP head-of-line. Streams are independent at the transport layer.
- 0-RTT and 1-RTT handshakes. Faster cold start, especially over long-RTT or mobile links.
- Connection migration. A network switch doesn’t drop the connection.
What HTTP/3 costs you:
- UDP is sometimes blocked. Corporate firewalls and some carrier middleboxes drop or rate-limit UDP. Clients fall back to HTTP/2; the network has to permit that fallback.
- CPU cost for crypto. QUIC encrypts more of the packet (including parts TCP leaves cleartext, like sequence numbers). The kernel-bypass story is less mature than TCP+TLS.
- Operational tooling lags. Many load balancers, WAFs, and IDS systems still speak HTTP/3 less mature than HTTP/2. Improving every quarter.
HTTP/2. The right default for almost every public API as of 2024-26. Mature tooling, every browser and SDK negotiates it, ALPN handles fallback to HTTP/1.1 cleanly. CPU cost is manageable; the head-of-line edge case only matters on lossy links.
HTTP/3. The right upgrade when your audience is mobile, global, or moves between networks. CDN-enabled in one click at Cloudflare / Fastly / CloudFront. Falls back to HTTP/2 cleanly when UDP is blocked. Pays for itself on the long tail of slow connections, not the median.
Common pitfalls#
- Assuming HTTP/2 means no head-of-line blocking. It moved the problem to TCP. On a wired connection that’s fine; on a lossy mobile link it’s still visible.
- Capping concurrent streams on the server. Default of 100 concurrent streams per connection is fine for browsers but cripples high-fan-out clients. Tune
SETTINGS_MAX_CONCURRENT_STREAMSdeliberately. - Disabling HTTP/2 because of “WebSocket compatibility”. WebSockets over HTTP/2 (RFC 8441) work, but support is uneven. The right answer for bidirectional streaming on modern stacks is gRPC-streaming, Server-Sent Events, or WebSockets over HTTP/1.1 (which still ships everywhere).
- Not testing the HTTP/3 → HTTP/2 fallback. Some networks accept UDP outbound but block QUIC’s specific port. Your client must transparently fall back; verify it does.
- Logging only
HTTP/1.1because that’s what the LB → origin link speaks. The edge negotiates HTTP/2 or HTTP/3 with the client; the LB then talks HTTP/1.1 to the origin. Client-side telemetry tells the truth; server-side access logs may understate it. - Believing server push will save you. It was deprecated by Chrome in 2022 over correctness and cache issues. Design as if server push does not exist.
- HPACK compression-context exhaustion. Long-lived connections with many distinct values (e.g. per-request
X-Trace-Idheaders) eventually evict the compression context entries you wanted to compress. Reserve fixed header names for high-frequency values.
Related building blocks#
- HTTP — The Foundational Protocol for APIs — the methods, status codes, and headers that every version of HTTP carries.
- The Narrow Waist of the Internet — IP and the layered model HTTP sits on top of.
- Transport Layer Security (TLS) — TLS, which HTTP/2 effectively requires and HTTP/3 has baked in.
- Latency and Throughput — the two dimensions every protocol upgrade is trading on.
- Estimating API Latency — Back-of-Envelope — what changes in your latency budget when you flip HTTP/3 on.