WebSockets — Bidirectional Streaming

Upgrade-handshake, full-duplex frames, the long-lived connection model. When push beats poll.

Building Block Intermediate
13 min read
websocket streaming real-time push bidirectional

What it is#

WebSockets is a full-duplex messaging protocol that runs over a single TCP connection, specified by RFC 6455 (2011). A client opens a normal HTTP connection, asks the server to upgrade it to a WebSocket, and from that point on both sides can send framed messages whenever they like — no polling, no new requests, no server-side flush ceremony.

The shape of a WebSocket session:

┌─ HTTP/1.1 ──────────────────────────────────────────────┐
│ GET /ws HTTP/1.1 │
│ Upgrade: websocket ← handshake │
│ Connection: Upgrade │
│ Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== │
│ │
│ HTTP/1.1 101 Switching Protocols │
│ Upgrade: websocket │
│ Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= │
└─────────────────────────────────────────────────────────┘
↓ same TCP connection, now WebSocket frames ↓
client → server: TEXT frame "subscribe orders"
server → client: TEXT frame "{\"order\":\"123\",\"status\":\"paid\"}"
server → client: TEXT frame "{\"order\":\"124\",\"status\":\"paid\"}"
client → server: PING frame
server → client: PONG frame
client → server: CLOSE frame (1000)
server → client: CLOSE frame (1000)

The connection is long-lived — minutes to hours, sometimes days. The server can push to the client at any moment without the client asking. The client can push to the server at any moment without opening a new request. That is what “bidirectional” buys you, and it is what poll-based APIs cannot do without burning round-trips.

WebSockets is not a replacement for HTTP. It is a complement — built on HTTP for the handshake, then a custom frame protocol on the same TCP socket. Modern stacks use it for chat, live dashboards, collaborative editing, real-time multiplayer, financial market feeds, IoT telemetry.

When to use it#

Reach for WebSockets when:

  • The interaction is genuinely bidirectional. The server needs to push to the client and the client sends frequent commands. Chat is the canonical case.
  • Updates arrive at irregular, low-latency intervals. Order-book ticks, presence events (“user X is typing”), live game state, collaborative cursors.
  • Polling would be wasteful. A client polling every second sends 3600 requests an hour; a WebSocket sends one handshake plus the messages that matter.
  • The client is a browser. WebSockets are the only first-class bidirectional channel native to browsers. gRPC streaming needs a proxy (gRPC-Web); raw TCP is not available.

Avoid WebSockets when:

  • The flow is server-push-only. Server-Sent Events (SSE) is half the protocol weight, works over plain HTTP, falls back through proxies, and reconnects automatically.
  • The flow is client-poll-only. A plain GET with If-None-Match or short polling is simpler and CDN-cacheable.
  • The interaction is short-lived. A two-message exchange does not justify a WebSocket handshake.
  • Your load balancer terminates HTTP and does not speak WebSocket upgrade. Verify before you design around it. Most modern LBs do; some legacy ones don’t.

How it works#

The upgrade handshake#

The handshake is plain HTTP. The client sends a GET with three special headers:

Client → server: upgrade request
GET /ws HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: chat.v1
Origin: https://app.example.com

The server proves it understands the protocol by hashing the client’s key with a magic GUID (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), base64-encoding it, and returning:

Server → client: upgrade accepted
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat.v1

After the 101 Switching Protocols, both sides stop speaking HTTP and start speaking the WebSocket frame protocol on the same TCP socket. The handshake’s whole purpose is to negotiate the upgrade with intermediaries that only understand HTTP (CDNs, reverse proxies, corporate firewalls) — to them it looks like an unusual but valid HTTP request, and once the upgrade succeeds they keep the TCP byte-stream open without inspection.

The frame format#

A WebSocket frame has a small binary header (2-14 bytes) followed by a payload:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Masking-key (4 bytes, only if MASK=1) |
+---------------------------------------------------------------+
| Payload data |
+---------------------------------------------------------------+

Key fields:

  • FIN — last frame of a logical message (messages can be fragmented).
  • opcode0x0 continuation, 0x1 text, 0x2 binary, 0x8 close, 0x9 ping, 0xA pong.
  • MASK — client→server frames must be masked (XOR with a 4-byte key) to defeat certain cache-poisoning attacks; server→client frames must not be masked.
  • Payload len — the length encoding is variable: small lengths fit in 7 bits, larger in 16, very large in 64. Max message size is 2^63 bytes, but in practice servers cap it (max_frame_size, often 1 MiB).

The control frames — PING, PONG, CLOSE — are short (<= 125 bytes) and used for liveness and graceful shutdown. Application data flows as TEXT (UTF-8) or BINARY (anything).

Subprotocols and message shape#

WebSockets itself is just framing. What goes inside the frames is the subprotocol — JSON, MessagePack, Protobuf, raw bytes, anything. The Sec-WebSocket-Protocol handshake header lets the client and server negotiate a named subprotocol (chat.v1, mqtt, wamp).

A common subprotocol convention is typed JSON messages:

{"type": "subscribe", "channel": "orders"}
{"type": "order.updated", "id": "ord_123", "status": "paid"}
{"type": "ping", "t": 1685440800}

Frameworks like Socket.IO (Node), Phoenix Channels (Elixir), and SignalR (.NET) layer their own message conventions, presence systems, and reconnection logic on top of raw WebSockets.

A simple WebSocket server and client in three languages#

A chat-style echo server that broadcasts everything it receives, plus a small client. Same protocol on the wire; idiomatic shape per language.

WebSocket echo server + client — Python (websockets library)
# Server
import asyncio
import websockets
CONNECTIONS = set()
async def handler(ws):
CONNECTIONS.add(ws)
try:
async for msg in ws:
# Broadcast to everyone
for peer in list(CONNECTIONS):
if peer is not ws:
await peer.send(msg)
finally:
CONNECTIONS.discard(ws)
async def main():
async with websockets.serve(handler, "0.0.0.0", 8080,
ping_interval=20, ping_timeout=10):
await asyncio.Future() # run forever
# asyncio.run(main())
# Client
async def client():
async with websockets.connect("wss://api.example.com/ws") as ws:
await ws.send('{"type": "subscribe", "channel": "orders"}')
async for msg in ws:
print("recv:", msg)

The wire format is identical across all three. Production code adds reconnection with exponential backoff, heartbeat pings (configured per ping interval), idle-connection cleanup, and per-message authentication or rate-limiting checks.

Auth — the awkward part#

HTTP APIs put auth in the Authorization header. WebSockets in a browser cannot set arbitrary headers on the upgrade request — the WebSocket constructor only takes a URL and a subprotocol list. So bearer tokens have three common landing spots:

  1. Cookie auth. The browser sends cookies on the upgrade request automatically. If you already use cookie sessions, this is the cleanest. Cross-site WS requires careful SameSite settings.
  2. Token in the URL. wss://api.example.com/ws?token=<JWT>. Simple; works everywhere. Risk: URLs end up in logs, browser history, server-side access logs. Use short-lived tokens (1-5 minutes) minted just for the upgrade.
  3. Token in the first message. Connect anonymously; client sends {"type":"auth","token":"..."} as the first frame; server validates and either upgrades the session or closes the connection. Works around URL-logging concerns; requires a “pending” state on the server.

Sec-WebSocket-Protocol is also abused as an auth channel — the client passes the token as a subprotocol name. It works in browsers and shows up less in logs than a query string. It is a hack, but a common one.

Non-browser clients (mobile native, server-to-server) can set headers freely and should use Authorization: Bearer <token> on the upgrade.

Scaling — the long-lived connection problem#

One WebSocket connection = one TCP socket open on a server for the duration of the session. A million concurrent connections = one server with a million sockets, or 100 servers with ten thousand each. The constraints:

  • File descriptor limits. A modern Linux box can handle ~1M sockets per process (ulimit -n + memory tuning); 64-128k is the typical default until you tune.
  • Load balancer affinity. A WebSocket session lives on one specific origin. The LB must route every frame on that connection to the same backend. Layer-4 LB or layer-7 with sticky sessions.
  • Horizontal scaling and fan-out. A “broadcast order update” event must reach every server that holds a relevant subscriber. Production architectures use Redis pub/sub, Kafka, or NATS as the broadcast backbone between WebSocket servers.
  • Graceful deploy. Rolling a server kicks every connected client. Clients must reconnect cleanly with exponential backoff and jitter. 1001 Going Away is the close code your server should send during graceful shutdown.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ ws-srv-1 │ │ ws-srv-2 │ │ ws-srv-3 │ each holds 10k connections
└────┬─────┘ └─────┬────┘ └─────┬────┘
│ │ │
└────────────────┼────────────────┘
┌───────▼────────┐
│ Redis pub/sub │ broadcast bus
│ (or Kafka) │
└────────────────┘

This is one of the loudest reasons teams choose Server-Sent Events for push-only use cases — SSE is plain HTTP and inherits everything HTTP middleboxes do for free, including transparent reconnection.

Variants#

VariantMechanismWhen it fits
Raw WebSockets (RFC 6455)The framing protocol described hereCustom protocols; minimum dependency
Server-Sent Events (SSE)HTTP streaming response, server-push only, auto-reconnectPush-only feeds (live tickers, notifications); simpler ops
Socket.IOWebSocket + long-polling fallback + presence + roomsOlder / heterogeneous browser fleets; many built-in features
WAMP / STOMPPub/sub + RPC framing on top of WebSocketEnterprise message-bus style integrations
WebTransport (over HTTP/3)Newer browser API; multiple streams + datagramsGame and AR/VR workloads; not yet ubiquitous
gRPC bidirectional streamingHTTP/2 streams; not browser-native (needs gRPC-Web)Internal services; see gRPC — Protobuf over HTTP/2

WebSockets. Full-duplex. Client and server can both push. Single connection per session. The right call for chat, collaborative editing, multiplayer games, live trading interfaces.

Server-Sent Events. Server-to-client only. Plain HTTP — proxies and CDNs handle it. Auto-reconnect built into the browser. The right call for notification feeds, live tickers, log streaming, anything one-way.

Trade-offs#

What WebSockets give you:

  • True bidirectional, low-latency messaging. No polling overhead; messages arrive within the network RTT.
  • Browser-native. Every modern browser ships the WebSocket API.
  • Lightweight after handshake. A frame header is 2-14 bytes; far smaller than an HTTP request envelope.
  • Custom subprotocols. You’re not boxed into JSON; binary protocols (MessagePack, Protobuf) ride happily.

What WebSockets cost you:

  • A new operational model. Long-lived connections, sticky routing, broadcast buses, reconnection logic — all problems that pure HTTP APIs don’t have.
  • Worse CDN story. CDNs can proxy WebSockets but cannot cache them. The CDN becomes a dumb byte pipe.
  • Awkward auth in browsers. The header limitation forces one of the three workarounds above.
  • Harder observability. A WebSocket session is one log line per connection, not one per request. Instrumentation must speak the subprotocol to be useful.
  • Connection limits. Mobile networks, corporate firewalls, and some load balancers drop idle WebSocket connections after 60-300 seconds. Heartbeat pings are mandatory.

Common pitfalls#

  • No heartbeat. Idle WebSockets get silently dropped by NATs and middleboxes. Send a PING every 20-30 seconds; if no PONG arrives within a small timeout, treat the connection as dead and reconnect.
  • No reconnection backoff. A flapping server gets DDoS’d by thousands of clients reconnecting in the same 100ms window. Exponential backoff with jitter — minimum 250ms, doubling to 30s, plus 10-20% random jitter.
  • No max message size. Clients can stream a 10 GB frame and exhaust server memory. Set max_frame_size (or the equivalent) to something sane — 1 MiB is common.
  • Trusting the client message blindly. WebSocket frames bypass every per-request middleware you have on the HTTP side. Re-do authn, authz, rate-limit, and input-validation per message.
  • Forgetting to mask frames from the client. RFC 6455 requires it; non-conformant clients are rejected by strict servers. The libraries handle this for you; raw socket implementations don’t.
  • Sticky sessions that aren’t sticky. Reconnects land on different servers; the new server has no subscription state. Either replicate state via the broadcast bus or accept that the client resubscribes on every reconnect.
  • Mixing ws:// and wss:// like http:// and https://. A wss:// page that opens ws:// gets blocked by mixed-content rules in every browser. Always use wss:// in production.
  • Counting one connection as one user. A user may have three tabs open, each holding a WebSocket. Quota and rate limits must be per-user, not per-connection.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.