Network Sockets — The Foundation
Sockets as the OS-level primitive every HTTP server stands on. Why API designers should know what's underneath.
Summary#
A socket is the operating-system handle that a process uses to send and receive bytes over a network. It is the API between user space and the kernel’s networking stack, and it has been almost unchanged since the original BSD design in 1983. Every HTTP server you have ever used — nginx, Apache, Go’s net/http, Node’s http module, Python’s aiohttp, Tomcat — eventually calls socket(), bind(), listen(), accept(), read(), and write(). The high-level frameworks hide it, but the call sequence is universal.
There are two families of sockets you will meet in API work. A stream socket (TCP, type SOCK_STREAM) gives you a reliable, ordered, byte-stream connection. You write bytes on one end, the same bytes come out the other end in the same order, with retransmission and flow control handled by the kernel. A datagram socket (UDP, type SOCK_DGRAM) gives you a best-effort, unordered, message-oriented endpoint. You send a packet; it might arrive, might not, might arrive out of order. HTTP/1.1 and HTTP/2 run over TCP stream sockets. HTTP/3 and DNS run over UDP datagram sockets. Your raw API code almost never touches either — but understanding what is happening beneath your HTTP library is the difference between debugging a connection-pool issue and being confused by one.
An API designer is rarely writing socket code directly. The point of knowing sockets is that every property of HTTP — connection pooling, keep-alive, timeouts, head-of-line blocking, the cost of a new connection, the meaning of an EADDRINUSE, the reason a server can run out of file descriptors — falls out of the socket abstraction underneath.
Why it matters#
Three reasons sockets matter to API design even though you almost never see them.
- Every HTTP behaviour you care about traces to a socket behaviour. Keep-alive is “reuse the same TCP socket for multiple requests.” Connection pooling is “keep a set of open sockets and hand them out.” HTTP/2 multiplexing is “send multiple logical streams over one TCP socket.” Head-of-line blocking is “TCP’s ordering guarantee means a lost packet stalls every stream sharing the socket.” If you cannot trace a behaviour back to a socket, you do not yet understand it.
- Sockets are the cost centre. A TCP connection costs roughly 1 RTT to establish (3-way handshake), plus 1–2 RTTs for TLS. A socket consumes a file descriptor, kernel memory for send/receive buffers, and a slot in the connection-tracking table. A server that handles 10,000 concurrent connections holds 10,000 sockets open. The OS limits, the load balancer’s connection table, the TLS session cache, the conntrack table — all of these are dimensioned in sockets.
- The socket API is the same on Linux, macOS, BSD, and (with minor twists) Windows. The portability is why HTTP libraries look similar across languages — they all wrap the same call sequence. Knowing the underlying calls makes any HTTP library’s source code legible.
The pragmatic version: when a production API gets slow and the metrics show “connection pool exhausted” or “too many open files” or “TIME_WAIT pile-up,” the engineer who can name the underlying socket state ships the fix in an hour. The engineer who cannot has to learn it during the incident.
How it works#
The classic Berkeley sockets API has six calls that form the lifecycle of a server-side TCP socket.
server client │ │ │ socket() ──────────── create ─► │ socket() │ bind() │ │ listen() │ │ accept() ◄──── 3-way handshake ──── │ connect() │ ◄──── data ─────────────── │ send() │ recv() │ │ send() ──── data ───────────────► │ recv() │ close() ──── 4-way close ────────► │ close() ▼ ▼The six calls#
socket()— ask the kernel for a new socket file descriptor. Specify the family (AF_INET,AF_INET6,AF_UNIX) and the type (SOCK_STREAMfor TCP,SOCK_DGRAMfor UDP).bind()— attach the socket to a specific IP address and port. A server binds to0.0.0.0:443to listen on all interfaces; a client typically skips this and lets the kernel pick an ephemeral port.listen()— mark the socket as passive (accepting incoming connections). The argument is the backlog — how many pending connections the kernel will queue before refusing new ones.accept()— block until a connection arrives, then return a new socket for that specific connection. The original listening socket stays open for the next one.send()/recv()(orread()/write()) — move bytes through the connection. For TCP, the kernel handles retransmission, ordering, and flow control transparently.close()— tear down the connection. For TCP, this triggers the 4-way close handshake (FIN/ACK/FIN/ACK) and the socket entersTIME_WAITfor2 × MSL(typically 60–120 seconds) before being fully released.
The TCP state machine is famous and worth glancing at once: CLOSED → LISTEN → SYN_RCVD → ESTABLISHED → FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED. The TIME_WAIT state is what causes the “Address already in use” error when you restart a server too fast — the kernel is still holding the port to absorb stray packets from the previous connection.
Blocking vs non-blocking#
By default, sockets are blocking. A call to recv() waits until data arrives, possibly forever. A server that calls accept() in a loop and recv() in a thread per connection is the classic one-thread-per-connection model — simple, but it does not scale past a few thousand concurrent connections because each thread has a kernel stack and a context-switch cost.
The modern approach is non-blocking sockets plus an event-loop. Each socket is marked O_NONBLOCK; recv() returns immediately with EAGAIN if no data is ready. The application uses a multiplexing syscall to ask the kernel “which of these N sockets is ready?” The progression of multiplexing APIs is one of the canonical stories in systems programming:
select()— POSIX. Takes a bitmask of file descriptors. Limited toFD_SETSIZE(typically 1024). O(n) per call.poll()— POSIX. Takes an array ofpollfdstructs. No size limit but still O(n) per call.epoll()— Linux. Register file descriptors once, then ask “which ones are ready?” in O(1). The basis of nginx, Node.js, Go’s runtime, and every modern Linux server.kqueue()— BSD/macOS equivalent ofepoll. Same idea, different API.io_uring— Linux. The next step: submit and complete I/O operations entirely without syscalls per operation. Used in modern high-performance servers.
An API designer does not write epoll code. But “the server uses an event loop” is a one-line summary of the entire epoll/kqueue/io_uring family, and knowing the family explains why Node.js can hold 100,000 concurrent WebSocket connections on a single core.
A minimal TCP echo server#
Concrete code, for grounding only — production servers use frameworks that wrap this.
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as srv: srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) srv.bind(("0.0.0.0", 8080)) srv.listen(128) while True: conn, addr = srv.accept() with conn: data = conn.recv(4096) conn.sendall(data)package main
import ( "io" "net")
func main() { ln, _ := net.Listen("tcp", ":8080") for { conn, err := ln.Accept() if err != nil { continue } go func(c net.Conn) { defer c.Close() io.Copy(c, c) }(conn) }}const net = require("net");
net.createServer((conn) => { conn.on("data", (chunk) => conn.write(chunk));}).listen(8080);The three implementations differ in concurrency model — Python blocks, Go uses goroutines (m:n threading), Node uses an event loop — but they all call the same six syscalls underneath.
Variants and trade-offs#
The big variant axis is TCP vs UDP, and the choice cascades into the API design.
| Property | TCP (SOCK_STREAM) | UDP (SOCK_DGRAM) |
|---|---|---|
| Reliability | Guaranteed delivery | Best-effort |
| Ordering | In-order | Out-of-order possible |
| Connection | Required (handshake) | Connectionless |
| Boundaries | Byte stream, no message boundaries | Datagram with boundaries |
| Flow control | Yes | No |
| Congestion control | Yes | No (application’s job) |
| Head-of-line blocking | Yes | No |
| Use cases | HTTP/1, HTTP/2, SSH, SMTP | DNS, NTP, QUIC, RTP, gaming |
TCP is the default for almost every API protocol. UDP wins where latency matters more than completeness (real-time video, voice, gaming), or where the application can do its own reliability (QUIC, which builds reliability on UDP precisely to escape TCP’s head-of-line blocking).
A second variant axis is synchronous threaded vs asynchronous event-loop server design. Threaded servers are simpler to reason about but cap out at low thousands of concurrent connections per host. Event-loop servers scale to hundreds of thousands but require non-blocking I/O all the way down — a single blocking call in the wrong place stalls everything. Go’s goroutines hide this trade by giving you blocking semantics on top of an event loop. Most modern API frameworks have made this choice for you; the cost-vs-benefit shows up only when you push throughput limits.
When this is asked in interviews#
Sockets rarely appear by name in an API-design round, but socket-level intuition shows up in three specific moments.
The first is the connection-management discussion. The interviewer asks “how do you handle 10,000 concurrent clients?” The senior answer mentions connection pooling, keep-alive, HTTP/2 multiplexing — all of which reduce socket churn. The junior answer says “we’d spin up more servers” without addressing why the current servers run out of capacity.
The second is the timeout question. “What timeouts do you set on a downstream call?” The right answer names all of them: connection timeout, read timeout, idle keep-alive timeout, total deadline. Each one corresponds to a socket-level event — TCP handshake, recv() blocking, idle connection eviction, request budget. A candidate who collapses all of these into one “timeout” is missing the socket model.
The third is the load-balancer / NAT question. “What changes when there is a load balancer in front?” The socket model gives you the answer: the LB terminates the client TCP connection and opens a new one to the backend; the source IP the backend sees is the LB’s, not the client’s; the conntrack tables on every middlebox now have to track twice as many connections; idle backend connections can be killed by the LB’s idle timeout. None of this is intuitive without the socket lens.
The summary line: every API runs over sockets, and most production API incidents are socket incidents wearing an HTTP costume.
Related concepts#
- The Narrow Waist of the Internet — the IP layer beneath every socket, which is what the kernel routes packets over.
- HTTP — The Foundational Protocol for APIs — the request-response protocol built on top of TCP sockets.
- WebSockets — Bidirectional Streaming — long-lived bidirectional sockets after an HTTP upgrade, used for streaming APIs.
- Latency and Throughput — the performance dimensions whose physical floors come from socket-level handshakes.
- What Is API Design? — the contract framing that sits several layers above the socket.