BitTorrent
Tracker + swarm + pieces + tit-for-tat. The protocol that made P2P scale and still teaches the patterns.
What it is#
BitTorrent is a peer-to-peer file-distribution protocol designed by Bram Cohen in 2001 and released the next year. The premise: when many people want the same file, the bandwidth cost should be borne by all of them together, not by a single origin server that scales linearly with demand. A torrent breaks a file into fixed-size pieces, distributes those pieces across a swarm of peers, and uses a tit-for-tat algorithm to keep peers honest about uploading as well as downloading.
The protocol’s headline result was that releasing a popular Linux ISO via BitTorrent cost the publisher dramatically less in egress bandwidth than HTTP, even at the largest file-sharing community sizes. Its less-visible result was that the design patterns it introduced — content-addressed pieces, swarm coordination, choke/unchoke for rate control — became the substrate of nearly every modern P2P / decentralised system, from IPFS to Bitcoin’s block-propagation to game-engine asset distribution.
Architecture overview#
A torrent consists of four pieces:
┌────────────────────────────────────────┐ │ .torrent file │ │ ┌──────────────────────────────────┐ │ │ │ tracker URL(s) + announce list │ │ │ │ piece size (typ. 256KB-16MB) │ │ │ │ SHA-1 hash per piece │ │ │ │ file list + sizes │ │ │ │ infohash (SHA-1 of "info" dict) │ │ │ └──────────────────────────────────┘ │ └─────────────────┬──────────────────────┘ │ ┌───────────▼────────────┐ │ Tracker (or DHT) │ │ maps infohash → list │ │ of peer addresses │ └─────────────┬──────────┘ │ ┌───────────────────┼──────────────────────┐ │ │ │ ┌───▼────┐ ┌────▼───┐ ┌────▼───┐ │ Peer A │◄──────► │ Peer B │ ◄─────────► │ Peer C │ │ (seed) │ │ (leech)│ │ (leech)│ └────────┘ └────────┘ └────────┘ ◄── exchanges pieces over TCP ──►.torrentfile — metadata. Lists tracker URLs, piece size, the SHA-1 hash of each piece (the content-addressing that makes pieces verifiable independent of which peer served them), the file layout, and an infohash — the SHA-1 of the “info” section, which is the swarm’s identifier.- Tracker — a central server (originally) or distributed hash table (DHT, in modern deployments) that maps an infohash to a list of peer addresses currently in the swarm.
- Peers — every participant. Has a partial or complete copy of the file. Seeds have the whole file and only upload; leeches are still downloading. The distinction is dynamic.
- The swarm — the set of peers currently exchanging pieces of a particular torrent.
Protocol and pieces#
Three operations dominate the protocol:
- Announce. A peer joining the swarm tells the tracker (or queries the DHT for) the swarm’s current peer list. The tracker responds with a random subset (typically ~50 peers). The peer re-announces periodically to refresh the list and signal liveness.
- Handshake. Two peers connect via TCP, exchange a fixed-format handshake including the infohash (to confirm they’re talking about the same swarm) and a peer ID, then begin exchanging bitfields — each peer’s bit vector showing which pieces it has.
- Piece request and response. A peer requests a block (a 16KB or 32KB chunk within a piece) from a connected peer who has it. The provider sends the block. When all blocks of a piece are received, the piece’s SHA-1 is computed and compared to the
.torrentmetadata. Mismatch → discard, re-request.
The cleverness is in the policies layered on top:
- Rarest-first piece selection. Peers ask for the rarest pieces in the swarm first. This keeps the swarm balanced — no piece becomes a bottleneck because every peer has it except one.
- Endgame mode. When a peer is close to finishing (only a few pieces left), it requests those last pieces from all connected peers simultaneously and cancels redundant requests as they arrive. This shaves the long tail of completion time.
- Random first piece. New peers grab one random piece to start, both to have something to upload immediately (the tit-for-tat algorithm needs an upload offer) and to avoid all new peers swarming on the same “rarest” piece.
Peer selection (tit-for-tat)#
The single most important policy in BitTorrent is choking. Every peer maintains a small number of unchoked connections (typically 4) — the peers it’s currently willing to upload to. All other connections are choked — open but not transferring.
The unchoke decision is the tit-for-tat:
- Every 10 seconds, the peer ranks all its connected peers by how much they’ve uploaded to it recently.
- The top 4 (by upload rate) get unchoked. Everyone else gets choked.
- Optimistic unchoke. Every 30 seconds, one random peer is additionally unchoked, regardless of upload rate. This gives newcomers a chance to prove themselves — without optimistic unchoke, the initial bootstrap (when you have nothing to offer) would never start.
The mechanism gives BitTorrent its key economic property: peers that upload get faster downloads. Free-riding (downloading without uploading) is structurally inefficient — the protocol routes around it. The result is that swarms self-organise into a high-bandwidth core of mutually-uploading peers.
Tracker and DHT#
The original BitTorrent design relied on a central tracker per torrent. This was the system’s single point of failure — when ThePirateBay’s trackers went down, swarms fragmented. Two innovations replaced the centralised tracker:
- PEX (Peer Exchange). Peers in a swarm gossip their peer lists to each other. A peer can learn of new peers without contacting the tracker. Limits the tracker’s role to “first introduction”.
- DHT (Kademlia-based). A globally-shared distributed hash table where the key is the infohash and the value is the peer list. Every BitTorrent client participates in the DHT; the swarm’s existence is itself decentralised. The DHT is the reason a torrent can survive — and even continue to attract new peers — long after its original tracker is dead.
Modern torrent clients use trackers (when available) + DHT + PEX simultaneously, with each path independently sufficient for the swarm to function.
Operational characteristics#
The properties that made BitTorrent the dominant P2P protocol:
- Bandwidth cost is paid by peers, not the publisher. A 1GB ISO with 10,000 downloads costs the publisher one upload’s worth of bandwidth (to seed the swarm initially); the remaining 9,999 GB of egress is paid by the swarm itself.
- Throughput scales superlinearly with swarm size. More peers → more upload capacity → faster downloads for everyone. The opposite shape from HTTP, where more clients → more load on the origin → slower for everyone.
- Resilient to peer churn. A peer leaving the swarm doesn’t kill the download; remaining peers continue exchanging the pieces they have. As long as every piece exists somewhere in the swarm, the swarm is complete.
- Content-addressed verification. Every piece is SHA-1-verified against the
.torrentmetadata. Pieces can come from anyone — verification is independent of source. - NAT traversal. Many peers are behind NATs. The protocol works around this through hole-punching (via the tracker) and by accepting that peers behind symmetric NATs can connect outbound but not be connected to; the swarm still works as long as some peers are publicly reachable.
In production numbers: a healthy public swarm for a popular release routinely sustains 10-50 MB/s per leecher against a single seed peer at home — the swarm’s collective upload capacity dwarfs any one peer’s.
Trade-offs and gotchas#
Recurring operational gotchas:
- The “no seeds” problem. A torrent with no seeds and incomplete leeches is stuck — peers can exchange pieces they have but no peer can fill in the gaps. Long-tail content needs periodic re-seeding or a fallback HTTP mirror.
- NAT/firewall behaviour. Two peers behind symmetric NATs can’t connect to each other; both peers must be reachable from outside, or one must be. Practical effect: in a swarm of 100 leechers, maybe 30 can actually accept incoming connections.
- ISP throttling. Many ISPs throttle or shape BitTorrent traffic (visible by port and protocol). Encrypted-handshake variants (uTP, MSE) exist to defeat this; they add complexity.
- Legal and ethical surface. BitTorrent is content-agnostic; popular use has been heavily skewed toward copyright infringement, which has driven both ISP-level filtering and a public perception that “BitTorrent” means “piracy”. Legitimate uses (Linux ISOs, game patches, scientific datasets) still rely on the protocol, but the brand burden is real.
Where BitTorrent's ideas live now
Even though most users encounter BitTorrent as the Linux-ISO-distribution method, the design patterns are everywhere: IPFS’s content-addressing is BitTorrent’s piece-hashing scaled to the web; Bitcoin’s block-propagation uses BitTorrent-style chunked transfers; Steam’s game-update system, Facebook’s BitTorrent-based binary deployment to data centres, and Twitter’s murder tool all reach for the protocol when one-to-many distribution at scale is the bottleneck. The pattern is content-addressed pieces, swarm coordination, incentive-compatible peer selection — and that pattern is fundamental enough to outlive its original P2P-file-sharing context.
Related systems#