Reliable Data Transfer — ARQ, Sliding Window, Go-Back-N

Stop-and-wait → Go-Back-N → Selective Repeat. The recipe TCP refines into its flow- and congestion-control machinery.

Concept Intermediate
8 min read
arq sliding-window go-back-n selective-repeat transport-layer

Summary#

Reliable Data Transfer (RDT) is the family of protocols that turn an unreliable channel into a reliable one. The unreliable channel can lose, corrupt, reorder, or duplicate packets; the reliable one delivers each byte exactly once, in order, to the application above. The recipe is always the same shape: number the packets, Automatic-Repeat-reQuest (ARQ) on loss, acknowledge what arrived, and time out anything that didn’t.

Three canonical algorithms — Stop-and-Wait, Go-Back-N, and Selective Repeat — climb a complexity ladder, trading sender/receiver state for throughput. TCP is the production-grade hybrid: cumulative ACKs in the spirit of Go-Back-N, selective acknowledgements (SACK) for the precision of Selective Repeat, a sliding window for pipelining, and congestion control bolted on top. Every TCP design decision is best understood as a refinement of one of these three textbook protocols.

Why it matters#

You will never write Stop-and-Wait or Go-Back-N at work. You will read TCP traces, design RPC retry logic, build QUIC-based protocols, debug at-least-once vs exactly-once delivery in message queues — and every one of those problems is a reliable-data-transfer problem. Knowing the three textbook protocols gives you the vocabulary to reason about which trade-offs TCP made and why your own retry layer needs the same primitives (sequence numbers, ACKs, timeouts, idempotency).

It also matters because the same patterns reappear at every layer: Wi-Fi’s link-layer ARQ is Stop-and-Wait per frame; HTTP/2’s RST_STREAM is a stream-level reset; Kafka’s idempotent producer is a sliding-window protocol with cumulative offsets. The transport-layer textbook is the most concentrated form of the patterns.

How it works#

Stop-and-Wait — the simplest ARQ#

Sender sends packet 0, waits for ACK 0, sends packet 1, waits for ACK 1, and so on. One packet in flight at a time.

sender receiver
| -- pkt 0 ---> |
| <-- ack 0 --- |
| -- pkt 1 ---> |
| <-- ack 1 --- |

Loss handling: the sender sets a timer when it sends a packet. If the timer fires before the ACK arrives, retransmit. Receiver discards duplicates by checking the sequence number.

The headline weakness: throughput is bounded by 1 packet / RTT regardless of bandwidth. On a 100 Mbps link with 50 ms RTT and 1500-byte packets, Stop-and-Wait achieves about 240 Kbps — under a quarter of one percent of capacity. The link spends almost all its time idle.

Go-Back-N — sliding-window pipelining#

Allow up to N unacknowledged packets in flight at once. The sender maintains a window [base, base + N); the receiver sends cumulative ACKs (“I have everything up through sequence X”). On timeout for the oldest outstanding packet, the sender retransmits everything in the window from base onward.

window of size 4
sender receiver
| -- pkt 0 ------> |
| -- pkt 1 ------> | (pkt 1 lost on the wire)
| -- pkt 2 ------> | receiver: discard (out of order)
| -- pkt 3 ------> | receiver: discard
| <-- ack 0 ------ |
| (timer for pkt 1 fires)
| -- pkt 1, 2, 3 -> | retransmit everything

The receiver is simple — it accepts only the next expected packet, discards anything else, and re-sends the ACK for the highest contiguous sequence it has. The sender does the work: tracks the window, runs one timer per outstanding packet (or per window in simple implementations), retransmits on loss.

Go-Back-N pipelines effectively but wastes bandwidth on retransmissions when a single packet is lost in a long window. With N = 100 and one packet lost mid-window, you retransmit 100 packets to recover one.

Selective Repeat — buffer out-of-order, ack individually#

Same sliding window on the sender, but the receiver buffers out-of-order arrivals and ACKs each packet individually. The sender retransmits only the packets that timed out.

window of size 4
sender receiver
| -- pkt 0 ------> | ack 0
| -- pkt 1 ------> | (lost)
| -- pkt 2 ------> | ack 2, buffer
| -- pkt 3 ------> | ack 3, buffer
| (timer for pkt 1 fires)
| -- pkt 1 ------> | ack 1, deliver 1, 2, 3 in order to app

Both ends now carry more state — the sender tracks per-packet timers, the receiver maintains a buffer of out-of-order packets up to its own window. In return, retransmissions are precise: lose one packet, retransmit one packet.

Three things every ARQ scheme needs#

  • Sequence numbers. Disambiguate duplicates and detect gaps. Range must be wide enough that no two in-flight packets share a number (TCP’s 32-bit space is large but still wraps on fast links — hence PAWS and timestamps).
  • A timer. Without it you cannot detect loss. RTT estimation (Karn’s algorithm, Jacobson/Karels) sets the timeout — too short causes spurious retransmissions, too long stalls recovery.
  • An idempotent receiver. Duplicates will arrive (e.g. when an ACK is lost and the sender retransmits). The receiver must drop them silently using the sequence number.

How TCP combines them#

TCP’s reliable-data-transfer engine is a hybrid: cumulative ACKs (Go-Back-N’s economy of acknowledgement) plus selective acknowledgements (SACK option, Selective Repeat’s precision of retransmission) plus a sliding window for pipelining. On loss it triggers fast retransmit (three duplicate ACKs is treated as “the next packet is lost, retransmit immediately, don’t wait for the timer”) and fast recovery (don’t drop the window all the way to 1 — halve it and keep going).

Variants and trade-offs#

Go-Back-N — sender keeps a window, receiver is dumb (only accepts the next expected packet). Cumulative ACKs make the receiver almost free. On loss you retransmit the whole window from the loss point. Bandwidth-wasteful when windows are large and loss is rare.
Selective Repeat — both ends maintain windows; receiver buffers out-of-order; per-packet ACKs and timers. Retransmits only the lost packets. More bookkeeping; more memory at the receiver; better throughput when loss is sporadic.

Trade-offs across the family:

  • Window size vs throughput. Throughput is bounded by window / RTT. To fill a 1 Gbps link with 100 ms RTT you need a window of about 12 MB. TCP’s window-scaling option exists precisely to allow this.
  • Number of timers. Per-window timer (Go-Back-N-ish) is cheap but coarse. Per-packet timer (pure Selective Repeat) is precise but bookkeeping-heavy. TCP uses per-connection RTO plus fast-retransmit triggered by duplicate ACKs.
  • Sequence-number space. Wider is safer but costs header bytes. The number must be wide enough that the sender can never have two unacknowledged packets with the same sequence in flight (Nyquist-style; TCP’s 32 bits and timestamps handle this on multi-gigabit links).
  • NAK vs implicit timeout. Some schemes send explicit negative acknowledgements (“I didn’t get packet 5”). TCP doesn’t — duplicate ACKs play that role implicitly. Saves header overhead at the cost of slower loss detection on the last packet of a flight (no future packet to trigger a duplicate ACK).
  • Cumulative vs selective ACK. Cumulative is simpler and resilient to ACK loss (a later ACK supersedes an earlier one). Selective is precise. TCP carries both — the ACK field is cumulative; the SACK option is selective.
Why is the receiver in Go-Back-N so simple?

Cumulative ACK does all the work. The receiver only ever needs to remember “the highest contiguous sequence number I have.” If an out-of-order packet arrives, drop it; the next ACK will tell the sender what’s missing (the cumulative number won’t advance). The receiver is essentially stateless beyond a single counter — at the cost of forcing the sender to retransmit good packets along with the lost one.

When this is asked in interviews#

Common as a build-up to deeper TCP questions. Expect “design a reliable protocol over a lossy channel” — the interviewer wants you to invent Stop-and-Wait, identify the throughput bottleneck, evolve to Go-Back-N, then refine to Selective Repeat. Bonus points for naming the components (sequence numbers, ACKs, timers, idempotent receiver) and connecting them to TCP’s choices.

Common follow-ups:

  • “How does the sender pick a timeout?” — RTT estimation via exponentially weighted moving average; Karn’s algorithm excludes retransmitted samples; Jacobson/Karels adds RTT variance to the bound.
  • “What happens if the ACK is lost?” — the sender’s timer fires, it retransmits; receiver sees a duplicate (sequence already delivered), drops it silently, re-sends the ACK.
  • “How big should the window be?” — bandwidth-delay product. Filling a B bps link with R second RTT needs a window of at least B * R / 8 bytes.
  • “How does TCP differ from textbook Selective Repeat?” — cumulative ACK + SACK option for selectivity, single retransmission timer per connection (RTO) rather than per packet, fast retransmit on three duplicate ACKs, congestion window layered on top.
  • “How does Kafka’s idempotent producer relate?” — it’s a sliding-window protocol with per-partition sequence numbers, cumulative offsets, and duplicate detection — Go-Back-N-ish at the application layer.

Asked across backend, distributed-systems, and SRE loops. A staple of grad-level networking and most interview prep.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.