Managing Retries
Exponential backoff, jitter, retry budgets, the retry-storm that takes down a recovering service. Idempotency is mandatory.
What it is#
Retries are the client-side discipline of reissuing a failed request in the hope that the failure was transient. Done right, retries hide the dropped packets, brief congestion spikes, and momentary partial failures that every distributed system has — the request that would have failed at 200ms succeeds at 800ms after one retry, and the user never sees the seam.
Done wrong, retries are the mechanism by which a recovering service gets killed for a second time. A service that goes down for 30 seconds and starts coming back up faces every client’s accumulated retry queue all at once — a self-inflicted denial of service that prevents recovery. The 2015 AWS DynamoDB outage and the 2020 Cloudflare Workers KV incident both included a “retry-storm extended the outage by 20+ minutes” finding.
The vocabulary that separates safe retries from dangerous ones has four parts: exponential backoff (each retry waits longer than the last), jitter (randomise the wait so clients don’t synchronise), retry budgets (cap the cost — max attempts, max time, max queue size), and idempotency (the prerequisite — retrying a non-idempotent operation creates duplicates).
The senior signal in any interview that touches retries: retries are not a substitute for reliability. They’re a tool for masking transient failure. If a service is failing 50% of requests, retrying won’t help — the right answer is a circuit breaker, not more retries.
When to use it#
Retries are appropriate for:
- Network-level transient failures. TCP resets, DNS hiccups, brief TLS handshake failures. These resolve quickly; a retry hides them.
- Server-side transient failures.
503 Service UnavailablewithRetry-After, occasional502 Bad Gatewayfrom a load balancer’s brief view of a restarting backend,504 Gateway Timeoutfor a stuck request. - Rate-limit responses (429). With
Retry-Afterhonored. - Idempotent operations. Reads,
PUTs,DELETEs, andPOSTs with an idempotency key. Seeidempotency-in-api-design.
Retries are inappropriate for:
- Client errors (4xx other than 429). A
400 Bad Requestwon’t become200 OKon retry; the request is malformed. A403 Forbiddenwon’t grant access on retry. A404 Not Foundwon’t find the resource on retry. Don’t retry these. - Non-idempotent operations without an idempotency key. Charging a card, sending an email, creating an order. A retry creates a duplicate.
- Persistent failures. If 5 retries don’t help, retry 6 won’t either. Move to the fail path.
How it works#
Exponential backoff#
The wait between retries doubles (or scales by some factor) on each attempt:
delay = base * factor^attemptWith base = 100ms and factor = 2:
- Attempt 0: try immediately
- Attempt 1: wait 100ms
- Attempt 2: wait 200ms
- Attempt 3: wait 400ms
- Attempt 4: wait 800ms
- Attempt 5: wait 1600ms
The rationale: a brief blip (100ms) clears on the first retry. A medium failure (1-2s) clears on the second or third. A sustained outage takes the client past the budget without hammering the server.
Jitter — the bit everyone forgets#
Without jitter, every client retries on the same schedule. If 1,000 clients all see a failure at the same instant, they all retry at exactly 100ms, then 200ms, then 400ms — a synchronised thundering herd hitting the recovering service.
The fix is jitter: randomise the wait. Three common variants:
full jitter: wait = rand(0, base * factor^attempt)equal jitter: wait = base * factor^attempt / 2 + rand(0, base * factor^attempt / 2)decorrelated jitter: wait = rand(base, last_wait * 3) # capped at maxAWS’s “Exponential Backoff And Jitter” blog post (2015) showed that full jitter is the best of the three for both client experience and server recovery time. Most libraries default to full jitter today.
Without jitter: With full jitter: | : : ----▼-----▼-----▼----- ----▼-----:▼--▼-----:--▼:-- all clients spread out retry togetherRetry budgets — the cost cap#
A budget caps how many retries the system as a whole will spend. Three forms:
- Per-call attempts cap. Maximum retry count for a single call, often 3-5. Beyond this, fail the call.
- Per-call time budget. Maximum total elapsed time including retries, often 30s. If the budget runs out mid-backoff, abandon.
- Per-target retry quota. Across all calls to the same target, cap retries as a fraction of total requests (e.g., “at most 10% of requests can be retries”). Prevents retry-storm during incidents.
The third form is what Google’s SRE Book calls a “retry budget” specifically and what gRPC implements as RetryThrottlingPolicy. The idea: when the system is healthy, retries are cheap and rare. When the system is degraded, the retry rate spikes and starts to consume the budget. When the budget is exhausted, retries are disabled until the underlying success rate recovers.
This is the discipline that prevents a recovering service from being re-killed.
The retry-storm anti-pattern#
Without budgets, retries form a positive-feedback loop during partial failures:
Backend at 80% capacity, 20% requests time out Each timeout triggers 3 retries → effective request rate = 1.0 + 0.2*3 = 1.6x Now backend is at 130% capacity → 60% timeout Each timeout triggers 3 retries → effective rate = 1.0 + 0.6*3 = 2.8x Now backend is at 230% capacity → 100% timeout Service is now in a self-sustaining outage.The cure has three parts:
- Cap attempts per call. 3-5, not unlimited.
- Honour
Retry-After. When the server says “wait 30 seconds”, wait 30 seconds. - Coordinate via a retry budget. When too high a fraction of requests are retries, stop retrying.
Idempotency is the prerequisite#
Retries change the question from “did the request succeed?” to “did the server execute the operation, and if so, exactly once?” For reads (GET), the answer is obvious — reading twice is the same as reading once. For non-idempotent writes (POST to create), retrying a request that already succeeded creates a duplicate.
The two solutions, in order of preference:
- Use idempotent verbs where possible.
PUT /users/42is idempotent by definition;DELETE /orders/42is idempotent. Both can be retried freely. - Use idempotency keys for non-idempotent operations. A client-generated
Idempotency-Keyheader lets the server deduplicate retries. Seeidempotency-in-api-design.
A retry policy without one of these is a duplicate-write factory waiting for the first network hiccup.
Which responses to retry#
A canonical decision table:
| Outcome | Retry? | Notes |
|---|---|---|
| Connection refused | yes | Service not up; transient if restarting |
| DNS lookup failure | yes | Transient |
| TCP reset | yes | Brief network issue |
| TLS handshake failure | yes | Often transient (load balancer warming up) |
| Request timeout (client side) | conditionally | Only if idempotent — request may have succeeded |
| 503 Service Unavailable | yes | Honor Retry-After |
| 502 Bad Gateway | yes | Load balancer can’t reach a backend; brief |
| 504 Gateway Timeout | yes | Upstream slow; backoff and retry |
| 429 Too Many Requests | yes | Honor Retry-After |
| 500 Internal Server Error | conditionally | Idempotent only; usually transient |
| 4xx (other than 429) | no | Client bug; won’t fix on retry |
| Successful response | no |
The “conditionally” rows are where idempotency does its work. A 500 on a POST /charges without an idempotency key is dangerous to retry; a 500 on the same call with an idempotency key is safe.
Three-language example#
A retry wrapper with exponential backoff, full jitter, and Retry-After handling:
import timeimport randomimport requests
RETRYABLE_STATUS = {429, 500, 502, 503, 504}MAX_ATTEMPTS = 5BASE_DELAY = 0.1MAX_DELAY = 30.0
def call_with_retry(method, url, idempotency_key=None, **kwargs): headers = kwargs.pop("headers", {}) or {} if idempotency_key: headers["Idempotency-Key"] = idempotency_key
for attempt in range(MAX_ATTEMPTS): try: resp = requests.request( method, url, headers=headers, timeout=10, **kwargs, ) except (requests.ConnectionError, requests.Timeout): if attempt == MAX_ATTEMPTS - 1: raise _sleep(attempt, retry_after=None) continue
if resp.status_code not in RETRYABLE_STATUS: return resp
retry_after = resp.headers.get("Retry-After") if attempt == MAX_ATTEMPTS - 1: return resp _sleep(attempt, retry_after)
def _sleep(attempt, retry_after): if retry_after is not None: wait = float(retry_after) else: # Full jitter exponential backoff ceiling = min(BASE_DELAY * (2 ** attempt), MAX_DELAY) wait = random.uniform(0, ceiling) time.sleep(wait)package main
import ( "context" "math" "math/rand" "net/http" "strconv" "time")
var retryable = map[int]bool{429: true, 500: true, 502: true, 503: true, 504: true}
const ( maxAttempts = 5 baseDelay = 100 * time.Millisecond maxDelay = 30 * time.Second)
func callWithRetry(ctx context.Context, req *http.Request, idempotencyKey string) (*http.Response, error) { if idempotencyKey != "" { req.Header.Set("Idempotency-Key", idempotencyKey) } client := &http.Client{Timeout: 10 * time.Second}
var lastResp *http.Response for attempt := 0; attempt < maxAttempts; attempt++ { resp, err := client.Do(req.Clone(ctx)) if err != nil { if attempt == maxAttempts-1 { return nil, err } sleep(attempt, "") continue } if !retryable[resp.StatusCode] { return resp, nil } lastResp = resp if attempt == maxAttempts-1 { return resp, nil } retryAfter := resp.Header.Get("Retry-After") resp.Body.Close() sleep(attempt, retryAfter) } return lastResp, nil}
func sleep(attempt int, retryAfter string) { if retryAfter != "" { if secs, err := strconv.Atoi(retryAfter); err == nil { time.Sleep(time.Duration(secs) * time.Second) return } } ceiling := time.Duration(math.Min( float64(baseDelay)*math.Pow(2, float64(attempt)), float64(maxDelay), )) time.Sleep(time.Duration(rand.Int63n(int64(ceiling))))}const RETRYABLE = new Set([429, 500, 502, 503, 504]);const MAX_ATTEMPTS = 5;const BASE_DELAY_MS = 100;const MAX_DELAY_MS = 30_000;
async function callWithRetry(url, options = {}, idempotencyKey) { const headers = { ...(options.headers || {}) }; if (idempotencyKey) headers["Idempotency-Key"] = idempotencyKey;
let lastResp; for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) { let resp; try { resp = await fetch(url, { ...options, headers, signal: AbortSignal.timeout(10_000), }); } catch (err) { if (attempt === MAX_ATTEMPTS - 1) throw err; await sleep(attempt, null); continue; }
if (!RETRYABLE.has(resp.status)) return resp; lastResp = resp; if (attempt === MAX_ATTEMPTS - 1) return resp; await sleep(attempt, resp.headers.get("retry-after")); } return lastResp;}
function sleep(attempt, retryAfter) { let ms; if (retryAfter) { ms = Number(retryAfter) * 1000; } else { // Full jitter const ceiling = Math.min(BASE_DELAY_MS * 2 ** attempt, MAX_DELAY_MS); ms = Math.random() * ceiling; } return new Promise((r) => setTimeout(r, ms));}Three details across the implementations:
- The
Retry-Afterheader takes priority over computed backoff. The server knows when it’s ready; trust it. - The retry list is restrictive — only specific 5xx codes and 429. Don’t retry 4xx (other than 429); they won’t change.
- The idempotency key is sent on every retry, identical. That’s how the server dedupes.
Where retries should live — and where they shouldn’t#
A common mistake: retries at every layer. If the SDK retries 5 times, and the proxy retries 5 times, and the gateway retries 5 times, a single user click becomes 125 backend requests under failure.
The right pattern: retry at one layer, ideally the highest layer that has full context. Often that’s the SDK or the application code. Lower layers (gateway, proxy) should not retry unless they’re the only one — and if so, they should be aware of being the only one.
Some teams adopt the convention: client SDK retries; service-to-service calls do not retry (the upstream caller is expected to retry instead). This prevents amplification through the service graph.
Variants#
| Variant | Mechanism | When it fits |
|---|---|---|
| Fixed delay | Constant wait between retries | Rare; only when load on the dependency is irrelevant |
| Linear backoff | delay = base * attempt | Niche; usually pick exponential instead |
| Exponential backoff (no jitter) | delay = base * 2^attempt | Single-client tools; unsafe for many clients |
| Exponential + full jitter | wait = rand(0, base * 2^attempt) | The default for production |
| Decorrelated jitter | wait = rand(base, prev * 3) | AWS SDK default; smoother under load |
| Adaptive (e.g., gRPC retry throttle) | Disable retries when ratio of retries to requests exceeds a threshold | Service meshes, retry budgets |
Trade-offs#
What good retry policies give you:
- Masked transient failures. The 1% of requests that hit a TCP reset never surface to users.
- Higher effective success rate. A 99% backend with retries looks like 99.99% to clients.
- Operational margin. Brief incidents (a pod restart, a brief network blip) don’t cause user-visible failure.
What good retry policies cost you:
- Higher peak load. Retries on a degraded backend amplify load exactly when you can least afford it.
- Latency variance. A retried request takes 100ms + backoff + 100ms = noticeably slower.
- Duplicate writes if idempotency isn’t perfect.
- Operational subtlety. Retries are easy to write and hard to tune. The default settings are wrong for most workloads.
Common pitfalls#
- Retrying without backoff. A tight retry loop is a denial-of-service attack against your own dependency.
- Retrying without jitter. Synchronised retries from many clients form a thundering herd.
- Retrying non-idempotent operations without an idempotency key. Each retry under timeout creates a duplicate.
- Retrying 4xx errors. A 400 won’t become 200; you’re spending budget on a bug.
- Ignoring
Retry-After. The server’s hint is more accurate than your computed backoff. Use it. - Unbounded retries. “Retry until success” is a great way to hang a thread forever during an outage.
- Retries at every layer. SDK retries × gateway retries × proxy retries = 125x amplification. Pick one layer.
- No retry budget. A retry-storm during a partial outage extends the outage.
- Not distinguishing connection errors from response errors. A connection refused is safe to retry; a
500on a non-idempotentPOSTis not.
Related building blocks#
- The Role of Idempotency in API Design — the prerequisite; retries are unsafe without idempotency.
- The Circuit Breaker Pattern — the complement; retries handle transient failures, circuit breakers handle sustained ones.
- Rate Limiting — the server-side counterpart to client-side retries; 429 + Retry-After is the contract.
- API Monitoring — retry rate, retry budget consumption, and
Retry-Afterheader counts are core metrics to dashboard. - What Causes API Failures — A Taxonomy — retry storms are a named failure mode; this is where to study them.