Managing Retries

Exponential backoff, jitter, retry budgets, the retry-storm that takes down a recovering service. Idempotency is mandatory.

Building Block Intermediate
12 min read
retries backoff jitter resilience idempotency

What it is#

Retries are the client-side discipline of reissuing a failed request in the hope that the failure was transient. Done right, retries hide the dropped packets, brief congestion spikes, and momentary partial failures that every distributed system has — the request that would have failed at 200ms succeeds at 800ms after one retry, and the user never sees the seam.

Done wrong, retries are the mechanism by which a recovering service gets killed for a second time. A service that goes down for 30 seconds and starts coming back up faces every client’s accumulated retry queue all at once — a self-inflicted denial of service that prevents recovery. The 2015 AWS DynamoDB outage and the 2020 Cloudflare Workers KV incident both included a “retry-storm extended the outage by 20+ minutes” finding.

The vocabulary that separates safe retries from dangerous ones has four parts: exponential backoff (each retry waits longer than the last), jitter (randomise the wait so clients don’t synchronise), retry budgets (cap the cost — max attempts, max time, max queue size), and idempotency (the prerequisite — retrying a non-idempotent operation creates duplicates).

The senior signal in any interview that touches retries: retries are not a substitute for reliability. They’re a tool for masking transient failure. If a service is failing 50% of requests, retrying won’t help — the right answer is a circuit breaker, not more retries.

When to use it#

Retries are appropriate for:

  • Network-level transient failures. TCP resets, DNS hiccups, brief TLS handshake failures. These resolve quickly; a retry hides them.
  • Server-side transient failures. 503 Service Unavailable with Retry-After, occasional 502 Bad Gateway from a load balancer’s brief view of a restarting backend, 504 Gateway Timeout for a stuck request.
  • Rate-limit responses (429). With Retry-After honored.
  • Idempotent operations. Reads, PUTs, DELETEs, and POSTs with an idempotency key. See idempotency-in-api-design.

Retries are inappropriate for:

  • Client errors (4xx other than 429). A 400 Bad Request won’t become 200 OK on retry; the request is malformed. A 403 Forbidden won’t grant access on retry. A 404 Not Found won’t find the resource on retry. Don’t retry these.
  • Non-idempotent operations without an idempotency key. Charging a card, sending an email, creating an order. A retry creates a duplicate.
  • Persistent failures. If 5 retries don’t help, retry 6 won’t either. Move to the fail path.

How it works#

Exponential backoff#

The wait between retries doubles (or scales by some factor) on each attempt:

delay = base * factor^attempt

With base = 100ms and factor = 2:

  • Attempt 0: try immediately
  • Attempt 1: wait 100ms
  • Attempt 2: wait 200ms
  • Attempt 3: wait 400ms
  • Attempt 4: wait 800ms
  • Attempt 5: wait 1600ms

The rationale: a brief blip (100ms) clears on the first retry. A medium failure (1-2s) clears on the second or third. A sustained outage takes the client past the budget without hammering the server.

Jitter — the bit everyone forgets#

Without jitter, every client retries on the same schedule. If 1,000 clients all see a failure at the same instant, they all retry at exactly 100ms, then 200ms, then 400ms — a synchronised thundering herd hitting the recovering service.

The fix is jitter: randomise the wait. Three common variants:

full jitter: wait = rand(0, base * factor^attempt)
equal jitter: wait = base * factor^attempt / 2 + rand(0, base * factor^attempt / 2)
decorrelated jitter: wait = rand(base, last_wait * 3) # capped at max

AWS’s “Exponential Backoff And Jitter” blog post (2015) showed that full jitter is the best of the three for both client experience and server recovery time. Most libraries default to full jitter today.

Without jitter: With full jitter:
| : :
----▼-----▼-----▼----- ----▼-----:▼--▼-----:--▼:--
all clients spread out
retry together

Retry budgets — the cost cap#

A budget caps how many retries the system as a whole will spend. Three forms:

  • Per-call attempts cap. Maximum retry count for a single call, often 3-5. Beyond this, fail the call.
  • Per-call time budget. Maximum total elapsed time including retries, often 30s. If the budget runs out mid-backoff, abandon.
  • Per-target retry quota. Across all calls to the same target, cap retries as a fraction of total requests (e.g., “at most 10% of requests can be retries”). Prevents retry-storm during incidents.

The third form is what Google’s SRE Book calls a “retry budget” specifically and what gRPC implements as RetryThrottlingPolicy. The idea: when the system is healthy, retries are cheap and rare. When the system is degraded, the retry rate spikes and starts to consume the budget. When the budget is exhausted, retries are disabled until the underlying success rate recovers.

This is the discipline that prevents a recovering service from being re-killed.

The retry-storm anti-pattern#

Without budgets, retries form a positive-feedback loop during partial failures:

Backend at 80% capacity, 20% requests time out
Each timeout triggers 3 retries → effective request rate = 1.0 + 0.2*3 = 1.6x
Now backend is at 130% capacity → 60% timeout
Each timeout triggers 3 retries → effective rate = 1.0 + 0.6*3 = 2.8x
Now backend is at 230% capacity → 100% timeout
Service is now in a self-sustaining outage.

The cure has three parts:

  • Cap attempts per call. 3-5, not unlimited.
  • Honour Retry-After. When the server says “wait 30 seconds”, wait 30 seconds.
  • Coordinate via a retry budget. When too high a fraction of requests are retries, stop retrying.

Idempotency is the prerequisite#

Retries change the question from “did the request succeed?” to “did the server execute the operation, and if so, exactly once?” For reads (GET), the answer is obvious — reading twice is the same as reading once. For non-idempotent writes (POST to create), retrying a request that already succeeded creates a duplicate.

The two solutions, in order of preference:

  • Use idempotent verbs where possible. PUT /users/42 is idempotent by definition; DELETE /orders/42 is idempotent. Both can be retried freely.
  • Use idempotency keys for non-idempotent operations. A client-generated Idempotency-Key header lets the server deduplicate retries. See idempotency-in-api-design.

A retry policy without one of these is a duplicate-write factory waiting for the first network hiccup.

Which responses to retry#

A canonical decision table:

OutcomeRetry?Notes
Connection refusedyesService not up; transient if restarting
DNS lookup failureyesTransient
TCP resetyesBrief network issue
TLS handshake failureyesOften transient (load balancer warming up)
Request timeout (client side)conditionallyOnly if idempotent — request may have succeeded
503 Service UnavailableyesHonor Retry-After
502 Bad GatewayyesLoad balancer can’t reach a backend; brief
504 Gateway TimeoutyesUpstream slow; backoff and retry
429 Too Many RequestsyesHonor Retry-After
500 Internal Server ErrorconditionallyIdempotent only; usually transient
4xx (other than 429)noClient bug; won’t fix on retry
Successful responseno

The “conditionally” rows are where idempotency does its work. A 500 on a POST /charges without an idempotency key is dangerous to retry; a 500 on the same call with an idempotency key is safe.

Three-language example#

A retry wrapper with exponential backoff, full jitter, and Retry-After handling:

Retry wrapper with backoff + jitter — Python
import time
import random
import requests
RETRYABLE_STATUS = {429, 500, 502, 503, 504}
MAX_ATTEMPTS = 5
BASE_DELAY = 0.1
MAX_DELAY = 30.0
def call_with_retry(method, url, idempotency_key=None, **kwargs):
headers = kwargs.pop("headers", {}) or {}
if idempotency_key:
headers["Idempotency-Key"] = idempotency_key
for attempt in range(MAX_ATTEMPTS):
try:
resp = requests.request(
method, url, headers=headers, timeout=10, **kwargs,
)
except (requests.ConnectionError, requests.Timeout):
if attempt == MAX_ATTEMPTS - 1:
raise
_sleep(attempt, retry_after=None)
continue
if resp.status_code not in RETRYABLE_STATUS:
return resp
retry_after = resp.headers.get("Retry-After")
if attempt == MAX_ATTEMPTS - 1:
return resp
_sleep(attempt, retry_after)
def _sleep(attempt, retry_after):
if retry_after is not None:
wait = float(retry_after)
else:
# Full jitter exponential backoff
ceiling = min(BASE_DELAY * (2 ** attempt), MAX_DELAY)
wait = random.uniform(0, ceiling)
time.sleep(wait)

Three details across the implementations:

  • The Retry-After header takes priority over computed backoff. The server knows when it’s ready; trust it.
  • The retry list is restrictive — only specific 5xx codes and 429. Don’t retry 4xx (other than 429); they won’t change.
  • The idempotency key is sent on every retry, identical. That’s how the server dedupes.

Where retries should live — and where they shouldn’t#

A common mistake: retries at every layer. If the SDK retries 5 times, and the proxy retries 5 times, and the gateway retries 5 times, a single user click becomes 125 backend requests under failure.

The right pattern: retry at one layer, ideally the highest layer that has full context. Often that’s the SDK or the application code. Lower layers (gateway, proxy) should not retry unless they’re the only one — and if so, they should be aware of being the only one.

Some teams adopt the convention: client SDK retries; service-to-service calls do not retry (the upstream caller is expected to retry instead). This prevents amplification through the service graph.

Variants#

VariantMechanismWhen it fits
Fixed delayConstant wait between retriesRare; only when load on the dependency is irrelevant
Linear backoffdelay = base * attemptNiche; usually pick exponential instead
Exponential backoff (no jitter)delay = base * 2^attemptSingle-client tools; unsafe for many clients
Exponential + full jitterwait = rand(0, base * 2^attempt)The default for production
Decorrelated jitterwait = rand(base, prev * 3)AWS SDK default; smoother under load
Adaptive (e.g., gRPC retry throttle)Disable retries when ratio of retries to requests exceeds a thresholdService meshes, retry budgets

Trade-offs#

What good retry policies give you:

  • Masked transient failures. The 1% of requests that hit a TCP reset never surface to users.
  • Higher effective success rate. A 99% backend with retries looks like 99.99% to clients.
  • Operational margin. Brief incidents (a pod restart, a brief network blip) don’t cause user-visible failure.

What good retry policies cost you:

  • Higher peak load. Retries on a degraded backend amplify load exactly when you can least afford it.
  • Latency variance. A retried request takes 100ms + backoff + 100ms = noticeably slower.
  • Duplicate writes if idempotency isn’t perfect.
  • Operational subtlety. Retries are easy to write and hard to tune. The default settings are wrong for most workloads.

Common pitfalls#

  • Retrying without backoff. A tight retry loop is a denial-of-service attack against your own dependency.
  • Retrying without jitter. Synchronised retries from many clients form a thundering herd.
  • Retrying non-idempotent operations without an idempotency key. Each retry under timeout creates a duplicate.
  • Retrying 4xx errors. A 400 won’t become 200; you’re spending budget on a bug.
  • Ignoring Retry-After. The server’s hint is more accurate than your computed backoff. Use it.
  • Unbounded retries. “Retry until success” is a great way to hang a thread forever during an outage.
  • Retries at every layer. SDK retries × gateway retries × proxy retries = 125x amplification. Pick one layer.
  • No retry budget. A retry-storm during a partial outage extends the outage.
  • Not distinguishing connection errors from response errors. A connection refused is safe to retry; a 500 on a non-idempotent POST is not.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.