Ticketmaster (Flash Sale)
Inventory reservation under coordinated burst load: waiting rooms, holds, atomic seat allocation, and the bot-vs-fan arms race.
Step 1 — Clarify Requirements#
Functional
- A user browses upcoming events; picks an event; sees a seat map; selects seats; checks out within a hold window; pays; receives a ticket.
- A seat can be sold exactly once. Ever. Double-booking is the worst possible failure.
- An on-sale moment (“Taylor Swift Eras Tour, sale at 10:00 ET sharp”) collapses a year of event browsing into ~10 minutes of stampede.
- Waiting room: between “queue opens” and “you get a turn”, show me my position and an honest ETA.
- Hold: once I have seats in my cart, they’re mine for 8 minutes; if I don’t pay, they return to the pool.
- Out of scope: dynamic pricing, secondary-market resale, accessibility-seat handling rules, event-day check-in scanning.
Non-functional
- 99.99% availability for the browse path year-round.
- 99.9% acceptable for the on-sale path during the spike — degradation is OK as long as no double-bookings.
- p99 “click → hold confirmed” under 2 s during peak. The user is staring at a spinner; longer than that and the cart abandonment shape gets worse.
- Single biggest sale we plan for: 10 M users in the waiting room, 50 K seats available. The mismatch is the whole problem.
- Strong consistency on seat ownership. Weak consistency on every other axis (waiting-room position, displayed inventory) is fine.
Step 2 — Capacity Estimation#
A “normal” day:
- ~50 K events listed, ~1 M tickets purchased/day → ~12 purchases/sec average, peaks of a few hundred per second on a Friday afternoon sale.
A flash sale (the design driver):
- Waiting room population: 10 M users in queue for a 50 K-seat tour leg.
- Arrival rate: those 10 M show up within ~5 minutes (waiting-room “doors open” + a coordinated email blast + push notification). ~33 K request/sec sustained for the first 5 minutes, with sharp peaks at the second of doors-opening.
- Active hold rate: at most 50 K simultaneous holds (one per seat). The remaining 9.95 M users are in queue, not actively reserving.
- Reservation attempts per second (the actual contention): each hold cycle is roughly 5 minutes (3 minute payment + abandonment + re-queue). Through a sale’s first hour we cycle each seat ~10–12 times before it sticks. That’s ~50 K × 10 / 3600 ≈ 140 reservation attempts/sec, but each attempt touches the same hot inventory.
- Storage: a 50 K-seat venue has ~50 K seat rows. Even a year of tours is under 100 M rows in the inventory store. Tiny.
- Payment QPS: bounded by seat capacity, not waiting-room size. ~10–20 payments/sec sustained, well within any payment gateway.
The whole system is shaped by one number: arrival rate ÷ throughput capacity = queue depth. Everything we build is to make that ratio not blow up the actual seat-allocation engine.
Step 3 — System Interface#
GET /events/:id Returns: event metadata (name, venue, dates, sections)
GET /events/:id/seatmap Returns: section layouts, availability summary (count, not per-seat — too hot)
POST /queue/join Body: { event_id, user_id } Returns: { queue_token, position_estimate, eta_seconds }
GET /queue/status?token=... Returns: { position, eta_seconds, ready: false } On readiness: { ready: true, session_token (expires in 8 min) }
POST /reserve Header: session_token Body: { event_id, seat_ids: [...] } Returns: { hold_id, expires_at } or { conflict: [seat_id, ...] }
POST /cart/:hold_id/checkout Body: { payment_token, ... } Returns: { order_id } or { hold_expired: true }
DELETE /cart/:hold_id (explicit release)The waiting room is a strict gate: without session_token, /reserve returns 403. The session token is a short-lived JWT signed by the queue service.
Step 4 — High-Level Design#
[10M users] │ ▼ CDN edge (rate-limit, bot scoring) ──→ static seatmap, event metadata │ ▼ waiting room service ──→ admission counter (Redis, sliding window) │ │ │ ▼ │ session token issued ▼ application tier (stateless) ──→ reservation service ──→ inventory shard (per event) │ │ ├──→ hold store (Redis) └─ durable inventory DB │ (TTL-based) ▼ payment service ──→ external PSP │ ▼ order service ──→ ticket issuance, email, wallet passThe critical isolation: each event gets its own inventory shard. A Taylor Swift sale doesn’t share infrastructure with a small-club show. Hot events are pinned to dedicated capacity; cold events share a pool. This is the only way to bound blast radius — a single hot event can fully consume its shard without affecting the rest of the catalog.
Step 5 — Data Model#
Events (read-heavy, mostly static):
table events event_id uuid PK artist string venue_id uuid starts_at timestamp on_sale_at timestamp status enum(upcoming, on_sale, sold_out, ended) seat_map_url string // static asset on CDNInventory (sharded by event_id, partitioned within event by section):
table seats event_id uuid PK seat_id string CK // e.g. "A-12-15" section string row string number int price_tier string status enum(available, held, sold) hold_id uuid? held_until timestamp? sold_to uuid? sold_at timestamp?A single row per seat. The status transition is the consistency-critical operation. The held_until field is what enables TTL-based hold expiry.
Holds (Redis primary, durable log secondary):
hold:{hold_id} → { user_id, event_id, seat_ids, expires_at } TTL 8 minuser_active_hold:{user_id} → hold_id (enforce one hold per user)Waiting room (Redis sorted set per event):
queue:{event_id} → ZSET { user_id : enqueue_timestamp }queue:{event_id}:counter → monotonic admission counterOrders (durable, append-only):
table orders order_id uuid PK user_id uuid event_id uuid seat_ids list total money payment_ref string status enum(pending, paid, failed, refunded) created_at timestampStep 6 — Detailed Design#
The waiting room#
When on_sale_at - 30 min arrives, the system opens the queue. Each user submits to /queue/join; the service appends to a sorted set with their request timestamp. They get a queue_token (an opaque, signed identifier of their position).
The waiting room admits users to the active shopping path at a controlled rate. The admission rate is not a constant; it’s a function of current cart-flow throughput:
admission_rate = (seat_availability_estimate / avg_session_minutes) × safety_factorFor a 50 K-seat sale with 5-minute average sessions, baseline admission is ~10 K users every 5 minutes = ~33 admissions/sec. The safety factor (~0.7) keeps us from over-admitting and creating frustration when users arrive at an already-empty seatmap.
Position display is intentionally approximate. We never tell a user “you are #4 837 261 in line” — that’s information warfare. We tell them “your wait is approximately 12 minutes” with a coarse-grained bucket, and we always under-promise a bit. The exact position is derivable from the sorted set but never surfaced raw.
Atomic seat reservation#
The reservation service receives POST /reserve { seat_ids: ['A-12-15', 'A-12-16'] } with a valid session token. It must atomically transition each seat from available to held or return a conflict listing the seats that were already gone.
Two-phase implementation:
- Optimistic phase (Redis): a Lua script that checks
seat:{event}:{seat_id}:status, sets toheld, setsheld_until = now + 480 s, setshold_id. The script runs atomically across all seats requested; if any fails, none change. - Durable phase (async): the hold is journaled to the inventory DB within ~200 ms. If the durable write fails, the Redis state is rolled back via a compensating Lua script.
Why Redis first: the Lua script runs in microseconds on a single instance and handles thousands of attempts/sec on one core. The DB write is durability insurance, not the critical path.
KEYS = seat:{event}:A-12-15, seat:{event}:A-12-16ARGV = hold_id, user_id, ttl_seconds
local results = {}for i, k in ipairs(KEYS) do if redis.call('HGET', k, 'status') ~= 'available' then -- roll back any seats already held in this attempt for j = 1, i-1 do redis.call('HSET', KEYS[j], 'status', 'available') redis.call('HDEL', KEYS[j], 'hold_id', 'held_until') end return { 'conflict', k } end redis.call('HSET', k, 'status', 'held', 'hold_id', ARGV[1], 'held_until', ARGV[3]+now)endreturn 'ok'The transition available → held is single-writer on the Redis primary; the script holds the seats while it works, and Redis is single-threaded per shard so there is no concurrent observer. A double-allocation can’t happen.
Inventory sharding#
A 50 K-seat venue with 33 K rps arriving simultaneously is too hot for a single Redis instance. We shard the inventory inside the event by section:
shard 1: sections A-G (10 K seats)shard 2: sections H-N (10 K seats)shard 3: sections O-T (10 K seats)shard 4: sections U-Z (10 K seats)shard 5: sections AA-GG (10 K seats)Each shard handles ~7 K rps independently. Cross-section reservations (rare — most users pick within a section) require a two-shard transaction; we use a saga pattern with explicit rollback rather than 2PC.
Hold expiry#
A hold has a held_until field set at reservation time. Three mechanisms enforce it:
- Redis TTL on the
hold:{hold_id}key — automatically gone after 8 min. - Scheduled scanner every 30 s checks for
status='held' AND held_until < nowin the durable inventory and resets toavailable(covers any Redis evictions). - Read-time validation: any read of a held seat that finds
held_until < nowlazy-evicts the hold.
The reconciliation job between Redis and durable storage runs every minute to repair any drift.
Checkout#
Once the user submits payment, the hold transitions to sold. This must be irreversible from the user’s perspective: a payment confirmation followed by “actually, your seats are gone” is the worst UX outcome.
POST /cart/:hold_id/checkout load hold from Redis (must exist, not expired) call payment service (synchronous, idempotent on hold_id) on payment.success: Lua: for each seat in hold, transition held→sold (idempotent on hold_id) durable write: insert order row, update seats emit ticket-issuance event return order_id on payment.failure: keep hold alive (let user retry); or release on user actionThe payment call is the longest piece of the checkout path (~1 s). It’s idempotent on hold_id so client retries don’t double-charge. If the payment succeeds but our subsequent durable write fails (rare but possible), we have a refund-safety job that reconciles payment-events against order-events nightly.
Bot defenses#
This is the arms race that makes Ticketmaster Ticketmaster.
- CDN-level rate limiting keyed on IP and on browser-fingerprint. The first wave of bots dies here; sophisticated ones use residential-IP rotation and pass.
- Proof-of-work or invisible CAPTCHA at queue-join time. Adds 1–3 s of CPU work the bot must spend per request. Real users barely notice; bots burn meaningful compute.
- Behavior scoring: mouse-movement entropy, time-on-page before clicking, prior-event history. Low-score sessions face heavier challenges (visible CAPTCHA, longer admission delay).
- Account aging: an account created 30 minutes ago that joins 15 different sale queues has a low score by definition.
- Per-account purchase limits: 4 tickets per account per event, with backend dedup across payment cards / shipping addresses to catch obvious sybils.
None of these stop a determined adversary; all of them raise the cost enough that the bot economics get marginal. The honest framing is: we make scalping less profitable, not impossible.
Latency budget#
queue admission → /reserve: sub-100 ms (cached session token)/reserve roundtrip: edge / TLS: 30 ms application tier hop: 5 ms reservation service → Redis Lua: 2 ms durable journal (async): not on critical path response back: 30 ms total: ~70–120 ms p99Far inside the 2 s budget; the spare time is what swallows network jitter on slow mobile connections.
Step 7 — Evaluation & Trade-offs#
Bottleneck #1: the hot inventory shard. A single section is a contention hotspot when its first row is the most-coveted real estate. Even with intra-event sharding, “section A row 1” is a single key. Mitigations: lazy availability display (don’t tell users a specific row is available; tell them “front section available”) so 100 K users don’t all hammer the same key on the same millisecond. The reservation service serializes contention but throughput is bounded by Redis single-thread perf (~100 K ops/sec per primary, headroom from there with multiple shards).
Bottleneck #2: payment-gateway throughput. External PSPs (Stripe, Adyen, regional processors) have per-merchant rate limits in the low thousands per second. A sold-out 50 K-seat sale completes in ~30 min, so ~30 payments/sec — comfortable. A truly extreme event (multi-venue tour going on sale globally) can exceed PSP limits; we negotiate burst quotas in advance and have a payment-queue fallback that delays charging while keeping the hold alive.
Bottleneck #3: waiting-room state size. 10 M ZSET entries per event in Redis is fine; 50 simultaneous mega-events with 10 M each is 500 M entries. We allocate dedicated queue-cluster capacity per major event rather than mixing.
Bottleneck #4: bot adversaries. Not a technical bottleneck so much as an economic one. Every defensive measure (CAPTCHA, IP scoring, account-aging) trades user friction for bot resistance. The honest evaluation: we will never win the arms race outright. The goal is making the bot economics worse than ticket prices, so any individual scalper’s margin is thin. Verified-fan presales (allowlists of known accounts) are the only mechanism that genuinely changes the shape, at the cost of audit complexity.
Alternative I’d push back on: real-time per-seat availability rendered on the seatmap during a flash sale. Customers ask for it; UX designers prototype it. At 10 M concurrent viewers on the same seatmap, the broadcast cost (every seat-status change pushed to every viewer) is unbounded. The hybrid we use — show aggregate availability (“12 left in Section A”) and reveal individual seats only after the user enters the active selection flow — is the right trade. It frustrates a small fraction of power-users; the majority get a faster, more reliable experience.
What breaks first at 10× scale (a 500 K-seat global tour going on sale in one moment): the waiting-room admission service. The current per-event sorted set hits memory and write-throughput limits beyond ~50 M entries. The fix is to shard the queue itself by region and run a distributed admission protocol — but that complicates the fairness story (does a US user have any chance against an EU user with the same enqueue time?). The product-policy answer probably becomes “regional allocations” rather than “global free-for-all.”
Companies this resembles#
Ticketmaster (the canonical), AXS, See Tickets, DICE, Eventbrite (lighter end of the same shape), and the queue technology overlaps heavily with high-demand product launches (Supreme, Nintendo Switch restocks, GPU drops) and IPO retail subscription windows.
Related systems#
- Rate Limiter — the front-line defense layer for the queue-join endpoint.
- Distributed Cache — Redis is the linchpin of both the holds plane and the waiting room.
- Payment System — checkout depends on this design directly; idempotency and reconciliation are co-designed.