Uber
Driver / rider matching, real-time location, surge pricing, payments, fraud detection.
Step 1 — Clarify Requirements#
Functional
- A rider requests a ride from point A to point B → system finds and dispatches a nearby driver.
- Drivers stream their location to the server; riders see nearby drivers on a live map.
- Trip lifecycle: request → match → en route to pickup → in trip → drop-off → payment.
- Surge pricing when demand outpaces supply in a region.
- Fare estimation before request.
- Out of scope: ratings & reviews, driver onboarding, the routing engine itself (see
/system-design/google-maps), pool / shared rides.
Non-functional
- 99.99% availability for the rider request path.
- p99 match latency under 3 seconds (request to driver assigned).
- 25 M trips/day globally; ~5 M active drivers; ~100 M monthly riders.
- Strong consistency for payments and trip state; eventual consistency acceptable for driver location and surge.
Step 2 — Capacity Estimation#
- Active drivers: 5 M streaming location every 4 seconds → 1.25 M location writes/sec.
- Active riders polling map: ~1 M concurrent at peak → 200 K read QPS for nearby-driver queries.
- Trip requests: 25 M/day ≈ 300 trips/sec average, 2 K/sec peak.
- Storage (location pings): 1.25 M writes/sec × 86 400 × 60 B ≈ 6 TB/day raw. Retain only the latest per driver for matching; archive trip routes only.
- Storage (trips): 25 M × ~10 KB = 250 GB/day, 90 TB/year of trip records.
- Surge calc: city-grid cells × 30-second windows; aggregation runs continuously over the location stream.
The whole problem is a geo-indexed lookup under continuous updates — a write-heavy spatial index for drivers, a read-heavy nearest-neighbor query for riders.
Step 3 — System Interface#
// DriverPOST /driver/location { driver_id, lat, lng, heading, ts }POST /driver/online { driver_id }POST /driver/accept { trip_id }
// RiderPOST /rider/estimate { origin, destination } → { fare, eta }POST /rider/request { origin, destination, payment_method } → { trip_id }GET /rider/nearby ?lat&lng → [ {driver_id, lat, lng}, ... ]GET /trip/:id (state + driver location)
// Lifecycle (server-emitted via WebSocket to both rider and driver)TRIP_MATCHED, DRIVER_ARRIVING, TRIP_STARTED, TRIP_ENDED, PAYMENT_SETTLEDTrip requests are idempotent on (rider_id, request_id) — if the rider’s phone retries on a flaky network, the server must not create two trips.
Step 4 — High-Level Design#
driver app rider app │ │ ▼ ▼ location ingest ──→ Kafka ──→ geo index (Redis Geo + city shard) │ ▲ │ │ │ ▼ │ │ nearby-drivers service │ │ │ ▼ │ ▼ trip state ◀── trip orchestrator ──┴── matching service ◀── request service │ ▲ ▼ │ payments + ledger surge engine (city-grid demand vs supply)The hot loop on request is: rider → matching service → query geo index for K nearest drivers → score / pick one → dispatch via push notification → driver accepts → trip orchestrator binds state.
Step 5 — Data Model#
Geo index (Redis with GEOADD, sharded by city / H3 hex):
key: drivers:city:{city_id} → GEOSET (lng, lat, driver_id)GEORADIUS drivers:city:42 BYRADIUS lng lat 3 km COUNT 20Sharding by city (or an H3 hex of resolution 6, ~1 km diameter) keeps each shard’s working set small and reads cheap. Cross-city queries are rare (no rider in Manhattan needs to see Brooklyn drivers).
Trips (relational, partitioned by trip_id):
table trips trip_id uuid PK rider_id uuid driver_id uuid null until matched origin { lat, lng, addr } destination { lat, lng, addr } state enum(requested, matched, en_route_pickup, in_trip, completed, cancelled) fare money null until completed surge_factor decimal requested_at timestamp ...Driver state (Redis, hot):
key: driver:{id}:state → { online, on_trip, current_trip_id, last_ping }Trip events (append-only Kafka topic, persisted to OLAP):
{ trip_id, event: 'matched', ts, driver_id, ... }Surge cells (Redis):
key: surge:{city}:{h3_cell}:{minute_bucket} → { demand: N, supply: M, factor }Step 6 — Detailed Design#
Driver location ingest#
Drivers send a location ping every 4 seconds (configurable; less frequent if stationary). Pings go to the nearest regional gateway over a long-lived connection.
ping → gateway → kafka topic 'driver-pings' (partition by driver_id) ↓ consumer updates Redis GEOSET drivers:city:{city} ↓ consumer updates driver:{id}:last_pingWe absorb 1.25 M writes/sec by discarding stale pings: if a ping arrives older than the latest seen for that driver, drop it. Kafka per-driver ordering plus consumer offset tracking gives us this.
Nearest-driver query#
A rider request lands at the matching service:
1. Look up rider's home city: city_id.2. GEORADIUS drivers:city:{city_id} BYRADIUS lat lng 3 km COUNT 20.3. Filter: online, not on_trip, vehicle_type matches.4. Score each candidate: score = w1 / eta + w2 * acceptance_rate + w3 / recent_cancellations5. Pick top 1, send dispatch push to that driver.6. Wait for accept (timeout 15s). If declined or timed out, try next.The matching algorithm is one of the most-tuned pieces of any rideshare backend. Real-world variants include batched matching (every 5 seconds, optimize over the bipartite graph of pending requests × free drivers) — see below.
Surge pricing#
A separate streaming pipeline aggregates demand and supply per (city, H3 cell, 30-second bucket):
demand[city][cell][bucket] = count of requests in windowsupply[city][cell][bucket] = count of online unmatched driverssurge_factor = clip(demand / supply, 1.0, 5.0)The surge factor is eventually consistent and is shown in the rider app on the fare estimate. Once a rider accepts at surge factor X, the factor is locked into the trip record at request time and is not revised — payments must be predictable.
Trip orchestration#
The trip state machine is implemented as a per-trip workflow (think Temporal or an in-house orchestrator):
REQUESTED → MATCHED → DRIVER_EN_ROUTE → ARRIVED → IN_TRIP → COMPLETED → PAID ↓ CANCELLED (rider or driver)State transitions are persisted; events emit on Kafka. The orchestrator is the source of truth for “what’s the current trip state” — clients subscribe and render accordingly.
Payment path#
When the trip ends:
1. Compute final fare (distance × per-km + time × per-min + surge × base).2. Idempotent ledger write: debit rider's saved payment, credit Uber + driver split.3. Async settlement with the payment processor (see /system-design/payment-system).4. If charge fails, attempt fallback method; mark trip as `payment_pending` and notify.The double-entry ledger guarantees no money is created or lost in the accounting. Real settlement (bank rails) happens hours later.
Driver / rider matching at edge cases#
- No drivers within radius: expand radius progressively (3 → 5 → 8 km); if still nothing after 30 seconds, return “no drivers available” to the rider.
- Driver loses connectivity mid-trip: trip orchestrator detects stale ping (>30 s), pings the rider for confirmation, and either continues with a partially-tracked route or marks as completed at last-known location.
- Multiple concurrent requests target the same driver: matching service uses Redis SETNX
driver:{id}:lockto claim a driver atomically. Losing requests fall back to next-best driver.
Step 7 — Evaluation & Trade-offs#
Bottleneck #1: location write throughput on the geo index. 1.25 M writes/sec into Redis is achievable only by sharding by city (or H3 hex) and accepting that all in-city writes hit one shard. Hot cities (NYC, SF, Mumbai) get sub-sharded by neighborhood-level H3 cells.
Bottleneck #2: matching latency at high demand. During surge, the matching service is the bottleneck — a city with 50 K pending requests/min and bipartite optimization runs into solver budget limits. Real systems degrade to greedy matching during extreme spikes and accept a quality dip.
Bottleneck #3: payment processor throughput. Even at 25 M trips/day, the payment processor can throttle. Async settlement with retry-with-backoff and a durable outbox prevents the trip path from blocking on payment.
Alternative I’d push back on: storing all historical location pings forever for the live geo index. A ping older than the current one is useless for matching; archive to OLAP for analytics, not the live store. Mixing “where are drivers now” and “where has driver X been” in the same index turns hot writes into hot writes against cold data.
What breaks first at 10× scale (250 M trips/day): the trip orchestrator’s per-trip state. A naive design (one row + state column) becomes a hot-write hellscape; the workflow engine pattern (per-trip durable timer + event log) scales horizontally. Plan for that re-architecture early.
Companies this resembles#
Uber, Lyft, Ola, Didi, Grab, Bolt. Cousins: Lime / Bird (geo-indexed vehicles without a driver dispatch), food delivery (see /system-design/ride-hailing-eats).
Related systems#
- Google Maps — the routing and ETA engine Uber depends on (or builds in-house equivalents of).
- Food Delivery (Uber Eats) — three-sided variant of the same dispatch problem.
- Proximity Service (Yelp) — same geo-index, read-only queries.
- Payment System — substrate for the fare collection path.