Food Delivery (Uber Eats)
Three-sided marketplace: restaurants, couriers, customers. Order routing, ETA, courier dispatch.
Step 1 — Clarify Requirements#
Functional
- A customer browses nearby restaurants, builds a cart, places an order.
- The order is sent to the restaurant; once accepted, a courier is dispatched.
- Real-time tracking: customer sees courier en route to the restaurant, then to them.
- ETA computed continuously, accounting for restaurant prep time and traffic.
- Out of scope: payments path (covered by
/system-design/payment-system), ratings & reviews, restaurant management dashboard.
Non-functional
- 99.9% availability for the customer-facing path; 99.99% for in-progress orders.
- p99 page load and search under 400 ms.
- p99 order-to-courier-assigned under 30 seconds.
- Real-time location updates with ~5 second freshness.
- Three-sided fairness: a fair share of high-value orders across restaurants and couriers.
Step 2 — Capacity Estimation#
- Orders/day: 10 M globally (Uber Eats scale).
- Order QPS: 10 M / 86 400 = ~120 orders/sec average, ~1 K/sec peak (lunch and dinner rushes).
- Active couriers: ~1 M, ping every ~5 s → 200 K location pings/sec.
- Active restaurants: ~1 M; each gets order events occasionally.
- Customers browsing: ~5 M concurrent at peak; 50 K search QPS.
- Storage (orders): 10 M × 5 KB = 50 GB/day, ~20 TB/year.
- Storage (location pings): handled like rideshare; only the latest per courier stored hot, full track archived for completed orders.
This is two-sided rideshare (/system-design/uber) plus a third leg (the restaurant). The restaurant introduces a long pause (prep time) into the delivery flow that doesn’t exist in rideshare.
Step 3 — System Interface#
// CustomerGET /restaurants?lat&lng&q=&cuisine&open_nowPOST /orders (cart submission)GET /orders/:id (state + courier location)
// RestaurantPOST /restaurants/orders/accept { order_id, prep_eta }POST /restaurants/orders/ready { order_id } -- food is ready for pickup
// CourierPOST /courier/location { lat, lng, heading, ts }POST /courier/accept { order_id }POST /courier/arrived { trip_leg: pickup|dropoff }POST /courier/delivered { order_id }
// Lifecycle events streamed to customer + restaurant + courierORDER_PLACED, RESTAURANT_ACCEPTED, COURIER_DISPATCHED, COURIER_ARRIVED_PICKUP,COURIER_PICKED_UP, COURIER_EN_ROUTE_DROPOFF, COURIER_ARRIVED_DROPOFF, DELIVEREDStep 4 — High-Level Design#
customer app restaurant tablet courier app │ │ │ ▼ ▼ ▼ discovery svc restaurant svc courier svc │ │ │ └────────────────────────────┼──────────────────────────────┘ ▼ order orchestrator (workflow engine) │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ payment svc dispatch svc ETA svc + traffic (matching) (Maps overlay) │ geo index + courier stateThe order orchestrator runs the per-order state machine. The dispatch service is the rideshare-style matcher with the extra wrinkle of timing.
Step 5 — Data Model#
Orders:
table orders order_id uuid PK customer_id uuid restaurant_id uuid courier_id uuid? null until matched state enum(placed, accepted, ready, picked_up, en_route, delivered, cancelled) items list<{id, name, qty, price}> totals money origin point // restaurant location destination point // delivery address placed_at timestamp prep_eta timestamp? delivered_at timestamp?Restaurants (geo-indexed):
table restaurants restaurant_id uuid PK geo point cuisines list<string> is_open bool current_load int // orders in queue; affects accept latency avg_prep_min intCouriers (state cache, Redis):
key: courier:{id}:state → { online, on_order, current_order_id, last_ping, vehicle_type }key: geo:couriers:{city_h3} → GEOSET of available couriersOrder events (Kafka + OLAP):
{ order_id, event, ts, actor_id, ... }Step 6 — Detailed Design#
The three-leg dispatch problem#
A delivery has three legs of time:
- Restaurant accept + prep (5-25 min): from order placement to food being ready.
- Courier to restaurant (3-10 min): from courier assignment to arriving at restaurant.
- Restaurant to customer (5-20 min): from pickup to dropoff.
If we dispatch the courier at order placement, the courier idles at the restaurant during prep (wasteful) — but the food is rushed out hot. If we dispatch when food is ready, no idle, but the customer waits an extra 5 minutes.
The right answer: predict food-ready time, dispatch a courier so they arrive at that time. This is the central optimization:
prep_eta = restaurant.avg_prep_min + load_factor + item-specific_buffercourier_eta_to_restaurant = MapsAPI.eta(courier.location, restaurant.location)dispatch_courier_at = prep_eta - courier_eta_to_restaurantThe dispatcher runs this calculation continuously; when dispatch_courier_at <= now, it sends a job to the matching service.
Matching service#
Similar to rideshare matching (/system-design/uber), with the added input that the destination (restaurant) is known and fixed:
nearby_couriers = GEORADIUS geo:couriers:{city_h3} BYRADIUS rest.lat rest.lng 2 km COUNT 20filter: online, not on_order, vehicle_type allowedscore each candidate: distance_to_restaurant (lower is better) predicted_acceptance_rate fairness term (couriers with fewer recent orders get priority)pick top 1, send dispatch pushBatched matching every few seconds beats greedy first-come-first-served. The dispatcher considers all pending dispatch-due orders × all free couriers as a bipartite assignment problem.
Order state machine#
PLACED → on restaurant accept: ACCEPTED { prep_eta } → on courier dispatch: COURIER_ASSIGNED → on courier arrival: COURIER_AT_RESTAURANT → on pickup confirmed: PICKED_UP → on dropoff arrival: COURIER_AT_DROPOFF → on delivered: DELIVERED (cancellations possible at most steps with refund logic)Persisted as a workflow with durable per-state transitions. Each transition emits events on Kafka for downstream (ETA service, notifications, analytics).
Live ETA#
The customer sees an ETA that updates continuously:
ETA = max(prep_eta, courier_eta_to_restaurant) + travel_time_to_customer
before pickup: pull courier location (Redis), Map ETA to restaurant, Map ETA from restaurant to customerafter pickup: pull courier location, Map ETA from current location to customerUpdated every 30-60 seconds. The order orchestrator emits these via a WebSocket / SSE stream to the customer.
Discovery (browsing restaurants)#
GET /restaurants?lat&lng → geo-radius query against restaurants table filter: open_now, cuisines, dietary, distance rank: by predicted delivery time, rating, personal preference return top 50Personalized ranking model takes the candidate list, scores by predicted-conversion + freshness + diversity, returns top 50. Standard newsfeed-style two-stage.
Surge / dynamic pricing#
When demand exceeds courier supply in a region (Friday 7 PM in a college town), the system applies a surcharge:
surcharge = clip(orders_pending / couriers_free, 0, 1.5) * base_delivery_feeLocked at order placement. Compensates couriers (higher per-order earnings during busy times) and dampens demand.
Restaurant tablet UX#
Restaurants run an order tablet. The system pushes new orders via long-lived connection (WebSocket / FCM). Restaurant accepts → enters prep_eta or accepts default. Marks food-ready → triggers courier nudge. Marks issue → escalates to support.
Tablet reliability is operationally critical: a restaurant whose tablet is offline misses orders. The system pings the tablet every minute; on missed pings, falls back to SMS / phone to the restaurant.
Latency budget for “place order to assignment” (target 30 s)#
Order placed (durable write): 300 msRestaurant push + accept (human action): 5-60 sDispatch decision (when to send courier): asynchronousCourier nudge (push notification): 5 sCourier accept (human action): 5-30 s total: ~30-90 s typicalThe human-action steps dominate. The system can’t speed those up beyond reducing friction in the UI.
Step 7 — Evaluation & Trade-offs#
Bottleneck #1: courier supply. Most operational issues in food delivery aren’t engineering — they’re “not enough couriers in this neighborhood at this time”. Engineering can mitigate (surge pricing, batched deliveries when multiple orders are near each other) but can’t create couriers from thin air.
Bottleneck #2: restaurant acceptance latency. A restaurant that takes 2 minutes to accept the order can’t be compensated for by faster code on our side. Mitigation: auto-accept for high-trust restaurants; track and surface slow-to-accept restaurants in operational dashboards.
Bottleneck #3: stacked orders for one courier. A courier picking up at one restaurant and delivering to two nearby customers (“stacking”) is a 30%+ efficiency win — but planning it requires predicting time windows accurately. Real implementations stack opportunistically when the second order arrives during the first courier’s pickup-arrival window.
Alternative I’d push back on: a pure greedy match-when-order-placed approach (rideshare-style). It works, but courier idle time at restaurants is the system’s worst unit-economics outcome. Time-aware dispatch (with food-ready prediction) is the design difference that makes delivery profitable.
What breaks first at 10× scale (100 M orders/day): the order orchestrator’s per-order state. Already a per-trip workflow at present scale; at 10× we’d shard the orchestrator by city. Also: courier scarcity becomes acute in many markets; the system’s design around scarcity (dispatch waves, dynamic radii) becomes the dominant operational concern.
Companies this resembles#
Uber Eats, DoorDash, Deliveroo, Zomato, Swiggy, Grubhub, Meituan. Cousins: Instacart (grocery, multi-stop), GoPuff (warehouse-fulfilled), Postmates.
Related systems#
- Uber — the rideshare sibling; same dispatch primitives without the restaurant leg.
- Google Maps — ETA and routing substrate.
- Payment System — settles each order across customer, restaurant, courier, platform.
- Proximity Service (Yelp) — the discovery layer’s geo-search substrate.