Proximity Service (Yelp) — System Design

Step 1 — Clarify Requirements#

Functional

Given a location and radius (or a viewport), return nearby businesses matching a query (coffee, dentist, vegan ramen).
Filter by category, rating, open-now, price tier.
Sort by distance, rating, or relevance.
Business profile pages (name, hours, photos, reviews) — but the read path for those is uninteresting and out of scope.
Out of scope: write path for reviews, photo upload, business onboarding.

Non-functional

99.9% availability.
p99 search latency under 300 ms.
100 M MAU, ~10 K searches/sec peak.
~50 M businesses globally.
Updates to business data are infrequent (hours, address); the index can refresh nightly.

Step 2 — Capacity Estimation#

Businesses indexed: 50 M.
Search QPS: 10 K/sec peak, ~3 K/sec average.
Per-search work: query the geo index for businesses within radius, then rank top 20. With ~100 candidates per typical urban query, ranking cost is small.
Storage: business metadata averages 5 KB (name, address, hours, categories, summary). Photos and full reviews are out of band (blob store). Total: 50 M × 5 KB = 250 GB of structured data. Tiny — fits comfortably on a single shard, but partitioning by region helps with locality.
Geo index size: lat/lng + business_id + category bitset per row ≈ 40 B; 50 M × 40 B = 2 GB. In-memory across the fleet.

The system is read-heavy and modest in scale. The interesting parts are the spatial query, the relevance ranking, and the freshness story.

Step 3 — System Interface#

GET /search
    ?q=<query>                  (e.g., "coffee", optional)
    &lat=<float>&lng=<float>
    &radius_m=<int>             (or &viewport=...)
    &category=<id>              (optional)
    &open_now=<bool>
    &min_rating=<float>
    &sort=<distance|rating|relevance>
    &limit=20
    &cursor=<opaque>
→ { results: [...], next_cursor }

GET /businesses/:id            (full profile)

The search response includes only enough for a list (name, photo URL, rating, distance); detail is fetched on tap.

Step 4 — High-Level Design#

                       ┌── photo + review CDN
                       │
client → LB → API ──┬── /search ──→ search service ──→ geo index (sharded by region)
                    │                    │                       │
                    │                    ▼                       ▼
                    │             ranker / scorer       business store (Postgres / KV)
                    │                    │
                    │                    └── reads metadata for candidates
                    │
                    └── /businesses/:id ──→ business store

The geo index and the business store can be one and the same (a Postgres database with PostGIS, partitioned by region), or split (an Elasticsearch / OpenSearch geo index + a relational store for the truth). Both shapes work at this scale; the trade-off is below.

Step 5 — Data Model#

Businesses:

table businesses
  business_id   uuid     PK
  name          string
  geo           point     // lat, lng
  geohash       string    // for non-PostGIS implementations
  categories    array<string>
  rating        decimal
  review_count  int
  hours         jsonb     // for "open now" filter
  metadata      jsonb

Geo index — several viable approaches:

PostGIS GiST index on geo (point) — proper R-tree / GiST, accurate distances, native to Postgres. Best when the business store and geo index are co-located. Scales to 50 M rows easily.

Geohash-based string index — convert lat/lng to a string like 9q8yyk, index as B-tree. Cheap in any KV store. Querying “within radius” requires querying multiple neighboring geohashes — coarse but very simple.

A third option, often better at scale: quadtree or H3 hex indexes. Each business lives in an H3 cell at resolution ~8 (~0.7 km²); a search expands outward, ring by ring, until it has enough candidates.

Step 6 — Detailed Design#

The geo query#

Most “search within radius” queries are answered as “search within bounding box, then post-filter”:

1. Compute bbox: (lat ± dlat, lng ± dlng) where dlat, dlng approximate the radius.
2. SELECT * FROM businesses WHERE
     ST_DWithin(geo, ST_Point(lat,lng)::geography, radius)
     AND categories @> [category]
     AND CASE WHEN open_now THEN check_hours(hours, now())
   LIMIT 200;
3. Rank the 200 candidates by ranker.
4. Return top 20.

PostGIS uses the GiST index to prune the bbox before the precise ST_DWithin filter. For 50 M rows partitioned by region, each query touches one partition and returns in 5-20 ms.

Why H3 / quadtree for huge cities#

Manhattan has ~100 K businesses in 60 km². A naive radius query returns hundreds of thousands of candidates. H3 indexing lets us bound candidate count by cell hit, not by area:

center_cell = h3_index(lat, lng, resolution=8)
candidate_cells = h3_grid_ring(center_cell, k=1)  // 7 cells
candidates = SELECT * FROM businesses WHERE h3_cell IN candidate_cells AND ...
if len(candidates) < limit:
    expand to k=2 (19 cells), and so on

Adaptive expansion gives k-nearest semantics — return the closest 20, regardless of how dense the area is.

Ranking#

Sort by relevance is the interesting one; distance and rating are trivial.

Relevance combines:

Lexical match: business name and description match the query.
Distance: closer is better, exponentially decaying.
Quality: rating × log(1 + review_count).
Personalization: user’s category preferences (from past clicks), if logged in.
Recency / freshness: very recent positive reviews bump a business.

Implementation: a lightweight scorer (linear model or boosted trees) over the candidate set. Latency budget for ranking ~50 ms over 200 candidates.

Caching#

Each search query has a natural cache key: (query, geohash_prefix, filters, sort). Identical queries from the same neighborhood hit the cache (Redis, 60-second TTL). Hit rate is modest (~20%) because users tweak filters often; cache mostly absorbs viral queries (“happy hour” on a Friday evening in a popular area).

Freshness#

Business data changes slowly: hours updated by owners, new businesses onboarded daily. The index updates run hourly (incremental) and nightly (full rebuild). For most queries, hour-old data is invisible to the user.

The exception is “open now” — opens and closes happen by the minute. Implement as a runtime filter over hours jsonb rather than precomputing an is_open boolean (which would go stale). Cost: ~1 µs per row, well within budget.

Photos and reviews#

Listed search results show one photo URL and a star rating. The photo lives in the CDN; the URL is precomputed and cached as part of the business metadata. Full review text is fetched only when the user taps into the business detail page.

Step 7 — Evaluation & Trade-offs#

Bottleneck #1: ranking cost during peak. 10 K QPS × 200 candidates × 50 ms ranking = ~5 cores fully saturated per shard. Fan out ranking work across the search-service fleet; cache rankings for popular (query, neighborhood) pairs.

Bottleneck #2: cold-cache latency in dense urban areas. First search of the day in San Francisco hits the GiST index, fans into ~5,000 candidates, picks 20. Page-warming on app launch (a background prefetch of “near you” results) makes the first interaction feel instant.

Bottleneck #3: write amplification on a viral business. A new restaurant featured in a major article gets 100× normal review-write traffic for a week. Write-only paths (post review, post photo) are out of scope here but worth noting: the rating field must be a sharded counter or a periodic recompute, not a per-write update.

Alternative I’d push back on: building this on Elasticsearch alone with no relational store. ES is fine for the geo index and lexical scoring, but it’s the wrong store for “business hours and ownership info” — those are mutable, transactional, and benefit from relational constraints. Two stores, joined at query time, is the right shape.

What breaks first at 10× scale (1 B MAU): the global business store would still fit on one DB, but write amplification on review counts and ratings would dominate. Partition by region from day one; even if the data fits centrally, the operational shape does not.

Companies this resembles#

Yelp, Google Maps Places, Foursquare, TripAdvisor, Zomato. Cousins: real-estate search (Zillow, same geo problem, different ranking signals), dating apps (geo + soft-match).

Google Maps — same geo substrate, different problem (routing vs proximity search).
Distributed Search — the lexical-relevance backbone.
Uber — same H3 geo-index pattern, write-heavy variant.
Typeahead Suggestion — what runs as the user types the query.