Design a Search Service API
Query, suggest, rank, paginate. Where latency budget lives and how to write a search API that won't go viral on the wrong page.
Context#
A search service is the foundational API every product team eventually owns. E-commerce search, employee search, content search, log search, support-ticket search — the surface looks different but the contract is the same: take a query string, return a ranked list of matches, support pagination, support suggestions, expose enough filtering for the use cases that matter.
The interviewer’s hidden objectives, roughly in order:
- Can you scope the search without spinning into an entire Google rebuild?
- Can you write a clean read API with sensible pagination, sorting, and filtering?
- Can you produce a latency budget that respects the 100 ms perception threshold for typeahead and the 300 ms threshold for full results?
- Can you handle the suggest vs query distinction correctly — they have different latency budgets, different SLOs, often different backends?
- Can you defend the ranking story without claiming to have built BM25 in the room?
This is a Foundational-tier system in the source curriculum precisely because it forces you to engage with the core API-design vocabulary: endpoints, schemas, pagination, filtering, ranking, idempotency (read APIs are inherently idempotent — but cache semantics are a real trade), and a latency budget.
Requirements (functional and non-functional)#
Functional — in scope:
- Query by free-text string. Returns ranked results.
- Suggest (autocomplete) endpoint optimised for sub-100 ms response.
- Pagination via opaque cursor (not page numbers).
- Optional filters (faceted by stable enum fields — category, date range, language).
- Optional sort overrides (relevance is default; price-asc, recency, etc. as overrides).
- Result hit counts (approximate, not exact at scale).
Functional — out of scope:
- Index-management API (admin only; not part of the public search contract).
- Personalisation (treat the query as anonymous in v1; user-aware ranking is a future axis).
- Spell correction as a separate endpoint (server applies it transparently and reports it in the response).
- Vector / semantic search (call out as a follow-up; not in v1 scope).
Non-functional:
- Latency: typeahead
<= 100 ms p95, full query<= 300 ms p95. - Throughput: 10k QPS sustained, 50k QPS burst.
- Availability: 99.95% on the read path.
- Freshness: writes show up in results within 60 seconds.
Use case diagram#
┌──────────────┐ │ End user │ └──────┬───────┘ │ ┌──────────────┼──────────────┐ ▼ ▼ ▼ [type in box] [paginate] [filter / sort] │ │ │ ▼ ▼ ▼ ┌──────────────────────────────────────┐ │ Search Service API │ └────────────────┬─────────────────────┘ │ ▼ ┌───────────────┐ │ Internal team │ … index manage, observe metrics └───────────────┘Two actors (End user, Internal team). The internal-team use cases are admin and out of public-API scope.
Class diagram#
┌───────────────────────┐ │ SearchService │ ├───────────────────────┤ │ query(req): SearchResp│ │ suggest(req): SuggestResp │ └──────────┬────────────┘ │ ▼ ┌───────────────────────┐ ┌─────────────────────┐ │ SearchRequest │ │ SearchResponse │ ├───────────────────────┤ ├─────────────────────┤ │ q : string │ │ hits : Hit[] │ │ cursor? : string │ │ next_cursor? : str │ │ page_size : int (≤100)│ │ total_estimate : int│ │ filters? : Filter[] │ │ took_ms : int │ │ sort? : SortKey │ │ correction? : Corr │ └───────────────────────┘ └─────────────────────┘ │ │ │ ▼ │ ┌──────────────┐ │ │ Hit │ │ ├──────────────┤ │ │ id, title │ │ │ snippet │ │ │ score │ │ │ fields{...} │ │ └──────────────┘ ▼ ┌───────────────────────┐ │ Filter │ facet : enum op : in | eq | range │ SortKey │ field : enum dir : asc | desc └───────────────────────┘SearchService is the API. SearchRequest and SearchResponse are the wire shapes. Hit is per-result. Filter is constrained — only stable enum fields are filterable, so the URL can be reasoned about and the result cached.
Sequence diagram (key flows)#
The full query flow, client to backend and back:
Client Gateway SearchAPI Indexer Cache │ GET /search?q=...│ │ │ │ │─────────────────►│ │ │ │ │ │ rate limit? │ │ │ │ │ auth? │ │ │ │ │─────────────►│ │ │ │ │ │ cache key │ │ │ │ │ ───────────────────────────────► │ │ │ │ hit? │ │ │ │ │ ◄─────────────────────────────── │ │ │ │ miss │ │ │ │ │ query(q,…) │ │ │ │ │───────────────►│ │ │ │ │ ranked hits │ │ │ │ │◄───────────────│ │ │ │ │ set cache │ │ │ │ │ ───────────────────────────────► │ │ │ SearchResp │ │ │ │ │◄─────────────│ │ │ │ 200 OK + body │ │ │ │ │◄─────────────────│ │ │ │The suggest flow is identical in shape but bypasses the Cache layer (typeahead changes every keystroke; cache hit rate is poor) and runs against a separate index optimised for prefix matches.
Activity diagram (for non-trivial state)#
The query lifecycle is simple — read-only, no state machine — but the error / degradation path has structure worth showing:
[request arrives] │ ▼ ┌───────────────┐ │ validate q │── invalid → 400 Bad Request └───────┬───────┘ │ valid ▼ ┌───────────────┐ │ check rate │── exceeded → 429 + Retry-After └───────┬───────┘ │ within budget ▼ ┌───────────────┐ │ cache lookup │── hit → 200 (cached) └───────┬───────┘ │ miss ▼ ┌───────────────┐ │ indexer │── timeout → 503 + cached-fallback? └───────┬───────┘ │ success ▼ ┌───────────────┐ │ rank + slice │ └───────┬───────┘ │ ▼ 200 OKThe 503 + cached-fallback branch is the interesting one — when the live indexer is slow, returning a slightly-stale cached result for the same query is almost always better than failing the request. Stripe’s Tracing-vs-Outage trade-off illustrates this principle.
API implementation#
Endpoint catalogue#
| Method | Path | Purpose |
|---|---|---|
GET | /v1/search | Full search; returns ranked hits + cursor |
GET | /v1/suggest | Typeahead; returns top-N completions only |
Two endpoints. No POSTs. Read-only. Cacheable at every layer.
OpenAPI schema (excerpt)#
paths: /v1/search: get: operationId: search parameters: - name: q in: query required: true schema: { type: string, minLength: 1, maxLength: 256 } - name: cursor in: query schema: { type: string } - name: page_size in: query schema: { type: integer, minimum: 1, maximum: 100, default: 20 } - name: filter in: query style: deepObject schema: type: object additionalProperties: { type: string } - name: sort in: query schema: type: string enum: [relevance, recency, price_asc, price_desc] default: relevance responses: '200': description: Ranked hits headers: X-RateLimit-Remaining: schema: { type: integer } content: application/json: schema: $ref: '#/components/schemas/SearchResponse' '400': { description: Invalid query } '429': { description: Too many requests }components: schemas: SearchResponse: type: object required: [hits, took_ms] properties: hits: type: array items: { $ref: '#/components/schemas/Hit' } next_cursor: { type: string, nullable: true } total_estimate: { type: integer } took_ms: { type: integer } correction: type: object nullable: true properties: applied_q: { type: string } original_q: { type: string } Hit: type: object required: [id, score] properties: id: { type: string } title: { type: string } snippet: { type: string } score: { type: number } fields: { type: object, additionalProperties: true }Client samples — three languages#
The same GET /v1/search call in Python, Go, and Node. Matches the multi-language convention used across this workbook.
import requests
def search(q, cursor=None, page_size=20, sort="relevance", filters=None): params = {"q": q, "page_size": page_size, "sort": sort} if cursor: params["cursor"] = cursor if filters: for key, val in filters.items(): params[f"filter[{key}]"] = val resp = requests.get( "https://api.example.com/v1/search", params=params, headers={"Authorization": "Bearer eyJhbGciOi..."}, timeout=2, ) resp.raise_for_status() return resp.json()
page = search("wireless mouse", filters={"category": "peripherals"})for hit in page["hits"]: print(hit["id"], hit["title"], hit["score"])package main
import ( "encoding/json" "fmt" "net/http" "net/url")
type Hit struct { ID string `json:"id"` Title string `json:"title"` Score float64 `json:"score"`}
type SearchResponse struct { Hits []Hit `json:"hits"` NextCursor string `json:"next_cursor"` TookMs int `json:"took_ms"`}
func search(q, cursor string) (*SearchResponse, error) { u, _ := url.Parse("https://api.example.com/v1/search") qs := u.Query() qs.Set("q", q) qs.Set("page_size", "20") qs.Set("sort", "relevance") if cursor != "" { qs.Set("cursor", cursor) } u.RawQuery = qs.Encode()
req, _ := http.NewRequest("GET", u.String(), nil) req.Header.Set("Authorization", "Bearer eyJhbGciOi...") resp, err := http.DefaultClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close()
var sr SearchResponse if err := json.NewDecoder(resp.Body).Decode(&sr); err != nil { return nil, err } return &sr, nil}
func main() { page, _ := search("wireless mouse", "") for _, h := range page.Hits { fmt.Println(h.ID, h.Title, h.Score) }}async function search(q, { cursor, pageSize = 20, sort = "relevance", filters } = {}) { const params = new URLSearchParams({ q, page_size: String(pageSize), sort }); if (cursor) params.set("cursor", cursor); if (filters) { for (const [k, v] of Object.entries(filters)) { params.set(`filter[${k}]`, v); } } const resp = await fetch(`https://api.example.com/v1/search?${params}`, { headers: { Authorization: "Bearer eyJhbGciOi..." }, }); if (!resp.ok) throw new Error(`HTTP ${resp.status}`); return resp.json();}
const page = await search("wireless mouse", { filters: { category: "peripherals" } });for (const hit of page.hits) { console.log(hit.id, hit.title, hit.score);}Latency budget#
The 300 ms p95 full-query budget breaks down as:
| Phase | Budget | Notes |
|---|---|---|
| TLS + HTTP/2 setup | 0 ms | Connection pool keeps these warm. |
| Gateway (rate limit, auth) | 5 ms | In-memory token-bucket, JWT verify cached. |
| Cache lookup | 2 ms | Redis in the same DC. |
| Indexer fan-out + merge | 80 ms p95 | The dominant cost. |
| Ranking + scoring | 30 ms | After candidate retrieval. |
| Serialize + transport | 20 ms | JSON encoding + ~20 KB body. |
| Margin for tail | 60 ms | Headroom for slow shards. |
| Total | 197 ms | Safely under 300 ms. |
Typeahead’s 100 ms budget halves every line — and skips the cache (low hit rate) and ranking (top-N by prefix match is enough).
Trade-offs and extensions#
| Decision | Why | Cost if requirements change |
|---|---|---|
| Cursor pagination, not page numbers | Stable under writes; index-friendly | Can’t jump to arbitrary page — UX trade-off |
| Filters as enum-only fields | Cacheable URLs; tractable index | Free-text filters require a re-design |
Approximate total_estimate | O(1) to compute at scale | Users expect exact counts under 100 |
| GET-only API | Cacheable at every layer | No personalisation possible without auth-aware caching |
| Single relevance + sort dimension | Simple model; predictable | Multi-dimensional ranking needs a richer sort grammar |
| 60 s freshness SLO | Indexers can batch | Real-time use cases (chat search) need < 5 s |
Likely follow-up extensions and the shape of the answer:
- Personalisation. Move from anonymous to per-user ranking. Adds an auth requirement; changes cache key shape (user-aware); forces splitting the cache into per-user buckets.
- Vector search. A
vectormode that uses dense embeddings. Same API shape; new backend. Most teams build vector and lexical in parallel and merge results. - Multi-tenant search. Add tenant scoping at the API level (
X-Tenant-Idheader); per-tenant rate limits; per-tenant indexes for isolation. - Webhook on index-staleness. Push notifications to clients when their last query is now stale enough to re-run.
Mock interview follow-ups#
Questions interviewers reach for and the shortest correct answer:
- “How do you prevent a deep-paginated request from blowing up?” — Cursor-based pagination caps it naturally; we never seek past
Nresults. Hard limit at 10k. - “What’s the cache invalidation story?” — Cache key on the full query; TTL of 60 s (matching freshness SLO); plus a soft “stampede” lock around concurrent misses.
- “How do you do typeahead in 100 ms?” — Separate index optimised for prefix matches; in-memory; warm cache at the edge; degrade gracefully on miss (return fewer results, not slower).
- “How do you rank?” — Out of scope to design the ranking algorithm in an API round. The API contract is
score: number; the engine behind it is BM25 / dense retrieval / a learned-to-rank model — the API doesn’t care. - “What happens at 10x scale?” — Shard the index by document partition (e.g. category); fan out the query; merge top-K. Add a read-replica gateway closer to the user.
- “How do you A/B test ranking changes?” — Header-based experiment routing; the API contract stays unchanged; observability captures
experiment_idper request. - “What if the indexer is down?” — Return cached results with a
Warning: 110header; fail-soft. If cache is also cold, return503withRetry-After: 1.
Related#
- Design a File Service API — the next foundational system; range-requests + signed URLs.
- Design a Comment Service API — cursor-based pagination, but mutating.
- Design a Pub-Sub Service API — the asynchronous backbone you’ll plug into search-indexing later.
- The API-Design Walk-through — the seven-step recipe this writeup followed.
- REST — The Architectural Style — the architectural style behind the endpoint shape.