Design a Search Service API — API Design

Context#

A search service is the foundational API every product team eventually owns. E-commerce search, employee search, content search, log search, support-ticket search — the surface looks different but the contract is the same: take a query string, return a ranked list of matches, support pagination, support suggestions, expose enough filtering for the use cases that matter.

The interviewer’s hidden objectives, roughly in order:

Can you scope the search without spinning into an entire Google rebuild?
Can you write a clean read API with sensible pagination, sorting, and filtering?
Can you produce a latency budget that respects the 100 ms perception threshold for typeahead and the 300 ms threshold for full results?
Can you handle the suggest vs query distinction correctly — they have different latency budgets, different SLOs, often different backends?
Can you defend the ranking story without claiming to have built BM25 in the room?

This is a Foundational-tier system in the source curriculum precisely because it forces you to engage with the core API-design vocabulary: endpoints, schemas, pagination, filtering, ranking, idempotency (read APIs are inherently idempotent — but cache semantics are a real trade), and a latency budget.

Requirements (functional and non-functional)#

Functional — in scope:

Query by free-text string. Returns ranked results.
Suggest (autocomplete) endpoint optimised for sub-100 ms response.
Pagination via opaque cursor (not page numbers).
Optional filters (faceted by stable enum fields — category, date range, language).
Optional sort overrides (relevance is default; price-asc, recency, etc. as overrides).
Result hit counts (approximate, not exact at scale).

Functional — out of scope:

Index-management API (admin only; not part of the public search contract).
Personalisation (treat the query as anonymous in v1; user-aware ranking is a future axis).
Spell correction as a separate endpoint (server applies it transparently and reports it in the response).
Vector / semantic search (call out as a follow-up; not in v1 scope).

Non-functional:

Latency: typeahead <= 100 ms p95, full query <= 300 ms p95.
Throughput: 10k QPS sustained, 50k QPS burst.
Availability: 99.95% on the read path.
Freshness: writes show up in results within 60 seconds.

Use case diagram#

                ┌──────────────┐
                │   End user   │
                └──────┬───────┘
                       │
        ┌──────────────┼──────────────┐
        ▼              ▼              ▼
   [type in box]  [paginate]    [filter / sort]
        │              │              │
        ▼              ▼              ▼
   ┌──────────────────────────────────────┐
   │          Search Service API          │
   └────────────────┬─────────────────────┘
                    │
                    ▼
            ┌───────────────┐
            │ Internal team │  …  index manage, observe metrics
            └───────────────┘

Two actors (End user, Internal team). The internal-team use cases are admin and out of public-API scope.

Class diagram#

   ┌───────────────────────┐
   │   SearchService       │
   ├───────────────────────┤
   │ query(req): SearchResp│
   │ suggest(req): SuggestResp │
   └──────────┬────────────┘
              │
              ▼
   ┌───────────────────────┐         ┌─────────────────────┐
   │   SearchRequest       │         │   SearchResponse    │
   ├───────────────────────┤         ├─────────────────────┤
   │ q : string            │         │ hits : Hit[]        │
   │ cursor? : string      │         │ next_cursor? : str  │
   │ page_size : int (≤100)│         │ total_estimate : int│
   │ filters? : Filter[]   │         │ took_ms : int       │
   │ sort? : SortKey       │         │ correction? : Corr  │
   └───────────────────────┘         └─────────────────────┘
              │                                  │
              │                                  ▼
              │                          ┌──────────────┐
              │                          │     Hit      │
              │                          ├──────────────┤
              │                          │ id, title    │
              │                          │ snippet      │
              │                          │ score        │
              │                          │ fields{...}  │
              │                          └──────────────┘
              ▼
   ┌───────────────────────┐
   │      Filter           │   facet : enum    op : in | eq | range
   │      SortKey          │   field : enum    dir : asc | desc
   └───────────────────────┘

SearchService is the API. SearchRequest and SearchResponse are the wire shapes. Hit is per-result. Filter is constrained — only stable enum fields are filterable, so the URL can be reasoned about and the result cached.

Sequence diagram (key flows)#

The full query flow, client to backend and back:

 Client          Gateway          SearchAPI         Indexer            Cache
   │ GET /search?q=...│              │                │                  │
   │─────────────────►│              │                │                  │
   │                  │ rate limit?  │                │                  │
   │                  │ auth?        │                │                  │
   │                  │─────────────►│                │                  │
   │                  │              │  cache key     │                  │
   │                  │              │ ───────────────────────────────► │
   │                  │              │  hit?          │                  │
   │                  │              │ ◄─────────────────────────────── │
   │                  │              │   miss         │                  │
   │                  │              │  query(q,…)    │                  │
   │                  │              │───────────────►│                  │
   │                  │              │  ranked hits   │                  │
   │                  │              │◄───────────────│                  │
   │                  │              │  set cache    │                  │
   │                  │              │ ───────────────────────────────► │
   │                  │  SearchResp  │                │                  │
   │                  │◄─────────────│                │                  │
   │   200 OK + body  │              │                │                  │
   │◄─────────────────│              │                │                  │

The suggest flow is identical in shape but bypasses the Cache layer (typeahead changes every keystroke; cache hit rate is poor) and runs against a separate index optimised for prefix matches.

Activity diagram (for non-trivial state)#

The query lifecycle is simple — read-only, no state machine — but the error / degradation path has structure worth showing:

              [request arrives]
                    │
                    ▼
            ┌───────────────┐
            │  validate q   │── invalid → 400 Bad Request
            └───────┬───────┘
                    │ valid
                    ▼
            ┌───────────────┐
            │ check rate    │── exceeded → 429 + Retry-After
            └───────┬───────┘
                    │ within budget
                    ▼
            ┌───────────────┐
            │  cache lookup │── hit → 200 (cached)
            └───────┬───────┘
                    │ miss
                    ▼
            ┌───────────────┐
            │   indexer     │── timeout → 503 + cached-fallback?
            └───────┬───────┘
                    │ success
                    ▼
            ┌───────────────┐
            │ rank + slice  │
            └───────┬───────┘
                    │
                    ▼
                 200 OK

The 503 + cached-fallback branch is the interesting one — when the live indexer is slow, returning a slightly-stale cached result for the same query is almost always better than failing the request. Stripe’s Tracing-vs-Outage trade-off illustrates this principle.

API implementation#

Endpoint catalogue#

Method	Path	Purpose
`GET`	`/v1/search`	Full search; returns ranked hits + cursor
`GET`	`/v1/suggest`	Typeahead; returns top-N completions only

Two endpoints. No POSTs. Read-only. Cacheable at every layer.

OpenAPI schema (excerpt)#

paths:
  /v1/search:
    get:
      operationId: search
      parameters:
        - name: q
          in: query
          required: true
          schema: { type: string, minLength: 1, maxLength: 256 }
        - name: cursor
          in: query
          schema: { type: string }
        - name: page_size
          in: query
          schema: { type: integer, minimum: 1, maximum: 100, default: 20 }
        - name: filter
          in: query
          style: deepObject
          schema:
            type: object
            additionalProperties: { type: string }
        - name: sort
          in: query
          schema:
            type: string
            enum: [relevance, recency, price_asc, price_desc]
            default: relevance
      responses:
        '200':
          description: Ranked hits
          headers:
            X-RateLimit-Remaining:
              schema: { type: integer }
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SearchResponse'
        '400': { description: Invalid query }
        '429': { description: Too many requests }
components:
  schemas:
    SearchResponse:
      type: object
      required: [hits, took_ms]
      properties:
        hits:
          type: array
          items: { $ref: '#/components/schemas/Hit' }
        next_cursor: { type: string, nullable: true }
        total_estimate: { type: integer }
        took_ms: { type: integer }
        correction:
          type: object
          nullable: true
          properties:
            applied_q: { type: string }
            original_q: { type: string }
    Hit:
      type: object
      required: [id, score]
      properties:
        id: { type: string }
        title: { type: string }
        snippet: { type: string }
        score: { type: number }
        fields: { type: object, additionalProperties: true }

Client samples — three languages#

The same GET /v1/search call in Python, Go, and Node. Matches the multi-language convention used across this workbook.

import requests

def search(q, cursor=None, page_size=20, sort="relevance", filters=None):
    params = {"q": q, "page_size": page_size, "sort": sort}
    if cursor:
        params["cursor"] = cursor
    if filters:
        for key, val in filters.items():
            params[f"filter[{key}]"] = val
    resp = requests.get(
        "https://api.example.com/v1/search",
        params=params,
        headers={"Authorization": "Bearer eyJhbGciOi..."},
        timeout=2,
    )
    resp.raise_for_status()
    return resp.json()

page = search("wireless mouse", filters={"category": "peripherals"})
for hit in page["hits"]:
    print(hit["id"], hit["title"], hit["score"])

package main

import (
    "encoding/json"
    "fmt"
    "net/http"
    "net/url"
)

type Hit struct {
    ID    string  `json:"id"`
    Title string  `json:"title"`
    Score float64 `json:"score"`
}

type SearchResponse struct {
    Hits       []Hit  `json:"hits"`
    NextCursor string `json:"next_cursor"`
    TookMs     int    `json:"took_ms"`
}

func search(q, cursor string) (*SearchResponse, error) {
    u, _ := url.Parse("https://api.example.com/v1/search")
    qs := u.Query()
    qs.Set("q", q)
    qs.Set("page_size", "20")
    qs.Set("sort", "relevance")
    if cursor != "" {
        qs.Set("cursor", cursor)
    }
    u.RawQuery = qs.Encode()

    req, _ := http.NewRequest("GET", u.String(), nil)
    req.Header.Set("Authorization", "Bearer eyJhbGciOi...")
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return nil, err }
    defer resp.Body.Close()

    var sr SearchResponse
    if err := json.NewDecoder(resp.Body).Decode(&sr); err != nil { return nil, err }
    return &sr, nil
}

func main() {
    page, _ := search("wireless mouse", "")
    for _, h := range page.Hits {
        fmt.Println(h.ID, h.Title, h.Score)
    }
}

async function search(q, { cursor, pageSize = 20, sort = "relevance", filters } = {}) {
  const params = new URLSearchParams({ q, page_size: String(pageSize), sort });
  if (cursor) params.set("cursor", cursor);
  if (filters) {
    for (const [k, v] of Object.entries(filters)) {
      params.set(`filter[${k}]`, v);
    }
  }
  const resp = await fetch(`https://api.example.com/v1/search?${params}`, {
    headers: { Authorization: "Bearer eyJhbGciOi..." },
  });
  if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
  return resp.json();
}

const page = await search("wireless mouse", { filters: { category: "peripherals" } });
for (const hit of page.hits) {
  console.log(hit.id, hit.title, hit.score);
}

Latency budget#

The 300 ms p95 full-query budget breaks down as:

Phase	Budget	Notes
TLS + HTTP/2 setup	0 ms	Connection pool keeps these warm.
Gateway (rate limit, auth)	5 ms	In-memory token-bucket, JWT verify cached.
Cache lookup	2 ms	Redis in the same DC.
Indexer fan-out + merge	80 ms p95	The dominant cost.
Ranking + scoring	30 ms	After candidate retrieval.
Serialize + transport	20 ms	JSON encoding + ~20 KB body.
Margin for tail	60 ms	Headroom for slow shards.
Total	197 ms	Safely under 300 ms.

Typeahead’s 100 ms budget halves every line — and skips the cache (low hit rate) and ranking (top-N by prefix match is enough).

Trade-offs and extensions#

Decision	Why	Cost if requirements change
Cursor pagination, not page numbers	Stable under writes; index-friendly	Can’t jump to arbitrary page — UX trade-off
Filters as enum-only fields	Cacheable URLs; tractable index	Free-text filters require a re-design
Approximate `total_estimate`	O(1) to compute at scale	Users expect exact counts under 100
GET-only API	Cacheable at every layer	No personalisation possible without auth-aware caching
Single relevance + sort dimension	Simple model; predictable	Multi-dimensional ranking needs a richer sort grammar
60 s freshness SLO	Indexers can batch	Real-time use cases (chat search) need `< 5 s`

Likely follow-up extensions and the shape of the answer:

Personalisation. Move from anonymous to per-user ranking. Adds an auth requirement; changes cache key shape (user-aware); forces splitting the cache into per-user buckets.
Vector search. A vector mode that uses dense embeddings. Same API shape; new backend. Most teams build vector and lexical in parallel and merge results.
Multi-tenant search. Add tenant scoping at the API level (X-Tenant-Id header); per-tenant rate limits; per-tenant indexes for isolation.
Webhook on index-staleness. Push notifications to clients when their last query is now stale enough to re-run.

Mock interview follow-ups#

Questions interviewers reach for and the shortest correct answer:

“How do you prevent a deep-paginated request from blowing up?” — Cursor-based pagination caps it naturally; we never seek past N results. Hard limit at 10k.
“What’s the cache invalidation story?” — Cache key on the full query; TTL of 60 s (matching freshness SLO); plus a soft “stampede” lock around concurrent misses.
“How do you do typeahead in 100 ms?” — Separate index optimised for prefix matches; in-memory; warm cache at the edge; degrade gracefully on miss (return fewer results, not slower).
“How do you rank?” — Out of scope to design the ranking algorithm in an API round. The API contract is score: number; the engine behind it is BM25 / dense retrieval / a learned-to-rank model — the API doesn’t care.
“What happens at 10x scale?” — Shard the index by document partition (e.g. category); fan out the query; merge top-K. Add a read-replica gateway closer to the user.
“How do you A/B test ranking changes?” — Header-based experiment routing; the API contract stays unchanged; observability captures experiment_id per request.
“What if the indexer is down?” — Return cached results with a Warning: 110 header; fail-soft. If cache is also cold, return 503 with Retry-After: 1.

Design a File Service API — the next foundational system; range-requests + signed URLs.
Design a Comment Service API — cursor-based pagination, but mutating.
Design a Pub-Sub Service API — the asynchronous backbone you’ll plug into search-indexing later.
The API-Design Walk-through — the seven-step recipe this writeup followed.
REST — The Architectural Style — the architectural style behind the endpoint shape.