Design the CamelCamelCamel API — API Design

Context#

CamelCamelCamel is a price-tracking site for Amazon products. A user pastes an Amazon URL, the service records the product’s price over time, and the user can subscribe to an alert that fires when the price drops below a target. The product has existed since 2008 and predates a meaningful Amazon partner program for this use case — its entire data pipeline is shaped by the constraint that it does not own the underlying catalogue.

The interesting API-design question is not “how do you store price history” — that part is straightforward time-series storage. The question is: how do you design a public API on top of a partner whose terms you do not control, whose rate limits you cannot raise, and whose product page format changes whenever they want it to?

Hidden objectives an interviewer is probing for:

Can you draw the partner boundary correctly — what’s yours, what’s Amazon’s, what bridges the two?
Can you design a polite ingestion path that respects robots.txt, the Product Advertising API quotas, and 429 semantics?
Can you separate the read API (fast, cached, public) from the ingestion pipeline (slow, partner-rate-limited, internal)?
Can you write a clean alert subscription model without conflating “alert created” with “alert evaluated”?
Can you talk about the legal/ToS exposure honestly without pretending it isn’t a design constraint?

This is an Intermediate-tier system because the architecture is straightforward once the partner constraint is internalised — but candidates who miss the partner constraint design something that wouldn’t survive its first cease-and-desist.

Requirements (functional and non-functional)#

Functional — in scope:

Submit an Amazon URL or ASIN; service returns the canonical product record.
Fetch price history for a product over a time range (1 day, 1 month, 1 year, all-time).
Subscribe to a price alert: notify me when product X drops to Y or below.
Manage alert subscriptions: list, cancel, snooze.
Receive alert notifications via email or webhook.

Functional — out of scope:

The scraper infrastructure itself (headless-browser farms, residential proxy pools, fingerprinting evasion). This writeup designs the API that sits in front of an ingestion system; how the ingestion system fetches data is a separate problem covered downstream.
Affiliate-revenue reporting (the business model exists but is orthogonal to the read API).
Cross-marketplace tracking (multiple Amazon locales, Walmart, eBay) — single-marketplace v1.
Camel-style buy box vs new vs used price decomposition — return a single “price” in v1; multiple price types in v2.

Non-functional:

Read latency: price-history reads <= 200 ms p95. Almost everything cacheable.
Ingestion freshness: per-product re-scrape interval of 1-6 hours depending on popularity tier. No real-time guarantees.
Throughput: 1k read QPS sustained, 5k burst.
Partner rate limit: respect Amazon’s Product Advertising API (PAAPI) request budget per second and per day. Treat 429 from any partner endpoint as an absolute stop signal with exponential back-off.
Availability: 99.9% on the read path. Ingestion can degrade without taking the read API with it.

Use case diagram#

              ┌──────────────┐
              │   End user   │
              └──────┬───────┘
                     │
       ┌─────────────┼──────────────┐
       ▼             ▼              ▼
 [paste URL]   [set alert]    [view history]
       │             │              │
       ▼             ▼              ▼
   ┌───────────────────────────────────────┐
   │       CamelCamelCamel Public API      │
   └────────────────┬──────────────────────┘
                    │
                    ▼
       ┌────────────────────────────┐
       │   Ingestion pipeline       │  ──►  [PAAPI / web scraper]
       │   (internal, rate-limited) │              │
       └────────────────────────────┘              ▼
                                            ┌───────────┐
                                            │  Amazon   │
                                            └───────────┘

Two actors. The end user talks only to the public API. The ingestion pipeline is internal — it bridges into Amazon under partner-rate-limit constraints, and it must never be on the synchronous read path.

Class diagram#

   ┌────────────────────────┐
   │       Product         │
   ├────────────────────────┤
   │ asin : string (PK)     │
   │ title : string         │
   │ image_url : string     │
   │ category : enum        │
   │ first_seen : ts        │
   │ last_refreshed : ts    │
   │ popularity_tier : enum │  // hot / warm / cold
   └──────────┬────────────┘
              │ 1..*
              ▼
   ┌────────────────────────┐
   │     PricePoint         │
   ├────────────────────────┤
   │ asin : string (FK)     │
   │ observed_at : ts       │
   │ price_cents : int      │
   │ currency : enum        │
   │ source : enum          │  // paapi / scraper / cache
   │ in_stock : bool        │
   └────────────────────────┘

   ┌────────────────────────┐         ┌─────────────────────┐
   │        Alert          │         │       User          │
   ├────────────────────────┤         ├─────────────────────┤
   │ id : uuid (PK)         │  ────►  │ id : uuid (PK)      │
   │ user_id : uuid (FK)    │         │ email : string      │
   │ asin : string (FK)     │         │ webhook_url? : url  │
   │ target_cents : int     │         │ tier : enum         │
   │ direction : enum       │         └─────────────────────┘
   │ state : enum           │  // Active | Triggered | Acknowledged | Cancelled
   │ created_at : ts        │
   │ last_evaluated : ts    │
   │ triggered_at? : ts     │
   └────────────────────────┘

The schema is deliberately minimal. Product.asin is the primary key — Amazon’s stable identifier across page rewrites. PricePoint.source records whether the observation came from PAAPI (trusted, but rate-limited) or the scraper (less trusted, can drift if Amazon changes layout). Alert.state is the only piece of meaningful state on a write surface — it deserves a state machine.

Sequence diagram (key flows)#

The two flows worth showing: read price history (synchronous, cached) and create alert + trigger (asynchronous, evaluated against incoming PricePoint writes).

Read price history#

 Client          Gateway          ReadAPI         Cache         PriceDB
   │ GET /products/B07.../price-history?range=1y │              │
   │─────────────────►│              │            │              │
   │                  │ rate limit + auth │       │              │
   │                  │─────────────►│            │              │
   │                  │              │  cache key │              │
   │                  │              │───────────►│              │
   │                  │              │   hit?     │              │
   │                  │              │◄───────────│              │
   │                  │              │   miss     │              │
   │                  │              │   read     │              │
   │                  │              │─────────────────────────► │
   │                  │              │   rows     │              │
   │                  │              │◄─────────────────────────│
   │                  │              │  set cache │              │
   │                  │              │───────────►│              │
   │                  │   200 OK     │            │              │
   │                  │◄─────────────│            │              │
   │   200 OK + body  │              │            │              │
   │◄─────────────────│              │            │              │

The read path never talks to Amazon. It is entirely self-contained on the cached time-series.

Alert evaluation (asynchronous)#

Ingestion        AlertEvaluator      AlertDB        Notifier
   │   new PricePoint(asin, $)│           │            │
   │─────────────────►│       │           │            │
   │                  │ load alerts(asin) │            │
   │                  │──────────────────►│            │
   │                  │ Alert[]           │            │
   │                  │◄──────────────────│            │
   │                  │ for each:         │            │
   │                  │   met target?     │            │
   │                  │   transition state│            │
   │                  │──────────────────►│            │
   │                  │ updated alerts    │            │
   │                  │◄──────────────────│            │
   │                  │   enqueue notify  │            │
   │                  │───────────────────────────────►│
   │                  │                   │            │  email / webhook

Alerts are evaluated on the ingestion path, not on the read path. A user setting up an alert that targets a popular product gets near-real-time evaluation because that product is on a 1-hour refresh; a user setting up an alert on a cold product gets up to 6-hour latency. Both are fine for the use case.

Activity diagram (for non-trivial state)#

The Alert state machine:

                  [user creates alert]
                          │
                          ▼
                  ┌───────────────┐
                  │   Active     │
                  └───────┬───────┘
                          │ price <= target
                          ▼
                  ┌───────────────┐
                  │  Triggered   │── ack by user ─►┌────────────────┐
                  └───────┬───────┘                │  Acknowledged  │
                          │                        └────────┬───────┘
                          │ (not acked in 7d)              │ re-arm
                          │                                  │
                          ▼                                  ▼
                  ┌───────────────┐                  ┌───────────────┐
                  │  Cancelled   │                  │   Active     │
                  └───────────────┘                  └───────────────┘

Two interesting transitions: the auto-cancel after 7 days unacknowledged prevents alerts from going stale on accounts the user forgot about. The re-arm from Acknowledged lets a user say “tell me if the price drops again” without recreating the alert. Without these two transitions the table grows unboundedly and users get spammed.

API implementation#

Endpoint catalogue#

Method	Path	Purpose
`POST`	`/v1/products`	Register a new product to track (idempotent by ASIN)
`GET`	`/v1/products/{asin}`	Get product metadata
`GET`	`/v1/products/{asin}/price-history`	Time-series for a product
`GET`	`/v1/products/{asin}/current`	Current observed price
`POST`	`/v1/alerts`	Create an alert
`GET`	`/v1/alerts`	List user’s alerts
`GET`	`/v1/alerts/{id}`	Single alert detail
`PATCH`	`/v1/alerts/{id}`	Acknowledge, snooze, update target
`DELETE`	`/v1/alerts/{id}`	Cancel an alert

Note the absence of any endpoint that talks about scraping, refresh schedules, or partner sources. Those are internal concerns; clients don’t get to ask “refresh this product right now” — that would be a denial-of-service vector against the partner.

OpenAPI schema (excerpt)#

paths:
  /v1/products/{asin}/price-history:
    get:
      operationId: getPriceHistory
      parameters:
        - name: asin
          in: path
          required: true
          schema: { type: string, pattern: '^[A-Z0-9]{10}$' }
        - name: range
          in: query
          schema:
            type: string
            enum: [1d, 7d, 1m, 3m, 1y, all]
            default: 1m
        - name: aggregation
          in: query
          schema:
            type: string
            enum: [raw, hourly, daily]
            default: daily
      responses:
        '200':
          description: Time-series
          headers:
            Cache-Control:
              schema: { type: string, example: 'public, max-age=3600' }
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/PriceHistoryResponse'
        '404': { description: Product not tracked }
        '429': { description: Too many requests }
  /v1/alerts:
    post:
      operationId: createAlert
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [asin, target_cents, direction]
              properties:
                asin: { type: string }
                target_cents: { type: integer, minimum: 1 }
                direction:
                  type: string
                  enum: [below, above]
                  default: below
      responses:
        '201':
          description: Alert created
          content:
            application/json:
              schema: { $ref: '#/components/schemas/Alert' }
        '409': { description: Duplicate alert for asin/user/target }
components:
  schemas:
    PriceHistoryResponse:
      type: object
      required: [asin, points]
      properties:
        asin: { type: string }
        currency: { type: string }
        points:
          type: array
          items:
            type: object
            required: [t, price_cents]
            properties:
              t: { type: string, format: date-time }
              price_cents: { type: integer }
              in_stock: { type: boolean }
        last_refreshed: { type: string, format: date-time }
    Alert:
      type: object
      properties:
        id: { type: string }
        asin: { type: string }
        target_cents: { type: integer }
        direction: { type: string, enum: [below, above] }
        state:
          type: string
          enum: [Active, Triggered, Acknowledged, Cancelled]
        created_at: { type: string, format: date-time }
        triggered_at: { type: string, format: date-time, nullable: true }

A representative price-history response:

{
  "asin": "B07XYZ1234",
  "currency": "USD",
  "points": [
    { "t": "2026-04-30T00:00:00Z", "price_cents": 4999, "in_stock": true },
    { "t": "2026-05-01T00:00:00Z", "price_cents": 4999, "in_stock": true },
    { "t": "2026-05-15T00:00:00Z", "price_cents": 3499, "in_stock": true },
    { "t": "2026-05-29T00:00:00Z", "price_cents": 3499, "in_stock": false }
  ],
  "last_refreshed": "2026-05-30T09:14:00Z"
}

Client samples — three languages#

The same “create an alert” flow in Python, Go, and Node.

import requests

def create_alert(asin, target_cents, token):
    resp = requests.post(
        "https://api.camelcamelcamel.example/v1/alerts",
        json={"asin": asin, "target_cents": target_cents, "direction": "below"},
        headers={
            "Authorization": f"Bearer {token}",
            "Idempotency-Key": f"alert-{asin}-{target_cents}",
        },
        timeout=5,
    )
    resp.raise_for_status()
    return resp.json()

alert = create_alert("B07XYZ1234", 2999, token="eyJhbGciOi...")
print(alert["id"], alert["state"])

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

type CreateAlertReq struct {
    ASIN        string `json:"asin"`
    TargetCents int    `json:"target_cents"`
    Direction   string `json:"direction"`
}

type Alert struct {
    ID    string `json:"id"`
    State string `json:"state"`
}

func createAlert(asin string, target int, token string) (*Alert, error) {
    body, _ := json.Marshal(CreateAlertReq{ASIN: asin, TargetCents: target, Direction: "below"})
    req, _ := http.NewRequest("POST", "https://api.camelcamelcamel.example/v1/alerts", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer "+token)
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("Idempotency-Key", fmt.Sprintf("alert-%s-%d", asin, target))

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var a Alert
    if err := json.NewDecoder(resp.Body).Decode(&a); err != nil {
        return nil, err
    }
    return &a, nil
}

func main() {
    a, _ := createAlert("B07XYZ1234", 2999, "eyJhbGciOi...")
    fmt.Println(a.ID, a.State)
}

async function createAlert(asin, targetCents, token) {
  const resp = await fetch("https://api.camelcamelcamel.example/v1/alerts", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${token}`,
      "Idempotency-Key": `alert-${asin}-${targetCents}`,
    },
    body: JSON.stringify({
      asin,
      target_cents: targetCents,
      direction: "below",
    }),
  });
  if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
  return resp.json();
}

const alert = await createAlert("B07XYZ1234", 2999, "eyJhbGciOi...");
console.log(alert.id, alert.state);

Caching and partner-respect#

The whole architecture rests on aggressive caching of the read path so the public API’s traffic shape is decoupled from the partner’s rate limit.

Resource	TTL	Reason
`/products/{asin}` (metadata)	24 h	Title and image rarely change
`/products/{asin}/current`	5 min	Front-page accuracy without hammering ingestion
`/products/{asin}/price-history?range=1y`	1 h	Aggregate already lossy at hourly buckets
`/products/{asin}/price-history?range=1d`	5 min	Closer to ingestion cadence

Ingestion-side: a token-bucket per partner endpoint. If Amazon’s PAAPI returns 429, the bucket halves its allowance and waits the Retry-After value, doubling back up only after a clean window. Scraping (the fallback path) has a stricter budget — robots.txt is parsed at boot and respected, and request rate is throttled per (domain, IP) pair to single-digit requests-per-minute.

Trade-offs and extensions#

Decision	Why	Cost if requirements change
Read API never talks to partner	Public traffic decoupled from partner rate limit	”Live price” feature impossible without rebuild
Aggregate downsampling at server	Cheaper transport, faster render	Fine-grained analysis needs `aggregation=raw`, slower
Alert evaluation on ingestion path	Alerts fire as data arrives, no separate scheduler	Coarser products evaluated less often
ASIN as PK	Stable identifier; Amazon’s own key	Multi-marketplace requires `(asin, locale)` composite
No “refresh now” endpoint	Protects partner	Power users will scrape us instead
PAAPI preferred, scraping as fallback	Partner-compliant when possible	Partner deprecation = scrape harder, ToS exposure

Likely follow-up extensions:

Multi-marketplace. Add locale to the PK. Per-locale ingestion buckets. Per-locale alert evaluation.
Buy-box decomposition. Split price into new, used, third-party. Schema-additive; safe to roll out.
Webhook alerts. Already in the schema (User.webhook_url). Notifier becomes a separate worker pool; webhooks signed with HMAC; retries with exponential back-off and a dead-letter queue.
Public price-history embed. A read-only iframe widget for affiliates. Signed embed URLs; cached for hours; same data plane.
GraphQL gateway. For browser extensions that want product + history + alert state in one call. Same backend; thin gateway in front.

Mock interview follow-ups#

“What if Amazon blocks your scraper?” The PAAPI path is the durable surface; the scraper is the fallback for products not in the partner catalogue. We accept that some products go stale; we don’t degrade the read API for the rest. Long-term: partner API is the right answer.
“How do you scale to 10M tracked products?” Tier products by popularity. Hot products on 1-hour refresh, warm on 6-hour, cold on weekly. PriceDB is time-series (Druid / Timescale / ClickHouse); reads hit aggregated rollups.
“What’s the alert-trigger latency?” Bounded by the popularity tier’s refresh interval. A 30-second SLA is unrealistic; a 1-6 hour SLA is honest. Alerts on products newly added jump to hot tier for 24 hours so the first observation lands fast.
“How do you handle Amazon changing the product page format?” The scraper has per-section selectors with health-monitoring. When success rate drops, ingestion routes around the broken selector, alerts the team, and the cache TTL is held at last-known-good until repair. The read API never returns “unknown price” — it returns the most recent valid observation with last_refreshed so the client can decide.
“What about the legal angle?” PAAPI usage is contractually fine. Scraping is ToS-grey; we mitigate by respecting robots.txt, identifying with a known User-Agent, throttling aggressively, and not caching content we don’t have permission to redistribute (we redistribute prices, which are facts and uncopyrightable). When PAAPI covers the catalogue, the scraper retires.
“How is this different from a generic price-comparison API?” Single partner = single rate-limit envelope to negotiate, single product schema to normalise, single ToS to respect. Multi-partner price comparison adds per-partner adapters and a normalisation layer — same architecture, broader integration footprint.

Design a Search Service API — the read-heavy / cache-heavy pattern this design inherits.
Design a Pub-Sub Service API — how alert evaluation hands off to notifier workers.
Rate Limiting — the partner-side discipline that defines the entire architecture.
Design the Stripe Payment API — idempotency-key conventions copied directly into alert creation.
What Causes API Failures — A Taxonomy — partner-coupling is failure mode three in the taxonomy.