Design the CamelCamelCamel API

Price-tracking for Amazon products. A scraper-shaped API that must respect a partner's rate limits and ToS.

System Intermediate
14 min read
price-tracking scraping rate-limiting partner-api alerts
Companies this resembles: CamelCamelCamel · Amazon

Context#

CamelCamelCamel is a price-tracking site for Amazon products. A user pastes an Amazon URL, the service records the product’s price over time, and the user can subscribe to an alert that fires when the price drops below a target. The product has existed since 2008 and predates a meaningful Amazon partner program for this use case — its entire data pipeline is shaped by the constraint that it does not own the underlying catalogue.

The interesting API-design question is not “how do you store price history” — that part is straightforward time-series storage. The question is: how do you design a public API on top of a partner whose terms you do not control, whose rate limits you cannot raise, and whose product page format changes whenever they want it to?

Hidden objectives an interviewer is probing for:

  • Can you draw the partner boundary correctly — what’s yours, what’s Amazon’s, what bridges the two?
  • Can you design a polite ingestion path that respects robots.txt, the Product Advertising API quotas, and 429 semantics?
  • Can you separate the read API (fast, cached, public) from the ingestion pipeline (slow, partner-rate-limited, internal)?
  • Can you write a clean alert subscription model without conflating “alert created” with “alert evaluated”?
  • Can you talk about the legal/ToS exposure honestly without pretending it isn’t a design constraint?

This is an Intermediate-tier system because the architecture is straightforward once the partner constraint is internalised — but candidates who miss the partner constraint design something that wouldn’t survive its first cease-and-desist.

Requirements (functional and non-functional)#

Functional — in scope:

  • Submit an Amazon URL or ASIN; service returns the canonical product record.
  • Fetch price history for a product over a time range (1 day, 1 month, 1 year, all-time).
  • Subscribe to a price alert: notify me when product X drops to Y or below.
  • Manage alert subscriptions: list, cancel, snooze.
  • Receive alert notifications via email or webhook.

Functional — out of scope:

  • The scraper infrastructure itself (headless-browser farms, residential proxy pools, fingerprinting evasion). This writeup designs the API that sits in front of an ingestion system; how the ingestion system fetches data is a separate problem covered downstream.
  • Affiliate-revenue reporting (the business model exists but is orthogonal to the read API).
  • Cross-marketplace tracking (multiple Amazon locales, Walmart, eBay) — single-marketplace v1.
  • Camel-style buy box vs new vs used price decomposition — return a single “price” in v1; multiple price types in v2.

Non-functional:

  • Read latency: price-history reads <= 200 ms p95. Almost everything cacheable.
  • Ingestion freshness: per-product re-scrape interval of 1-6 hours depending on popularity tier. No real-time guarantees.
  • Throughput: 1k read QPS sustained, 5k burst.
  • Partner rate limit: respect Amazon’s Product Advertising API (PAAPI) request budget per second and per day. Treat 429 from any partner endpoint as an absolute stop signal with exponential back-off.
  • Availability: 99.9% on the read path. Ingestion can degrade without taking the read API with it.

Use case diagram#

┌──────────────┐
│ End user │
└──────┬───────┘
┌─────────────┼──────────────┐
▼ ▼ ▼
[paste URL] [set alert] [view history]
│ │ │
▼ ▼ ▼
┌───────────────────────────────────────┐
│ CamelCamelCamel Public API │
└────────────────┬──────────────────────┘
┌────────────────────────────┐
│ Ingestion pipeline │ ──► [PAAPI / web scraper]
│ (internal, rate-limited) │ │
└────────────────────────────┘ ▼
┌───────────┐
│ Amazon │
└───────────┘

Two actors. The end user talks only to the public API. The ingestion pipeline is internal — it bridges into Amazon under partner-rate-limit constraints, and it must never be on the synchronous read path.

Class diagram#

┌────────────────────────┐
│ Product │
├────────────────────────┤
│ asin : string (PK) │
│ title : string │
│ image_url : string │
│ category : enum │
│ first_seen : ts │
│ last_refreshed : ts │
│ popularity_tier : enum │ // hot / warm / cold
└──────────┬────────────┘
│ 1..*
┌────────────────────────┐
│ PricePoint │
├────────────────────────┤
│ asin : string (FK) │
│ observed_at : ts │
│ price_cents : int │
│ currency : enum │
│ source : enum │ // paapi / scraper / cache
│ in_stock : bool │
└────────────────────────┘
┌────────────────────────┐ ┌─────────────────────┐
│ Alert │ │ User │
├────────────────────────┤ ├─────────────────────┤
│ id : uuid (PK) │ ────► │ id : uuid (PK) │
│ user_id : uuid (FK) │ │ email : string │
│ asin : string (FK) │ │ webhook_url? : url │
│ target_cents : int │ │ tier : enum │
│ direction : enum │ └─────────────────────┘
│ state : enum │ // Active | Triggered | Acknowledged | Cancelled
│ created_at : ts │
│ last_evaluated : ts │
│ triggered_at? : ts │
└────────────────────────┘

The schema is deliberately minimal. Product.asin is the primary key — Amazon’s stable identifier across page rewrites. PricePoint.source records whether the observation came from PAAPI (trusted, but rate-limited) or the scraper (less trusted, can drift if Amazon changes layout). Alert.state is the only piece of meaningful state on a write surface — it deserves a state machine.

Sequence diagram (key flows)#

The two flows worth showing: read price history (synchronous, cached) and create alert + trigger (asynchronous, evaluated against incoming PricePoint writes).

Read price history#

Client Gateway ReadAPI Cache PriceDB
│ GET /products/B07.../price-history?range=1y │ │
│─────────────────►│ │ │ │
│ │ rate limit + auth │ │ │
│ │─────────────►│ │ │
│ │ │ cache key │ │
│ │ │───────────►│ │
│ │ │ hit? │ │
│ │ │◄───────────│ │
│ │ │ miss │ │
│ │ │ read │ │
│ │ │─────────────────────────► │
│ │ │ rows │ │
│ │ │◄─────────────────────────│
│ │ │ set cache │ │
│ │ │───────────►│ │
│ │ 200 OK │ │ │
│ │◄─────────────│ │ │
│ 200 OK + body │ │ │ │
│◄─────────────────│ │ │ │

The read path never talks to Amazon. It is entirely self-contained on the cached time-series.

Alert evaluation (asynchronous)#

Ingestion AlertEvaluator AlertDB Notifier
│ new PricePoint(asin, $)│ │ │
│─────────────────►│ │ │ │
│ │ load alerts(asin) │ │
│ │──────────────────►│ │
│ │ Alert[] │ │
│ │◄──────────────────│ │
│ │ for each: │ │
│ │ met target? │ │
│ │ transition state│ │
│ │──────────────────►│ │
│ │ updated alerts │ │
│ │◄──────────────────│ │
│ │ enqueue notify │ │
│ │───────────────────────────────►│
│ │ │ │ email / webhook

Alerts are evaluated on the ingestion path, not on the read path. A user setting up an alert that targets a popular product gets near-real-time evaluation because that product is on a 1-hour refresh; a user setting up an alert on a cold product gets up to 6-hour latency. Both are fine for the use case.

Activity diagram (for non-trivial state)#

The Alert state machine:

[user creates alert]
┌───────────────┐
│ Active │
└───────┬───────┘
│ price <= target
┌───────────────┐
│ Triggered │── ack by user ─►┌────────────────┐
└───────┬───────┘ │ Acknowledged │
│ └────────┬───────┘
│ (not acked in 7d) │ re-arm
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Cancelled │ │ Active │
└───────────────┘ └───────────────┘

Two interesting transitions: the auto-cancel after 7 days unacknowledged prevents alerts from going stale on accounts the user forgot about. The re-arm from Acknowledged lets a user say “tell me if the price drops again” without recreating the alert. Without these two transitions the table grows unboundedly and users get spammed.

API implementation#

Endpoint catalogue#

MethodPathPurpose
POST/v1/productsRegister a new product to track (idempotent by ASIN)
GET/v1/products/{asin}Get product metadata
GET/v1/products/{asin}/price-historyTime-series for a product
GET/v1/products/{asin}/currentCurrent observed price
POST/v1/alertsCreate an alert
GET/v1/alertsList user’s alerts
GET/v1/alerts/{id}Single alert detail
PATCH/v1/alerts/{id}Acknowledge, snooze, update target
DELETE/v1/alerts/{id}Cancel an alert

Note the absence of any endpoint that talks about scraping, refresh schedules, or partner sources. Those are internal concerns; clients don’t get to ask “refresh this product right now” — that would be a denial-of-service vector against the partner.

OpenAPI schema (excerpt)#

OpenAPI 3.1 — CamelCamelCamel API
paths:
/v1/products/{asin}/price-history:
get:
operationId: getPriceHistory
parameters:
- name: asin
in: path
required: true
schema: { type: string, pattern: '^[A-Z0-9]{10}$' }
- name: range
in: query
schema:
type: string
enum: [1d, 7d, 1m, 3m, 1y, all]
default: 1m
- name: aggregation
in: query
schema:
type: string
enum: [raw, hourly, daily]
default: daily
responses:
'200':
description: Time-series
headers:
Cache-Control:
schema: { type: string, example: 'public, max-age=3600' }
content:
application/json:
schema:
$ref: '#/components/schemas/PriceHistoryResponse'
'404': { description: Product not tracked }
'429': { description: Too many requests }
/v1/alerts:
post:
operationId: createAlert
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [asin, target_cents, direction]
properties:
asin: { type: string }
target_cents: { type: integer, minimum: 1 }
direction:
type: string
enum: [below, above]
default: below
responses:
'201':
description: Alert created
content:
application/json:
schema: { $ref: '#/components/schemas/Alert' }
'409': { description: Duplicate alert for asin/user/target }
components:
schemas:
PriceHistoryResponse:
type: object
required: [asin, points]
properties:
asin: { type: string }
currency: { type: string }
points:
type: array
items:
type: object
required: [t, price_cents]
properties:
t: { type: string, format: date-time }
price_cents: { type: integer }
in_stock: { type: boolean }
last_refreshed: { type: string, format: date-time }
Alert:
type: object
properties:
id: { type: string }
asin: { type: string }
target_cents: { type: integer }
direction: { type: string, enum: [below, above] }
state:
type: string
enum: [Active, Triggered, Acknowledged, Cancelled]
created_at: { type: string, format: date-time }
triggered_at: { type: string, format: date-time, nullable: true }

A representative price-history response:

GET /v1/products/B07XYZ1234/price-history?range=1m response
{
"asin": "B07XYZ1234",
"currency": "USD",
"points": [
{ "t": "2026-04-30T00:00:00Z", "price_cents": 4999, "in_stock": true },
{ "t": "2026-05-01T00:00:00Z", "price_cents": 4999, "in_stock": true },
{ "t": "2026-05-15T00:00:00Z", "price_cents": 3499, "in_stock": true },
{ "t": "2026-05-29T00:00:00Z", "price_cents": 3499, "in_stock": false }
],
"last_refreshed": "2026-05-30T09:14:00Z"
}

Client samples — three languages#

The same “create an alert” flow in Python, Go, and Node.

Create alert — Python
import requests
def create_alert(asin, target_cents, token):
resp = requests.post(
"https://api.camelcamelcamel.example/v1/alerts",
json={"asin": asin, "target_cents": target_cents, "direction": "below"},
headers={
"Authorization": f"Bearer {token}",
"Idempotency-Key": f"alert-{asin}-{target_cents}",
},
timeout=5,
)
resp.raise_for_status()
return resp.json()
alert = create_alert("B07XYZ1234", 2999, token="eyJhbGciOi...")
print(alert["id"], alert["state"])

Caching and partner-respect#

The whole architecture rests on aggressive caching of the read path so the public API’s traffic shape is decoupled from the partner’s rate limit.

ResourceTTLReason
/products/{asin} (metadata)24 hTitle and image rarely change
/products/{asin}/current5 minFront-page accuracy without hammering ingestion
/products/{asin}/price-history?range=1y1 hAggregate already lossy at hourly buckets
/products/{asin}/price-history?range=1d5 minCloser to ingestion cadence

Ingestion-side: a token-bucket per partner endpoint. If Amazon’s PAAPI returns 429, the bucket halves its allowance and waits the Retry-After value, doubling back up only after a clean window. Scraping (the fallback path) has a stricter budget — robots.txt is parsed at boot and respected, and request rate is throttled per (domain, IP) pair to single-digit requests-per-minute.

Trade-offs and extensions#

DecisionWhyCost if requirements change
Read API never talks to partnerPublic traffic decoupled from partner rate limit”Live price” feature impossible without rebuild
Aggregate downsampling at serverCheaper transport, faster renderFine-grained analysis needs aggregation=raw, slower
Alert evaluation on ingestion pathAlerts fire as data arrives, no separate schedulerCoarser products evaluated less often
ASIN as PKStable identifier; Amazon’s own keyMulti-marketplace requires (asin, locale) composite
No “refresh now” endpointProtects partnerPower users will scrape us instead
PAAPI preferred, scraping as fallbackPartner-compliant when possiblePartner deprecation = scrape harder, ToS exposure

Likely follow-up extensions:

  • Multi-marketplace. Add locale to the PK. Per-locale ingestion buckets. Per-locale alert evaluation.
  • Buy-box decomposition. Split price into new, used, third-party. Schema-additive; safe to roll out.
  • Webhook alerts. Already in the schema (User.webhook_url). Notifier becomes a separate worker pool; webhooks signed with HMAC; retries with exponential back-off and a dead-letter queue.
  • Public price-history embed. A read-only iframe widget for affiliates. Signed embed URLs; cached for hours; same data plane.
  • GraphQL gateway. For browser extensions that want product + history + alert state in one call. Same backend; thin gateway in front.

Mock interview follow-ups#

  • “What if Amazon blocks your scraper?” The PAAPI path is the durable surface; the scraper is the fallback for products not in the partner catalogue. We accept that some products go stale; we don’t degrade the read API for the rest. Long-term: partner API is the right answer.
  • “How do you scale to 10M tracked products?” Tier products by popularity. Hot products on 1-hour refresh, warm on 6-hour, cold on weekly. PriceDB is time-series (Druid / Timescale / ClickHouse); reads hit aggregated rollups.
  • “What’s the alert-trigger latency?” Bounded by the popularity tier’s refresh interval. A 30-second SLA is unrealistic; a 1-6 hour SLA is honest. Alerts on products newly added jump to hot tier for 24 hours so the first observation lands fast.
  • “How do you handle Amazon changing the product page format?” The scraper has per-section selectors with health-monitoring. When success rate drops, ingestion routes around the broken selector, alerts the team, and the cache TTL is held at last-known-good until repair. The read API never returns “unknown price” — it returns the most recent valid observation with last_refreshed so the client can decide.
  • “What about the legal angle?” PAAPI usage is contractually fine. Scraping is ToS-grey; we mitigate by respecting robots.txt, identifying with a known User-Agent, throttling aggressively, and not caching content we don’t have permission to redistribute (we redistribute prices, which are facts and uncopyrightable). When PAAPI covers the catalogue, the scraper retires.
  • “How is this different from a generic price-comparison API?” Single partner = single rate-limit envelope to negotiate, single product schema to normalise, single ToS to respect. Multi-partner price comparison adds per-partner adapters and a normalisation layer — same architecture, broader integration footprint.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.