Design the CamelCamelCamel API
Price-tracking for Amazon products. A scraper-shaped API that must respect a partner's rate limits and ToS.
Context#
CamelCamelCamel is a price-tracking site for Amazon products. A user pastes an Amazon URL, the service records the product’s price over time, and the user can subscribe to an alert that fires when the price drops below a target. The product has existed since 2008 and predates a meaningful Amazon partner program for this use case — its entire data pipeline is shaped by the constraint that it does not own the underlying catalogue.
The interesting API-design question is not “how do you store price history” — that part is straightforward time-series storage. The question is: how do you design a public API on top of a partner whose terms you do not control, whose rate limits you cannot raise, and whose product page format changes whenever they want it to?
Hidden objectives an interviewer is probing for:
- Can you draw the partner boundary correctly — what’s yours, what’s Amazon’s, what bridges the two?
- Can you design a polite ingestion path that respects
robots.txt, the Product Advertising API quotas, and429semantics? - Can you separate the read API (fast, cached, public) from the ingestion pipeline (slow, partner-rate-limited, internal)?
- Can you write a clean alert subscription model without conflating “alert created” with “alert evaluated”?
- Can you talk about the legal/ToS exposure honestly without pretending it isn’t a design constraint?
This is an Intermediate-tier system because the architecture is straightforward once the partner constraint is internalised — but candidates who miss the partner constraint design something that wouldn’t survive its first cease-and-desist.
Requirements (functional and non-functional)#
Functional — in scope:
- Submit an Amazon URL or ASIN; service returns the canonical product record.
- Fetch price history for a product over a time range (1 day, 1 month, 1 year, all-time).
- Subscribe to a price alert: notify me when product
Xdrops toYor below. - Manage alert subscriptions: list, cancel, snooze.
- Receive alert notifications via email or webhook.
Functional — out of scope:
- The scraper infrastructure itself (headless-browser farms, residential proxy pools, fingerprinting evasion). This writeup designs the API that sits in front of an ingestion system; how the ingestion system fetches data is a separate problem covered downstream.
- Affiliate-revenue reporting (the business model exists but is orthogonal to the read API).
- Cross-marketplace tracking (multiple Amazon locales, Walmart, eBay) — single-marketplace v1.
- Camel-style buy box vs new vs used price decomposition — return a single “price” in v1; multiple price types in v2.
Non-functional:
- Read latency: price-history reads
<= 200 ms p95. Almost everything cacheable. - Ingestion freshness: per-product re-scrape interval of 1-6 hours depending on popularity tier. No real-time guarantees.
- Throughput: 1k read QPS sustained, 5k burst.
- Partner rate limit: respect Amazon’s Product Advertising API (PAAPI) request budget per second and per day. Treat
429from any partner endpoint as an absolute stop signal with exponential back-off. - Availability: 99.9% on the read path. Ingestion can degrade without taking the read API with it.
Use case diagram#
┌──────────────┐ │ End user │ └──────┬───────┘ │ ┌─────────────┼──────────────┐ ▼ ▼ ▼ [paste URL] [set alert] [view history] │ │ │ ▼ ▼ ▼ ┌───────────────────────────────────────┐ │ CamelCamelCamel Public API │ └────────────────┬──────────────────────┘ │ ▼ ┌────────────────────────────┐ │ Ingestion pipeline │ ──► [PAAPI / web scraper] │ (internal, rate-limited) │ │ └────────────────────────────┘ ▼ ┌───────────┐ │ Amazon │ └───────────┘Two actors. The end user talks only to the public API. The ingestion pipeline is internal — it bridges into Amazon under partner-rate-limit constraints, and it must never be on the synchronous read path.
Class diagram#
┌────────────────────────┐ │ Product │ ├────────────────────────┤ │ asin : string (PK) │ │ title : string │ │ image_url : string │ │ category : enum │ │ first_seen : ts │ │ last_refreshed : ts │ │ popularity_tier : enum │ // hot / warm / cold └──────────┬────────────┘ │ 1..* ▼ ┌────────────────────────┐ │ PricePoint │ ├────────────────────────┤ │ asin : string (FK) │ │ observed_at : ts │ │ price_cents : int │ │ currency : enum │ │ source : enum │ // paapi / scraper / cache │ in_stock : bool │ └────────────────────────┘
┌────────────────────────┐ ┌─────────────────────┐ │ Alert │ │ User │ ├────────────────────────┤ ├─────────────────────┤ │ id : uuid (PK) │ ────► │ id : uuid (PK) │ │ user_id : uuid (FK) │ │ email : string │ │ asin : string (FK) │ │ webhook_url? : url │ │ target_cents : int │ │ tier : enum │ │ direction : enum │ └─────────────────────┘ │ state : enum │ // Active | Triggered | Acknowledged | Cancelled │ created_at : ts │ │ last_evaluated : ts │ │ triggered_at? : ts │ └────────────────────────┘The schema is deliberately minimal. Product.asin is the primary key — Amazon’s stable identifier across page rewrites. PricePoint.source records whether the observation came from PAAPI (trusted, but rate-limited) or the scraper (less trusted, can drift if Amazon changes layout). Alert.state is the only piece of meaningful state on a write surface — it deserves a state machine.
Sequence diagram (key flows)#
The two flows worth showing: read price history (synchronous, cached) and create alert + trigger (asynchronous, evaluated against incoming PricePoint writes).
Read price history#
Client Gateway ReadAPI Cache PriceDB │ GET /products/B07.../price-history?range=1y │ │ │─────────────────►│ │ │ │ │ │ rate limit + auth │ │ │ │ │─────────────►│ │ │ │ │ │ cache key │ │ │ │ │───────────►│ │ │ │ │ hit? │ │ │ │ │◄───────────│ │ │ │ │ miss │ │ │ │ │ read │ │ │ │ │─────────────────────────► │ │ │ │ rows │ │ │ │ │◄─────────────────────────│ │ │ │ set cache │ │ │ │ │───────────►│ │ │ │ 200 OK │ │ │ │ │◄─────────────│ │ │ │ 200 OK + body │ │ │ │ │◄─────────────────│ │ │ │The read path never talks to Amazon. It is entirely self-contained on the cached time-series.
Alert evaluation (asynchronous)#
Ingestion AlertEvaluator AlertDB Notifier │ new PricePoint(asin, $)│ │ │ │─────────────────►│ │ │ │ │ │ load alerts(asin) │ │ │ │──────────────────►│ │ │ │ Alert[] │ │ │ │◄──────────────────│ │ │ │ for each: │ │ │ │ met target? │ │ │ │ transition state│ │ │ │──────────────────►│ │ │ │ updated alerts │ │ │ │◄──────────────────│ │ │ │ enqueue notify │ │ │ │───────────────────────────────►│ │ │ │ │ email / webhookAlerts are evaluated on the ingestion path, not on the read path. A user setting up an alert that targets a popular product gets near-real-time evaluation because that product is on a 1-hour refresh; a user setting up an alert on a cold product gets up to 6-hour latency. Both are fine for the use case.
Activity diagram (for non-trivial state)#
The Alert state machine:
[user creates alert] │ ▼ ┌───────────────┐ │ Active │ └───────┬───────┘ │ price <= target ▼ ┌───────────────┐ │ Triggered │── ack by user ─►┌────────────────┐ └───────┬───────┘ │ Acknowledged │ │ └────────┬───────┘ │ (not acked in 7d) │ re-arm │ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Cancelled │ │ Active │ └───────────────┘ └───────────────┘Two interesting transitions: the auto-cancel after 7 days unacknowledged prevents alerts from going stale on accounts the user forgot about. The re-arm from Acknowledged lets a user say “tell me if the price drops again” without recreating the alert. Without these two transitions the table grows unboundedly and users get spammed.
API implementation#
Endpoint catalogue#
| Method | Path | Purpose |
|---|---|---|
POST | /v1/products | Register a new product to track (idempotent by ASIN) |
GET | /v1/products/{asin} | Get product metadata |
GET | /v1/products/{asin}/price-history | Time-series for a product |
GET | /v1/products/{asin}/current | Current observed price |
POST | /v1/alerts | Create an alert |
GET | /v1/alerts | List user’s alerts |
GET | /v1/alerts/{id} | Single alert detail |
PATCH | /v1/alerts/{id} | Acknowledge, snooze, update target |
DELETE | /v1/alerts/{id} | Cancel an alert |
Note the absence of any endpoint that talks about scraping, refresh schedules, or partner sources. Those are internal concerns; clients don’t get to ask “refresh this product right now” — that would be a denial-of-service vector against the partner.
OpenAPI schema (excerpt)#
paths: /v1/products/{asin}/price-history: get: operationId: getPriceHistory parameters: - name: asin in: path required: true schema: { type: string, pattern: '^[A-Z0-9]{10}$' } - name: range in: query schema: type: string enum: [1d, 7d, 1m, 3m, 1y, all] default: 1m - name: aggregation in: query schema: type: string enum: [raw, hourly, daily] default: daily responses: '200': description: Time-series headers: Cache-Control: schema: { type: string, example: 'public, max-age=3600' } content: application/json: schema: $ref: '#/components/schemas/PriceHistoryResponse' '404': { description: Product not tracked } '429': { description: Too many requests } /v1/alerts: post: operationId: createAlert requestBody: required: true content: application/json: schema: type: object required: [asin, target_cents, direction] properties: asin: { type: string } target_cents: { type: integer, minimum: 1 } direction: type: string enum: [below, above] default: below responses: '201': description: Alert created content: application/json: schema: { $ref: '#/components/schemas/Alert' } '409': { description: Duplicate alert for asin/user/target }components: schemas: PriceHistoryResponse: type: object required: [asin, points] properties: asin: { type: string } currency: { type: string } points: type: array items: type: object required: [t, price_cents] properties: t: { type: string, format: date-time } price_cents: { type: integer } in_stock: { type: boolean } last_refreshed: { type: string, format: date-time } Alert: type: object properties: id: { type: string } asin: { type: string } target_cents: { type: integer } direction: { type: string, enum: [below, above] } state: type: string enum: [Active, Triggered, Acknowledged, Cancelled] created_at: { type: string, format: date-time } triggered_at: { type: string, format: date-time, nullable: true }A representative price-history response:
{ "asin": "B07XYZ1234", "currency": "USD", "points": [ { "t": "2026-04-30T00:00:00Z", "price_cents": 4999, "in_stock": true }, { "t": "2026-05-01T00:00:00Z", "price_cents": 4999, "in_stock": true }, { "t": "2026-05-15T00:00:00Z", "price_cents": 3499, "in_stock": true }, { "t": "2026-05-29T00:00:00Z", "price_cents": 3499, "in_stock": false } ], "last_refreshed": "2026-05-30T09:14:00Z"}Client samples — three languages#
The same “create an alert” flow in Python, Go, and Node.
import requests
def create_alert(asin, target_cents, token): resp = requests.post( "https://api.camelcamelcamel.example/v1/alerts", json={"asin": asin, "target_cents": target_cents, "direction": "below"}, headers={ "Authorization": f"Bearer {token}", "Idempotency-Key": f"alert-{asin}-{target_cents}", }, timeout=5, ) resp.raise_for_status() return resp.json()
alert = create_alert("B07XYZ1234", 2999, token="eyJhbGciOi...")print(alert["id"], alert["state"])package main
import ( "bytes" "encoding/json" "fmt" "net/http")
type CreateAlertReq struct { ASIN string `json:"asin"` TargetCents int `json:"target_cents"` Direction string `json:"direction"`}
type Alert struct { ID string `json:"id"` State string `json:"state"`}
func createAlert(asin string, target int, token string) (*Alert, error) { body, _ := json.Marshal(CreateAlertReq{ASIN: asin, TargetCents: target, Direction: "below"}) req, _ := http.NewRequest("POST", "https://api.camelcamelcamel.example/v1/alerts", bytes.NewReader(body)) req.Header.Set("Authorization", "Bearer "+token) req.Header.Set("Content-Type", "application/json") req.Header.Set("Idempotency-Key", fmt.Sprintf("alert-%s-%d", asin, target))
resp, err := http.DefaultClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close()
var a Alert if err := json.NewDecoder(resp.Body).Decode(&a); err != nil { return nil, err } return &a, nil}
func main() { a, _ := createAlert("B07XYZ1234", 2999, "eyJhbGciOi...") fmt.Println(a.ID, a.State)}async function createAlert(asin, targetCents, token) { const resp = await fetch("https://api.camelcamelcamel.example/v1/alerts", { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${token}`, "Idempotency-Key": `alert-${asin}-${targetCents}`, }, body: JSON.stringify({ asin, target_cents: targetCents, direction: "below", }), }); if (!resp.ok) throw new Error(`HTTP ${resp.status}`); return resp.json();}
const alert = await createAlert("B07XYZ1234", 2999, "eyJhbGciOi...");console.log(alert.id, alert.state);Caching and partner-respect#
The whole architecture rests on aggressive caching of the read path so the public API’s traffic shape is decoupled from the partner’s rate limit.
| Resource | TTL | Reason |
|---|---|---|
/products/{asin} (metadata) | 24 h | Title and image rarely change |
/products/{asin}/current | 5 min | Front-page accuracy without hammering ingestion |
/products/{asin}/price-history?range=1y | 1 h | Aggregate already lossy at hourly buckets |
/products/{asin}/price-history?range=1d | 5 min | Closer to ingestion cadence |
Ingestion-side: a token-bucket per partner endpoint. If Amazon’s PAAPI returns 429, the bucket halves its allowance and waits the Retry-After value, doubling back up only after a clean window. Scraping (the fallback path) has a stricter budget — robots.txt is parsed at boot and respected, and request rate is throttled per (domain, IP) pair to single-digit requests-per-minute.
Trade-offs and extensions#
| Decision | Why | Cost if requirements change |
|---|---|---|
| Read API never talks to partner | Public traffic decoupled from partner rate limit | ”Live price” feature impossible without rebuild |
| Aggregate downsampling at server | Cheaper transport, faster render | Fine-grained analysis needs aggregation=raw, slower |
| Alert evaluation on ingestion path | Alerts fire as data arrives, no separate scheduler | Coarser products evaluated less often |
| ASIN as PK | Stable identifier; Amazon’s own key | Multi-marketplace requires (asin, locale) composite |
| No “refresh now” endpoint | Protects partner | Power users will scrape us instead |
| PAAPI preferred, scraping as fallback | Partner-compliant when possible | Partner deprecation = scrape harder, ToS exposure |
Likely follow-up extensions:
- Multi-marketplace. Add
localeto the PK. Per-locale ingestion buckets. Per-locale alert evaluation. - Buy-box decomposition. Split
priceintonew,used,third-party. Schema-additive; safe to roll out. - Webhook alerts. Already in the schema (
User.webhook_url). Notifier becomes a separate worker pool; webhooks signed with HMAC; retries with exponential back-off and a dead-letter queue. - Public price-history embed. A read-only iframe widget for affiliates. Signed embed URLs; cached for hours; same data plane.
- GraphQL gateway. For browser extensions that want product + history + alert state in one call. Same backend; thin gateway in front.
Mock interview follow-ups#
- “What if Amazon blocks your scraper?” The PAAPI path is the durable surface; the scraper is the fallback for products not in the partner catalogue. We accept that some products go stale; we don’t degrade the read API for the rest. Long-term: partner API is the right answer.
- “How do you scale to 10M tracked products?” Tier products by popularity. Hot products on 1-hour refresh, warm on 6-hour, cold on weekly. PriceDB is time-series (Druid / Timescale / ClickHouse); reads hit aggregated rollups.
- “What’s the alert-trigger latency?” Bounded by the popularity tier’s refresh interval. A 30-second SLA is unrealistic; a 1-6 hour SLA is honest. Alerts on products newly added jump to hot tier for 24 hours so the first observation lands fast.
- “How do you handle Amazon changing the product page format?” The scraper has per-section selectors with health-monitoring. When success rate drops, ingestion routes around the broken selector, alerts the team, and the cache TTL is held at last-known-good until repair. The read API never returns “unknown price” — it returns the most recent valid observation with
last_refreshedso the client can decide. - “What about the legal angle?” PAAPI usage is contractually fine. Scraping is ToS-grey; we mitigate by respecting
robots.txt, identifying with a known User-Agent, throttling aggressively, and not caching content we don’t have permission to redistribute (we redistribute prices, which are facts and uncopyrightable). When PAAPI covers the catalogue, the scraper retires. - “How is this different from a generic price-comparison API?” Single partner = single rate-limit envelope to negotiate, single product schema to normalise, single ToS to respect. Multi-partner price comparison adds per-partner adapters and a normalisation layer — same architecture, broader integration footprint.
Related#
- Design a Search Service API — the read-heavy / cache-heavy pattern this design inherits.
- Design a Pub-Sub Service API — how alert evaluation hands off to notifier workers.
- Rate Limiting — the partner-side discipline that defines the entire architecture.
- Design the Stripe Payment API — idempotency-key conventions copied directly into alert creation.
- What Causes API Failures — A Taxonomy — partner-coupling is failure mode three in the taxonomy.