Design a Pub-Sub Service API

Topics, subscribers, fan-out, at-least-once vs exactly-once. The asynchronous backbone of every modern microservice mesh.

System Intermediate
15 min read
pub-sub api-design messaging event-driven

Context#

A pub-sub service is the asynchronous backbone of every microservice mesh built since 2010. AWS SNS+SQS, Google Cloud Pub/Sub, Azure Service Bus, NATS, Apache Pulsar, and (with caveats) Apache Kafka all expose variations on the same shape. One producer emits an event to a topic; N consumers receive it. The producer doesn’t know who the consumers are; the consumers don’t know who produced it. This decoupling is the whole point — services evolve independently, scale independently, fail independently.

What makes this a strong interview question is that the API surface looks small (publish, subscribe, pull) but the semantics are unusually rich:

  • Delivery guarantees: at-most-once, at-least-once, exactly-once — each has a different API consequence.
  • Push vs pull: a delivery_url on the subscription versus the consumer calling pull on a schedule.
  • Fan-out: 1 topic → N subscriptions; each subscription is an independent queue.
  • Ordering: per-topic, per-key, or none — pick one and live with the trade.
  • Dead-letter queues: where messages go when they can’t be processed after N retries.
  • Backpressure: what happens when consumers can’t keep up.

The interviewer’s hidden objectives:

  • Can you explain the difference between a topic and a subscription correctly? (Topics are stateless; subscriptions are queues.)
  • Can you defend at-least-once as the default and explain why exactly-once is hard?
  • Do you specify acknowledgement as a first-class operation, not a happy-path side effect?
  • Can you sketch the fan-out model at scale — does each subscription get its own copy?
  • Can you write the dead-letter and retry policy as an explicit config, not folklore?

Reference: Google Cloud Pub/Sub for the API shape; AWS SNS+SQS for the topic-fans-into-queues mental model; Kafka for ordering and partition guarantees.

Requirements (functional and non-functional)#

Functional — in scope:

  • Create / delete topics.
  • Publish a message (or a batch) to a topic.
  • Create / delete subscriptions on a topic (push or pull mode).
  • Pull messages from a pull-subscription.
  • Acknowledge messages (so they’re removed from the queue).
  • Configure dead-letter policy per subscription.
  • Configure per-subscription filtering (by message attributes).

Functional — out of scope:

  • Stream-replay / re-consumption — Kafka’s specialty; we are a queue-shaped pub-sub, not a log.
  • Schema registry — assume the producer and consumer agree out-of-band.
  • Encryption-at-rest — assume the underlying storage handles it.
  • Multi-region replication — separate axis; v1 is single-region.

Non-functional:

  • Latency: publish <= 50 ms p95; end-to-end (publish → delivered) <= 1 s p95.
  • Throughput: 1 M messages/s aggregate; 10k messages/s per topic.
  • Availability: 99.95% on the publish path (writes must not fail); 99.9% on the pull/push path.
  • Delivery guarantee: at-least-once by default. Idempotent consumers achieve effectively-once.
  • Durability: 11 nines once publish returns 200.
  • Maximum message size: 1 MB (larger payloads use file-service and pass a reference).
  • Retention: up to 7 days unacknowledged; configurable.

Use case diagram#

┌──────────────┐ ┌──────────────┐
│ Producer │ │ Consumer │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
[publish event] [pull messages]
[batch publish] [acknowledge]
[confirm receipt] [reject (nack)]
│ │
└──────────────┬──────────────────┘
┌───────────────────────────┐
│ Pub-Sub Service API │
└───────────────────────────┘
┌──────────┴──────────┐
▼ ▼
[Admin] [Async push]
create/delete to webhook URL
topic/subscription

Three actors: producer, consumer, admin. Both producer and consumer talk only to the API; they never know each other.

Class diagram#

┌───────────────────────┐
│ PubSubService │
├───────────────────────┤
│ createTopic(req) │
│ publish(t, msgs) │
│ createSubscription(r) │
│ pull(sub, max) │
│ ack(sub, ids) │
│ nack(sub, ids) │
└──────────┬────────────┘
│ owns
┌───────────────────────┐ ┌─────────────────────┐
│ Topic │ 1 ─── * │ Subscription │
├───────────────────────┤ ├─────────────────────┤
│ name │ │ name │
│ created_at │ │ topic_name │
│ partition_count │ │ mode (push|pull) │
│ retention_seconds │ │ delivery_url? │
└──────────┬────────────┘ │ ack_deadline_sec │
│ publishes │ filter? │
▼ │ dead_letter? │
┌───────────────────────┐ └──────────┬──────────┘
│ Message │ │ enqueues into
├───────────────────────┤ ▼
│ id │ ┌─────────────────────┐
│ topic │ │ DeliveryAttempt │
│ ordering_key? │ ├─────────────────────┤
│ attributes (map) │ │ subscription │
│ data (bytes) │ │ message_id │
│ publish_time │ │ attempt_count │
└───────────────────────┘ │ lease_expires_at │
│ status │
└─────────────────────┘

Topic is stateless: messages flow through but aren’t owned by it. Subscription is where state lives — each subscription is conceptually a queue, with its own backlog. A message arrives at a topic and is copied into the queue of every matching subscription. Within a subscription, DeliveryAttempt tracks per-message lease and retry state.

Sequence diagram (key flows)#

The publish + fan-out flow:

Producer PubSubAPI Topic Sub-A Sub-B
│ POST /topics/orders:publish │ │ │
│──────────────────►│ │ │ │
│ │ validate│ │ │
│ │ persist │ │ │
│ │────────►│ │ │
│ │ matches subscription filters │
│ │ enqueue into Sub-A queue │
│ │─────────────────────────►│ │
│ │ enqueue into Sub-B queue │
│ │─────────────────────────────────────────►│
│ 200 + msg_id │ │ │ │
│◄──────────────────│ │ │ │

The pull + ack flow (pull-mode subscription):

Consumer PubSubAPI SubscriptionQueue
│ POST /subscriptions/{s}/pull?max=10
│──────────────────►│ │
│ │ fetch up to 10│
│ │ lease them for│ ack_deadline_seconds
│ │──────────────►│
│ │ 10 messages │
│ │◄──────────────│
│ 10 messages + ack_ids │
│◄──────────────────│ │
│ │
│ ... consumer processes ... │
│ │
│ POST /subscriptions/{s}/ack │
│ ack_ids: [a, b, c, ...] │
│──────────────────►│ │
│ │ delete acked │
│ │──────────────►│
│ 204 No Content │ │
│◄──────────────────│ │

The push flow inverts the direction — the service POSTs to the consumer’s delivery_url:

PubSubService Consumer (webhook)
│ POST <delivery_url> │
│ body: {message, ack_id} │
│─────────────────────────────────────►│
│ │ process
│ HTTP 200 OK ◄───────────────────────│
│ (200 == ack; non-2xx == nack) │

The nack + retry + dead-letter flow:

ack fails after N attempts
┌─────────────────────────┐
│ Subscription queue │
└─────────┬───────────────┘
│ message exhausts retry policy
┌─────────────────────────┐
│ Dead-Letter Subscription│
└─────────┬───────────────┘
│ operator inspects, replays, or drops
[out of scope]

Activity diagram (for non-trivial state)#

DeliveryAttempt.status is the meaningful state machine — what does a message look like during its retry-loop life?

[message published]
┌──────────┐
│ Queued │
└────┬─────┘
│ pull / push
┌──────────┐
│ Leased │ (consumer holds it for
└────┬─────┘ ack_deadline_seconds)
┌───────────┼──────────────────┐
ack │ │ lease │ nack /
│ │ expires │ explicit nack
▼ ▼ ▼
┌───────┐ ┌──────────┐ ┌──────────┐
│ Acked │ │ Re-Queued│ │ Re-Queued│
│(done) │ └────┬─────┘ └────┬─────┘
└───────┘ │ │
│ retry │ retry
▼ ▼
┌──────────────────────────┐
│ attempt_count++ │
└────────────┬─────────────┘
max attempts │ exceeded
┌────────────────────┐
│ Dead-Letter │
│ Subscription │
└────────────────────┘

The lease mechanism is the load-bearing invariant: when a consumer pulls a message, it gets a lease for ack_deadline_seconds (default 30 s). If the consumer crashes, the lease expires and the message is redelivered. This is why at-least-once is the default and the consumer must be idempotent. Exactly-once requires a deduplication store keyed on message ID — pushed to the consumer’s side, not the broker’s.

API implementation#

Endpoint catalogue#

MethodPathPurpose
POST/v1/topicsCreate topic
DELETE/v1/topics/{name}Delete topic
POST/v1/topics/{name}:publishPublish 1..N messages
POST/v1/subscriptionsCreate subscription (push or pull)
DELETE/v1/subscriptions/{name}Delete subscription
POST/v1/subscriptions/{name}:pullPull up to max_messages (pull only)
POST/v1/subscriptions/{name}:ackAcknowledge by ack_ids
POST/v1/subscriptions/{name}:nackNegative-acknowledge (immediate redelivery)
PATCH/v1/subscriptions/{name}Update push URL, deadline, filter, DLQ

We use resource-action verbs (:publish, :pull, :ack) because the operations aren’t naturally CRUD. This matches GCP’s API style guide and pub-sub-specific verbs across the industry.

OpenAPI schema (excerpt)#

OpenAPI 3.1 — Pub-Sub API (publish + pull + ack)
paths:
/v1/topics/{name}:publish:
post:
operationId: publish
parameters:
- { name: name, in: path, required: true, schema: { type: string } }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [messages]
properties:
messages:
type: array
maxItems: 1000
items:
type: object
required: [data]
properties:
data: { type: string, format: byte, maxLength: 1048576 }
attributes:
type: object
additionalProperties: { type: string }
ordering_key:
type: string
nullable: true
maxLength: 1024
responses:
'200':
description: Published
content:
application/json:
schema:
type: object
required: [message_ids]
properties:
message_ids:
type: array
items: { type: string }
'413': { description: Batch too large }
/v1/subscriptions/{name}:pull:
post:
operationId: pull
parameters:
- { name: name, in: path, required: true, schema: { type: string } }
requestBody:
required: false
content:
application/json:
schema:
type: object
properties:
max_messages: { type: integer, minimum: 1, maximum: 1000, default: 10 }
return_immediately: { type: boolean, default: false }
responses:
'200':
description: Up to max_messages messages
content:
application/json:
schema:
type: object
properties:
received_messages:
type: array
items:
type: object
required: [ack_id, message]
properties:
ack_id: { type: string }
message: { $ref: '#/components/schemas/Message' }
/v1/subscriptions/{name}:ack:
post:
operationId: acknowledge
parameters:
- { name: name, in: path, required: true, schema: { type: string } }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [ack_ids]
properties:
ack_ids:
type: array
maxItems: 1000
items: { type: string }
responses:
'204': { description: Acked }
components:
schemas:
Message:
type: object
required: [id, data, publish_time]
properties:
id: { type: string }
data: { type: string, format: byte }
attributes:
type: object
additionalProperties: { type: string }
ordering_key: { type: string, nullable: true }
publish_time: { type: string, format: date-time }
Subscription:
type: object
required: [name, topic, mode, ack_deadline_seconds]
properties:
name: { type: string }
topic: { type: string }
mode:
type: string
enum: [push, pull]
delivery_url:
type: string
format: uri
nullable: true
ack_deadline_seconds: { type: integer, minimum: 10, maximum: 600 }
filter: { type: string, nullable: true }
dead_letter:
type: object
nullable: true
properties:
topic: { type: string }
max_delivery_attempts: { type: integer, minimum: 5, maximum: 100 }

Push body — raw HTTP#

The push delivery webhook receives this body:

{
"message": {
"id": "msg_01HZ...",
"data": "eyJvcmRlcl9pZCI6IDEyMyB9",
"attributes": { "event": "order.created", "region": "us-east-1" },
"publish_time": "2026-05-30T12:34:56Z",
"ordering_key": "customer-789"
},
"ack_id": "ack_proj-1234_sub-orders_abc"
}

A 2xx response from the consumer means ack; anything else means nack and re-deliver after the policy’s backoff.

Client samples — three languages#

The publish + pull + ack triad, in three languages.

Pub-Sub client — Python
import base64, json, time, requests
API = "https://api.example.com"
TOKEN = "Bearer eyJhbGciOi..."
def publish(topic, messages):
body = {"messages": [
{"data": base64.b64encode(json.dumps(m).encode()).decode(),
"attributes": {"event": m.get("type", "unknown")}}
for m in messages
]}
return requests.post(
f"{API}/v1/topics/{topic}:publish",
json=body,
headers={"Authorization": TOKEN},
timeout=2,
).json()
def pull_and_ack(sub, processor):
r = requests.post(
f"{API}/v1/subscriptions/{sub}:pull",
json={"max_messages": 100},
headers={"Authorization": TOKEN},
timeout=10,
).json()
ack_ids = []
for env in r.get("received_messages", []):
data = base64.b64decode(env["message"]["data"])
try:
processor(json.loads(data))
ack_ids.append(env["ack_id"])
except Exception:
pass # don't ack; will redeliver
if ack_ids:
requests.post(
f"{API}/v1/subscriptions/{sub}:ack",
json={"ack_ids": ack_ids},
headers={"Authorization": TOKEN},
)
publish("orders", [{"type": "order.created", "order_id": 123}])
pull_and_ack("orders-billing", lambda m: print("got", m))

Latency budget#

PhaseBudgetNotes
Auth + validate5 msCached JWT
Persist to topic log20 ms p95Quorum write to 3 replicas
Fan-out to subscriptions15 msAsync; respond to publisher first
Publish response50 ms p95Total budget
Sub queue → pull/push1 s p95 end-to-endIncludes fan-out latency
Ack write10 msSingle quorum write

The publish budget is tight because producers block on it in the synchronous path. The end-to-end (publish to delivery) budget is looser because consumers tolerate it.

Trade-offs and extensions#

DecisionWhyCost if requirements change
At-least-once defaultAchievable with simple infraConsumers must be idempotent
Per-subscription queueIndependent consumer paceMore storage; 1 message becomes N copies
Push and pull both supportedDifferent consumer ergonomicsTwo delivery paths to operate
1 MB max message sizeTractable disk + networkLarger payloads via file-service reference
7-day retentionBounds storage; matches GCP/SNSCompliance use cases want 30+ days
Resource-action verbsFits non-CRUD semanticsLess RESTful; mild style mismatch
Lease + nack modelCrash-safe redeliverySlow consumers can extend lease or fall over

A clean contrast on the delivery-mode axis:

Push mode

  • Service POSTs to webhook
  • Low operational burden for consumer
  • Tight feedback loop on consumer failures
  • Consumer must be publicly reachable
  • Throughput limited by webhook 2xx rate

Pull mode

  • Consumer calls :pull on a schedule
  • Backpressure is consumer-controlled
  • Long-poll keeps latency low
  • Behind-NAT consumers work fine
  • Consumer owns scheduling logic

Likely follow-up extensions:

  • Ordering keys. Messages sharing an ordering_key are delivered in publish order to the same consumer. This requires per-key affinity at the queue level; throughput per key is bounded.
  • Filtering. Per-subscription filter expression on message attributes (e.g. event=order.* AND region=us-*). Cheap to add at fan-out time; saves consumer bandwidth.
  • Replay. Subscription becomes time-windowed; admin can rewind to T or to a message_id. Pushes the model towards Kafka and complicates retention.
  • Cross-region replication. Topics replicate asynchronously; subscriptions are per-region. Aurora-style topology.
  • Schema registry. Producer declares Protobuf/Avro/JSON-Schema per topic; service validates at publish time.

Mock interview follow-ups#

  • “What’s the difference between Pub/Sub and Kafka?” — Pub/Sub is a queue with topic-fanned-into-queues semantics; once consumed, messages are gone. Kafka is a distributed log; messages persist for the retention window regardless of consumption. Different abstractions, different operational profiles.
  • “How do you handle a slow consumer?” — Lease-renew API for long-running work; per-subscription backlog metrics; eventually the backlog policy triggers either dead-lettering (drop oldest) or backpressure (publishers see 429).
  • “What if a consumer crashes mid-processing?” — The lease expires; the message is redelivered to another instance. The consumer must be idempotent, keyed on message.id.
  • “How do you achieve exactly-once?” — At-least-once delivery plus consumer-side dedup keyed on message.id against a persistent store. Some platforms (Kafka, GCP Pub/Sub) offer transactional helpers, but the durable dedup is still the load-bearing primitive.
  • “What about ordering?” — Off by default. Opt-in per topic; ordering_key directs same-key messages to the same partition and the same consumer instance. Throughput per key is bounded by single-threaded consumer.
  • “Why not just use a database table as a queue?” — Polling latency, fairness, contention on UPDATE ... SET locked=true. Production-grade pub-sub is a different optimisation target.
  • “How is this different from SQS?” — SQS is the per-subscription queue without the multi-subscriber topic; you compose SNS+SQS to get our shape. Our API is closer to GCP Pub/Sub, which presents topic and subscription as first-class resources.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.