Event-Driven Architecture Protocols

Webhooks, server-sent events, Kafka, message queues. The push-shaped alternative to request/response.

Building Block Intermediate
11 min read
webhooks sse kafka queues event-driven

What it is#

An event-driven architecture inverts the request/response contract. Instead of the client asking and the server answering, the server emits an event when something happens and any interested party can consume it. There is no single wire-level protocol; there is a family of them, each with different durability, ordering, and delivery guarantees.

Four protocols cover most of the territory:

  • Webhooks — the server makes an HTTP POST to a URL the consumer registered. One-way, request/response-shaped, fire-and-forget-ish.
  • Server-Sent Events (SSE) — the consumer holds open an HTTP GET, the server streams events as text/event-stream. One-way, server → client, long-lived.
  • Apache Kafka (and similar log-shaped brokers) — a durable, ordered, replayable log of events. Consumers track their own position; events are kept for days.
  • Message queues (RabbitMQ, AWS SQS, Google Cloud Pub/Sub) — a broker holds messages until a consumer acks. No replay; once acked, it’s gone.

Each has a different contract. Webhook delivery is at-least-once with retry; SSE is best-effort with reconnection on Last-Event-ID; Kafka is durable and ordered per-partition; SQS is durable but unordered (FIFO queues exist but at cost). The choice between them is one of the most consequential in an API design.

When to use it#

Reach for event-driven protocols when:

  • The producer doesn’t know who’s listening. Stripe doesn’t know which webhook URL on which merchant will be hit; Kafka producer doesn’t know which consumer group will read the topic next week. Decoupling producers and consumers is the canonical event-driven win.
  • The consumer wants liveness, not polls. A chat app polling /messages every second is wasting cycles. SSE or WebSocket pushes when there’s news, sleeps when there isn’t.
  • Multiple consumers want the same event. A new-order event triggers fulfilment, billing, analytics, and a customer notification — all separate consumers. A pub/sub broker fans this out for free.
  • You want replay. Kafka keeps events for a week (or a year if you want); a new consumer can replay history. Webhook and SSE typically can’t.

Avoid event-driven protocols when:

  • You actually want a synchronous reply. Request/response is simpler; if the caller needs the answer immediately, don’t bounce through a queue.
  • Strong consistency is required. Eventual-consistency is the event-driven default. A bank transfer that has to debit then credit atomically is a transaction, not two events.
  • Operational simplicity matters more than throughput. Kafka is a system to operate. RabbitMQ is a system to operate. If you can solve it with a POST and a retry, that’s cheaper.

How it works#

Webhooks — HTTP POST from server to consumer#

The simplest event-driven protocol. The consumer registers a URL; the server POSTs to it when an event fires.

Stripe webhook delivery
POST /webhooks/stripe HTTP/1.1
Host: api.merchant.com
Content-Type: application/json
Stripe-Signature: t=1717075200,v1=hexsignature...
User-Agent: Stripe/1.0
{
"id": "evt_1NkXyzABC",
"type": "payment_intent.succeeded",
"created": 1717075200,
"data": { "object": { "id": "pi_3NkXyz", "amount": 4999, ... } }
}

The delivery contract is at-least-once with retry on non-2xx. Stripe retries failed webhooks with exponential backoff over 3 days. The consumer must:

  • Respond with a 2xx promptly (Stripe’s deadline is 30 seconds, GitHub’s is 10 seconds; whatever the producer’s docs say).
  • Verify the signature (HMAC-SHA256 over the raw body, keyed by a secret the consumer shared at registration). Without this, anyone can POST to the public webhook URL.
  • Deduplicate by event ID. Retries arrive with the same id. Storing seen IDs (Redis with TTL = the producer’s retry window) avoids double-processing.
  • Handle out-of-order delivery. Webhooks have no per-recipient ordering guarantee; payment_intent.created and payment_intent.succeeded can arrive in either order. Idempotent state machines, not sequential processing.

Slack, GitHub, Twilio, Auth0, and every payments processor ship webhooks. The pattern is universal; the details (signature scheme, retry window, ordering claim) vary per provider.

Server-Sent Events — one-way streaming over HTTP#

SSE is the lighter-weight cousin of WebSocket: one-way (server → client), HTTP-native, text-only. The client opens a GET with Accept: text/event-stream; the server holds the connection open and writes events.

Server response — SSE
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
id: 14721
event: message
data: {"user": "alice", "text": "hello"}
id: 14722
event: typing
data: {"user": "bob"}
id: 14723
event: message
data: {"user": "bob", "text": "hi"}

Each event is id:, event:, data: lines followed by a blank line. The browser’s EventSource API handles reconnection automatically — on a dropped connection, it reconnects and sends the last seen id as the Last-Event-ID header, letting the server resume.

Reconnect — client sends Last-Event-ID
GET /events HTTP/1.1
Accept: text/event-stream
Last-Event-ID: 14723

SSE is the right tool for: LLM token streaming (OpenAI, Anthropic), live notifications, dashboards, chat read receipts (one-way is enough), public-event tails (GitHub events). Not for: full-duplex chat, binary frames, anything requiring client-to-server pushes (use WebSocket).

Apache Kafka — the durable log#

Kafka is a different model: a distributed, replicated, append-only log with partitioned topics. Producers write to a topic; consumers read at their own pace and track their own offset.

Producer Topic "orders" (partitions p0, p1, p2)
│ produce(key=order_42, value=...)
│─────────────────────────────────►│ p1: [..., 1023, 1024, 1025, NEW]
│ │ p0: [..., 487, 488]
│ │ p2: [..., 612, 613, 614]
Consumer group "fulfilment" │
│ consume(p1, offset=1023) │
│─────────────────────────────────◄│
│ consume(p1, offset=1024) │
│─────────────────────────────────◄│
│ ... commits offset back │
│ │
Consumer group "analytics" │
│ reads from offset=0 (replay) │
│─────────────────────────────────◄│

Key contracts:

  • Ordering is per-partition, not per-topic. Messages with the same key (e.g. order_42) go to the same partition and are read in order. Messages with different keys can arrive in any order across the topic.
  • Durability is replicated. Each partition is replicated to N brokers; a write isn’t acked until M replicas have it (configurable).
  • Replay is built-in. A new consumer can start from offset 0 and re-read everything still in retention (7 days default, configurable up to forever).
  • Delivery is at-least-once by default, exactly-once with transactions. The exactly-once mode (idempotent producer + transactional consumer) has overhead; most teams stay at at-least-once and dedupe at the consumer.

Kafka is the right tool when: events are the source of truth (event sourcing), replay is needed, multiple consumer groups will read the same topic, throughput is high (>10k events/sec). Wrong tool when: small scale, simple notification, no replay need (use a queue or webhook instead).

Message queues — RabbitMQ, SQS, Cloud Pub/Sub#

Queues are simpler than logs: a broker holds a message until a consumer acks. Once acked, it’s gone.

Producer Queue "fulfilment" Consumer
│ send(msg) [msg1, msg2, msg3] │
│─────────────►│ │
│ │ deliver(msg1) │
│ │──────────────────────►│
│ │ │ process(msg1)
│ │ ack(msg1) │
│ │◄──────────────────────│
│ │ msg1 removed │
│ │ [msg2, msg3] │

SQS, RabbitMQ, and Cloud Pub/Sub all expose this shape with variations:

  • Visibility timeout — when a message is delivered, it’s hidden for N seconds. If the consumer acks within N, it’s removed; if not, it reappears for redelivery.
  • Dead-letter queue (DLQ) — after K redeliveries, the message moves to a separate queue for inspection. Prevents poison-message loops.
  • Fan-out — SQS topics + queues (or RabbitMQ exchanges + queues) let one publish hit multiple downstream queues, one per consumer.
  • FIFO mode — strict ordering with deduplication, at lower throughput. SQS FIFO is capped at 3000 msg/sec per group; standard SQS has no cap.

Queues are right for: work queues (process this asynchronously), email/SMS dispatch, image processing, background jobs. Wrong for: event replay, multiple consumer groups reading the same stream (Kafka is the right tool there).

Consuming a webhook — three languages#

Verifying a Stripe-style HMAC-signed webhook in Python, Go, and Node:

Webhook receiver with signature verification — Python (Flask)
import hmac
import hashlib
import time
from flask import Flask, request, abort
app = Flask(__name__)
WEBHOOK_SECRET = "whsec_..."
TOLERANCE_SECONDS = 300
seen = {} # in production use Redis
def verify_signature(body: bytes, header: str) -> None:
parts = dict(p.split("=", 1) for p in header.split(","))
timestamp = int(parts["t"])
if abs(time.time() - timestamp) > TOLERANCE_SECONDS:
abort(400, "stale signature")
expected = hmac.new(
WEBHOOK_SECRET.encode(),
f"{timestamp}.".encode() + body,
hashlib.sha256,
).hexdigest()
if not hmac.compare_digest(expected, parts["v1"]):
abort(400, "bad signature")
@app.post("/webhooks/stripe")
def webhook():
body = request.get_data()
verify_signature(body, request.headers["Stripe-Signature"])
event = request.get_json()
if event["id"] in seen:
return "", 200 # dedupe
seen[event["id"]] = True
# process the event idempotently
return "", 200

Variants#

ProtocolDirectionDurabilityOrderingReplayBest for
WebhookServer → consumer (HTTP POST)Producer retries 1-3 daysNoneNoneExternal integrations, partner notifications
SSEServer → client (long-lived GET)None (best-effort)In-order on one streamVia Last-Event-IDLive dashboards, LLM streaming, chat read
WebSocketBidirectionalNoneIn-orderNoneReal-time bidirectional (chat, games)
KafkaProducer → log → consumersReplicated, retention 7d–∞Per-partitionYes (offset-based)Event sourcing, multi-consumer fan-out, replay
Message queue (SQS, RabbitMQ)Producer → queue → one consumerUntil ackedNone (FIFO mode = yes)NoneWork queues, async jobs
MQTTPub/sub over TCP, low overheadOptionalPer-topicNoneIoT, mobile (low bandwidth)

Trade-offs#

What event-driven protocols give you:

  • Decoupling. Producers and consumers evolve independently.
  • Liveness without polling. Push is more efficient than pull when the event rate is low and the latency target is tight.
  • Fan-out. One event, many consumers, each at its own pace.
  • Replay (Kafka specifically). Reprocess history when a bug forces it.

What event-driven protocols cost you:

  • Operational complexity. A Kafka cluster is a system to operate; a webhook receiver is a system to monitor.
  • Debugging is harder. No per-event HTTP log line; you trace across producer, broker, and consumer.
  • Ordering and consistency are weaker by default. Most event systems are at-least-once; consumers must be idempotent. Strong-ordering modes exist but at cost.
  • Backpressure is the producer’s problem to think about. A slow consumer with no DLQ is a queue that grows until the broker tips over.

Common pitfalls#

  • No signature verification on webhooks. Public URL + no HMAC = anyone can fake events. Stripe, GitHub, Twilio all ship HMAC; use it.
  • Not deduplicating by event ID. Retries are guaranteed; non-idempotent consumers double-charge / double-email / double-ship.
  • Treating webhook delivery as synchronous. Respond with 202 Accepted and process asynchronously. A slow consumer chain causes the producer to mark the webhook as failed and retry the whole train.
  • Assuming webhook order. Events are reordered in flight; build state machines that tolerate any order.
  • Using Kafka for a simple notification. A 3-broker cluster to deliver one email-sent event is over-investing. A queue or a webhook is fine.
  • Using a queue for event sourcing. Once acked, the message is gone. If you want history, you want a log (Kafka), not a queue.
  • No DLQ. Poison messages loop forever, consuming worker capacity. Every queue needs a DLQ and an alert on it.
  • SSE through a CDN that buffers. Some CDNs buffer responses for compression; the stream arrives in chunks of N. Disable buffering on the SSE path (X-Accel-Buffering: no for nginx).
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.