Event-Driven Architecture Protocols
Webhooks, server-sent events, Kafka, message queues. The push-shaped alternative to request/response.
What it is#
An event-driven architecture inverts the request/response contract. Instead of the client asking and the server answering, the server emits an event when something happens and any interested party can consume it. There is no single wire-level protocol; there is a family of them, each with different durability, ordering, and delivery guarantees.
Four protocols cover most of the territory:
- Webhooks — the server makes an HTTP
POSTto a URL the consumer registered. One-way, request/response-shaped, fire-and-forget-ish. - Server-Sent Events (SSE) — the consumer holds open an HTTP
GET, the server streams events astext/event-stream. One-way, server → client, long-lived. - Apache Kafka (and similar log-shaped brokers) — a durable, ordered, replayable log of events. Consumers track their own position; events are kept for days.
- Message queues (RabbitMQ, AWS SQS, Google Cloud Pub/Sub) — a broker holds messages until a consumer acks. No replay; once acked, it’s gone.
Each has a different contract. Webhook delivery is at-least-once with retry; SSE is best-effort with reconnection on Last-Event-ID; Kafka is durable and ordered per-partition; SQS is durable but unordered (FIFO queues exist but at cost). The choice between them is one of the most consequential in an API design.
When to use it#
Reach for event-driven protocols when:
- The producer doesn’t know who’s listening. Stripe doesn’t know which webhook URL on which merchant will be hit; Kafka producer doesn’t know which consumer group will read the topic next week. Decoupling producers and consumers is the canonical event-driven win.
- The consumer wants liveness, not polls. A chat app polling
/messagesevery second is wasting cycles. SSE or WebSocket pushes when there’s news, sleeps when there isn’t. - Multiple consumers want the same event. A new-order event triggers fulfilment, billing, analytics, and a customer notification — all separate consumers. A pub/sub broker fans this out for free.
- You want replay. Kafka keeps events for a week (or a year if you want); a new consumer can replay history. Webhook and SSE typically can’t.
Avoid event-driven protocols when:
- You actually want a synchronous reply. Request/response is simpler; if the caller needs the answer immediately, don’t bounce through a queue.
- Strong consistency is required. Eventual-consistency is the event-driven default. A bank transfer that has to debit then credit atomically is a transaction, not two events.
- Operational simplicity matters more than throughput. Kafka is a system to operate. RabbitMQ is a system to operate. If you can solve it with a
POSTand a retry, that’s cheaper.
How it works#
Webhooks — HTTP POST from server to consumer#
The simplest event-driven protocol. The consumer registers a URL; the server POSTs to it when an event fires.
POST /webhooks/stripe HTTP/1.1Host: api.merchant.comContent-Type: application/jsonStripe-Signature: t=1717075200,v1=hexsignature...User-Agent: Stripe/1.0
{ "id": "evt_1NkXyzABC", "type": "payment_intent.succeeded", "created": 1717075200, "data": { "object": { "id": "pi_3NkXyz", "amount": 4999, ... } }}The delivery contract is at-least-once with retry on non-2xx. Stripe retries failed webhooks with exponential backoff over 3 days. The consumer must:
- Respond with a 2xx promptly (Stripe’s deadline is 30 seconds, GitHub’s is 10 seconds; whatever the producer’s docs say).
- Verify the signature (HMAC-SHA256 over the raw body, keyed by a secret the consumer shared at registration). Without this, anyone can
POSTto the public webhook URL. - Deduplicate by event ID. Retries arrive with the same
id. Storing seen IDs (Redis with TTL = the producer’s retry window) avoids double-processing. - Handle out-of-order delivery. Webhooks have no per-recipient ordering guarantee;
payment_intent.createdandpayment_intent.succeededcan arrive in either order. Idempotent state machines, not sequential processing.
Slack, GitHub, Twilio, Auth0, and every payments processor ship webhooks. The pattern is universal; the details (signature scheme, retry window, ordering claim) vary per provider.
Server-Sent Events — one-way streaming over HTTP#
SSE is the lighter-weight cousin of WebSocket: one-way (server → client), HTTP-native, text-only. The client opens a GET with Accept: text/event-stream; the server holds the connection open and writes events.
HTTP/1.1 200 OKContent-Type: text/event-streamCache-Control: no-cacheConnection: keep-alive
id: 14721event: messagedata: {"user": "alice", "text": "hello"}
id: 14722event: typingdata: {"user": "bob"}
id: 14723event: messagedata: {"user": "bob", "text": "hi"}Each event is id:, event:, data: lines followed by a blank line. The browser’s EventSource API handles reconnection automatically — on a dropped connection, it reconnects and sends the last seen id as the Last-Event-ID header, letting the server resume.
GET /events HTTP/1.1Accept: text/event-streamLast-Event-ID: 14723SSE is the right tool for: LLM token streaming (OpenAI, Anthropic), live notifications, dashboards, chat read receipts (one-way is enough), public-event tails (GitHub events). Not for: full-duplex chat, binary frames, anything requiring client-to-server pushes (use WebSocket).
Apache Kafka — the durable log#
Kafka is a different model: a distributed, replicated, append-only log with partitioned topics. Producers write to a topic; consumers read at their own pace and track their own offset.
Producer Topic "orders" (partitions p0, p1, p2) │ │ produce(key=order_42, value=...) │─────────────────────────────────►│ p1: [..., 1023, 1024, 1025, NEW] │ │ p0: [..., 487, 488] │ │ p2: [..., 612, 613, 614] │ Consumer group "fulfilment" │ │ consume(p1, offset=1023) │ │─────────────────────────────────◄│ │ consume(p1, offset=1024) │ │─────────────────────────────────◄│ │ ... commits offset back │ │ │ Consumer group "analytics" │ │ reads from offset=0 (replay) │ │─────────────────────────────────◄│Key contracts:
- Ordering is per-partition, not per-topic. Messages with the same key (e.g.
order_42) go to the same partition and are read in order. Messages with different keys can arrive in any order across the topic. - Durability is replicated. Each partition is replicated to N brokers; a write isn’t acked until M replicas have it (configurable).
- Replay is built-in. A new consumer can start from offset 0 and re-read everything still in retention (7 days default, configurable up to forever).
- Delivery is at-least-once by default, exactly-once with transactions. The exactly-once mode (idempotent producer + transactional consumer) has overhead; most teams stay at at-least-once and dedupe at the consumer.
Kafka is the right tool when: events are the source of truth (event sourcing), replay is needed, multiple consumer groups will read the same topic, throughput is high (>10k events/sec). Wrong tool when: small scale, simple notification, no replay need (use a queue or webhook instead).
Message queues — RabbitMQ, SQS, Cloud Pub/Sub#
Queues are simpler than logs: a broker holds a message until a consumer acks. Once acked, it’s gone.
Producer Queue "fulfilment" Consumer │ send(msg) [msg1, msg2, msg3] │ │─────────────►│ │ │ │ deliver(msg1) │ │ │──────────────────────►│ │ │ │ process(msg1) │ │ ack(msg1) │ │ │◄──────────────────────│ │ │ msg1 removed │ │ │ [msg2, msg3] │SQS, RabbitMQ, and Cloud Pub/Sub all expose this shape with variations:
- Visibility timeout — when a message is delivered, it’s hidden for N seconds. If the consumer acks within N, it’s removed; if not, it reappears for redelivery.
- Dead-letter queue (DLQ) — after K redeliveries, the message moves to a separate queue for inspection. Prevents poison-message loops.
- Fan-out — SQS topics + queues (or RabbitMQ exchanges + queues) let one publish hit multiple downstream queues, one per consumer.
- FIFO mode — strict ordering with deduplication, at lower throughput. SQS FIFO is capped at 3000 msg/sec per group; standard SQS has no cap.
Queues are right for: work queues (process this asynchronously), email/SMS dispatch, image processing, background jobs. Wrong for: event replay, multiple consumer groups reading the same stream (Kafka is the right tool there).
Consuming a webhook — three languages#
Verifying a Stripe-style HMAC-signed webhook in Python, Go, and Node:
import hmacimport hashlibimport timefrom flask import Flask, request, abort
app = Flask(__name__)WEBHOOK_SECRET = "whsec_..."TOLERANCE_SECONDS = 300seen = {} # in production use Redis
def verify_signature(body: bytes, header: str) -> None: parts = dict(p.split("=", 1) for p in header.split(",")) timestamp = int(parts["t"]) if abs(time.time() - timestamp) > TOLERANCE_SECONDS: abort(400, "stale signature") expected = hmac.new( WEBHOOK_SECRET.encode(), f"{timestamp}.".encode() + body, hashlib.sha256, ).hexdigest() if not hmac.compare_digest(expected, parts["v1"]): abort(400, "bad signature")
@app.post("/webhooks/stripe")def webhook(): body = request.get_data() verify_signature(body, request.headers["Stripe-Signature"]) event = request.get_json() if event["id"] in seen: return "", 200 # dedupe seen[event["id"]] = True # process the event idempotently return "", 200package main
import ( "crypto/hmac" "crypto/sha256" "encoding/hex" "encoding/json" "io" "net/http" "strings" "time")
const webhookSecret = "whsec_..."const toleranceSeconds = 300
var seen = map[string]bool{} // in production use Redis
func verifySignature(body []byte, header string) bool { parts := map[string]string{} for _, p := range strings.Split(header, ",") { kv := strings.SplitN(p, "=", 2) if len(kv) == 2 { parts[kv[0]] = kv[1] } } ts, _ := time.Parse(time.RFC3339, parts["t"]) if time.Since(ts) > toleranceSeconds*time.Second { return false } mac := hmac.New(sha256.New, []byte(webhookSecret)) io.WriteString(mac, parts["t"]+".") mac.Write(body) return hmac.Equal([]byte(hex.EncodeToString(mac.Sum(nil))), []byte(parts["v1"]))}
func webhook(w http.ResponseWriter, r *http.Request) { body, _ := io.ReadAll(r.Body) if !verifySignature(body, r.Header.Get("Stripe-Signature")) { http.Error(w, "bad sig", 400) return } var event struct{ ID string `json:"id"` } json.Unmarshal(body, &event) if seen[event.ID] { w.WriteHeader(200); return } seen[event.ID] = true // process idempotently w.WriteHeader(200)}import express from "express";import crypto from "crypto";
const app = express();const WEBHOOK_SECRET = "whsec_...";const TOLERANCE_SECONDS = 300;const seen = new Set(); // in production use Redis
function verifySignature(body, header) { const parts = Object.fromEntries(header.split(",").map(p => p.split("="))); const ts = Number(parts.t); if (Math.abs(Date.now() / 1000 - ts) > TOLERANCE_SECONDS) return false; const expected = crypto .createHmac("sha256", WEBHOOK_SECRET) .update(`${ts}.${body}`) .digest("hex"); return crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(parts.v1));}
app.post( "/webhooks/stripe", express.raw({ type: "application/json" }), (req, res) => { const body = req.body.toString(); if (!verifySignature(body, req.headers["stripe-signature"])) { return res.status(400).send("bad sig"); } const event = JSON.parse(body); if (seen.has(event.id)) return res.status(200).end(); seen.add(event.id); // process idempotently res.status(200).end(); });Variants#
| Protocol | Direction | Durability | Ordering | Replay | Best for |
|---|---|---|---|---|---|
| Webhook | Server → consumer (HTTP POST) | Producer retries 1-3 days | None | None | External integrations, partner notifications |
| SSE | Server → client (long-lived GET) | None (best-effort) | In-order on one stream | Via Last-Event-ID | Live dashboards, LLM streaming, chat read |
| WebSocket | Bidirectional | None | In-order | None | Real-time bidirectional (chat, games) |
| Kafka | Producer → log → consumers | Replicated, retention 7d–∞ | Per-partition | Yes (offset-based) | Event sourcing, multi-consumer fan-out, replay |
| Message queue (SQS, RabbitMQ) | Producer → queue → one consumer | Until acked | None (FIFO mode = yes) | None | Work queues, async jobs |
| MQTT | Pub/sub over TCP, low overhead | Optional | Per-topic | None | IoT, mobile (low bandwidth) |
Trade-offs#
What event-driven protocols give you:
- Decoupling. Producers and consumers evolve independently.
- Liveness without polling. Push is more efficient than pull when the event rate is low and the latency target is tight.
- Fan-out. One event, many consumers, each at its own pace.
- Replay (Kafka specifically). Reprocess history when a bug forces it.
What event-driven protocols cost you:
- Operational complexity. A Kafka cluster is a system to operate; a webhook receiver is a system to monitor.
- Debugging is harder. No per-event HTTP log line; you trace across producer, broker, and consumer.
- Ordering and consistency are weaker by default. Most event systems are at-least-once; consumers must be idempotent. Strong-ordering modes exist but at cost.
- Backpressure is the producer’s problem to think about. A slow consumer with no DLQ is a queue that grows until the broker tips over.
Common pitfalls#
- No signature verification on webhooks. Public URL + no HMAC = anyone can fake events. Stripe, GitHub, Twilio all ship HMAC; use it.
- Not deduplicating by event ID. Retries are guaranteed; non-idempotent consumers double-charge / double-email / double-ship.
- Treating webhook delivery as synchronous. Respond with
202 Acceptedand process asynchronously. A slow consumer chain causes the producer to mark the webhook as failed and retry the whole train. - Assuming webhook order. Events are reordered in flight; build state machines that tolerate any order.
- Using Kafka for a simple notification. A 3-broker cluster to deliver one email-sent event is over-investing. A queue or a webhook is fine.
- Using a queue for event sourcing. Once acked, the message is gone. If you want history, you want a log (Kafka), not a queue.
- No DLQ. Poison messages loop forever, consuming worker capacity. Every queue needs a DLQ and an alert on it.
- SSE through a CDN that buffers. Some CDNs buffer responses for compression; the stream arrives in chunks of N. Disable buffering on the SSE path (
X-Accel-Buffering: nofor nginx).
Related building blocks#
- WebSockets — Bidirectional Streaming — bidirectional version of SSE; same upgrade-handshake idea, full-duplex frames.
- Data Fetching Patterns — push vs pull, paginated vs streamed; the policy that picks event-driven over polling.
- Design a Pub-Sub Service API — designing a Kafka-shaped or SQS-shaped pub/sub API from scratch.
- Managing Retries — webhook retry policies and DLQ design; the same exponential-backoff playbook.
- The Role of Idempotency in API Design — at-least-once delivery requires idempotent consumers; same key-store pattern.