Design the Facebook Messenger API

Real-time delivery, presence, read receipts, group threads, end-to-end encryption. WebSockets + a careful state machine.

System Advanced
20 min read
api-design real-time e2e-encryption
Companies this resembles: Meta · Facebook

Context#

Messenger is the canonical “real-time chat” API question and an Advanced-tier prompt because most candidates spend the first twenty minutes drawing fan-out diagrams that belong in the HLD round. The discriminator at API-design altitude is whether you can lock down the contract — endpoints, payload shapes, the message state machine, the WebSocket frame format — without sliding into infra architecture.

This writeup is an API-design round, not an HLD round. That means:

  • The MQTT-style fan-out broker behind the WebSocket is a black box. We do not design it.
  • The storage tier (HBase, RocksDB, whatever) is a black box. We design the read API over it.
  • Voice / video calling, payments, M-the-assistant — out of scope. Messenger has historically bundled them; the API round should not.
  • The Signal Protocol itself is a black box. The server is blind to plaintext; it routes ciphertext. We describe the contract, not the cryptography.

What remains is rich enough for a 45-minute round:

  • A dual-surface API: REST for history + thread management, WebSocket for live delivery.
  • A per-recipient message state machine (Sent → Delivered → Read | Failed), important because group threads have N state machines per message.
  • A presence channel that is intentionally separate from the message channel because it has different throughput, different ordering needs, and different caching.
  • A read-receipt protocol that is its own write path, not a side-effect of fetching.
  • A group-thread model with participant membership as a first-class resource.
  • An E2E-encryption seam: the server stores and forwards ciphertext, but plaintext never crosses the API boundary.

The interviewer’s hidden objectives, roughly in order:

  • Can you separate persistent state (REST) from live transport (WebSocket) cleanly?
  • Can you write a WebSocket frame protocol with the same care you write REST endpoints — frame types, ACK semantics, sequence numbers?
  • Can you reason about delivery semantics (at-least-once vs exactly-once) and the dedup ID the client owns?
  • Can you handle read receipts without quadratic write amplification in a 250-person group?
  • Can you scope the E2E contract — what the server can see and what it can’t?

Requirements (functional and non-functional)#

Functional — in scope:

  • Create 1:1 and group threads (up to 250 participants).
  • Send a message into a thread; receive a delivery acknowledgement.
  • Receive incoming messages in near-real-time via WebSocket.
  • Mark a message as read; propagate read receipts to other participants.
  • Paginated fetch of a thread’s history (newest first, cursor-based).
  • Edit / delete a message within a 15-minute window (the edit window is a product call; the API enforces it).
  • Typing indicators and online-presence as a separate ephemeral channel.
  • End-to-end encryption: the client uploads its identity / signed-pre / one-time prekey bundles; the server stores opaque ciphertext payloads.

Functional — out of scope:

  • Voice and video calling. Separate API surface, not covered here.
  • Group video / Rooms. Same — out.
  • Payments inside a chat. Out.
  • The M assistant / Smart Reply.
  • Stickers / GIFs / reactions storefront. Reactions exist as a primitive (POST /messages/{id}/reactions) but the sticker catalogue itself is out.
  • Search inside Messenger. See the Search Service API for that pattern.
  • Backup / restore of an encrypted history (the device-handoff problem the Signal Protocol partially solves).

Non-functional:

  • Delivery latency: send to receive <= 200 ms p95 within-region; <= 500 ms p95 cross-region.
  • WebSocket connection: 1 long-lived connection per device; reconnect with backoff on drop.
  • Send-API SLO: <= 100 ms p95 for the synchronous “your message is stored” response.
  • Throughput: 100M concurrent WebSockets globally; 5M messages sent per second peak.
  • Availability: 99.99% on send; 99.95% on history read; presence is best-effort (no SLA).
  • Durability: messages durably stored within 100 ms of accept; replicated cross-AZ before ACK.
  • Ordering: total order per thread (so participants see the same sequence); the client sequence number is the tiebreaker.

Use case diagram#

┌─────────────────┐
│ Participant A │
└────────┬────────┘
┌────────────────┼────────────────┐
▼ ▼ ▼
[create [send / edit / [mark read]
thread] delete msg]
│ │ │
└────────────────┴────────────────┘
┌─────────────────┐ WebSocket ┌─────────────────┐
│ Messenger API │◄──────── live frames ─────►│ Participant B │
└─────────────────┘ └─────────────────┘
┌─────────────────┐
│ Presence API │ ──── typing/online ───► all participants
└─────────────────┘

Two API surfaces, one logical actor (a participant). The WebSocket is the back-channel that carries the live delivery, read-receipt, and presence frames; REST is the durable write/read path for messages and thread state.

Class diagram#

┌──────────────────────────┐
│ ThreadService │
├──────────────────────────┤
│ createThread(req) │
│ listThreads(viewer) │
│ getThread(id) │
│ addParticipant(id, uid) │
│ removeParticipant(id,uid)│
│ leaveThread(id, viewer) │
└──────────────┬───────────┘
┌──────────────────────────┐
│ Thread │
├──────────────────────────┤
│ id : UUID │
│ type : 1_TO_1 | GROUP │
│ participants : [User] │
│ created_at : timestamp │
│ last_msg_id : str │
└──────────────────────────┘
┌──────────────────────────┐
│ MessageService │
├──────────────────────────┤
│ sendMessage(req) │
│ fetchHistory(tid, cur) │
│ editMessage(id, body) │
│ deleteMessage(id) │
│ react(id, emoji) │
└──────────────┬───────────┘
┌──────────────────────────┐ ┌─────────────────────┐
│ Message │ │ RecipientState │
├──────────────────────────┤ ├─────────────────────┤
│ id : ULID │ 1 N │ recipient_id : str │
│ thread_id : UUID │────────►│ state : enum │
│ sender_id : str │ │ delivered_at : ts? │
│ client_seq : int │ │ read_at : ts? │
│ ciphertext : bytes │ └─────────────────────┘
│ created_at : timestamp │
│ edited_at : timestamp? │
│ deleted : bool │
└──────────────────────────┘
┌──────────────────────────┐
│ PresenceService │
├──────────────────────────┤
│ subscribe(thread_id) │ pushes typing + online via WebSocket
│ markTyping(thread_id) │
│ heartbeat() │
└──────────────────────────┘

RecipientState is the row-explosion that makes group reads tricky. A 250-person message has 250 RecipientState rows — sharded by (thread_id, recipient_id) so a per-user “mark all read” can update only the relevant rows.

Sequence diagram (key flows)#

Flow 1: send + deliver in a 1:1 thread.

Sender(A) MessengerAPI PubSub Receiver(B)
│ POST /threads/{tid}/messages │ │ (WS open)
│ { ciphertext, client_seq } │ │
│──────────────────────────────►│ │
│ validate, persist, mint id │ │
│ fan out to PubSub │ │
│ ────────────────────────────►│ │
│ 202 Accepted + msg_id, ts │ │
│◄──────────────────────────────│ │
│ │ push frame │
│ │ ───────────────►│
│ │ │ render
│ │ ACK delivered │
│ │◄────────────────│
│ (back-channel) WS frame │ │
│ { event: delivered, id, by } │ │
│◄──────────────────────────────│ │

The sender’s 202 Accepted happens before B receives the message. The delivered event comes back later via the sender’s own WebSocket — it’s a separate frame, not part of the original REST response. The client_seq is the dedupe key in case the client retries the send.

Flow 2: read receipt in a group of 5.

ReaderC MessengerAPI OtherFour(WS open)
│ POST /threads/{tid}/read │
│ { up_to_msg_id } │
│──────────────────────────────►│
│ update RecipientState │
│ for C: state=READ, read_at=…│
│ enqueue 1 fan-out event │
│ 204 No Content │
│◄──────────────────────────────│
│ │ WS frame to each other participant
│ │ { event: read, by: C, up_to: id }
│ │─────────────────►
│ │─────────────────►
│ │─────────────────►
│ │─────────────────►

One write into the recipient’s row, one fan-out event to the thread topic. Each participant’s client merges the read event into its local view; no quadratic writes.

Flow 3: reconnect and catch up.

Client MessengerAPI
│ WS connect (last_seen_id=Mxyz)
│──────────────────────────────►
│ resolve subscriber position │
│ replay missed frames │
│ { event: msg, id, ... } │ for each missed message
│◄──────────────────────────────
│ { event: msg, id, ... } │
│◄──────────────────────────────
│ { event: caught_up } │
│◄──────────────────────────────
│ ACK each msg with `client_ack` frame
│──────────────────────────────►

The client passes last_seen_id on the upgrade query; the server replays from there. After the caught_up frame, live delivery resumes. This is the contract that makes reconnect feel instant on flaky networks.

Activity diagram (for non-trivial state)#

The per-recipient message state machine:

[sendMessage returns 202 + msg_id]
┌───────────────────────┐
│ SENT │
│ (server has it, │
│ recipient not │
│ yet acked) │
└─────────┬─────────────┘
┌─────────────┼────────────┐
▼ │ ▼
┌───────────────────┐ │ ┌───────────────────┐
│ DELIVERED │ │ │ FAILED │
│ recipient WS ack │ │ │ TTL exceeded or │
│ received │ │ │ recipient blocked│
└─────────┬─────────┘ │ └───────────────────┘
│ │
▼ │
┌───────────────────┐ │
│ READ │ │
│ recipient POSTed │ │
│ /read up_to_id │ │
└───────────────────┘ │
│ │
▼ │
[terminal] │
[edit / delete by sender, within 15 min]
┌─────────────────────────┐
│ EDITED / DELETED │
│ (overlay on the row, │
│ not a new state) │
└─────────────────────────┘

A few invariants the API enforces:

  • READ implies DELIVERED — if a read event arrives without a prior delivered, the server upgrades both in one write. (Some networks drop the delivered-ack but the read-receipt POST is reliable.)
  • FAILED is recoverable only by the sender retrying with the same client_seq within 24 hours.
  • EDITED / DELETED are not new states in the recipient state machine; they are content overlays. A deleted message keeps its RecipientState rows so read-position math stays consistent.
  • Group threads have N - 1 parallel state machines for each message (one per recipient). The thread-level “all delivered” / “all read” view is computed lazily, never stored.

API implementation#

Endpoint catalogue — REST surface#

MethodPathPurpose
POST/v1/threadsCreate a 1:1 or group thread
GET/v1/threadsList threads for the authenticated viewer
GET/v1/threads/{id}Thread metadata + participants
POST/v1/threads/{id}/participantsAdd a participant (group only)
DELETE/v1/threads/{id}/participants/{uid}Remove a participant
POST/v1/threads/{id}/messagesSend a message
GET/v1/threads/{id}/messagesFetch history, cursor-paginated
PATCH/v1/messages/{id}Edit a message (within 15 min)
DELETE/v1/messages/{id}Delete a message
POST/v1/threads/{id}/readMark all messages up to a given id as read
POST/v1/messages/{id}/reactionsAdd a reaction
GET/v1/users/{id}/prekeysFetch a recipient’s prekey bundle (E2E setup)

Endpoint catalogue — WebSocket surface#

A single connection: GET wss://api.messenger.example/v1/realtime?last_seen_id=...&device_id=...

Frame types are carried as { "t": "<type>", ... } JSON envelopes. The frame catalogue:

DirectionTypePurpose
S → CmsgA new message arrived in a subscribed thread
S → CeditA message in a subscribed thread was edited
S → CdelA message in a subscribed thread was deleted
S → CdeliveredA message you sent reached recipient X
S → CreadRecipient X has read up to message Y
S → CtypingRecipient X is typing in thread Y (TTL 5 s)
S → CpresenceUser X went online / offline
S → Ccaught_upReplay complete, live mode resumed
S → CpingServer keepalive (every 30 s)
C → Sclient_ackI received frame N
C → StypingI’m typing in thread Y
C → SpongI’m alive

Two surfaces, two contracts. REST is the durable write path; the WebSocket is a notification channel that mirrors state changes back to the client. The client should never trust the WebSocket as the source of truth — on reconnect or doubt, refetch via REST.

OpenAPI schema (excerpt)#

OpenAPI 3.1 — Messenger REST API (core endpoints)
paths:
/v1/threads/{id}/messages:
post:
operationId: sendMessage
security: [{ bearerAuth: [chat.write] }]
parameters:
- { name: id, in: path, required: true, schema: { type: string, format: uuid } }
- name: Idempotency-Key
in: header
required: true
schema: { type: string, maxLength: 64 }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [client_seq, ciphertext]
properties:
client_seq: { type: integer, description: monotonic per-thread per-device }
ciphertext: { type: string, format: byte, description: Signal Protocol envelope }
content_type: { type: string, enum: [text, image_ref, video_ref, audio_ref, file_ref], default: text }
attachment_id: { type: string, nullable: true }
reply_to_msg_id: { type: string, nullable: true }
responses:
'202':
description: Accepted and durably stored
content:
application/json:
schema: { $ref: '#/components/schemas/Message' }
'400': { description: Invalid payload }
'403': { description: Not a participant }
'409': { description: Duplicate client_seq for this thread + device }
'413': { description: Ciphertext exceeds 64 KiB }
get:
operationId: fetchHistory
security: [{ bearerAuth: [chat.read] }]
parameters:
- { name: id, in: path, required: true, schema: { type: string, format: uuid } }
- { name: cursor, in: query, schema: { type: string } }
- { name: page_size, in: query, schema: { type: integer, minimum: 1, maximum: 100, default: 50 } }
responses:
'200':
description: Page of messages, newest first
content:
application/json:
schema:
type: object
required: [messages]
properties:
messages:
type: array
items: { $ref: '#/components/schemas/Message' }
next_cursor: { type: string, nullable: true }
/v1/threads/{id}/read:
post:
operationId: markRead
security: [{ bearerAuth: [chat.write] }]
parameters:
- { name: id, in: path, required: true, schema: { type: string, format: uuid } }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [up_to_msg_id]
properties:
up_to_msg_id: { type: string }
responses:
'204': { description: Read receipt recorded }
'403': { description: Not a participant }
components:
schemas:
Message:
type: object
required: [id, thread_id, sender_id, ciphertext, created_at]
properties:
id: { type: string, description: ULID, sortable }
thread_id: { type: string, format: uuid }
sender_id: { type: string }
client_seq: { type: integer }
ciphertext: { type: string, format: byte }
content_type: { type: string }
created_at: { type: string, format: date-time }
edited_at: { type: string, format: date-time, nullable: true }
deleted: { type: boolean }
reactions:
type: array
items:
type: object
properties:
emoji: { type: string }
by: { type: array, items: { type: string } }

WebSocket frame examples#

Server -> Client: new message frame
{
"t": "msg",
"id": "01HX9G3M2J8K4N5P7Q8R0S1T",
"thread_id": "5e9b...",
"sender_id": "u_42",
"ciphertext": "AAJ8Rk...",
"content_type": "text",
"created_at": "2026-05-30T17:42:11.234Z",
"frame_seq": 18742
}
Client -> Server: frame ack
{ "t": "client_ack", "frame_seq": 18742 }

frame_seq is a monotonic per-connection sequence that lets the server know which frames need replay on reconnect. The client passes last_seen_id (a message id, not a frame seq) on reconnect because frame sequences reset across connections.

Client samples — three languages#

The send-message path in Python, Go, and Node. (WebSocket clients use language-specific libraries; the REST send is the primary write path and is what most server-side integrations need.)

Send a message — Python
import uuid
import requests
API = "https://api.messenger.example"
TOKEN = "Bearer eyJhbGciOi..."
def send_message(thread_id, ciphertext_b64, client_seq, reply_to=None):
idem = str(uuid.uuid4())
body = {
"client_seq": client_seq,
"ciphertext": ciphertext_b64,
"content_type": "text",
}
if reply_to:
body["reply_to_msg_id"] = reply_to
resp = requests.post(
f"{API}/v1/threads/{thread_id}/messages",
json=body,
headers={
"Authorization": TOKEN,
"Idempotency-Key": idem,
"Content-Type": "application/json",
},
timeout=2,
)
if resp.status_code == 409:
# already delivered; the existing message id is in the body
return resp.json()
resp.raise_for_status()
return resp.json()
def mark_read(thread_id, up_to_msg_id):
requests.post(
f"{API}/v1/threads/{thread_id}/read",
json={"up_to_msg_id": up_to_msg_id},
headers={"Authorization": TOKEN},
timeout=2,
).raise_for_status()

Latency budget — send#

The 100 ms p95 send-API budget breaks down as:

PhaseBudgetNotes
TLS / HTTP setup0 msPinned connection
Auth5 msJWT verify cached
Idempotency check5 msRedis lookup keyed on idem-key
Persist + cross-AZ replication50 msSynchronous quorum write
Enqueue fan-out event10 msPubSub append
Serialize + transport10 msSmall JSON body
Margin20 ms
Total100 msAt budget

The fan-out from PubSub to recipient WebSockets is asynchronous to this budget — recipients see the message between 50–500 ms later depending on regional WebSocket hop topology, well within the 200 ms p95 send-to-receive target inside a region.

Trade-offs and extensions#

DecisionWhyCost if requirements change
Two surfaces (REST + WebSocket)REST gives durable writes + offline-tolerant history; WS gives pushTwo surfaces to version and observe
client_seq + Idempotency-KeyEither is enough but together they catch retries even after id lossTwo dedupe paths to keep consistent
Read-receipt as 1 write + 1 fan-outAvoids N-per-recipient writes in 250-person groupsFine-grained “delivered to N of M” is approximate
Server-blind E2E ciphertextTrust model is “server cannot read messages”Server-side search, smart-reply, abuse-mining all need on-device equivalents
Per-recipient state machineCorrect delivery + read trackingMany rows per message; storage cost
15-min edit windowProduct call against eternal mutabilityCannot edit historical mistakes; harms compliance use cases
Presence as separate channelDifferent throughput, different freshnessTwo channels to keep in sync
ULIDs for message idSortable + cursor-friendlyA bit larger than auto-inc; can’t conceal client ordering
64 KiB ciphertext capForces large media into the attachment serviceAttachment-API hop adds latency for image / video msgs
At-least-once delivery to WS clientsNetwork drops happen; client must dedupe by idClient-side dedup cache (recently-seen ids) is mandatory

Likely follow-up extensions and how the API absorbs them:

  • Disappearing messages. Add a ttl_seconds field on send; the server schedules a delete at created_at + ttl. The recipient receives a del frame at expiry. Server doesn’t need to read the ciphertext.
  • Reactions as full first-class objects. Already in the schema as a list on Message; extend to support custom emoji + per-reaction timestamps.
  • Pinned messages. A POST /v1/threads/{id}/pins collection with a max of 3 pinned items per thread. New endpoint, no schema change to existing types.
  • Message search. Out of E2E reach by definition — on-device index only. The API exposes a GET /v1/threads/{id}/messages?since=... for incremental sync into the on-device index.
  • Multi-device sync. Each device gets its own E2E session; the server fans out to all the user’s devices. The prekey-bundle endpoint already supports this.
  • Backup and restore. The Signal Protocol’s session state can be backed up encrypted-at-rest with a user-provided key. Out of scope for the API round but the prekey endpoint accommodates it.

Mock interview follow-ups#

  • “How does the client deduplicate when the network drops mid-send?”Idempotency-Key (a UUID per attempt) plus client_seq (monotonic per thread, per device). The server stores the resolution under the idem key for 24 hours; a retry returns the same msg_id.
  • “What happens if the recipient is offline for a week?” — Server holds the ciphertext indefinitely in durable storage. When the WebSocket reconnects with last_seen_id, the server replays missed messages. Push notifications via the platform service (APNs / FCM) act as the wakeup.
  • “How do you handle read receipts in a group of 250?” — One write per reader to their own RecipientState row, one fan-out event. Clients aggregate the read events into a per-message “read by X / Y” view client-side. Quadratic on display, linear on writes.
  • “How does typing not flood the connection at scale?” — Client debounces — emit typing at most every 3 seconds. Server fans out typing frames with a TTL of 5 seconds; if no refresh arrives, recipients fade the indicator client-side. Typing is not durable; it’s a best-effort presence-shaped signal.
  • “What’s the prekey bundle for?” — Each user uploads identity, signed-pre, and a batch of one-time prekeys. A sender fetches the recipient’s bundle once to establish a Signal session. The server treats the bundle as opaque; only key counts and timestamps are inspectable. When one-time prekeys run low, the API surfaces a 503 on bundle-fetch until the recipient’s device tops up.
  • “How do you scope server-side abuse detection if you can’t read the ciphertext?” — Metadata only: send rate, thread fan-out shape, attachment hashes (clients hash and the server checks against a known-bad list), recipient reports. Trade-offs are explicit; the interviewer wants you to articulate the privacy / safety tension, not solve it.
  • “At 10x scale, what breaks first?” — The per-region WebSocket gateway. We’d shard by (user_id, device_id) to a region-local gateway, and the PubSub fan-out would need cross-region fast paths for international 1:1 threads. The REST APIs scale horizontally without contract change.
  • “How do clients handle an edit they receive after the edit window?” — Server rejects the edit with 403. The client never sees an out-of-window edit. Within the 15-minute window, the edit frame replaces the message body client-side and the API returns the new edited_at.
  • “Why ULIDs for message ids and not auto-increment?” — Sortable, distributed-id-friendly (no central allocator), and short enough for URLs. Auto-increment would force a single id allocator and leak send-order to other clients.

One channel does everything (e.g. all REST, polling-driven). Polling forces a freshness vs cost trade — short polls hammer the server, long polls feel laggy. Read receipts and typing become storms of small requests. The 200 ms p95 send-to-receive target is unattainable above small scale.

REST for state, WebSocket for delivery. Each surface plays to its strength. REST is cacheable, idempotent, observable; WebSocket is push-shaped, low-overhead per frame, naturally subscribes to threads. The contract has two halves but the seams are clean — the server is the source of truth, WS is the mirror.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.