Design the Facebook Messenger API — API Design

Context#

Messenger is the canonical “real-time chat” API question and an Advanced-tier prompt because most candidates spend the first twenty minutes drawing fan-out diagrams that belong in the HLD round. The discriminator at API-design altitude is whether you can lock down the contract — endpoints, payload shapes, the message state machine, the WebSocket frame format — without sliding into infra architecture.

This writeup is an API-design round, not an HLD round. That means:

The MQTT-style fan-out broker behind the WebSocket is a black box. We do not design it.
The storage tier (HBase, RocksDB, whatever) is a black box. We design the read API over it.
Voice / video calling, payments, M-the-assistant — out of scope. Messenger has historically bundled them; the API round should not.
The Signal Protocol itself is a black box. The server is blind to plaintext; it routes ciphertext. We describe the contract, not the cryptography.

What remains is rich enough for a 45-minute round:

A dual-surface API: REST for history + thread management, WebSocket for live delivery.
A per-recipient message state machine (Sent → Delivered → Read | Failed), important because group threads have N state machines per message.
A presence channel that is intentionally separate from the message channel because it has different throughput, different ordering needs, and different caching.
A read-receipt protocol that is its own write path, not a side-effect of fetching.
A group-thread model with participant membership as a first-class resource.
An E2E-encryption seam: the server stores and forwards ciphertext, but plaintext never crosses the API boundary.

The interviewer’s hidden objectives, roughly in order:

Can you separate persistent state (REST) from live transport (WebSocket) cleanly?
Can you write a WebSocket frame protocol with the same care you write REST endpoints — frame types, ACK semantics, sequence numbers?
Can you reason about delivery semantics (at-least-once vs exactly-once) and the dedup ID the client owns?
Can you handle read receipts without quadratic write amplification in a 250-person group?
Can you scope the E2E contract — what the server can see and what it can’t?

Requirements (functional and non-functional)#

Functional — in scope:

Create 1:1 and group threads (up to 250 participants).
Send a message into a thread; receive a delivery acknowledgement.
Receive incoming messages in near-real-time via WebSocket.
Mark a message as read; propagate read receipts to other participants.
Paginated fetch of a thread’s history (newest first, cursor-based).
Edit / delete a message within a 15-minute window (the edit window is a product call; the API enforces it).
Typing indicators and online-presence as a separate ephemeral channel.
End-to-end encryption: the client uploads its identity / signed-pre / one-time prekey bundles; the server stores opaque ciphertext payloads.

Functional — out of scope:

Voice and video calling. Separate API surface, not covered here.
Group video / Rooms. Same — out.
Payments inside a chat. Out.
The M assistant / Smart Reply.
Stickers / GIFs / reactions storefront. Reactions exist as a primitive (POST /messages/{id}/reactions) but the sticker catalogue itself is out.
Search inside Messenger. See the Search Service API for that pattern.
Backup / restore of an encrypted history (the device-handoff problem the Signal Protocol partially solves).

Non-functional:

Delivery latency: send to receive <= 200 ms p95 within-region; <= 500 ms p95 cross-region.
WebSocket connection: 1 long-lived connection per device; reconnect with backoff on drop.
Send-API SLO: <= 100 ms p95 for the synchronous “your message is stored” response.
Throughput: 100M concurrent WebSockets globally; 5M messages sent per second peak.
Availability: 99.99% on send; 99.95% on history read; presence is best-effort (no SLA).
Durability: messages durably stored within 100 ms of accept; replicated cross-AZ before ACK.
Ordering: total order per thread (so participants see the same sequence); the client sequence number is the tiebreaker.

Use case diagram#

                  ┌─────────────────┐
                  │  Participant A  │
                  └────────┬────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
     [create        [send / edit /     [mark read]
      thread]        delete msg]
          │                │                │
          └────────────────┴────────────────┘
                           │
                           ▼
                  ┌─────────────────┐         WebSocket          ┌─────────────────┐
                  │  Messenger API  │◄──────── live frames ─────►│  Participant B  │
                  └─────────────────┘                            └─────────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │  Presence API   │ ──── typing/online ───► all participants
                  └─────────────────┘

Two API surfaces, one logical actor (a participant). The WebSocket is the back-channel that carries the live delivery, read-receipt, and presence frames; REST is the durable write/read path for messages and thread state.

Class diagram#

   ┌──────────────────────────┐
   │   ThreadService          │
   ├──────────────────────────┤
   │ createThread(req)        │
   │ listThreads(viewer)      │
   │ getThread(id)            │
   │ addParticipant(id, uid)  │
   │ removeParticipant(id,uid)│
   │ leaveThread(id, viewer)  │
   └──────────────┬───────────┘
                  ▼
   ┌──────────────────────────┐
   │   Thread                 │
   ├──────────────────────────┤
   │ id : UUID                │
   │ type : 1_TO_1 | GROUP    │
   │ participants : [User]    │
   │ created_at : timestamp   │
   │ last_msg_id : str        │
   └──────────────────────────┘

   ┌──────────────────────────┐
   │   MessageService         │
   ├──────────────────────────┤
   │ sendMessage(req)         │
   │ fetchHistory(tid, cur)   │
   │ editMessage(id, body)    │
   │ deleteMessage(id)        │
   │ react(id, emoji)         │
   └──────────────┬───────────┘
                  ▼
   ┌──────────────────────────┐         ┌─────────────────────┐
   │   Message                │         │  RecipientState     │
   ├──────────────────────────┤         ├─────────────────────┤
   │ id : ULID                │ 1     N │ recipient_id : str  │
   │ thread_id : UUID         │────────►│ state : enum        │
   │ sender_id : str          │         │ delivered_at : ts?  │
   │ client_seq : int         │         │ read_at : ts?       │
   │ ciphertext : bytes       │         └─────────────────────┘
   │ created_at : timestamp   │
   │ edited_at : timestamp?   │
   │ deleted : bool           │
   └──────────────────────────┘

   ┌──────────────────────────┐
   │   PresenceService        │
   ├──────────────────────────┤
   │ subscribe(thread_id)     │   pushes typing + online via WebSocket
   │ markTyping(thread_id)    │
   │ heartbeat()              │
   └──────────────────────────┘

RecipientState is the row-explosion that makes group reads tricky. A 250-person message has 250 RecipientState rows — sharded by (thread_id, recipient_id) so a per-user “mark all read” can update only the relevant rows.

Sequence diagram (key flows)#

Flow 1: send + deliver in a 1:1 thread.

 Sender(A)      MessengerAPI       PubSub          Receiver(B)
   │ POST /threads/{tid}/messages  │                 │  (WS open)
   │ { ciphertext, client_seq }    │                 │
   │──────────────────────────────►│                 │
   │  validate, persist, mint id   │                 │
   │  fan out to PubSub            │                 │
   │  ────────────────────────────►│                 │
   │  202 Accepted + msg_id, ts    │                 │
   │◄──────────────────────────────│                 │
   │                               │  push frame     │
   │                               │ ───────────────►│
   │                               │                 │ render
   │                               │  ACK delivered  │
   │                               │◄────────────────│
   │  (back-channel) WS frame      │                 │
   │  { event: delivered, id, by } │                 │
   │◄──────────────────────────────│                 │

The sender’s 202 Accepted happens before B receives the message. The delivered event comes back later via the sender’s own WebSocket — it’s a separate frame, not part of the original REST response. The client_seq is the dedupe key in case the client retries the send.

Flow 2: read receipt in a group of 5.

 ReaderC       MessengerAPI                 OtherFour(WS open)
   │ POST /threads/{tid}/read      │
   │ { up_to_msg_id }              │
   │──────────────────────────────►│
   │  update RecipientState        │
   │  for C: state=READ, read_at=…│
   │  enqueue 1 fan-out event      │
   │  204 No Content               │
   │◄──────────────────────────────│
   │                               │  WS frame to each other participant
   │                               │  { event: read, by: C, up_to: id }
   │                               │─────────────────►
   │                               │─────────────────►
   │                               │─────────────────►
   │                               │─────────────────►

One write into the recipient’s row, one fan-out event to the thread topic. Each participant’s client merges the read event into its local view; no quadratic writes.

Flow 3: reconnect and catch up.

 Client        MessengerAPI
   │ WS connect (last_seen_id=Mxyz)
   │──────────────────────────────►
   │  resolve subscriber position │
   │  replay missed frames        │
   │   { event: msg, id, ...  }   │  for each missed message
   │◄──────────────────────────────
   │   { event: msg, id, ...  }   │
   │◄──────────────────────────────
   │   { event: caught_up }       │
   │◄──────────────────────────────
   │ ACK each msg with `client_ack` frame
   │──────────────────────────────►

The client passes last_seen_id on the upgrade query; the server replays from there. After the caught_up frame, live delivery resumes. This is the contract that makes reconnect feel instant on flaky networks.

Activity diagram (for non-trivial state)#

The per-recipient message state machine:

                  [sendMessage returns 202 + msg_id]
                              │
                              ▼
                  ┌───────────────────────┐
                  │        SENT          │
                  │  (server has it,      │
                  │   recipient not       │
                  │   yet acked)          │
                  └─────────┬─────────────┘
                            │
              ┌─────────────┼────────────┐
              ▼             │            ▼
   ┌───────────────────┐    │  ┌───────────────────┐
   │     DELIVERED     │    │  │     FAILED       │
   │ recipient WS ack │    │  │ TTL exceeded or  │
   │  received        │    │  │ recipient blocked│
   └─────────┬─────────┘    │  └───────────────────┘
             │              │
             ▼              │
   ┌───────────────────┐    │
   │      READ         │    │
   │ recipient POSTed │    │
   │  /read up_to_id  │    │
   └───────────────────┘    │
             │              │
             ▼              │
       [terminal]           │
                            │
              [edit / delete by sender, within 15 min]
                            │
                            ▼
                  ┌─────────────────────────┐
                  │    EDITED / DELETED    │
                  │  (overlay on the row,   │
                  │   not a new state)      │
                  └─────────────────────────┘

A few invariants the API enforces:

READ implies DELIVERED — if a read event arrives without a prior delivered, the server upgrades both in one write. (Some networks drop the delivered-ack but the read-receipt POST is reliable.)
FAILED is recoverable only by the sender retrying with the same client_seq within 24 hours.
EDITED / DELETED are not new states in the recipient state machine; they are content overlays. A deleted message keeps its RecipientState rows so read-position math stays consistent.
Group threads have N - 1 parallel state machines for each message (one per recipient). The thread-level “all delivered” / “all read” view is computed lazily, never stored.

API implementation#

Endpoint catalogue — REST surface#

Method	Path	Purpose
`POST`	`/v1/threads`	Create a 1:1 or group thread
`GET`	`/v1/threads`	List threads for the authenticated viewer
`GET`	`/v1/threads/{id}`	Thread metadata + participants
`POST`	`/v1/threads/{id}/participants`	Add a participant (group only)
`DELETE`	`/v1/threads/{id}/participants/{uid}`	Remove a participant
`POST`	`/v1/threads/{id}/messages`	Send a message
`GET`	`/v1/threads/{id}/messages`	Fetch history, cursor-paginated
`PATCH`	`/v1/messages/{id}`	Edit a message (within 15 min)
`DELETE`	`/v1/messages/{id}`	Delete a message
`POST`	`/v1/threads/{id}/read`	Mark all messages up to a given id as read
`POST`	`/v1/messages/{id}/reactions`	Add a reaction
`GET`	`/v1/users/{id}/prekeys`	Fetch a recipient’s prekey bundle (E2E setup)

Endpoint catalogue — WebSocket surface#

A single connection: GET wss://api.messenger.example/v1/realtime?last_seen_id=...&device_id=...

Frame types are carried as { "t": "<type>", ... } JSON envelopes. The frame catalogue:

Direction	Type	Purpose
S → C	`msg`	A new message arrived in a subscribed thread
S → C	`edit`	A message in a subscribed thread was edited
S → C	`del`	A message in a subscribed thread was deleted
S → C	`delivered`	A message you sent reached recipient X
S → C	`read`	Recipient X has read up to message Y
S → C	`typing`	Recipient X is typing in thread Y (TTL 5 s)
S → C	`presence`	User X went online / offline
S → C	`caught_up`	Replay complete, live mode resumed
S → C	`ping`	Server keepalive (every 30 s)
C → S	`client_ack`	I received frame N
C → S	`typing`	I’m typing in thread Y
C → S	`pong`	I’m alive

Two surfaces, two contracts. REST is the durable write path; the WebSocket is a notification channel that mirrors state changes back to the client. The client should never trust the WebSocket as the source of truth — on reconnect or doubt, refetch via REST.

OpenAPI schema (excerpt)#

paths:
  /v1/threads/{id}/messages:
    post:
      operationId: sendMessage
      security: [{ bearerAuth: [chat.write] }]
      parameters:
        - { name: id, in: path, required: true, schema: { type: string, format: uuid } }
        - name: Idempotency-Key
          in: header
          required: true
          schema: { type: string, maxLength: 64 }
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [client_seq, ciphertext]
              properties:
                client_seq: { type: integer, description: monotonic per-thread per-device }
                ciphertext: { type: string, format: byte, description: Signal Protocol envelope }
                content_type: { type: string, enum: [text, image_ref, video_ref, audio_ref, file_ref], default: text }
                attachment_id: { type: string, nullable: true }
                reply_to_msg_id: { type: string, nullable: true }
      responses:
        '202':
          description: Accepted and durably stored
          content:
            application/json:
              schema: { $ref: '#/components/schemas/Message' }
        '400': { description: Invalid payload }
        '403': { description: Not a participant }
        '409': { description: Duplicate client_seq for this thread + device }
        '413': { description: Ciphertext exceeds 64 KiB }

    get:
      operationId: fetchHistory
      security: [{ bearerAuth: [chat.read] }]
      parameters:
        - { name: id, in: path, required: true, schema: { type: string, format: uuid } }
        - { name: cursor, in: query, schema: { type: string } }
        - { name: page_size, in: query, schema: { type: integer, minimum: 1, maximum: 100, default: 50 } }
      responses:
        '200':
          description: Page of messages, newest first
          content:
            application/json:
              schema:
                type: object
                required: [messages]
                properties:
                  messages:
                    type: array
                    items: { $ref: '#/components/schemas/Message' }
                  next_cursor: { type: string, nullable: true }

  /v1/threads/{id}/read:
    post:
      operationId: markRead
      security: [{ bearerAuth: [chat.write] }]
      parameters:
        - { name: id, in: path, required: true, schema: { type: string, format: uuid } }
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [up_to_msg_id]
              properties:
                up_to_msg_id: { type: string }
      responses:
        '204': { description: Read receipt recorded }
        '403': { description: Not a participant }

components:
  schemas:
    Message:
      type: object
      required: [id, thread_id, sender_id, ciphertext, created_at]
      properties:
        id: { type: string, description: ULID, sortable }
        thread_id: { type: string, format: uuid }
        sender_id: { type: string }
        client_seq: { type: integer }
        ciphertext: { type: string, format: byte }
        content_type: { type: string }
        created_at: { type: string, format: date-time }
        edited_at: { type: string, format: date-time, nullable: true }
        deleted: { type: boolean }
        reactions:
          type: array
          items:
            type: object
            properties:
              emoji: { type: string }
              by: { type: array, items: { type: string } }

WebSocket frame examples#

{
  "t": "msg",
  "id": "01HX9G3M2J8K4N5P7Q8R0S1T",
  "thread_id": "5e9b...",
  "sender_id": "u_42",
  "ciphertext": "AAJ8Rk...",
  "content_type": "text",
  "created_at": "2026-05-30T17:42:11.234Z",
  "frame_seq": 18742
}

{ "t": "client_ack", "frame_seq": 18742 }

frame_seq is a monotonic per-connection sequence that lets the server know which frames need replay on reconnect. The client passes last_seen_id (a message id, not a frame seq) on reconnect because frame sequences reset across connections.

Client samples — three languages#

The send-message path in Python, Go, and Node. (WebSocket clients use language-specific libraries; the REST send is the primary write path and is what most server-side integrations need.)

import uuid
import requests

API = "https://api.messenger.example"
TOKEN = "Bearer eyJhbGciOi..."

def send_message(thread_id, ciphertext_b64, client_seq, reply_to=None):
    idem = str(uuid.uuid4())
    body = {
        "client_seq": client_seq,
        "ciphertext": ciphertext_b64,
        "content_type": "text",
    }
    if reply_to:
        body["reply_to_msg_id"] = reply_to
    resp = requests.post(
        f"{API}/v1/threads/{thread_id}/messages",
        json=body,
        headers={
            "Authorization": TOKEN,
            "Idempotency-Key": idem,
            "Content-Type": "application/json",
        },
        timeout=2,
    )
    if resp.status_code == 409:
        # already delivered; the existing message id is in the body
        return resp.json()
    resp.raise_for_status()
    return resp.json()

def mark_read(thread_id, up_to_msg_id):
    requests.post(
        f"{API}/v1/threads/{thread_id}/read",
        json={"up_to_msg_id": up_to_msg_id},
        headers={"Authorization": TOKEN},
        timeout=2,
    ).raise_for_status()

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"

    "github.com/google/uuid"
)

const API = "https://api.messenger.example"
const Token = "Bearer eyJhbGciOi..."

type Message struct {
    ID         string `json:"id"`
    ThreadID   string `json:"thread_id"`
    SenderID   string `json:"sender_id"`
    Ciphertext string `json:"ciphertext"`
}

func sendMessage(threadID, ciphertextB64 string, clientSeq int) (*Message, error) {
    body, _ := json.Marshal(map[string]any{
        "client_seq":   clientSeq,
        "ciphertext":   ciphertextB64,
        "content_type": "text",
    })
    url := fmt.Sprintf("%s/v1/threads/%s/messages", API, threadID)
    req, _ := http.NewRequest("POST", url, bytes.NewReader(body))
    req.Header.Set("Authorization", Token)
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("Idempotency-Key", uuid.NewString())
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return nil, err }
    defer resp.Body.Close()
    if resp.StatusCode != 202 && resp.StatusCode != 409 {
        return nil, fmt.Errorf("send: HTTP %d", resp.StatusCode)
    }
    var m Message
    json.NewDecoder(resp.Body).Decode(&m)
    return &m, nil
}

func markRead(threadID, upToMsgID string) error {
    body, _ := json.Marshal(map[string]string{"up_to_msg_id": upToMsgID})
    url := fmt.Sprintf("%s/v1/threads/%s/read", API, threadID)
    req, _ := http.NewRequest("POST", url, bytes.NewReader(body))
    req.Header.Set("Authorization", Token)
    req.Header.Set("Content-Type", "application/json")
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return err }
    resp.Body.Close()
    return nil
}

import { randomUUID } from "node:crypto";

const API = "https://api.messenger.example";
const TOKEN = "Bearer eyJhbGciOi...";

export async function sendMessage(threadId, ciphertextB64, clientSeq, replyTo = null) {
  const body = { client_seq: clientSeq, ciphertext: ciphertextB64, content_type: "text" };
  if (replyTo) body.reply_to_msg_id = replyTo;
  const resp = await fetch(`${API}/v1/threads/${threadId}/messages`, {
    method: "POST",
    headers: {
      Authorization: TOKEN,
      "Content-Type": "application/json",
      "Idempotency-Key": randomUUID(),
    },
    body: JSON.stringify(body),
  });
  if (resp.status === 409) return resp.json();
  if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
  return resp.json();
}

export async function markRead(threadId, upToMsgId) {
  const resp = await fetch(`${API}/v1/threads/${threadId}/read`, {
    method: "POST",
    headers: { Authorization: TOKEN, "Content-Type": "application/json" },
    body: JSON.stringify({ up_to_msg_id: upToMsgId }),
  });
  if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
}

Latency budget — send#

The 100 ms p95 send-API budget breaks down as:

Phase	Budget	Notes
TLS / HTTP setup	0 ms	Pinned connection
Auth	5 ms	JWT verify cached
Idempotency check	5 ms	Redis lookup keyed on idem-key
Persist + cross-AZ replication	50 ms	Synchronous quorum write
Enqueue fan-out event	10 ms	PubSub append
Serialize + transport	10 ms	Small JSON body
Margin	20 ms
Total	100 ms	At budget

The fan-out from PubSub to recipient WebSockets is asynchronous to this budget — recipients see the message between 50–500 ms later depending on regional WebSocket hop topology, well within the 200 ms p95 send-to-receive target inside a region.

Trade-offs and extensions#

Decision	Why	Cost if requirements change
Two surfaces (REST + WebSocket)	REST gives durable writes + offline-tolerant history; WS gives push	Two surfaces to version and observe
`client_seq` + `Idempotency-Key`	Either is enough but together they catch retries even after id loss	Two dedupe paths to keep consistent
Read-receipt as 1 write + 1 fan-out	Avoids N-per-recipient writes in 250-person groups	Fine-grained “delivered to N of M” is approximate
Server-blind E2E ciphertext	Trust model is “server cannot read messages”	Server-side search, smart-reply, abuse-mining all need on-device equivalents
Per-recipient state machine	Correct delivery + read tracking	Many rows per message; storage cost
15-min edit window	Product call against eternal mutability	Cannot edit historical mistakes; harms compliance use cases
Presence as separate channel	Different throughput, different freshness	Two channels to keep in sync
ULIDs for message id	Sortable + cursor-friendly	A bit larger than auto-inc; can’t conceal client ordering
64 KiB ciphertext cap	Forces large media into the attachment service	Attachment-API hop adds latency for image / video msgs
At-least-once delivery to WS clients	Network drops happen; client must dedupe by id	Client-side dedup cache (recently-seen ids) is mandatory

Likely follow-up extensions and how the API absorbs them:

Disappearing messages. Add a ttl_seconds field on send; the server schedules a delete at created_at + ttl. The recipient receives a del frame at expiry. Server doesn’t need to read the ciphertext.
Reactions as full first-class objects. Already in the schema as a list on Message; extend to support custom emoji + per-reaction timestamps.
Pinned messages. A POST /v1/threads/{id}/pins collection with a max of 3 pinned items per thread. New endpoint, no schema change to existing types.
Message search. Out of E2E reach by definition — on-device index only. The API exposes a GET /v1/threads/{id}/messages?since=... for incremental sync into the on-device index.
Multi-device sync. Each device gets its own E2E session; the server fans out to all the user’s devices. The prekey-bundle endpoint already supports this.
Backup and restore. The Signal Protocol’s session state can be backed up encrypted-at-rest with a user-provided key. Out of scope for the API round but the prekey endpoint accommodates it.

Mock interview follow-ups#

“How does the client deduplicate when the network drops mid-send?” — Idempotency-Key (a UUID per attempt) plus client_seq (monotonic per thread, per device). The server stores the resolution under the idem key for 24 hours; a retry returns the same msg_id.
“What happens if the recipient is offline for a week?” — Server holds the ciphertext indefinitely in durable storage. When the WebSocket reconnects with last_seen_id, the server replays missed messages. Push notifications via the platform service (APNs / FCM) act as the wakeup.
“How do you handle read receipts in a group of 250?” — One write per reader to their own RecipientState row, one fan-out event. Clients aggregate the read events into a per-message “read by X / Y” view client-side. Quadratic on display, linear on writes.
“How does typing not flood the connection at scale?” — Client debounces — emit typing at most every 3 seconds. Server fans out typing frames with a TTL of 5 seconds; if no refresh arrives, recipients fade the indicator client-side. Typing is not durable; it’s a best-effort presence-shaped signal.
“What’s the prekey bundle for?” — Each user uploads identity, signed-pre, and a batch of one-time prekeys. A sender fetches the recipient’s bundle once to establish a Signal session. The server treats the bundle as opaque; only key counts and timestamps are inspectable. When one-time prekeys run low, the API surfaces a 503 on bundle-fetch until the recipient’s device tops up.
“How do you scope server-side abuse detection if you can’t read the ciphertext?” — Metadata only: send rate, thread fan-out shape, attachment hashes (clients hash and the server checks against a known-bad list), recipient reports. Trade-offs are explicit; the interviewer wants you to articulate the privacy / safety tension, not solve it.
“At 10x scale, what breaks first?” — The per-region WebSocket gateway. We’d shard by (user_id, device_id) to a region-local gateway, and the PubSub fan-out would need cross-region fast paths for international 1:1 threads. The REST APIs scale horizontally without contract change.
“How do clients handle an edit they receive after the edit window?” — Server rejects the edit with 403. The client never sees an out-of-window edit. Within the 15-minute window, the edit frame replaces the message body client-side and the API returns the new edited_at.
“Why ULIDs for message ids and not auto-increment?” — Sortable, distributed-id-friendly (no central allocator), and short enough for URLs. Auto-increment would force a single id allocator and leak send-order to other clients.

One channel does everything (e.g. all REST, polling-driven). Polling forces a freshness vs cost trade — short polls hammer the server, long polls feel laggy. Read receipts and typing become storms of small requests. The 200 ms p95 send-to-receive target is unattainable above small scale.

REST for state, WebSocket for delivery. Each surface plays to its strength. REST is cacheable, idempotent, observable; WebSocket is push-shaped, low-overhead per frame, naturally subscribes to threads. The contract has two halves but the seams are clean — the server is the source of truth, WS is the mirror.

WebSockets — Bidirectional Streaming — the transport primitive under the live-delivery surface.
Design a Pub-Sub Service API — the asynchronous fan-out backbone behind the WebSocket gateway.
Design a Comment Service API — the read-mostly cursor-paginated cousin of this API.
Event-Driven Architecture Protocols — webhook and event-channel patterns.
The API-Design Walk-through — the seven-step recipe this writeup followed.