Design the Zoom API
Meetings, participants, signalling, recording, webhooks. The video-conferencing API behind a pandemic-era infrastructure.
Context#
Zoom is the canonical “video conferencing API” question and an Advanced-tier prompt because the surface invites scope creep into territory that has nothing to do with API design. The actual interesting media work — encoding, jitter buffer, congestion control, SFU forwarding — happens inside the media-layer architecture, not at the API boundary.
This writeup is an API-design round, not an HLD round. That means:
- The SFU (Selective Forwarding Unit) that fans out RTP packets between participants is a black box. We do not design it.
- The media transport (UDP / WebRTC / SRTP) is a black box. The API hands clients credentials to connect to the media plane; the rest is out of scope.
- The jitter buffer, FEC, simulcast, SVC are media-layer concerns, not API concerns.
- Whiteboarding, breakout rooms, polls, captions — out of scope for one round. They’re sibling APIs that share the meeting object.
What remains is the right altitude for an API-design round:
- REST surface for scheduling, participant management, recording control, integration management.
- WebSocket signalling for the connection-establishment dance and live meeting events.
- Webhooks as the asynchronous notification channel for off-meeting events (started / ended / recording.completed).
- OAuth 2 as the third-party-integration auth model, with an admin-managed app marketplace.
- A clean boundary between what the API does (resource management, signalling handshake, lifecycle webhooks) and what the SFU does (forward packets).
The interviewer’s hidden objectives, roughly in order:
- Can you draw a clean seam between the API plane and the media plane?
- Can you design a scheduled-meeting object with the right level of state (scheduled / live / ended) without sliding into the SFU?
- Can you treat webhooks as a first-class API surface, with signed payloads, retry semantics, and an event taxonomy?
- Can you reason about OAuth scopes for an app marketplace —
meeting:write,recording:read,webhook:write? - Can you handle the recording lifecycle (start, stop, processing, available) with a webhook contract that doesn’t tie callers to a polling pattern?
Requirements (functional and non-functional)#
Functional — in scope:
- Schedule a meeting with start time, duration, host, optional passcode, optional waiting room.
- Update / cancel a scheduled meeting.
- List meetings for an authenticated user or workspace.
- Get meeting details, including the live-state if running.
- Join token issuance — credentials a client uses to authenticate to the SFU.
- List participants of a live or recently-ended meeting.
- Remove or mute a participant (host action via API).
- Start / stop recording (cloud recording), get the result URL on completion.
- Webhooks for
meeting.started,meeting.ended,participant.joined,participant.left,recording.completed,recording.failed. - OAuth 2 for third-party apps; scopes per API category.
Functional — out of scope:
- The media-plane architecture (SFU, codecs, jitter buffer, congestion control).
- Phone-bridge / SIP / H.323 endpoints.
- Whiteboarding, polls, breakout rooms, in-meeting chat, captions, Zoom Rooms.
- Webinar-specific features (registration funnels, Q&A panels, practice sessions).
- Marketplace billing for paid apps.
- Voice-call recording transcription (the recording webhook gives you a media URL; transcription is a sibling service).
Non-functional:
- Meeting CRUD latency:
<= 200 ms p95for schedule / read / update. - Join-token issuance:
<= 100 ms p95(it gates the participant joining). - Participant list freshness:
<= 5 safter a join/leave event. - Webhook delivery: at-least-once, with exponential backoff over 24 hours; first delivery attempt within 5 s of the underlying event.
- Throughput: 50k concurrent live meetings; 500k API calls/sec at peak. Most calls are read-mostly (clients polling meeting state).
- Availability: 99.95% on the management surface; the media plane has a separate SLO managed by the SFU service.
- Webhook delivery: 99.9% within 24 hours.
Use case diagram#
┌─────────────────────┐ │ Host (user) │ └──────────┬──────────┘ │ ┌─────────────────┼─────────────────┐ ▼ ▼ ▼ [schedule] [start/end] [record start/stop] │ │ │ └─────────────────┴─────────────────┘ │ ▼ ┌─────────────────────┐ │ Zoom API │ └──────────┬──────────┘ │ ┌──────────┼──────────┐ ▼ ▼ ▼ [WS signalling] [webhooks] [REST] │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌──────────┐ ┌──────────┐ │ Client │ │ 3rd-party│ │ Client │ │ (joins │ │ app │ │ (mgmt UI)│ │ media │ │ │ │ │ │ plane) │ │ │ │ │ └─────────┘ └──────────┘ └──────────┘
┌─────────────────────┐ │ Media plane (SFU) │ ◄── out of API scope └─────────────────────┘Three surfaces (REST, WebSocket, webhooks). One actor (host or attendee). The media plane sits beneath the line — clients connect to it directly using credentials minted by the API.
Class diagram#
┌──────────────────────────┐ │ MeetingService │ ├──────────────────────────┤ │ scheduleMeeting(req) │ │ updateMeeting(id, req) │ │ cancelMeeting(id) │ │ getMeeting(id) │ │ listMeetings(filters) │ │ issueJoinToken(id, uid) │ └──────────────┬───────────┘ ▼ ┌──────────────────────────┐ │ Meeting │ ├──────────────────────────┤ │ id : str │ │ host_id : str │ │ topic : str │ │ start_time : timestamp │ │ duration_minutes : int │ │ passcode? : str │ │ waiting_room : bool │ │ state : enum │ │ join_url : str │ └──────────────────────────┘
┌──────────────────────────┐ ┌─────────────────────┐ │ ParticipantService │ │ Participant │ ├──────────────────────────┤ returns ├─────────────────────┤ │ listParticipants(mtg_id) │────────►│ user_id / guest_id │ │ removeParticipant(id,pid)│ │ display_name │ │ muteParticipant(id, pid) │ │ joined_at │ │ updateRole(id, pid, r) │ │ left_at? │ └──────────────────────────┘ │ role : host|co|att │ │ audio / video state │ └─────────────────────┘
┌──────────────────────────┐ ┌─────────────────────┐ │ RecordingService │ │ Recording │ ├──────────────────────────┤ returns ├─────────────────────┤ │ startRecording(mtg_id) │────────►│ id, meeting_id │ │ stopRecording(mtg_id) │ │ state : enum │ │ getRecording(rec_id) │ │ files[]: { url, │ │ listRecordings(mtg_id) │ │ type, size, ts } │ └──────────────────────────┘ └─────────────────────┘
┌──────────────────────────┐ ┌─────────────────────┐ │ WebhookService │ │ WebhookSub │ ├──────────────────────────┤ ├─────────────────────┤ │ subscribe(req) │ │ id, url │ │ unsubscribe(id) │ │ events[] │ │ listSubs() │ │ secret │ │ rotateSecret(id) │ │ active : bool │ └──────────────────────────┘ └─────────────────────┘Four services. Meeting is the central resource; everything else is keyed off meeting_id. The WebhookSub is a config-resource — created by the marketplace app on installation, lives in the app’s settings.
Sequence diagram (key flows)#
Flow 1: scheduling and joining.
Host ZoomAPI SFU Attendee │ POST /v1/meetings │ │ │ { topic, start_time, ... } │ │ │──────────────────────────────►│ │ │ 201 + meeting { id, join_url}│ │ │◄──────────────────────────────│ │ │ (out-of-band: share join_url) │ │ ─────────────────────────────────────────────────►│ │ │ │ │ GET join_url │ │ (browser/app) │ POST /v1/meetings/{id}/joinToken │ ◄────────────────── │ │ 200 + { jwt, sfu_url } │ ────────────────── ►│ │ │ │ │ WS to sfu_url with jwt │ │ ────────────────────►SFU │ │ media flows in SFU plane │ │ ────────────────────►SFUThe API’s last involvement before the meeting starts is issuing a joinToken — a short-lived JWT (5-min TTL) the attendee presents to the SFU. The SFU validates the token against a public key the API rotates daily.
Flow 2: recording lifecycle.
Host (or app) ZoomAPI SFU RecordingPipeline │ POST /v1/meetings/{id}/recordings:start │──────────────────────────────►│ │ signal SFU to capture │ │ ──────────────────────────────►│ │ tap media stream │ 202 Accepted + rec_id │ │◄──────────────────────────────│ │ ... meeting continues ... │ │ POST /v1/meetings/{id}/recordings:stop │──────────────────────────────►│ │ ──────────────────────────────►│ │ 202 Accepted │ │◄──────────────────────────────│ │ hand off raw media │ ──────────────────►│ │ │ transcode, mux, store │ │ (async, minutes) │ POST {webhook_url} │ │ { event: recording.completed, ─── ── ── ── ──◄│ │ rec_id, files[]: [...] } │◄──────────────────────────────│The API call is fast (202 in tens of ms); the actual recording-processing happens out-of-band and the webhook is the signal of completion. The webhook subscriber gets file URLs valid for 7 days; longer-lived links require a fresh GET /v1/recordings/{id}.
Flow 3: participant management.
Host ZoomAPI SignallingGateway Participant │ POST /v1/meetings/{id}/participants/{pid}/mute │──────────────────────────────►│ │ │ signal mute │ │ ─────────────────►│ │ participant client sets mic muted │ ◄─────────────────│ ACK │ 200 OK │ │ │◄──────────────────────────────│ │Mute is enforced client-side via the SDK with a server-vouched flag. The API doesn’t have direct media-plane control; it issues an authoritative state change that flows through the signalling channel.
Activity diagram (for non-trivial state)#
The Meeting state machine is the structure that justifies most of the design choices:
[scheduleMeeting] │ ▼ ┌────────────────┐ │ SCHEDULED │── cancelMeeting ──► CANCELLED └────────┬───────┘ │ first joinToken issued + redeemed ▼ ┌────────────────┐ │ LIVE │── all participants leave or │ │ host ends ──► ENDED └────────┬───────┘ │ ▼ (host triggers) ┌────────────────┐ │ RECORDING_ON │ ── stopRecording or LIVE→ENDED └────────┬───────┘ │ ▼ ┌────────────────┐ │ ENDED │ └────────┬───────┘ │ recording-processing job done ▼ ┌────────────────┐ │ RECORDING_READY│ (signalled via webhook) └────────────────┘Invariants the API enforces:
- A
SCHEDULEDmeeting can be edited; aLIVEmeeting cannot have itsstart_timechanged (onlyduration_minutescan extend). CANCELLEDis terminal; cancelled meetings keep their id (so external bookings remain valid lookups) butjoinTokenissuance returns410 Gone.RECORDING_ONis an overlay state on top ofLIVE, not a separate state. The webhook taxonomy reflects this —recording.startedandrecording.stoppedare independent ofmeeting.started/meeting.ended.- A meeting can ship multiple
Recordingobjects if recording is started and stopped multiple times during the meeting. - The
recording-processingstep is the only state transition that happens after the meeting isENDED— typically 0.5x to 2x the meeting duration in wall-clock time.
API implementation#
Endpoint catalogue — REST surface#
| Method | Path | Purpose |
|---|---|---|
POST | /v1/meetings | Schedule a new meeting |
GET | /v1/meetings | List meetings (filter by host, time window) |
GET | /v1/meetings/{id} | Read meeting details |
PATCH | /v1/meetings/{id} | Update a scheduled meeting |
DELETE | /v1/meetings/{id} | Cancel a meeting |
POST | /v1/meetings/{id}/joinToken | Issue a short-lived join JWT |
GET | /v1/meetings/{id}/participants | List participants (live or post-meeting) |
DELETE | /v1/meetings/{id}/participants/{pid} | Remove a participant |
POST | /v1/meetings/{id}/participants/{pid}/mute | Force-mute |
POST | /v1/meetings/{id}/recordings:start | Begin cloud recording |
POST | /v1/meetings/{id}/recordings:stop | Stop cloud recording |
GET | /v1/meetings/{id}/recordings | List recordings for a meeting |
GET | /v1/recordings/{rec_id} | Read a recording (includes signed file URLs) |
POST | /v1/webhooks | Subscribe to events |
DELETE | /v1/webhooks/{sub_id} | Unsubscribe |
GET | /v1/webhooks | List the caller’s subscriptions |
Endpoint catalogue — webhook events#
| Event | Payload key fields |
|---|---|
meeting.started | meeting_id, host_id, started_at |
meeting.ended | meeting_id, ended_at, duration_seconds, reason |
participant.joined | meeting_id, participant_id, display_name, joined_at |
participant.left | meeting_id, participant_id, left_at |
recording.started | meeting_id, recording_id, started_at |
recording.stopped | meeting_id, recording_id, stopped_at |
recording.completed | recording_id, files[], duration_seconds |
recording.failed | recording_id, error_code, message |
OpenAPI schema (excerpt)#
paths: /v1/meetings: post: operationId: scheduleMeeting security: [{ oauth2: [meeting.write] }] requestBody: required: true content: application/json: schema: type: object required: [topic, start_time, duration_minutes] properties: topic: { type: string, maxLength: 200 } start_time: { type: string, format: date-time } duration_minutes: { type: integer, minimum: 1, maximum: 1440 } timezone: { type: string, example: 'America/Los_Angeles' } passcode: { type: string, minLength: 1, maxLength: 10, nullable: true } waiting_room: { type: boolean, default: true } settings: type: object properties: auto_recording: type: string enum: [none, local, cloud] default: none mute_upon_entry: { type: boolean, default: true } allow_join_before_host: { type: boolean, default: false } responses: '201': description: Meeting scheduled content: application/json: schema: { $ref: '#/components/schemas/Meeting' }
/v1/meetings/{id}/joinToken: post: operationId: issueJoinToken security: [{ oauth2: [meeting.join] }] parameters: - { name: id, in: path, required: true, schema: { type: string } } requestBody: content: application/json: schema: type: object properties: display_name: { type: string } guest: { type: boolean, default: false } passcode: { type: string, nullable: true } responses: '200': description: Join JWT + SFU endpoint content: application/json: schema: type: object required: [jwt, sfu_url, expires_in] properties: jwt: { type: string } sfu_url: { type: string, format: uri } expires_in: { type: integer, example: 300 } '403': { description: Passcode mismatch or not yet joinable } '410': { description: Meeting cancelled }
/v1/meetings/{id}/recordings:start: post: operationId: startRecording security: [{ oauth2: [recording.write] }] parameters: - { name: id, in: path, required: true, schema: { type: string } } responses: '202': description: Recording requested content: application/json: schema: type: object properties: recording_id: { type: string } state: { type: string, example: 'recording' } '409': { description: Meeting not LIVE or already recording }
/v1/webhooks: post: operationId: subscribeWebhook security: [{ oauth2: [webhook.write] }] requestBody: required: true content: application/json: schema: type: object required: [url, events] properties: url: { type: string, format: uri } events: type: array items: type: string enum: - meeting.started - meeting.ended - participant.joined - participant.left - recording.started - recording.stopped - recording.completed - recording.failed responses: '201': description: Subscription created (secret returned once) content: application/json: schema: type: object properties: id: { type: string } secret: type: string description: HMAC-SHA256 secret; not retrievable later url: { type: string } events: { type: array, items: { type: string } }
components: schemas: Meeting: type: object required: [id, host_id, topic, start_time, duration_minutes, state, join_url] properties: id: { type: string } host_id: { type: string } topic: { type: string } start_time: { type: string, format: date-time } duration_minutes: { type: integer } passcode: { type: string, nullable: true } waiting_room: { type: boolean } join_url: { type: string, format: uri } state: type: string enum: [SCHEDULED, LIVE, ENDED, CANCELLED] settings: type: object additionalProperties: true securitySchemes: oauth2: type: oauth2 flows: authorizationCode: authorizationUrl: https://api.zoom.example/oauth/authorize tokenUrl: https://api.zoom.example/oauth/token scopes: meeting.read: Read meeting metadata meeting.write: Schedule/update meetings meeting.join: Issue join tokens recording.read: Read recordings recording.write: Control recording webhook.write: Manage webhook subscriptionsWebhook signature contract#
Every webhook POST carries two headers and a signed body:
POST /your-callback HTTP/1.1Content-Type: application/jsonX-Zoom-Timestamp: 1748637731X-Zoom-Signature: t=1748637731,v1=5c4f8d7e1a...
{ "event": "recording.completed", "ts": "2026-05-30T17:42:11Z", "data": { "recording_id": "rec_8h2N9c0qK", "meeting_id": "mtg_91j20kdoF", "duration_seconds": 1842, "files": [ { "type": "video", "url": "https://...mp4", "size_bytes": 482000000, "expires_at": "2026-06-06T17:42:11Z" }, { "type": "audio", "url": "https://...m4a", "size_bytes": 28000000, "expires_at": "2026-06-06T17:42:11Z" }, { "type": "transcript", "url": "https://...vtt", "size_bytes": 124000, "expires_at": "2026-06-06T17:42:11Z" } ] }}Signature scheme: v1 = HMAC-SHA256(secret, "v1:" + timestamp + ":" + raw_body). Subscribers reject events older than 5 minutes (replay defence). At-least-once delivery with retry backoff at 1s, 5s, 30s, 5min, 30min, 6h, 24h.
Client samples — three languages#
The schedule-then-join-token flow in Python, Go, and Node.
import requests
API = "https://api.zoom.example"TOKEN = "Bearer eyJhbGciOi..."
def schedule_meeting(topic, start_time_iso, duration_minutes): body = { "topic": topic, "start_time": start_time_iso, "duration_minutes": duration_minutes, "timezone": "America/Los_Angeles", "waiting_room": True, "settings": {"auto_recording": "cloud", "mute_upon_entry": True}, } resp = requests.post( f"{API}/v1/meetings", json=body, headers={"Authorization": TOKEN}, timeout=2, ) resp.raise_for_status() return resp.json()
def issue_join_token(meeting_id, display_name, passcode=None): body = {"display_name": display_name, "guest": False} if passcode: body["passcode"] = passcode resp = requests.post( f"{API}/v1/meetings/{meeting_id}/joinToken", json=body, headers={"Authorization": TOKEN}, timeout=1, ) resp.raise_for_status() return resp.json()
mtg = schedule_meeting("Weekly Sync", "2026-06-01T18:00:00Z", 30)tok = issue_join_token(mtg["id"], "Suraj")print(tok["sfu_url"], tok["expires_in"], "s")package main
import ( "bytes" "encoding/json" "fmt" "net/http")
const API = "https://api.zoom.example"const Token = "Bearer eyJhbGciOi..."
type Meeting struct { ID string `json:"id"` JoinURL string `json:"join_url"` State string `json:"state"`}
type JoinToken struct { JWT string `json:"jwt"` SFUURL string `json:"sfu_url"` ExpiresIn int `json:"expires_in"`}
func scheduleMeeting(topic, start string, durMin int) (*Meeting, error) { body, _ := json.Marshal(map[string]any{ "topic": topic, "start_time": start, "duration_minutes": durMin, "timezone": "America/Los_Angeles", "waiting_room": true, "settings": map[string]any{"auto_recording": "cloud"}, }) req, _ := http.NewRequest("POST", API+"/v1/meetings", bytes.NewReader(body)) req.Header.Set("Authorization", Token) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close() var m Meeting json.NewDecoder(resp.Body).Decode(&m) return &m, nil}
func issueJoinToken(meetingID, displayName string) (*JoinToken, error) { body, _ := json.Marshal(map[string]any{ "display_name": displayName, "guest": false, }) url := fmt.Sprintf("%s/v1/meetings/%s/joinToken", API, meetingID) req, _ := http.NewRequest("POST", url, bytes.NewReader(body)) req.Header.Set("Authorization", Token) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close() var t JoinToken json.NewDecoder(resp.Body).Decode(&t) return &t, nil}const API = "https://api.zoom.example";const TOKEN = "Bearer eyJhbGciOi...";
export async function scheduleMeeting(topic, startTimeIso, durationMinutes) { const resp = await fetch(`${API}/v1/meetings`, { method: "POST", headers: { Authorization: TOKEN, "Content-Type": "application/json" }, body: JSON.stringify({ topic, start_time: startTimeIso, duration_minutes: durationMinutes, timezone: "America/Los_Angeles", waiting_room: true, settings: { auto_recording: "cloud", mute_upon_entry: true }, }), }); if (!resp.ok) throw new Error(`HTTP ${resp.status}`); return resp.json();}
export async function issueJoinToken(meetingId, displayName, passcode = null) { const body = { display_name: displayName, guest: false }; if (passcode) body.passcode = passcode; const resp = await fetch(`${API}/v1/meetings/${meetingId}/joinToken`, { method: "POST", headers: { Authorization: TOKEN, "Content-Type": "application/json" }, body: JSON.stringify(body), }); if (!resp.ok) throw new Error(`HTTP ${resp.status}`); return resp.json();}
const mtg = await scheduleMeeting("Weekly Sync", "2026-06-01T18:00:00Z", 30);const tok = await issueJoinToken(mtg.id, "Suraj");console.log(tok.sfu_url, tok.expires_in, "s");Latency budget — join-token issuance#
The 100 ms p95 budget on POST /v1/meetings/{id}/joinToken (gating user-perceived “joining”):
| Phase | Budget |
|---|---|
| TLS / HTTP | 0 ms (warm) |
| OAuth scope check | 10 ms |
| Meeting state lookup | 15 ms |
| Passcode / waiting-room policy check | 5 ms |
| SFU selection (regional homing) | 10 ms |
| JWT mint (HS256 or RS256) | 5 ms |
| Serialize + transport | 10 ms |
| Margin | 45 ms |
| Total | 100 ms |
The downstream SFU connection takes another 500-1500 ms (ICE candidate gathering, DTLS handshake), but that’s media-plane time, not API time.
Trade-offs and extensions#
| Decision | Why | Cost if requirements change |
|---|---|---|
| Clean API / SFU seam via join JWT | API stays stateless about media; SFU is independent | Have to keep two systems in sync on key rotation |
| Webhooks for off-meeting events | Decouples callers from polling | Subscribers must implement retry-safe handlers |
Verb-suffixes for state-change ops (:start, :stop) | Honest about non-CRUD semantics | Diverges from pure REST aesthetic |
| OAuth 2 with granular scopes | Enables a third-party app marketplace | More scopes to document; consent screens get busy |
| Per-meeting recording lifecycle | Multiple recordings per meeting | UI complexity around “which recording?” |
| Cloud recording default off | Trust + cost defaults | Customer-side default is “always on” via app config |
| 5-minute join-token TTL | Limits replay if leaked | Re-issuance needed on long pre-meeting waiting periods |
| Webhooks signed with HMAC-SHA256 + timestamp | Replay defence + integrity | Subscriber-side bookkeeping (recent ts cache) |
| At-least-once delivery | Tolerates subscriber outages | Subscribers must dedupe by event_id (we send it) |
| Passcode in plaintext POST body | Standard for low-entropy meeting passcodes | Cannot be hashed at rest; rotated per meeting |
Likely follow-up extensions and how the API absorbs them:
- Phone bridge / SIP joins. A separate
joinTokenissuance path with a phone-number + meeting-id + PIN code triple. Doesn’t change the SFU contract; the SIP gateway sits beside the SFU and presents itself as one more participant. - Breakout rooms. A new sub-resource
POST /v1/meetings/{id}/breakoutsreturning an array of sub-rooms each with their own join URL. The SFU treats each breakout as a sibling room. Webhook taxonomy extends withbreakout.opened/breakout.closed. - Live transcription. A WebSocket subscription per meeting that emits transcribed text frames. New surface, new contract. Reuses OAuth scope shape.
- Webinar mode. A flag on
Meetingthat flips it into webinar semantics (panelists vs attendees, raise-hand queue, Q&A). SameMeetingresource, additional fields. - Meeting templates.
POST /v1/meeting_templates+start_from_templatefield on the schedule endpoint. Save customers from re-keying settings.
Mock interview follow-ups#
- “Why is
joinTokena separate endpoint, not a field on the meeting object?” — Two reasons. (1) Tokens are short-lived (5 min); embedding them in the meeting object is wrong for resource freshness. (2) Token issuance requires a synchronous policy check (passcode, waiting-room state) that’s a write-shaped operation, not a read. - “How does the API handle a meeting that goes long?” —
duration_minuteson the meeting object is advisory; the API does not auto-end. A long-running meeting firesmeeting.endedwhen the last participant leaves or the host clicks End. Customers who want hard caps configure a tenant-level policy. - “What happens if a webhook subscriber is down for hours?” — Exponential backoff over 24 hours. After 24 hours of failures we mark the subscription
degradedand surface a dashboard warning; after 7 days of continuous failure we disable it and email the app owner. - “How does a third-party app authenticate to our API?” — OAuth 2 Authorization Code flow with PKCE for user-context apps; OAuth 2 Client Credentials for server-to-server apps. Scopes are per-API-category. Refresh tokens rotate on use.
- “How do you support a meeting with 1000 participants?” — The API contract doesn’t change; the SFU’s job is to scale forwarding. Participant-list endpoint paginates (
page_size: 50, cursor); the join-token endpoint stays sub-100 ms. The interesting scaling work is below the seam. - “What’s the deal with the
start_urlvsjoin_url?” —start_urlcarries a host token (longer-lived JWT, broader scopes) that the meeting host uses to start the session.join_urltriggers the joinToken flow for attendees. Two different paths because hosts have authority and need a distinct credential path. - “How do recordings handle large meetings?” — Recording is multiplexed at the SFU; processing is async. A 4-hour meeting at 1080p produces ~12 GB of video that the recording pipeline transcodes into adaptive renditions and stores. The webhook fires when the lowest-bitrate rendition is ready; higher renditions arrive minutes later.
- “What if a participant leaves and rejoins?” — Two
participant.joinedevents with two differentparticipant_idvalues. The API does not unify them; that’s a downstream-customer reporting concern. - “At 10x scale, what breaks first?” — The participant-list endpoint’s freshness. We’d push participant-state via a server-sent events stream
GET /v1/meetings/{id}/participants/streamrather than 5-second polling. The schedule / read / write surface scales horizontally; webhook delivery scales by sharding subscribers across delivery workers.
API and media plane fused, with the API directly managing RTP streams. Tempting because it looks “simpler”, but the API surface becomes a leaky abstraction over the SFU. Every congestion-control tweak shows up in the API. Mobile SDKs can’t evolve independently. The web SDK has to mirror every server change.
API and media plane separated by a join JWT. The API is a stateless REST + webhook surface; the SFU is a media-routing fleet that the join token unlocks. Each can ship independently; the contract between them is a signed JWT and a public key rotation schedule. This is how every major real-time-media API ships.
Related#
- WebSockets — Bidirectional Streaming — the transport primitive under the signalling channel.
- OAuth 2 — The Authorization Framework — the auth model for third-party apps.
- Event-Driven Architecture Protocols — the webhook delivery contract.
- The API-Design Walk-through — the seven-step recipe this writeup followed.
- REST — The Architectural Style — the architectural style behind the endpoint shape.