Design a Chess API
Game lifecycle, move validation, time control, spectator stream. The cleanest turn-based-game API in the catalogue.
Context#
A chess API is the cleanest turn-based-game design problem in the interview catalogue. The rules are well-defined and centuries old, the state space is finite and serialisable to a single string (FEN), and the move language is standardised (UCI / SAN). Everything that’s interesting about real-time multiplayer infrastructure — authoritative server, anti-cheat, clock synchronisation, spectator fan-out — shows up at small enough scope to actually finish in 60 minutes.
Lichess and Chess.com are the production reference points. Both run authoritative servers, both stream board updates to spectators, both implement Glicko or Elo rating, both detect cheating server-side. Their APIs are public and battle-tested at millions of games per day.
The interviewer’s hidden objectives:
- Do you make the server authoritative for move validation, not the client?
- Can you model the game-state machine completely — including draws, resignations, timeouts?
- Do you encode the board state as FEN (or equivalent) rather than reinvent it?
- Do you handle time control as a server-side clock, not a client wall-clock?
- Do you separate the player API (writes moves) from the spectator API (read-only stream)?
Two design pitfalls to avoid: trusting the client (cheating becomes trivial), and using request-response for the spectator stream (latency and connection overhead explode).
Requirements (functional and non-functional)#
Functional — in scope:
- Create a new game: choose colour, time control, optional opponent (or match-find).
- Submit a move (UCI notation:
e2e4,e7e8qfor promotion). - Resign or offer / accept a draw.
- Query current game state (board, clocks, last move).
- Stream board state to spectators in real time.
- Time control: bullet (1+0), blitz (5+0), rapid (10+0), classical (30+0). Each game has a clock per player.
Functional — out of scope:
- Match-finding queue — separate service.
- Player rating updates — handled by an async post-game pipeline.
- Tournaments — composition of game APIs.
- Analysis engine / hints — explicitly post-game, separate API.
- Chat — separate sub-channel; not part of the game contract.
Non-functional:
- Latency: submit move
<= 100 ms p95; spectator delivery<= 500 ms p95. - Throughput: 10k concurrent games; 50k concurrent spectators.
- Availability: 99.95%. Brief outages must preserve in-flight game state.
- Move validation: 100% server-side. Clients are untrusted.
- Clock precision: 100 ms.
- Persistence: every move durable before the response returns.
Use case diagram#
┌──────────────┐ ┌──────────────┐ │ Player │ │ Spectator │ └──────┬───────┘ └──────┬───────┘ │ │ ┌──────┼─────────────┐ │ ▼ ▼ ▼ ▼[create] [move] [resign/draw] [stream board] │ │ │ │ └──────┴──────┬──────┘ │ ▼ ▼ ┌──────────────────────────────────────────┐ │ Chess API │ └────────────────────┬─────────────────────┘ │ ┌────────────┼─────────────┐ ▼ ▼ ▼ [GameDB] [ClockSvc] [Pub-Sub bus] (fan-out to spectators)Two actors. Players write moves; spectators only read.
Class diagram#
┌───────────────────────┐ │ ChessService │ ├───────────────────────┤ │ createGame(req): Game │ │ submitMove(id, mv) │ │ resign(id) │ │ offerDraw(id) │ │ getGame(id): GameState│ │ stream(id): SSE │ └──────────┬────────────┘ │ owns ▼ ┌───────────────────────┐ ┌─────────────────────┐ │ Game │ 1 ─── * │ Move │ ├───────────────────────┤ ├─────────────────────┤ │ id │ │ ply │ │ white_player_id │ │ uci ("e2e4") │ │ black_player_id │ │ san ("e4") │ │ status │ │ fen_after │ │ result │ │ time_left_ms │ │ time_control │ │ submitted_at │ │ fen (current) │ └─────────────────────┘ │ ply_count │ │ white_clock_ms │ ┌─────────────────────┐ │ black_clock_ms │ │ Clock │ │ last_move_at │ ├─────────────────────┤ │ result_reason │ │ game_id │ └───────────────────────┘ │ white_ms, black_ms │ │ active_side │ │ last_tick_at │ └─────────────────────┘Game carries the authoritative state. Move is append-only — once submitted and validated, never mutated. Clock is a separate row updated atomically with each move so the server has a definitive clock-state, not a derived one.
The time_control field is a structured tuple {base_seconds, increment_seconds}: blitz 5+0 is {300, 0}, blitz 3+2 is {180, 2}.
Sequence diagram (key flows)#
The submit move flow — the critical write path:
Player ChessAPI Validator GameDB Pub-Sub │ POST /games/{id}/moves │ │ │ │ body: {uci: "e2e4"} │ │ │ │──────────────────►│ │ │ │ │ │ lock game │ │ │ │ │ row │ │ │ │ │──────────────────────────►│ │ │ │ verify it's player's turn │ │ │ │ verify move legality │ │ │ │ on current FEN │ │ │ │──────────►│ │ │ │ │ valid + new FEN │ │ │ │◄──────────│ │ │ │ │ compute new clock │ │ │ │ persist Move + Game fen + │ │ │ │ Clock atomically │ │ │ │──────────────────────────►│ │ │ │ ok │ │ │ │◄──────────────────────────│ │ │ │ publish event │ │ │ │ {game_id, ply, fen, ...} │ │ │ │ ───────────────────────────────────────►│ │ 200 + new state │ │ │ │◄──────────────────│ │ │ │ ▼ [spectators fan-out] [opponent SSE push]The spectator stream is a long-lived SSE connection — opponent and spectators receive the same move.played event:
Spectator ChessAPI Pub-Sub │ GET /games/{id}/stream (Accept: text/event-stream) │──────────────────►│ │ │ │ subscribe │ │ │────────────►│ │ initial state event │ │◄──────────────────│ │ │ │ │ │ ... long-lived connection ... │ │ │ │ │ │ ◄── move.played event │ event: move │ │ │ data: {ply,fen} │ │ │◄──────────────────│ │ │ │ │ │ event: clock │ ◄── clock.tick (every 1 s) │◄──────────────────│ │The timeout flow runs out-of-band — a clock-service tick detects expiry:
ClockTicker (1 Hz) ChessAPI GameDB Pub-Sub │ scan active games │ │ │ │ where active clock ≤ 0 │ │ │──────────────────►│ │ │ │ │ atomic update │ │ │ │ status=TimedOut │ │ │ │ result=opponent │ │ │ │ wins on time │ │ │ │────────────────►│ │ │ │ publish event │ │ │ │ ────────────────────────────►│Activity diagram (for non-trivial state)#
Game.status is the canonical state machine — defending this picture is most of the interview:
[POST /games] │ ▼ ┌──────────┐ │ Pending │ (waiting for opponent) └────┬─────┘ │ opponent joins ▼ ┌────────────┐ │ InProgress │◄────┐ └────┬───────┘ │ valid move │ │ (loop on self) └─────────────┘ │ ┌───────────────┼────────────┬─────────────┬────────────┐ │ checkmate │ stalemate │ resign │ draw │ timeout ▼ ▼ ▼ ▼ ▼┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐│ Checkmate │ │Stalemate │ │ Resigned │ │ Draw │ │ TimedOut │└───────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘Five terminal states, all reachable from InProgress. The validator computes checkmate / stalemate after each move; resignation and draw are explicit player actions; timeout fires from the clock service.
Draw is the most subtle: it has sub-cases — agreement (both sides agree), threefold repetition (auto), fifty-move rule (auto), insufficient material (auto), stalemate (a separate terminal). We expose result_reason on Game to distinguish.
API implementation#
Endpoint catalogue#
| Method | Path | Purpose |
|---|---|---|
POST | /v1/games | Create new game (or match-find) |
GET | /v1/games/{id} | Get current game state |
POST | /v1/games/{id}/moves | Submit move (UCI) |
POST | /v1/games/{id}/resign | Resign |
POST | /v1/games/{id}/draw:offer | Offer draw |
POST | /v1/games/{id}/draw:accept | Accept open draw offer |
POST | /v1/games/{id}/draw:decline | Decline open draw offer |
GET | /v1/games/{id}/stream | SSE stream of board + clock events |
GET | /v1/games/{id}/moves | Move history (PGN-equivalent) |
The split between moves (write, single move) and stream (read-only, all events) keeps the contract clean. :offer, :accept, :decline are resource-actions because draw is a multi-step protocol that doesn’t fit pure CRUD.
OpenAPI schema (excerpt)#
paths: /v1/games: post: operationId: createGame requestBody: required: true content: application/json: schema: type: object required: [time_control] properties: color: type: string enum: [white, black, random] default: random time_control: type: object required: [base_seconds, increment_seconds] properties: base_seconds: type: integer minimum: 30 maximum: 10800 increment_seconds: type: integer minimum: 0 maximum: 60 opponent_id: type: string nullable: true start_fen: type: string nullable: true description: Custom start position (Chess960 etc.) responses: '201': description: Game created content: application/json: schema: $ref: '#/components/schemas/Game' /v1/games/{id}/moves: post: operationId: submitMove parameters: - { name: id, in: path, required: true, schema: { type: string } } requestBody: required: true content: application/json: schema: type: object required: [uci] properties: uci: type: string pattern: '^[a-h][1-8][a-h][1-8][qrbn]?$' client_clock_ms: type: integer description: Client-reported clock; server authoritative responses: '200': description: Accepted content: application/json: schema: $ref: '#/components/schemas/Game' '409': description: Not your turn, or illegal move /v1/games/{id}/stream: get: operationId: streamGame parameters: - { name: id, in: path, required: true, schema: { type: string } } responses: '200': description: Server-sent event stream content: text/event-stream: schema: { type: string }components: schemas: Game: type: object required: [id, status, fen, ply_count, white_clock_ms, black_clock_ms] properties: id: { type: string } white_player_id: { type: string } black_player_id: { type: string } status: type: string enum: [Pending, InProgress, Checkmate, Stalemate, Resigned, Draw, TimedOut] result: type: string enum: ["1-0", "0-1", "1/2-1/2", "*"] result_reason: type: string nullable: true fen: { type: string } ply_count: { type: integer } white_clock_ms: { type: integer } black_clock_ms: { type: integer } time_control: type: object properties: base_seconds: { type: integer } increment_seconds: { type: integer } last_move_at: { type: string, format: date-time }SSE stream — raw wire#
The spectator endpoint emits two event types:
HTTP/1.1 200 OKContent-Type: text/event-streamCache-Control: no-cacheConnection: keep-alive
event: statedata: {"fen":"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1","ply":0,"white_clock_ms":300000,"black_clock_ms":300000}
event: movedata: {"ply":1,"uci":"e2e4","san":"e4","fen_after":"rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1","white_clock_ms":299870}
event: clockdata: {"white_clock_ms":299870,"black_clock_ms":298120}
event: game_overdata: {"status":"Checkmate","result":"1-0","result_reason":"checkmate"}Clients use the standard EventSource (browser) or eventsource (Node/Python) to consume. Reconnect with Last-Event-Id is supported — on reconnect, the server replays any events after the given ID.
Client samples — three languages#
The create-game-then-submit-move flow.
import requests
API = "https://api.example.com"TOKEN = "Bearer eyJhbGciOi..."
def create_game(time_control_seconds=300, increment=0): return requests.post( f"{API}/v1/games", json={ "color": "white", "time_control": { "base_seconds": time_control_seconds, "increment_seconds": increment, }, }, headers={"Authorization": TOKEN}, ).json()
def submit_move(game_id, uci): r = requests.post( f"{API}/v1/games/{game_id}/moves", json={"uci": uci}, headers={"Authorization": TOKEN}, timeout=2, ) if r.status_code == 409: raise RuntimeError(f"illegal move: {r.json().get('detail')}") r.raise_for_status() return r.json()
# Stream board updates as a spectatordef watch(game_id): with requests.get( f"{API}/v1/games/{game_id}/stream", headers={"Authorization": TOKEN, "Accept": "text/event-stream"}, stream=True, ) as s: for line in s.iter_lines(): if line.startswith(b"data: "): print(line[6:].decode())
g = create_game(300, 0)submit_move(g["id"], "e2e4")package main
import ( "bufio" "bytes" "encoding/json" "fmt" "net/http")
const API = "https://api.example.com"const TOKEN = "Bearer eyJhbGciOi..."
type Game struct { ID string `json:"id"` Status string `json:"status"` FEN string `json:"fen"`}
func createGame(base, inc int) (*Game, error) { body, _ := json.Marshal(map[string]any{ "color": "white", "time_control": map[string]int{ "base_seconds": base, "increment_seconds": inc, }, }) req, _ := http.NewRequest("POST", API+"/v1/games", bytes.NewReader(body)) req.Header.Set("Authorization", TOKEN) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close()
var g Game json.NewDecoder(resp.Body).Decode(&g) return &g, nil}
func submitMove(gameID, uci string) (*Game, error) { body, _ := json.Marshal(map[string]string{"uci": uci}) req, _ := http.NewRequest("POST", fmt.Sprintf("%s/v1/games/%s/moves", API, gameID), bytes.NewReader(body)) req.Header.Set("Authorization", TOKEN) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { return nil, err } defer resp.Body.Close()
if resp.StatusCode == 409 { return nil, fmt.Errorf("illegal move") } var g Game json.NewDecoder(resp.Body).Decode(&g) return &g, nil}
func watch(gameID string) error { req, _ := http.NewRequest("GET", fmt.Sprintf("%s/v1/games/%s/stream", API, gameID), nil) req.Header.Set("Authorization", TOKEN) req.Header.Set("Accept", "text/event-stream") resp, err := http.DefaultClient.Do(req) if err != nil { return err } defer resp.Body.Close()
scanner := bufio.NewScanner(resp.Body) for scanner.Scan() { line := scanner.Text() if len(line) > 6 && line[:6] == "data: " { fmt.Println(line[6:]) } } return scanner.Err()}
func main() { g, _ := createGame(300, 0) submitMove(g.ID, "e2e4")}const API = "https://api.example.com";const TOKEN = "Bearer eyJhbGciOi...";
async function createGame(baseSeconds = 300, increment = 0) { const r = await fetch(`${API}/v1/games`, { method: "POST", headers: { Authorization: TOKEN, "Content-Type": "application/json" }, body: JSON.stringify({ color: "white", time_control: { base_seconds: baseSeconds, increment_seconds: increment }, }), }); return r.json();}
async function submitMove(gameId, uci) { const r = await fetch(`${API}/v1/games/${gameId}/moves`, { method: "POST", headers: { Authorization: TOKEN, "Content-Type": "application/json" }, body: JSON.stringify({ uci }), }); if (r.status === 409) throw new Error("illegal move"); return r.json();}
async function* watch(gameId) { const r = await fetch(`${API}/v1/games/${gameId}/stream`, { headers: { Authorization: TOKEN, Accept: "text/event-stream" }, }); const reader = r.body.getReader(); const decoder = new TextDecoder(); let buf = ""; while (true) { const { value, done } = await reader.read(); if (done) return; buf += decoder.decode(value, { stream: true }); let i; while ((i = buf.indexOf("\n")) >= 0) { const line = buf.slice(0, i); buf = buf.slice(i + 1); if (line.startsWith("data: ")) yield JSON.parse(line.slice(6)); } }}
const g = await createGame(300, 0);await submitMove(g.id, "e2e4");for await (const ev of watch(g.id)) console.log(ev);Latency budget#
| Phase | Budget | Notes |
|---|---|---|
| Auth + rate limit | 5 ms | Cached JWT |
| Game row lock | 5 ms | Single primary key |
| Move validation | 10 ms | Pure CPU; chess engine in-process |
| Atomic commit (FEN + Clock + Move) | 30 ms p95 | Quorum write |
| Pub-sub publish (async) | included | Off the response path |
| Serialize + transport | 15 ms | JSON, ~2 KB |
| Margin | 35 ms | Slow shard / GC pause |
| Total | ~100 ms | At budget |
The spectator-delivery budget (<= 500 ms) is dominated by the pub-sub fan-out and the SSE write — both well within the headroom.
Trade-offs and extensions#
| Decision | Why | Cost if requirements change |
|---|---|---|
| Server-authoritative move validation | Anti-cheat baseline | Higher CPU per request |
| FEN as canonical state | Standard; bounded size | Custom variants need an extended encoding |
| Single atomic write (FEN + Clock + Move) | Avoids partial-update bugs | Slightly heavier write path |
| SSE for spectator stream | Browser-native; HTTP-only | One-way; chat needs a separate channel |
| Lichess-style 100 ms clock | Tight enough for blitz | Bullet (1+0) is tense at this precision |
| Resource-action verbs for draw | Multi-step protocol | Less REST-pure |
| Move history endpoint separate from stream | Stream is push-only | Two endpoints for “what happened” |
A clean contrast on transport for the spectator stream:
SSE (chosen)
- One-way (server → client)
- HTTP-only; reverse proxy friendly
- Auto-reconnect with
Last-Event-Id - No native binary frames
- Browser
EventSourcebuilt-in
WebSocket (alternative)
- Bidirectional
- Binary frames cheap (encode FEN as bytes)
- More complex auth (no per-request headers)
- Needs sticky session
- Worth it only if chat lives on the same channel
Likely follow-up extensions:
- Anti-cheat hook. Every move triggers an async engine-correlation check; suspicious patterns are flagged. The API doesn’t change shape — anti-cheat is a downstream consumer of the pub-sub stream.
- Takebacks. Player requests a takeback; opponent accepts. A
takeback:offer/takeback:acceptmirroring the draw protocol. Updates FEN to the previous ply. - Move premoves. Client sends a conditional move “if opponent plays X, submit Y”. Server validates after opponent’s move lands.
- Variants (Chess960, antichess, atomic). The validator becomes pluggable;
start_fenandrules_variantgo increateGame. - Match-finding. Out of scope in v1; usually a separate queue service that creates the game when a pair is matched.
Mock interview follow-ups#
- “What stops a player from cheating with an engine?” — The API itself can’t; it’s enforced by the anti-cheat consumer of the move stream (timing analysis + engine-correlation). The API’s only job is to record what was played; cheating detection is downstream.
- “How do you handle clock drift between client and server?” — Server’s clock is canonical. Client’s
client_clock_msis advisory only — used to render UI; the server’s clock decides timeouts. NTP-synced server clocks suffice. - “What happens if the client disconnects mid-game?” — Game state is durable. Client reconnects, calls
GET /games/{id}, resumes. If the player’s clock expires while disconnected, they lose on time — same rule as over-the-board play. - “Why SSE and not WebSocket?” — SSE is one-way (server → spectators), HTTP-only (works through proxies), and has built-in reconnect-from-last-event semantics. WebSocket is overkill for a one-way fan-out stream.
- “How do you scale spectator fan-out?” — Each game’s events publish to a topic; subscribers fan out at the edge. A viral game with 50k spectators is 50k SSE connections across N gateway nodes — handle it with sticky routing and a slim per-connection memory budget.
- “What about draws by threefold repetition?” — The validator tracks position hashes per ply; if the same position recurs three times with the same side to move and castling rights,
result_reason: threefoldis set and the game terminates. The fifty-move rule is similar —halfmove_clockin FEN. - “How would you support takebacks without breaking move integrity?” — Append a
Movewithtype=takebackrather than mutating prior moves. UpdateGame.fenandply_countto point at the new tip; the history remains an audit trail.
Related#
- Design a Search Service API — game-search (by player, opening, time control) is a downstream consumer.
- Design a Gaming API — the broader turn-based / real-time gaming context.
- Design a Pub-Sub Service API — the event bus that fans game-state out to spectators and anti-cheat.
- Design the LeetCode API — sibling write-heavy stateful API (submissions vs moves).
- The API-Design Walk-through — the seven-step recipe this writeup followed.