Design a Chess API

Game lifecycle, move validation, time control, spectator stream. The cleanest turn-based-game API in the catalogue.

System Intermediate
15 min read
chess api-design turn-based sse

Context#

A chess API is the cleanest turn-based-game design problem in the interview catalogue. The rules are well-defined and centuries old, the state space is finite and serialisable to a single string (FEN), and the move language is standardised (UCI / SAN). Everything that’s interesting about real-time multiplayer infrastructure — authoritative server, anti-cheat, clock synchronisation, spectator fan-out — shows up at small enough scope to actually finish in 60 minutes.

Lichess and Chess.com are the production reference points. Both run authoritative servers, both stream board updates to spectators, both implement Glicko or Elo rating, both detect cheating server-side. Their APIs are public and battle-tested at millions of games per day.

The interviewer’s hidden objectives:

  • Do you make the server authoritative for move validation, not the client?
  • Can you model the game-state machine completely — including draws, resignations, timeouts?
  • Do you encode the board state as FEN (or equivalent) rather than reinvent it?
  • Do you handle time control as a server-side clock, not a client wall-clock?
  • Do you separate the player API (writes moves) from the spectator API (read-only stream)?

Two design pitfalls to avoid: trusting the client (cheating becomes trivial), and using request-response for the spectator stream (latency and connection overhead explode).

Requirements (functional and non-functional)#

Functional — in scope:

  • Create a new game: choose colour, time control, optional opponent (or match-find).
  • Submit a move (UCI notation: e2e4, e7e8q for promotion).
  • Resign or offer / accept a draw.
  • Query current game state (board, clocks, last move).
  • Stream board state to spectators in real time.
  • Time control: bullet (1+0), blitz (5+0), rapid (10+0), classical (30+0). Each game has a clock per player.

Functional — out of scope:

  • Match-finding queue — separate service.
  • Player rating updates — handled by an async post-game pipeline.
  • Tournaments — composition of game APIs.
  • Analysis engine / hints — explicitly post-game, separate API.
  • Chat — separate sub-channel; not part of the game contract.

Non-functional:

  • Latency: submit move <= 100 ms p95; spectator delivery <= 500 ms p95.
  • Throughput: 10k concurrent games; 50k concurrent spectators.
  • Availability: 99.95%. Brief outages must preserve in-flight game state.
  • Move validation: 100% server-side. Clients are untrusted.
  • Clock precision: 100 ms.
  • Persistence: every move durable before the response returns.

Use case diagram#

┌──────────────┐ ┌──────────────┐
│ Player │ │ Spectator │
└──────┬───────┘ └──────┬───────┘
│ │
┌──────┼─────────────┐ │
▼ ▼ ▼ ▼
[create] [move] [resign/draw] [stream board]
│ │ │ │
└──────┴──────┬──────┘ │
▼ ▼
┌──────────────────────────────────────────┐
│ Chess API │
└────────────────────┬─────────────────────┘
┌────────────┼─────────────┐
▼ ▼ ▼
[GameDB] [ClockSvc] [Pub-Sub bus]
(fan-out to
spectators)

Two actors. Players write moves; spectators only read.

Class diagram#

┌───────────────────────┐
│ ChessService │
├───────────────────────┤
│ createGame(req): Game │
│ submitMove(id, mv) │
│ resign(id) │
│ offerDraw(id) │
│ getGame(id): GameState│
│ stream(id): SSE │
└──────────┬────────────┘
│ owns
┌───────────────────────┐ ┌─────────────────────┐
│ Game │ 1 ─── * │ Move │
├───────────────────────┤ ├─────────────────────┤
│ id │ │ ply │
│ white_player_id │ │ uci ("e2e4") │
│ black_player_id │ │ san ("e4") │
│ status │ │ fen_after │
│ result │ │ time_left_ms │
│ time_control │ │ submitted_at │
│ fen (current) │ └─────────────────────┘
│ ply_count │
│ white_clock_ms │ ┌─────────────────────┐
│ black_clock_ms │ │ Clock │
│ last_move_at │ ├─────────────────────┤
│ result_reason │ │ game_id │
└───────────────────────┘ │ white_ms, black_ms │
│ active_side │
│ last_tick_at │
└─────────────────────┘

Game carries the authoritative state. Move is append-only — once submitted and validated, never mutated. Clock is a separate row updated atomically with each move so the server has a definitive clock-state, not a derived one.

The time_control field is a structured tuple {base_seconds, increment_seconds}: blitz 5+0 is {300, 0}, blitz 3+2 is {180, 2}.

Sequence diagram (key flows)#

The submit move flow — the critical write path:

Player ChessAPI Validator GameDB Pub-Sub
│ POST /games/{id}/moves │ │ │
│ body: {uci: "e2e4"} │ │ │
│──────────────────►│ │ │ │
│ │ lock game │ │ │
│ │ row │ │ │
│ │──────────────────────────►│ │
│ │ verify it's player's turn │ │
│ │ verify move legality │ │
│ │ on current FEN │ │
│ │──────────►│ │ │
│ │ valid + new FEN │ │
│ │◄──────────│ │ │
│ │ compute new clock │ │
│ │ persist Move + Game fen + │ │
│ │ Clock atomically │ │
│ │──────────────────────────►│ │
│ │ ok │ │
│ │◄──────────────────────────│ │
│ │ publish event │ │
│ │ {game_id, ply, fen, ...} │ │
│ │ ───────────────────────────────────────►│
│ 200 + new state │ │ │
│◄──────────────────│ │ │
[spectators fan-out]
[opponent SSE push]

The spectator stream is a long-lived SSE connection — opponent and spectators receive the same move.played event:

Spectator ChessAPI Pub-Sub
│ GET /games/{id}/stream (Accept: text/event-stream)
│──────────────────►│ │
│ │ subscribe │
│ │────────────►│
│ initial state event │
│◄──────────────────│ │
│ │ │
│ ... long-lived connection ... │
│ │ │
│ │ ◄── move.played event
│ event: move │ │
│ data: {ply,fen} │ │
│◄──────────────────│ │
│ │ │
│ event: clock │ ◄── clock.tick (every 1 s)
│◄──────────────────│ │

The timeout flow runs out-of-band — a clock-service tick detects expiry:

ClockTicker (1 Hz) ChessAPI GameDB Pub-Sub
│ scan active games │ │ │
│ where active clock ≤ 0 │ │
│──────────────────►│ │ │
│ │ atomic update │ │
│ │ status=TimedOut │ │
│ │ result=opponent │ │
│ │ wins on time │ │
│ │────────────────►│ │
│ │ publish event │ │
│ │ ────────────────────────────►│

Activity diagram (for non-trivial state)#

Game.status is the canonical state machine — defending this picture is most of the interview:

[POST /games]
┌──────────┐
│ Pending │ (waiting for opponent)
└────┬─────┘
│ opponent joins
┌────────────┐
│ InProgress │◄────┐
└────┬───────┘ │ valid move
│ │ (loop on self)
└─────────────┘
┌───────────────┼────────────┬─────────────┬────────────┐
│ checkmate │ stalemate │ resign │ draw │ timeout
▼ ▼ ▼ ▼ ▼
┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Checkmate │ │Stalemate │ │ Resigned │ │ Draw │ │ TimedOut │
└───────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘

Five terminal states, all reachable from InProgress. The validator computes checkmate / stalemate after each move; resignation and draw are explicit player actions; timeout fires from the clock service.

Draw is the most subtle: it has sub-cases — agreement (both sides agree), threefold repetition (auto), fifty-move rule (auto), insufficient material (auto), stalemate (a separate terminal). We expose result_reason on Game to distinguish.

API implementation#

Endpoint catalogue#

MethodPathPurpose
POST/v1/gamesCreate new game (or match-find)
GET/v1/games/{id}Get current game state
POST/v1/games/{id}/movesSubmit move (UCI)
POST/v1/games/{id}/resignResign
POST/v1/games/{id}/draw:offerOffer draw
POST/v1/games/{id}/draw:acceptAccept open draw offer
POST/v1/games/{id}/draw:declineDecline open draw offer
GET/v1/games/{id}/streamSSE stream of board + clock events
GET/v1/games/{id}/movesMove history (PGN-equivalent)

The split between moves (write, single move) and stream (read-only, all events) keeps the contract clean. :offer, :accept, :decline are resource-actions because draw is a multi-step protocol that doesn’t fit pure CRUD.

OpenAPI schema (excerpt)#

OpenAPI 3.1 — Chess API (create + submit move + state)
paths:
/v1/games:
post:
operationId: createGame
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [time_control]
properties:
color:
type: string
enum: [white, black, random]
default: random
time_control:
type: object
required: [base_seconds, increment_seconds]
properties:
base_seconds:
type: integer
minimum: 30
maximum: 10800
increment_seconds:
type: integer
minimum: 0
maximum: 60
opponent_id:
type: string
nullable: true
start_fen:
type: string
nullable: true
description: Custom start position (Chess960 etc.)
responses:
'201':
description: Game created
content:
application/json:
schema:
$ref: '#/components/schemas/Game'
/v1/games/{id}/moves:
post:
operationId: submitMove
parameters:
- { name: id, in: path, required: true, schema: { type: string } }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [uci]
properties:
uci:
type: string
pattern: '^[a-h][1-8][a-h][1-8][qrbn]?$'
client_clock_ms:
type: integer
description: Client-reported clock; server authoritative
responses:
'200':
description: Accepted
content:
application/json:
schema:
$ref: '#/components/schemas/Game'
'409':
description: Not your turn, or illegal move
/v1/games/{id}/stream:
get:
operationId: streamGame
parameters:
- { name: id, in: path, required: true, schema: { type: string } }
responses:
'200':
description: Server-sent event stream
content:
text/event-stream:
schema: { type: string }
components:
schemas:
Game:
type: object
required: [id, status, fen, ply_count, white_clock_ms, black_clock_ms]
properties:
id: { type: string }
white_player_id: { type: string }
black_player_id: { type: string }
status:
type: string
enum: [Pending, InProgress, Checkmate, Stalemate, Resigned, Draw, TimedOut]
result:
type: string
enum: ["1-0", "0-1", "1/2-1/2", "*"]
result_reason:
type: string
nullable: true
fen: { type: string }
ply_count: { type: integer }
white_clock_ms: { type: integer }
black_clock_ms: { type: integer }
time_control:
type: object
properties:
base_seconds: { type: integer }
increment_seconds: { type: integer }
last_move_at: { type: string, format: date-time }

SSE stream — raw wire#

The spectator endpoint emits two event types:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
event: state
data: {"fen":"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1","ply":0,"white_clock_ms":300000,"black_clock_ms":300000}
event: move
data: {"ply":1,"uci":"e2e4","san":"e4","fen_after":"rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1","white_clock_ms":299870}
event: clock
data: {"white_clock_ms":299870,"black_clock_ms":298120}
event: game_over
data: {"status":"Checkmate","result":"1-0","result_reason":"checkmate"}

Clients use the standard EventSource (browser) or eventsource (Node/Python) to consume. Reconnect with Last-Event-Id is supported — on reconnect, the server replays any events after the given ID.

Client samples — three languages#

The create-game-then-submit-move flow.

Chess client — Python
import requests
API = "https://api.example.com"
TOKEN = "Bearer eyJhbGciOi..."
def create_game(time_control_seconds=300, increment=0):
return requests.post(
f"{API}/v1/games",
json={
"color": "white",
"time_control": {
"base_seconds": time_control_seconds,
"increment_seconds": increment,
},
},
headers={"Authorization": TOKEN},
).json()
def submit_move(game_id, uci):
r = requests.post(
f"{API}/v1/games/{game_id}/moves",
json={"uci": uci},
headers={"Authorization": TOKEN},
timeout=2,
)
if r.status_code == 409:
raise RuntimeError(f"illegal move: {r.json().get('detail')}")
r.raise_for_status()
return r.json()
# Stream board updates as a spectator
def watch(game_id):
with requests.get(
f"{API}/v1/games/{game_id}/stream",
headers={"Authorization": TOKEN, "Accept": "text/event-stream"},
stream=True,
) as s:
for line in s.iter_lines():
if line.startswith(b"data: "):
print(line[6:].decode())
g = create_game(300, 0)
submit_move(g["id"], "e2e4")

Latency budget#

PhaseBudgetNotes
Auth + rate limit5 msCached JWT
Game row lock5 msSingle primary key
Move validation10 msPure CPU; chess engine in-process
Atomic commit (FEN + Clock + Move)30 ms p95Quorum write
Pub-sub publish (async)includedOff the response path
Serialize + transport15 msJSON, ~2 KB
Margin35 msSlow shard / GC pause
Total~100 msAt budget

The spectator-delivery budget (<= 500 ms) is dominated by the pub-sub fan-out and the SSE write — both well within the headroom.

Trade-offs and extensions#

DecisionWhyCost if requirements change
Server-authoritative move validationAnti-cheat baselineHigher CPU per request
FEN as canonical stateStandard; bounded sizeCustom variants need an extended encoding
Single atomic write (FEN + Clock + Move)Avoids partial-update bugsSlightly heavier write path
SSE for spectator streamBrowser-native; HTTP-onlyOne-way; chat needs a separate channel
Lichess-style 100 ms clockTight enough for blitzBullet (1+0) is tense at this precision
Resource-action verbs for drawMulti-step protocolLess REST-pure
Move history endpoint separate from streamStream is push-onlyTwo endpoints for “what happened”

A clean contrast on transport for the spectator stream:

SSE (chosen)

  • One-way (server → client)
  • HTTP-only; reverse proxy friendly
  • Auto-reconnect with Last-Event-Id
  • No native binary frames
  • Browser EventSource built-in

WebSocket (alternative)

  • Bidirectional
  • Binary frames cheap (encode FEN as bytes)
  • More complex auth (no per-request headers)
  • Needs sticky session
  • Worth it only if chat lives on the same channel

Likely follow-up extensions:

  • Anti-cheat hook. Every move triggers an async engine-correlation check; suspicious patterns are flagged. The API doesn’t change shape — anti-cheat is a downstream consumer of the pub-sub stream.
  • Takebacks. Player requests a takeback; opponent accepts. A takeback:offer / takeback:accept mirroring the draw protocol. Updates FEN to the previous ply.
  • Move premoves. Client sends a conditional move “if opponent plays X, submit Y”. Server validates after opponent’s move lands.
  • Variants (Chess960, antichess, atomic). The validator becomes pluggable; start_fen and rules_variant go in createGame.
  • Match-finding. Out of scope in v1; usually a separate queue service that creates the game when a pair is matched.

Mock interview follow-ups#

  • “What stops a player from cheating with an engine?” — The API itself can’t; it’s enforced by the anti-cheat consumer of the move stream (timing analysis + engine-correlation). The API’s only job is to record what was played; cheating detection is downstream.
  • “How do you handle clock drift between client and server?” — Server’s clock is canonical. Client’s client_clock_ms is advisory only — used to render UI; the server’s clock decides timeouts. NTP-synced server clocks suffice.
  • “What happens if the client disconnects mid-game?” — Game state is durable. Client reconnects, calls GET /games/{id}, resumes. If the player’s clock expires while disconnected, they lose on time — same rule as over-the-board play.
  • “Why SSE and not WebSocket?” — SSE is one-way (server → spectators), HTTP-only (works through proxies), and has built-in reconnect-from-last-event semantics. WebSocket is overkill for a one-way fan-out stream.
  • “How do you scale spectator fan-out?” — Each game’s events publish to a topic; subscribers fan out at the edge. A viral game with 50k spectators is 50k SSE connections across N gateway nodes — handle it with sticky routing and a slim per-connection memory budget.
  • “What about draws by threefold repetition?” — The validator tracks position hashes per ply; if the same position recurs three times with the same side to move and castling rights, result_reason: threefold is set and the game terminates. The fifty-move rule is similar — halfmove_clock in FEN.
  • “How would you support takebacks without breaking move integrity?” — Append a Move with type=takeback rather than mutating prior moves. Update Game.fen and ply_count to point at the new tip; the history remains an audit trail.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.