Data Fetching Patterns — API Design

Summary#

How a client gets data from an API has four independent design levers:

Eager vs lazy — does the server push the full graph on first call, or hand the client a shallow root and let it follow links?
Batch vs single — does one request return N items, or does the client fire N requests each returning one?
Paginated vs streamed — for large result sets, does the client ask for pages with cursors, or does the server keep a connection open and push as data arrives?
Cached vs uncached — is the response served from a cache (browser, CDN, gateway) or computed fresh every time?

These four levers are orthogonal. You can have a lazy + batched + paginated + cached API (typical REST list endpoint with embedded relations and a CDN). You can have an eager + single + streamed + uncached API (a real-time tail of a single resource, like a chat thread). The right combination depends on the access pattern — how often the data is read, how big each item is, how fresh it must be, how predictable the client’s needs are.

Most APIs ship with a default combination and then discover, three years in, that some endpoints want the opposite. The senior move is to name the four levers up front and pick deliberately per endpoint.

Why it matters#

Three reasons fetching patterns dominate API performance:

The wrong combination wastes round-trips or wastes payload. N+1 query problems on the server have a mirror image on the API: N+1 round-trips from the client. A single eager call beats a hundred lazy calls every time on cellular networks. A small paginated call beats a fat eager one when the user only looks at the first page.
Pagination is where most APIs ship a bug. Offset pagination breaks on insertion. Page-number pagination breaks on deletion. Cursor pagination is the only one that’s stable under writes, and most APIs don’t ship cursor pagination on day one — they ship ?page=2 and regret it.
Streaming is the right tool more often than teams think. Long-running queries, large result sets, live data, AI completion responses. A server-sent event stream beats a poll every time if the client wants liveness; a gRPC server-stream beats N batched calls when the result is incrementally produced.

The senior signal in an interview: “Before I design any endpoint, I name the four levers: eager vs lazy, batch vs single, paginated vs streamed, cached vs uncached. Then I pick per endpoint, not per API.”

How it works#

Eager vs lazy#

Eager: the server returns the requested resource plus its commonly-needed related resources in one response. Saves round-trips. Costs payload size and server work even when the client doesn’t use the extras.

Lazy: the server returns only the requested resource; the client follows links (_links.author.href, GraphQL nested fields, HATEOAS) to fetch related data. Smaller per-call payloads. More round-trips when the client wants the graph.

GET /posts/42
→ { "id": 42, "title": "...", "author_id": 7 }

GET /users/7
→ { "id": 7, "name": "..." }

GET /posts/42?include=author
→ {
    "id": 42, "title": "...",
    "author": { "id": 7, "name": "..." }
  }

REST APIs often allow both via ?include= or ?embed= query parameters (JSON:API formalises this). GraphQL is eager-by-default — the client says what to include in the query. gRPC is whatever the proto says.

The right call depends on how often the client wants the related resource. If 95% of the calls to /posts/42 also need the author, embed it. If 5% do, leave it lazy.

Batch vs single#

Single: each request asks for one item. RESTful, cacheable per URL, but expensive on round-trips for bulk reads.

GET /users/1
GET /users/2
GET /users/3

Batch: one request returns many items.

GET /users?ids=1,2,3
→ { "users": [ {...}, {...}, {...} ] }

Three batch-API design choices recur:

By IDs — GET /users?ids=1,2,3. Works for known IDs. Limits: URL length on GET (use POST for large batches), partial-failure semantics (200 with errors per item vs. fail-the-batch).
By filter — GET /users?status=active&limit=20. The list endpoint, paginated. Most APIs ship this.
Multi-call envelope — POST /batch with an array of sub-requests, each its own URL/verb/body. GitHub did this; AWS does this; the failure semantics are complex enough that most teams regret it.

The N+1 round-trip problem is the canonical reason to batch. A timeline view that calls /posts/N then /users/<author> for each post is 1 + N round-trips. Batching the user fetch to /users?ids=... makes it 2.

DataLoader (Facebook’s GraphQL batching library) automates this on the server side: collect the N user-by-ID requests within a tick, fire one batch query to the database, return results to each caller. The same idea on the wire is the API designer’s job.

Paginated vs streamed#

Paginated: client asks for a page; server returns N items + a continuation marker.

Three flavours, in increasing quality:

GET /posts?offset=20&limit=20

If row 5 is deleted between page 1 and page 2, page 2 starts at row 21 of the new table — the user skips an item silently. If row 5 is inserted, the user sees an item twice.

GET /posts?page=2&per_page=20

Equivalent to offset; same write-instability.

GET /posts?limit=20&after=eyJpZCI6MTk5LCJ0cyI6Li4ufQ==
→ {
    "items": [...],
    "next_cursor": "eyJpZCI6MTc5LCJ0cyI6Li4ufQ==",
    "has_more": true
  }

The cursor opaquely encodes the last item’s position (commonly (timestamp, id) to break ties). New writes don’t shift the cursor’s meaning; deletions just shrink the result set. Stripe, GitHub, Slack, and Twitter all ship cursor pagination.

Streamed: server keeps the connection open and pushes items as they’re produced. Three wire-level forms:

Server-Sent Events (SSE) — text/event-stream; HTTP-native; one-way server → client; auto-reconnect with Last-Event-ID. Used by OpenAI’s chat-completion streaming, GitHub’s notifications stream, browser-facing live tails.
WebSocket — full-duplex; long-lived; binary or text; client-side reconnection logic required.
gRPC server-streaming — server pushes a sequence of messages on one RPC; client cancels by closing the stream.
Chunked HTTP — Transfer-Encoding: chunked; the lowest-level form, the substrate that SSE sits on.

Stream when the result is produced over time (an LLM generating tokens; a query that finishes incrementally) or the data is live (chat, prices, scores). Paginate when the result is sittable-on-disk and the user navigates discretely.

Cached vs uncached#

Orthogonal to all of the above. A GET /users/42 response can sit in:

Browser cache — Cache-Control: private, max-age=60. Fastest possible read; per-user.
CDN edge cache — Cache-Control: public, max-age=300. Shared across users; s-maxage for the CDN, max-age for the browser.
API gateway cache — between the gateway and the origin; useful for “expensive to compute, the same for every caller”.
Application cache — Redis / Memcached inside the service; not visible to the client.

The cache key is usually the full request URL + selected headers (Authorization, Accept-Encoding). Cache invalidation is the hard part (see caching-at-different-layers).

Streamed responses and personalised responses are usually uncached. Paginated list responses with stable cursors are highly cacheable (the same cursor returns the same items, modulo new inserts).

The four levers, picked deliberately#

Endpoint	Eager/lazy	Batch/single	Paginated/streamed	Cached/uncached
`GET /users/42` (profile)	Lazy	Single	n/a	Cached (60s)
`GET /posts?author=42` (timeline)	Lazy	Batch	Cursor paginated	Cached (300s, private)
`GET /search?q=foo`	Lazy	Batch	Cursor paginated	Cached (60s) by query hash
`POST /chat/stream` (LLM)	n/a	n/a	Streamed (SSE)	Uncached
`GET /metrics/live`	n/a	Batch	Streamed (SSE)	Uncached
`GET /admin/audit-log`	Lazy	Batch	Cursor paginated	Uncached
`GET /catalog` (public, large)	Eager	Batch	Cursor paginated	Cached on CDN (1 hour)

The pattern: profile-shaped reads are lazy + cached; list reads are batch + cursor-paginated; live reads are streamed + uncached; large public reads are cached on CDN; admin reads bypass caches.

Variants and trade-offs#

Pull (request/response). Client asks; server answers. The default. Cacheable, idempotent on GET, easy to debug. Pays in latency for live data — every poll is a round-trip and most return nothing new.

Push (streamed / subscribed). Server pushes when data changes. Lower latency to live updates. Costs a long-lived connection per client, harder to scale fan-out, harder to debug (no per-event HTTP log line).

Lever-by-lever:

Lever	A	B	Pick A when	Pick B when
Eager / lazy	Eager	Lazy	Related resource needed >50% of the time	Mobile + payload-sensitive + related rarely needed
Batch / single	Batch	Single	Bulk reads, N+1 anywhere in sight	Per-item caching is the win, single-item access
Paginated / streamed	Paginated	Streamed	Sittable result set, user navigates	Live data, incrementally produced result
Cached / uncached	Cached	Uncached	Read >> write, same answer across users or short user-tolerance for staleness	Personal real-time data, strict freshness

Pagination flavour, separately:

Pagination	Stable under writes	URL-cacheable	Random-access
Offset (`?offset=20`)	No (skips/duplicates)	Yes	Yes
Page-number (`?page=2`)	No	Yes	Yes
Cursor (`?after=abc`)	Yes	Yes	No (sequential only)
Time-based (`?since=ts`)	Yes	Yes	No

Cursor pagination is the senior default. If random-access pagination is a hard requirement (a UI that jumps to “page 50 of 100”), accept the write-instability and ship offset, with the understanding that the count is approximate.

When this is asked in interviews#

Fetching patterns come up across every API-design interview:

In any list-endpoint design — “how do you paginate this?” The senior answer is cursor pagination, with (timestamp, id) as the encoded cursor, returned as an opaque base64 string. Justify the cursor choice with the write-stability argument.
In any mobile-app design — “how do you avoid N+1 round-trips?” The senior answer is batch reads for related items (or a BFF, or GraphQL) and eager embedding for the 95% common case.
In any live-data design — chat, prices, notifications, AI streaming. The senior answer names SSE for one-way pushes and WebSocket for bidirectional, and notes that long-polling is a thing of the past.
In any large-result-set design — analytics, audit logs, full-table exports. The senior answer is streamed responses (chunked HTTP, gRPC server-streaming) so the client doesn’t OOM and the server doesn’t buffer.

Specific points to make:

Name all four levers explicitly. Eager/lazy, batch/single, paginated/streamed, cached/uncached.
Pick per endpoint, not per API. A profile read and a timeline read have different needs.
Ship cursor pagination on day one. Reference Stripe, GitHub, Slack.
Acknowledge the N+1 round-trip mirror image. Batch endpoints exist to solve it.

The strongest one-liner: “Eager or lazy, batch or single, paginated or streamed, cached or uncached — those are four independent levers, and the right combination is per endpoint, not per API.”

Client-Adapting APIs — the BFF / GraphQL pattern; client-adapting is the policy, fetching is the mechanism.
RESTful API Design in Practice — pagination conventions, embedded relations, the ?include= query parameter.
WebSockets — Bidirectional Streaming — bidirectional streaming when SSE’s one-way push isn’t enough.
Event-Driven Architecture Protocols — webhooks and SSE as the push-shaped alternative to polling.
Caching at Different Layers — browser, CDN, gateway, app — where cached responses live.