Remote Procedure Calls (RPC) — System Design

Summary#

A Remote Procedure Call dresses a network round-trip in the syntax of a function call. The promise — user.get(id) looks the same whether it’s local or remote — is a useful illusion, until it isn’t. Every serious distributed-systems failure mode (partial failure, timeout ambiguity, retries, ordering) lives in the gap between the illusion and the wire.

Why it matters#

RPC is the default vocabulary for service-to-service communication in modern systems, so the interviewer assumes you understand its leaks. The leaks aren’t bugs; they’re the eight fallacies of distributed computing (the network is reliable, latency is zero, bandwidth is infinite…) restated as a critique of the abstraction itself.

If you can name which specific guarantees of a local function call don’t carry across the wire — and what your RPC framework does about each — you’ve already cleared the bar for an SDE-2 system-design loop.

How it works#

Every RPC framework, regardless of branding, is three layers stacked:

Interface Definition Language (IDL). A .proto (gRPC), .thrift (Thrift), or OpenAPI schema. Describes services, methods, request/response types. Generates client and server stubs in N languages.
Serialization format. Protobuf, Thrift binary, JSON, MessagePack, Avro. Determines wire size, schema evolution rules, and CPU cost of encode/decode.
Transport. HTTP/1.1, HTTP/2 (gRPC’s default), QUIC/HTTP/3, raw TCP. Determines multiplexing, head-of-line blocking, and what middleware (proxies, load balancers, observability) can inspect.

A call on the wire is: encode arguments → send over transport → server decodes → executes handler → encodes response → returns. The client stub hides this behind a method signature. Generated stubs handle connection pooling, retries (sometimes), deadlines, and metadata propagation.

The four guarantees that don’t carry across the wire#

No partial failure locally. A function call either returns or throws. A network call can also time out without a verdict — the server may have executed, may have crashed mid-execution, may be slow. This single fact drives every retry, idempotency, and deduplication discussion.
No latency variance locally. In-process calls are nanoseconds; an RPC over a load balancer in the same region is 1–5 ms p50, 50–500 ms p99 on a bad day. Latency tails are not optional knowledge.
No bandwidth limit locally. RPC payloads have to fit a real MTU and a real cross-region bandwidth budget. Designs that pass 10 MB blobs as RPC arguments will be drilled.
No version skew locally. Client and server are deployed independently, so the IDL must evolve in compatible ways. Adding a required field is a deploy-order trap.

Variants and trade-offs#

gRPC / Thrift (binary, schema-first) — small payloads, fast codec, strict schema evolution rules. Streaming first-class. Browser support is awkward (needs gRPC-Web proxy). Hardest to debug in a curl/log workflow.

REST + JSON (resource-oriented, schema-optional) — readable on the wire, every tool speaks it, easy to evolve loosely. Payloads are 3–10x larger; no native streaming; clients hand-roll method signatures.

REST-as-RPC is the common middle ground: the wire format is HTTP+JSON, but the API is verb-shaped (POST /orders/cancel) rather than resource-shaped. Most “REST APIs” in production are this — and that’s fine; the dogmatic resource model rarely pays off.

GraphQL is RPC-shaped from the client’s perspective (queries are typed function calls returning typed data), but pushes selection of fields to the caller. Useful for many-clients / one-backend; over-engineered for service-to-service.

Streaming RPC matters when responses are inherently sequential (LLM token streams, real-time location, log tails). HTTP/2 server-streaming and bidi-streaming are gRPC’s killer feature versus REST.

Why HTTP/2 specifically

HTTP/1.1 has head-of-line blocking per connection — one slow response blocks the queue behind it. HTTP/2 multiplexes streams over a single TCP connection, so an RPC framework can run thousands of concurrent calls over one socket. The catch: TCP-level head-of-line blocking still exists (a dropped packet stalls all streams), which is why HTTP/3 / QUIC over UDP is the next step.

When this is asked in interviews#

Three flavors show up:

The interface step (Step 3 of the walk-through). “Define the API.” A senior candidate uses an IDL-shape sketch — method, args, return, error type, idempotency, deadlines — not a vague “POST /foo”.
The retry / idempotency drill. “Your createOrder RPC times out. What do you do?” Wrong answer: “retry”. Right answer: “retry with an idempotency key the server dedupes by; if no key, expose a status endpoint and reconcile.”
The framework-choice question. Common at companies with polyglot stacks (Uber, Google, Coinbase). “Why gRPC over REST here?” The right answer cites payload size, streaming, schema discipline, or codegen — not “it’s faster” which isn’t always true.

More common at infrastructure-leaning companies and at any platform/SRE-track loop. Frontend-leaning loops will use REST/GraphQL framing and ask roughly the same questions in different vocabulary.