Remote Procedure Calls (RPC)

Network abstractions over function-call semantics: gRPC, Thrift, REST-as-RPC, the leaks they hide and don't.

Concept Foundational
4 min read
rpc networking api-design

Summary#

A Remote Procedure Call dresses a network round-trip in the syntax of a function call. The promise — user.get(id) looks the same whether it’s local or remote — is a useful illusion, until it isn’t. Every serious distributed-systems failure mode (partial failure, timeout ambiguity, retries, ordering) lives in the gap between the illusion and the wire.

Why it matters#

RPC is the default vocabulary for service-to-service communication in modern systems, so the interviewer assumes you understand its leaks. The leaks aren’t bugs; they’re the eight fallacies of distributed computing (the network is reliable, latency is zero, bandwidth is infinite…) restated as a critique of the abstraction itself.

If you can name which specific guarantees of a local function call don’t carry across the wire — and what your RPC framework does about each — you’ve already cleared the bar for an SDE-2 system-design loop.

How it works#

Every RPC framework, regardless of branding, is three layers stacked:

  1. Interface Definition Language (IDL). A .proto (gRPC), .thrift (Thrift), or OpenAPI schema. Describes services, methods, request/response types. Generates client and server stubs in N languages.
  2. Serialization format. Protobuf, Thrift binary, JSON, MessagePack, Avro. Determines wire size, schema evolution rules, and CPU cost of encode/decode.
  3. Transport. HTTP/1.1, HTTP/2 (gRPC’s default), QUIC/HTTP/3, raw TCP. Determines multiplexing, head-of-line blocking, and what middleware (proxies, load balancers, observability) can inspect.

A call on the wire is: encode arguments → send over transport → server decodes → executes handler → encodes response → returns. The client stub hides this behind a method signature. Generated stubs handle connection pooling, retries (sometimes), deadlines, and metadata propagation.

The four guarantees that don’t carry across the wire#

  • No partial failure locally. A function call either returns or throws. A network call can also time out without a verdict — the server may have executed, may have crashed mid-execution, may be slow. This single fact drives every retry, idempotency, and deduplication discussion.
  • No latency variance locally. In-process calls are nanoseconds; an RPC over a load balancer in the same region is 1–5 ms p50, 50–500 ms p99 on a bad day. Latency tails are not optional knowledge.
  • No bandwidth limit locally. RPC payloads have to fit a real MTU and a real cross-region bandwidth budget. Designs that pass 10 MB blobs as RPC arguments will be drilled.
  • No version skew locally. Client and server are deployed independently, so the IDL must evolve in compatible ways. Adding a required field is a deploy-order trap.

Variants and trade-offs#

gRPC / Thrift (binary, schema-first) — small payloads, fast codec, strict schema evolution rules. Streaming first-class. Browser support is awkward (needs gRPC-Web proxy). Hardest to debug in a curl/log workflow.
REST + JSON (resource-oriented, schema-optional) — readable on the wire, every tool speaks it, easy to evolve loosely. Payloads are 3–10x larger; no native streaming; clients hand-roll method signatures.

REST-as-RPC is the common middle ground: the wire format is HTTP+JSON, but the API is verb-shaped (POST /orders/cancel) rather than resource-shaped. Most “REST APIs” in production are this — and that’s fine; the dogmatic resource model rarely pays off.

GraphQL is RPC-shaped from the client’s perspective (queries are typed function calls returning typed data), but pushes selection of fields to the caller. Useful for many-clients / one-backend; over-engineered for service-to-service.

Streaming RPC matters when responses are inherently sequential (LLM token streams, real-time location, log tails). HTTP/2 server-streaming and bidi-streaming are gRPC’s killer feature versus REST.

Why HTTP/2 specifically

HTTP/1.1 has head-of-line blocking per connection — one slow response blocks the queue behind it. HTTP/2 multiplexes streams over a single TCP connection, so an RPC framework can run thousands of concurrent calls over one socket. The catch: TCP-level head-of-line blocking still exists (a dropped packet stalls all streams), which is why HTTP/3 / QUIC over UDP is the next step.

When this is asked in interviews#

Three flavors show up:

  1. The interface step (Step 3 of the walk-through). “Define the API.” A senior candidate uses an IDL-shape sketch — method, args, return, error type, idempotency, deadlines — not a vague “POST /foo”.
  2. The retry / idempotency drill. “Your createOrder RPC times out. What do you do?” Wrong answer: “retry”. Right answer: “retry with an idempotency key the server dedupes by; if no key, expose a status endpoint and reconcile.”
  3. The framework-choice question. Common at companies with polyglot stacks (Uber, Google, Coinbase). “Why gRPC over REST here?” The right answer cites payload size, streaming, schema discipline, or codegen — not “it’s faster” which isn’t always true.

More common at infrastructure-leaning companies and at any platform/SRE-track loop. Frontend-leaning loops will use REST/GraphQL framing and ask roughly the same questions in different vocabulary.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.