Evolving an API Design
Backward-compatible additions, breaking-change taxonomy, deprecation timelines. How successful APIs survive a decade.
Summary#
Evolving an API is the day-to-day work of changing an interface that real clients depend on. The headline question — “v1 vs v2?” — is the wrong frame. Most changes to a successful API are not version bumps; they are additive evolutions inside the existing version. A few changes are genuinely breaking and require a multi-year deprecation. The skill is telling the two apart.
The taxonomy that matters: additive changes (new endpoints, new optional request fields, new response fields) are free and ship without ceremony. Behavioural changes (a field that used to be optional becomes required; an error code that used to be 400 becomes 422) are subtle breakages that bite clients who never read the changelog. Removals and renames (deleting a field, renaming an endpoint, changing an enum value) are genuinely breaking and need a deprecation timeline measured in quarters or years.
The cultural rule that holds it together is Postel’s principle: “Be conservative in what you send, liberal in what you accept.” Servers tolerate unknown fields in requests (so old clients sending stale shapes still work). Clients tolerate unknown fields in responses (so the server can add fields without breaking them). Together, these two halves of the principle make additive evolution actually safe.
The successful APIs — Stripe, GitHub, AWS, Twilio — share a pattern: they make almost every change additively, they have a documented deprecation policy with explicit sunset dates, they invest in telemetry to know which clients still use the deprecated thing, and they communicate sunsets through multiple channels (changelog, dashboard banner, email, API response header) for many months before the actual cut-off. None of it is glamorous; all of it is the difference between an API that lasts a decade and one that gets resented.
Why it matters#
Three reasons API evolution is the highest-stakes part of API design:
- The clients you can never reach are the loudest voices when you break them. A mobile app shipped two years ago by a developer who has since left the company is still calling your API. Their users still expect it to work. There is no upgrade path; there is no email you can send. Anything that breaks them is on you.
- The “obvious cleanup” is almost always a breaking change. Renaming
customer_idtoaccount_idfor consistency feels like a quick fix. It is a months-long migration with telemetry, deprecation headers, dashboard banners, and a multi-quarter sunset. The first time you ship one of these without that machinery, the lesson is permanent. - Compatibility is the load-bearing trust signal. Integrators choose APIs partly on documentation, partly on pricing, and largely on the felt sense of “will this still work in three years?” An API with a track record of additive-only evolution wins this on inspection of the changelog; one with a history of breaks loses it.
The senior-signal phrasing in an interview: “Every change has to fit into one of three buckets — additive, behavioural, or breaking — and the deprecation timeline is set by which bucket. We don’t ship a breaking change unless we’ve measured remaining usage, announced the sunset, and waited at least six months — usually longer.”
How it works#
The change taxonomy#
Every proposed change to an API falls into one of three categories. Putting it in the right category up front determines the rest of the rollout.
| Category | Examples | Action |
|---|---|---|
| Additive | New endpoint; new optional request field; new response field; new enum value (with documented behaviour for unknown values) | Ship it. No version bump. Document in changelog. |
| Behavioural | Optional field becomes required; error code changes; default value changes; rate-limit headers added | Deprecate the old behaviour; ship the new in parallel; sunset after telemetry shows safe |
| Breaking | Remove a field; rename an endpoint; restructure response shape; change auth model | Announce deprecation; multi-quarter overlap; eventual cut-off; possibly a v2 |
The discipline is in the categorisation. A change that looks additive on the surface is sometimes behavioural in disguise — adding a new required field to a POST request body is “additive in the schema” but “breaking for every existing client that doesn’t send it”. Walk through the change from the perspective of an old client and ask: does it still work? If yes, additive. If no, you owe a deprecation.
Additive changes — the path of least resistance#
The vast majority of API evolution should be additive. Concretely:
- New endpoints.
POST /v1/refundsjoins the API; existing endpoints are unchanged. Old clients never know. - New optional request fields. Adding
metadataas an optional field onPOST /v1/charges. Old clients that don’t send it still work; the server uses the default. - New response fields. Adding
risk_scoreto the charge response. Clients that don’t read it ignore it; clients that need it can opt in. This relies on the client respecting Postel — see below. - New optional query parameters.
GET /v1/charges?status=succeeded. Old clients that don’t pass it get the unfiltered default. - New webhook event types. Adding a
charge.refundedevent. Clients that don’t subscribe to it never receive it.
The cumulative effect is dramatic: GitHub’s v3 REST API has been in production since 2012 and has accumulated thousands of additive changes. The version number has not changed. Clients written in 2013 still work today.
Postel’s principle — what it actually means in practice#
Postel’s “be conservative in what you send, liberal in what you accept” has two halves:
- Server side: accept unknown fields in requests. If a client sends
{"amount": 4999, "currency": "usd", "unused_field": "x"}, ignoreunused_fieldand process the rest. Do not error. This means a client running an old API stub can send the full request body it knows about — including fields the new server has since deprecated. - Client side: ignore unknown fields in responses. If the server adds
risk_scoreto the response, the client deserialises into a type that ignores unknown fields. Do not error. This means the server can add response fields anytime without breaking existing clients.
Both halves are mandatory. A “strict” JSON deserialiser that rejects unknown fields on the client side turns every server-side additive change into a breaking change. A “strict” request validator on the server side that rejects unknown fields turns every client-side change into a coordination disaster.
The defaults matter:
- Python:
pydantic.BaseModelrejects unknown by default — must setmodel_config = ConfigDict(extra='ignore')for safety. - Go:
encoding/jsonignores unknown by default — safe. UseDisallowUnknownFieldsonly for input validation where stricter behaviour is needed. - Java/Jackson: rejects unknown by default — must set
@JsonIgnoreProperties(ignoreUnknown = true). - JavaScript: ignores unknown by default in plain
JSON.parse+ property access. Schema validators like Zod default to stripping unknown fields, which is also fine.
If your client SDK is enforcing strict validation by default, you have made every server-side addition a coordination event. Loosen it.
Behavioural changes — the subtle middle category#
Behavioural changes don’t change the shape of the request or response; they change the meaning. They are the most insidious category because the change is invisible to anyone reading the schema diff.
Examples:
- Error code changes.
POST /v1/chargesused to return400for “card declined”; it now returns402. The shape is unchanged; the meaning has shifted. Clients that branch on the status code now branch differently. - Optional becomes required.
currencyused to default tousd; now it must be specified explicitly. Old clients that omitted it now get400. - Default value changes.
limitonGET /v1/chargesused to default to 10; now it defaults to 25. Pagination behaviour shifts under clients that don’t pass an explicit limit. - Rate limit added. No headers, no behavioural change — until a client crosses the threshold and starts getting
429s. Clients that didn’t implement retry logic break. - New required scope. An endpoint that previously worked with
readscope now requiresread.detailed. Clients with old tokens get403.
The treatment for behavioural changes is a hybrid of additive and breaking: ship the new behaviour as opt-in (via a header, a query parameter, or a version), gather telemetry on adoption, and only make it the default after a sunset period.
Breaking changes — the multi-quarter ceremony#
Genuinely breaking changes — removals, renames, semantic flips — are unavoidable occasionally. The rollout looks the same every time.
The standard timeline (industry default — Stripe, GitHub, AWS all approximately follow this):
T+0: Announce deprecation in changelogT+0: Add Deprecation: <date> and Sunset: <date> response headersT+0: Add dashboard banner for affected accountsT+0: Ship the new behaviour as a parallel pathT+30: Send first email to affected integratorsT+60: Add inline-docs deprecation noticeT+90: Telemetry review — what % still uses the old?T+180: Second email; possibly a brown-out test (return errors 1 hour/day)T+270: Final email; banner shifts to redT+365: Cut-off — old path returns 410 GoneThe headers in question are RFC 9745 (Deprecation) and RFC 8594 (Sunset):
HTTP/1.1 200 OKDeprecation: Sun, 30 Nov 2025 23:59:59 GMTSunset: Sun, 30 Nov 2026 23:59:59 GMTLink: <https://docs.example.com/migration/charges-v2>; rel="deprecation"These let well-written client SDKs log warnings, surface them in CI, and prod the integration team toward migration.
Telemetry is the missing ingredient#
The only way to make a sunset decision responsibly is to know who is still calling the deprecated thing. The instrumentation looks like:
- Per-request: which API key, which endpoint, which version, which deprecated-field is touched. Aggregate to per-account, per-day.
- Dashboard: usage trend for the deprecated thing. Track the curve from “everyone uses it” to “single-digit accounts left”.
- Account-level contact. When usage drops below the threshold, the remaining accounts are individually emailed (not a mass blast) and given a final timeline.
Without this, “sunset on 2026-11-30” is a guess. With it, the sunset is a decision: “v1 usage is down to 0.3% of total traffic and the remaining accounts are these eight enterprise customers — we will reach out individually and target sunset in Q1 2027.”
Versioning vs deprecation — they are not the same#
A common conflation: “we shipped v2; the deprecation is just turning off v1”. This conflates two questions.
- Versioning is the placement question — where does the version number live? URL, header, date? (See
api-versioning.) - Deprecation is the lifecycle question — how long does the old thing keep working after the new thing ships?
You can have lots of deprecation without a version bump (additive evolution with periodic deprecations of old fields). You can have a version bump without much deprecation (if the new version is a clean break, e.g. GitHub v3 REST → v4 GraphQL). They cover different concerns.
Variants and trade-offs#
Strict additive policy. Never ship a breaking change. Old fields stay forever; new fields supersede them. Pro: clients trust the contract absolutely. Con: schema becomes archaeological; the API gets harder to learn over time.
Disciplined breaking changes with long sunsets. Ship breaking changes when justified, with a 12-month or longer deprecation timeline and explicit telemetry. Pro: schema stays clean; old debt is shed. Con: requires investment in telemetry, comms, and dashboard machinery.
| Decision | Choice A | Choice B |
|---|---|---|
| Default for “small change” | Additive | Behavioural change with deprecation |
| Deprecation timeline | 6 months (aggressive) | 12-24 months (typical) |
| Comms channels | Changelog only | Changelog + email + banner + headers |
| Headers | Deprecation + Sunset (RFC) | Custom headers (avoid) |
| Telemetry | Per-endpoint | Per-account-per-field (detailed) |
| Final cut-off response | 410 Gone | 404 Not Found (loses semantics) |
The senior choice: aggressive additive policy combined with the full deprecation machinery for the rare breaking changes — RFC headers, telemetry by account, 12+ months overlap, individual emails to remaining users, 410 Gone as the post-sunset response.
When this is asked in interviews#
API evolution comes up in two main places:
- As the immediate follow-up to “how would you change this design?” The first answer is “additively”. The follow-up is “and if I told you we have to remove an existing field?” — that’s the deprecation-timeline question.
- In the operational-concerns segment of any API system design. Once the design is sketched, the interviewer probes how it survives over time. The answer combines versioning (
api-versioning), evolution policy (this writeup), and operational maturity (telemetry, comms, headers).
Specific points to make:
- State the three-bucket taxonomy (additive, behavioural, breaking) and classify any proposed change.
- Name Postel’s principle and explain what both halves mean for the client SDK and the server.
- Cite the RFC headers (
Deprecationper RFC 9745,Sunsetper RFC 8594) and explain the response telemetry chain. - Give a concrete deprecation timeline with rough dates (T+0 announce, T+30 email, T+90 telemetry review, T+365 cut-off).
- Distinguish versioning from deprecation. They are different tools.
The strongest one-liner: “Most evolution is additive — new endpoints, new optional fields, new response fields. The rare breaking change gets a 12-month deprecation with Sunset headers, per-account telemetry, and individual outreach to the long tail before we cut over.”
Related concepts#
- API Versioning — the placement question (URL vs header vs date) that sits next to deprecation.
- REST — The Architectural Style — REST’s principles assume additive evolution; the resource model survives change well when you stick to additive.
- The Role of Idempotency in API Design — idempotency semantics are part of the contract; changing them is a breaking change.
- HTTP — The Foundational Protocol for APIs — the
Deprecation/Sunsetheaders and content negotiation that make graceful evolution possible. - What Is API Design? — the whole-system frame; evolution is one of the few things you cannot retrofit.