Securing APIs Using Input Validation

Validate at the boundary, reject early, normalise once. SQL/NoSQL injection, type confusion, the OWASP API top-10.

Building Block Foundational
9 min read
validation security owasp injection schema

What it is#

Input validation is the discipline of rejecting any request that does not match the contract before the request reaches business logic, database calls, or downstream services. A validator checks types, shapes, ranges, formats, and invariants — and rejects anything that fails with a deterministic error.

The principle that does the work is boundary validation: every byte that enters the system from an untrusted source is parsed, validated, and normalised once at the entry point. Internal code then operates on trusted typed objects. The boundary is where the trust transition happens; below it, code can be paranoid; above it, code can be ergonomic.

Validation appears explicitly in eight of the OWASP API Security Top 10 (2023): broken object-level authorization (API1), broken authentication (API2), broken object-property-level authorization (API3), unrestricted resource consumption (API4), broken function-level authorization (API5), unrestricted access to sensitive business flows (API6), server-side request forgery (API7), and improper inventory management (API9). Almost every API CVE in the last decade traces back to a validation gap — the input was trusted somewhere it should not have been.

When to use it#

Always. The harder question is where to validate:

  • At the API gateway / edge — coarse checks: size limits, rate limits, header validation, basic schema shape. Reject malformed requests before they consume backend resources.
  • At the handler entry — strict schema validation against a contract (OpenAPI, JSON Schema, Pydantic model, Zod schema). Reject any request whose body, headers, query, or path parameters don’t match. This is the boundary.
  • At the persistence layer — invariant checks (foreign key existence, uniqueness). Always present, even when the upstream layer also checked, because the database is the source of truth.

The same value should be validated once per trust boundary, not at every layer. Re-validating identical invariants at every function call wastes CPU and creates inconsistent rejection behaviour. Validate at the boundary; trust below it.

How it works#

Validation has four steps. Every well-designed handler walks them in this order.

1. Parse with a strict parser#

Use a parser that rejects malformed input with a clear error. JSON parsers should reject duplicate keys (the JSON RFC permits them; many parsers silently take the last value, which is a security bug). YAML parsers should use the core schema (no implicit type coercions). XML parsers must have external entity expansion disabled (XXE) and DTD processing disabled (billion-laughs DoS).

2. Validate against a typed schema#

The schema declares: every required field, every optional field with default, every type, every constraint (min, max, regex, enum). The validator rejects anything that does not conform. Output: a typed object the rest of the handler can trust.

The best validators in 2026:

  • Python — Pydantic v2 (or attrs with validators).
  • Gogo-playground/validator (struct tags) or schema-first via Protobuf / OpenAPI codegen.
  • Node / TypeScript — Zod, Valibot, or AJV against a JSON Schema.
  • Java — Bean Validation (@Valid + @NotNull, @Size, @Pattern).
  • Rustserde with validator crate.

3. Normalise to canonical form#

Once validated, normalise: lowercase email addresses, strip leading/trailing whitespace from strings (or reject them), normalise Unicode (NFKC), canonicalise URLs (lowercase host, remove default port, decode percent-escapes once). Two equivalent inputs should produce one canonical value before any downstream code sees them.

This is where many injection bugs hide: validation accepts Bob and bob, then downstream code treats them as different users. Normalise.

4. Pass typed objects, not strings#

Below the boundary, code receives Email, OrderID, UserID — not str. Database calls use parameterised queries (SELECT * FROM users WHERE id = $1), never string interpolation. Subprocess calls use argument lists (subprocess.run(["ls", path])), never shell strings. HTTP calls use libraries that escape URL components.

The combination — typed objects + parameterised everything — closes most injection attack surfaces by construction.

Validating a request payload in three languages#

A canonical example: a POST /users handler that accepts a JSON body with email, display_name, and age, validates strictly, and either returns a typed object or a 400 Bad Request.

Pydantic v2 — Python
from pydantic import BaseModel, EmailStr, Field, ValidationError
from fastapi import FastAPI, HTTPException
app = FastAPI()
class CreateUserRequest(BaseModel):
email: EmailStr # RFC 5322
display_name: str = Field(min_length=1, max_length=64,
pattern=r"^[\w\s\-']+$")
age: int = Field(ge=13, le=120) # COPPA floor
# extra="forbid" rejects unknown fields — anti-mass-assignment.
model_config = {"extra": "forbid"}
@app.post("/users")
def create_user(req: CreateUserRequest):
# req is typed, validated, normalised. Downstream code trusts it.
normalised_email = req.email.lower().strip()
# Use parameterised queries — never f"... WHERE email = '{normalised_email}'"
return {"status": "created"}

The three handlers do the same thing: parse, validate, reject early, pass typed objects downward. Every public API endpoint has roughly this shape.

Variants#

The categories of injection that input validation defends against, with the canonical fix in each case:

CategoryExamplePrimary defence
SQL injection' OR 1=1 -- in a username field.Parameterised queries. Never concatenate strings into SQL.
NoSQL injection{"$gt": ""} as a password value in MongoDB.Validate that the value is a string. Reject any object shape where a scalar was expected.
Command injection; rm -rf / appended to a filename.subprocess.run(["cmd", arg]) with an argument list — never shell=True.
Path traversal../../../etc/passwd as a filename.Canonicalise the path; reject any resolved path that escapes the allowed root.
XSS reflection<script> echoed into a response.Output encoding at the rendering layer + a strict CSP.
XXE (XML)<!ENTITY xxe SYSTEM "file:///etc/passwd">.Disable external-entity resolution in the XML parser.
SSRFA user-supplied URL the server fetches: http://169.254.169.254/....Allowlist destinations, reject private IP ranges, disable redirects on the fetcher.
Header injection\r\n in a value used to construct an outgoing header.Reject control characters in any value used in a header.
Mass assignmentA request that sets is_admin: true on a PATCH /users/me.Strict schema: reject unknown fields. Never User.update(**request.json).
Type confusion"42" instead of 42 in a numeric field.Strict type validation; reject when type doesn’t match.

Trade-offs#

What strict input validation buys you:

  • Closes the largest class of API vulnerabilities at the source. Most CVE-rated API bugs trace to a validation gap.
  • Cheap to add up front, expensive to retrofit. Validation belongs in the same commit as the endpoint.
  • Doubles as API documentation. A Pydantic / Zod / Protobuf schema is a contract that humans and machines can both read.
  • Catches bugs that aren’t security bugs. A type-confused "42" instead of 42 corrupts data the same way an attacker’s input would.

What it costs:

  • Friction with permissive clients. Strict validation rejects payloads that older clients send “loosely”. Roll out with logging-only first, then with strict rejection.
  • Schema-drift maintenance. When the schema and the code disagree, the schema is wrong and silently rejects valid requests. CI should test the schema against representative payloads.
  • Performance at high QPS. Validation has a CPU cost — usually < 1% of handler time, occasionally meaningful for ultra-low-latency endpoints. Pre-compile regexes, reuse validator instances.

Common pitfalls#

  • Denylist filtering. Trying to strip <script>, ' OR , .. from input. Always incomplete; bypasses appear monthly. Use allowlists.
  • Validating only at the gateway. The gateway can catch malformed JSON but does not know the business invariants. The handler must validate too.
  • Trusting Content-Type. A client can send Content-Type: application/json with arbitrary bytes. Parse strictly, do not trust the header alone.
  • No body-size cap. Without Content-Length enforcement, an attacker sends a 4GB JSON document and your process OOMs. Cap the body at the smallest size that fits real traffic plus a margin.
  • Re-encoding without re-validating. Validation runs on the raw input; if the handler then base64-decodes or URL-decodes that input, the decoded value must be re-validated.
  • Ignoring header validation. Headers are inputs too. Authorization, X-Forwarded-For, Host — all can be attacker-controlled. Validate them like body fields.
  • Treating “internal” calls as trusted. Service-mesh requests are still untrusted by default — a compromised service can call another. mTLS + validation, not “we’re inside the VPC so it’s fine”.
  • Accepting unbounded arrays. tags: [...] should always have a maximum length. Without it, a single request can pin a CPU for minutes.
  • Missing rate-limit-as-validation. A request that passes type validation but is the 10,000th from the same key in a second is still an attack. Validation and rate limiting are complementary, not alternatives.
  • Leaking validation details in errors. “Field internal_user_token failed regex ^[a-f0-9]{64}$” tells an attacker the field name and format. Return generic 400 Bad Request to external clients; log the detail server-side.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.