Client-Side Error Monitoring — System Design

Use cases#

Client-side errors are the half of your stack you can’t ssh into. Three reasons to capture them:

Production bug discovery — a TypeError: undefined is not a function on a customer’s Safari/iOS happens every day; without client error monitoring, you’ll never know about it.
Release validation — a CDN deploy broke iOS 16 but not iOS 17? The error stream tells you within minutes — long before App Store reviews surface it.
Performance regressions — the same SDK that captures errors usually captures Core Web Vitals (LCP, INP, CLS), giving you UX regression detection paired with the offending build.

Distinguish from Server-Side Error Monitoring: different environment, very different constraints — adversarial network, untrusted runtime, source maps for symbolication, and a hard cap on payload size.

Functional requirements#

Capture uncaught JavaScript errors, unhandled Promise rejections, and native crashes (iOS/Android).
Capture explicit captureException(e) from app code.
Record request context: URL, browser, OS, viewport, locale, user (if identified), recent breadcrumbs.
Attach minified stack traces; resolve to source via uploaded source maps server-side.
Throttle, sample, and batch sends to avoid amplifying outages.
Survive page unloads via the sendBeacon API.

Non-functional requirements#

SDK size: under 30 KB gzipped is the practical limit before product teams start asking why error monitoring is making the page slower than the bugs it catches.
Runtime overhead: under 5 ms init, no detectable impact on Core Web Vitals.
Ingest throughput: orders of magnitude higher than server-side — every page view is a potential event source. Sentry’s docs cite ~3M errors/sec peak.
Reliability of capture: errors fire while the page is dying. Must use beacon transport and disk-buffer through to the next session if possible.

High-level design#

   browser                                ingest                           UI
  ─────────                              ────────                       ────────
  window.onerror ─┐
  unhandledrej. ──┼─> SDK ──> queue ──> sendBeacon ──> CDN edge ──> Kafka ──> processor
  React EB    ────┤   buffer           (POST navigator)   accepts/rate-limits   │
  app code    ────┘   <30 KB           up to 64 KB                              │
                      session                                          source maps applied
                      replay (optional)                                fingerprint, group
                                                                       dashboard, alerts

The SDK lives in every browser session — millions of independent edge nodes you don’t control. The ingest API has to accept events from any IP, validate them against per-DSN rate limits, and feed a server-side processor that does the heavy lifting (symbolication, grouping, enrichment).

Detailed design#

What to capture in the browser#

window.addEventListener('error', e => captureException(e.error, {
   type: 'uncaught',
   url: e.filename,
   line: e.lineno,
   col: e.colno,
}));

window.addEventListener('unhandledrejection', e =>
   captureException(e.reason, { type: 'unhandled_promise' }));

React, Vue, and Svelte expose error boundaries / onError hooks; the SDK wraps them. For native mobile, the SDK installs uncaught-exception and signal handlers (SIGSEGV, SIGABRT).

Context to include:

User agent parsed into browser, OS, device class, viewport, locale.
Page state: URL, route, document title, scroll position.
Network: effective connection type (navigator.connection.effectiveType) — useful for “this only happens on 3G”.
Breadcrumbs: last 20-50 events — clicks, navigations, console logs, XHR/fetch calls.
Memory: performance.memory if available (Chromium only).

Source maps and symbolication#

Modern frontends ship minified JS. A TypeError at a.b:1:1234 is useless without source maps. The pipeline:

Build emits app.<hash>.js and app.<hash>.js.map.
CI uploads the source maps to the error tracker keyed by release hash.
Map files are kept private — exposing them publicly leaks source code.
On ingest, the processor matches app.<hash>.js in the stack frame, fetches the map, resolves to original file/line.

Equivalent for iOS: upload dSYM files; for Android: ProGuard mappings; for Flutter: symbol files.

Beacon transport#

Errors fire during page unloads — exactly when normal fetch requests get cancelled. The fix is the Beacon API:

navigator.sendBeacon('/ingest', JSON.stringify(event));

Beacons are queued by the browser and sent in the background; they survive page navigation, tab close, and (mostly) browser crash. Max payload ~64 KB; if your event is larger, truncate breadcrumbs.

Fallback: hidden <img> with the payload encoded into the URL. Limited to a few KB but works everywhere.

Sampling#

Naive “send every error” creates a self-DDoS during a deployment that breaks every page view. Sampling strategies:

Session sampling — randomly include 10% of sessions; all errors from those sessions captured, none from the rest. Preserves per-session causality.
Event sampling — keep first N events per fingerprint per session.
Tier sampling — sample free-tier users at 1%, paid at 100%.
Performance sampling — capture all errors, but only sample 1% of regular transactions for Core Web Vitals data.

PII scrubbing#

The browser sees credit cards, passwords, SSNs, addresses. Scrub at the SDK before serialization:

SDK.config({
   denyUrls: [/password/, /credit-card/, /\/checkout/],
   scrubFields: ['password', 'token', 'ssn', 'credit_card', 'email'],
   beforeSend: (event) => redactPII(event),
});

Pattern-based scrubbing isn’t bulletproof — use server-side scrubbing as a defense-in-depth layer too. GDPR / CCPA require explicit user consent before capturing email or device fingerprint as PII.

Session replay#

Tools like LogRocket, FullStory, and Sentry’s Replay capture DOM mutations and replay them as a video. The privacy stakes are 10× higher — every form field, every modal, every visible PII element shows up in playback. Default masking on input elements is standard; explicit allow-lists for which fields can be captured plaintext are best practice.

Storage cost is also significant — ~50 KB/min of compressed session data per user is typical. Most teams sample at 1-5% of sessions.

Mobile-specific concerns#

Native apps add:

Offline buffering — capture, write to disk, send when network returns; can be hours later.
App lifecycle events — capture state at backgrounding, foregrounding, low-memory warnings.
Out-of-process crashes — iOS / Android send the OS-level crash report on next app launch; the SDK reads it from a system directory.

Trade-offs#

Heavy SDK with replay + breadcrumbs + APM — debugging gold; reconstruct exactly what the user did. 60-150 KB SDK, measurable page-load cost, big bandwidth bill, privacy risk.

Minimal SDK — error capture only — 10-20 KB, negligible cost, no privacy compromise. Errors arrive with stack and breadcrumbs but no replay; harder cases stay mysterious.

Other axes:

Per-user identification vs anonymous — identifying users speeds support but adds GDPR/CCPA obligations. The right answer depends on the product and user consent flow.
Direct ingest vs CDN proxy — direct ingest is one fewer hop; CDN proxy lets the SDK send to a same-origin URL, bypassing ad-blockers (which routinely block Sentry / Bugsnag domains).
In-process vs external session replay — in-process is precise (DOM diff); external (rrweb) is portable but heavier.

Real-world examples#

Sentry Browser SDK — open-source, ~30 KB; supports React, Vue, Svelte, Angular adapters; session replay add-on.
LogRocket — session replay first, error tracking second. Used heavily in B2B SaaS where individual customer issues warrant per-session investigation.
Datadog RUM (Real User Monitoring) — combines errors with Core Web Vitals and resource timing.
Bugsnag — strong mobile story; Android SDK handles native crashes via JNI bridges.
Crashlytics (Google) — free Android/iOS crash reporting; ~1B installs.
Microsoft Clarity — free session replay + heatmaps for ~100M websites.

Server-Side Error Monitoring — same problem on the backend, with very different constraints.
Distributed Logging — beacons feed the same downstream pipeline.
Content Delivery Network — proxying ingest through your own CDN bypasses ad-blockers.
Distributed Monitoring — RUM metrics flow into the same dashboards as backend metrics.

Use cases#

Functional requirements#

Non-functional requirements#

High-level design#

Detailed design#

What to capture in the browser#

Source maps and symbolication#

Beacon transport#

Sampling#

PII scrubbing#

Session replay#

Mobile-specific concerns#

Trade-offs#

Real-world examples#

Related building blocks#