Distributed Logging

Log shipping, structured fields, aggregation, retention tiers, search-on-logs vs metrics.

Building Block Intermediate
7 min read
logging observability log-aggregation
Companies this resembles: Fluent Bit · Loki · Splunk · Elasticsearch · Vector

Use cases#

A distributed logging system collects, transports, indexes, stores, and queries log events from every service in your fleet. Distinct from Distributed Monitoring (metrics) in that logs preserve full per-event detail rather than pre-aggregating. The dominant use cases:

  • Production debugging — “what did request 87a3 do?” The log line is the ground truth.
  • Audit and compliance — security events, access logs, payment events; SOC 2, HIPAA, PCI require log retention with integrity.
  • Postmortem reconstruction — sequence of events leading to an outage. Stitching together logs across services is often the only way to figure out cascade failures.
  • Real-time alerting on patterns — alert if any service logs OutOfMemoryError 5 times in 1 min.
  • Analytics on application events — feeding logs into BigQuery / Snowflake for product analytics.

Functional requirements#

  • Each service emits structured (JSON, logfmt) or unstructured logs.
  • A collector tails files / stdout / journald per host and ships to a central pipeline.
  • The pipeline enriches (add host, region, service, version), filters (drop noisy lines), and routes to one or more sinks.
  • Storage layer holds logs for a retention window with searchable indexing.
  • Query layer lets engineers grep / filter / aggregate by service, time, severity, custom fields.

Non-functional requirements#

  • Ingest throughput: a thousand-host system emits 100k-1M log lines/sec. The pipeline must absorb without dropping.
  • Storage: at ~500 bytes/line and 1M lines/sec, that’s 43 TB/day raw. Compression and tiering bring it down 5-20×.
  • Query latency: ad-hoc query p95 under 10 s on a day’s worth of logs.
  • Reliability: must survive collector restart, broker outage, pipeline backpressure without losing logs. The on-disk buffer is non-negotiable.
  • End-to-end delay: production trail under 30 s from log line emit to searchable. Debug-friendly.

High-level design#

host pipeline storage / query
───────── ────────────── ─────────────────────
app stdout/file ─┐
syslog ─┼──> Fluent Bit / Vector / Filebeat (per host)
containers ─┘ │ tail, parse, enrich, buffer to disk
Kafka / Kinesis / Pulsar (durable transport)
Processors (parse, route, redact, sample)
┌───────────┼────────────┬──────────────┐
▼ ▼ ▼ ▼
Loki OpenSearch ClickHouse S3 (cold storage)
(hot tier) (hot tier) (analytics) (archive)
Grafana / Kibana / SQL

Three logical stages: collect (per-host agent), transport (durable queue), store + query (indexed backend + archive).

Detailed design#

Structured logging#

A log line carrying a free-form string is hard to query; a log line carrying structured fields is queryable forever:

{
"ts":"2024-05-16T10:00:00Z",
"level":"error",
"service":"checkout",
"trace_id":"abc123",
"user_id":42,
"duration_ms":342,
"route":"/api/v1/orders",
"msg":"failed to charge card",
"err":"card_declined"
}

Every modern logging library emits structured logs (zap, slog, winston, pino, structlog). The pipeline preserves the structure end-to-end so queries become service:checkout AND level:error AND err:card_declined.

Collection: tail and ship#

Per-host agents (Fluent Bit, Vector, Filebeat, Datadog Agent, Promtail) tail log files and / or stdout. Modern containers route stdout to a per-container file on the host; the agent tails them all. Kubernetes’ canonical pattern: a DaemonSet runs one agent per node.

Critical features:

  • On-disk buffer with checkpoint, so a pipeline outage doesn’t lose logs.
  • Backpressure — slow down or drop low-priority logs when downstream is congested.
  • At-least-once delivery to the transport — the application has already committed the event by writing it to disk.

Transport: durable queue#

Kafka, Kinesis, or Pulsar between the collector and the storage tier serve three purposes:

  • Buffering — absorbs ingestion spikes without overloading storage.
  • Multi-consumer fan-out — the same log stream feeds the hot search index, the cold S3 archive, and any number of downstream analytics jobs.
  • Replay — re-process the last hour of logs after fixing a parser bug.

For low-volume systems (small startup), the agent can ship directly to the backend; Kafka is justified once you’re in the 50k+ events/sec range.

Storage and indexing#

Two camps, paralleling the metrics-side debate:

Fully-indexed (Elasticsearch / OpenSearch / Splunk) — every field is searchable, aggregations are fast, query language is rich. Storage cost is 3-5× the raw log size; ingest tops out at ~100k events/sec/node.
Label-indexed (Loki, ClickHouse) — only a small set of labels (service, host, level) are indexed. Raw logs sit in object storage and are scanned by time-and-label. ~80% cost reduction; full-text search is a linear scan over a small window.

The Loki bet: most queries narrow by service and time first, so label-indexing wins. The OpenSearch / Splunk bet: free-form ad-hoc queries dominate, so full indexing pays for itself.

ClickHouse occupies a third position — columnar storage with native SQL — and is gaining ground (Cloudflare’s logging stack runs on it).

Retention tiering#

Hot (0-7 days) — indexed, fast queries. Loki, OpenSearch. SSDs.
Warm (7-30 days) — searchable but slower; bigger blocks, less compute.
Cold (30+ days) — archived raw logs in S3. Query via Athena / restore.
Frozen (compliance) — write-once, sealed, decades.

The 80/20: 95% of queries hit the last 24 hours. Spending hot-storage money on year-old logs is the most common cost mistake.

Sampling and filtering#

Not every log line is worth keeping forever. Strategies:

  • Drop noisy lines at the collector — health-check logs, debug-level lines in prod.
  • Sample by trace — keep all spans for a sampled trace; drop unsampled ones. Same heuristics as Distributed Monitoring traces.
  • Rate-limit per service — a runaway service that logs 100k lines/sec must not blow up the whole pipeline.
  • PII redaction — regex-strip credit cards, SSNs, emails. Defense-in-depth (SDK + pipeline + storage).

Correlation with traces and metrics#

Logs become exponentially more useful when correlated:

  • Trace ID injection — every log line in a request carries the trace_id. Click a slow trace → jump to its logs.
  • Service-name standardization — same label scheme across logs, metrics, traces.
  • Common dashboards — Grafana panels show metric, logs, and traces side-by-side.

This is the OpenTelemetry pitch and the modern observability default.

Trade-offs#

Other axes:

  • Push vs pull collection — push (agent → broker) is the dominant pattern; pull (Prometheus-style scrape) doesn’t really apply to logs since they’re discrete events, not gauges.
  • JSON vs binary log format — JSON is universal but verbose; binary (Protobuf, MessagePack) is 3-5× smaller. Most pipelines use JSON for portability and rely on compression.
  • Self-hosted vs SaaS — Loki + Grafana + S3 is cheap to run but operationally significant. Datadog / Splunk / Sumo Logic are turnkey but bill per-GB at high markup.
  • Append-only vs indexed-mutable — append-only is faster ingestion; indexed-mutable supports update / delete (rarely needed for logs except for GDPR right-to-erasure).

Real-world examples#

  • Loki + Grafana + Promtail — the open low-cost stack. Cloudflare, GitLab, and many cost-conscious teams run this.
  • OpenSearch / Elasticsearch + Kibana + Filebeat — the rich-query stack. AWS, Netflix Mantis, and most enterprise log analytics.
  • Splunk — the enterprise default for the last 20 years; powerful query language (SPL), expensive.
  • Datadog Logs — agent + hosted backend + correlation with metrics and APM. Heavy at YC and fintech shops.
  • Cloudflare — internally migrated to ClickHouse; ingests trillions of events/day across edge fleet.
  • Vector (Datadog open source) — the fastest collector in the field; written in Rust.
  • Honeycomb — collapses logs and traces into wide events; the unification thesis.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.