Sequencer

Globally-ordered unique IDs with causality. Snowflake, TrueTime, hybrid logical clocks.

Building Block Advanced
6 min read
distributed-systems ids ordering
Companies this resembles: Twitter Snowflake · Google Spanner · Instagram · MongoDB

Use cases#

A sequencer produces monotonically increasing, globally unique identifiers. Three pressing reasons distributed systems need one:

  • Primary keys at scale — auto-incrementing IDs require a coordinator; in a sharded system, every shard rolls its own and collides. A sequencer issues IDs that are unique without coordination.
  • Causal ordering — “did A happen before B?” answered without a single physical clock. Critical for replicated logs, conflict resolution, transaction ordering.
  • Time-ordered scans — IDs that sort by approximate creation time make timeline / feed / log queries efficient (range scans instead of secondary indexes).

The boundary case: snowflake IDs let you embed a timestamp in the ID, so SELECT * FROM tweets WHERE id > <yesterday> becomes a cheap range scan.

Functional requirements#

  • Produce unique 64-bit (or 128-bit) IDs at high throughput.
  • IDs are roughly monotonic — newer > older for typical comparisons.
  • Generation works during network partitions; no coordinator on the hot path.
  • Optionally encode useful metadata (timestamp, generator ID, type).

Non-functional requirements#

  • Latency: ID generation must be local — under 1 µs. Anything that requires a network round-trip is unusable in OLTP.
  • Throughput: 10k+ IDs/sec per generator; cluster-wide 100M+/sec.
  • Uniqueness: zero collisions, even under clock skew, restart, or worker reassignment.
  • Causality: if event A causally precedes B, id(A) < id(B) (or at least id(A) <= id(B) with HLC).

High-level design#

worker 1: [ 41-bit timestamp ms ][ 10-bit worker ID ][ 12-bit sequence ]
worker 2: [ 41-bit timestamp ms ][ 10-bit worker ID ][ 12-bit sequence ]
worker N: [ 41-bit timestamp ms ][ 10-bit worker ID ][ 12-bit sequence ]
▲ ▲ ▲
│ │ │
wall-clock assigned at per-ms counter
(epoch-relative) boot, unique resets each ms
every worker generates IDs locally with no coordination

Each worker has a unique ID (assigned via ZooKeeper, Consul, or static config). The timestamp gives rough ordering; the worker ID guarantees uniqueness across workers; the per-ms sequence breaks ties within a single millisecond.

Detailed design#

Snowflake ID layout (Twitter’s original)#

sign | timestamp (41 bits) | worker (5) | datacenter (5) | sequence (12)
1 | ~69 years from epoch | 32 dc | 32 worker | 4096/ms/worker
  • 41-bit timestamp is ms since a custom epoch. ~69 years before rollover.
  • 5-bit datacenter + 5-bit worker allows 32×32 = 1024 generator instances.
  • 12-bit sequence gives 4096 IDs per worker per millisecond — over 4M IDs/sec/worker.

When sequence overflows, busy-wait until the next millisecond.

class Snowflake {
long lastMs = 0;
int seq = 0;
int workerId;
long nextId() {
long now = currentMs();
if (now == lastMs) {
seq = (seq + 1) & 0xFFF;
if (seq == 0) now = tilNextMs(lastMs);
} else {
seq = 0;
}
lastMs = now;
return (now << 22) | (workerId << 12) | seq;
}
};

Clock skew is the central problem#

If a worker’s clock moves backwards (NTP step, VM pause, reboot), it might issue an ID smaller than one it already issued. Mitigations:

  • Refuse to issue IDs while now < lastMs (busy-wait, error after timeout).
  • Use monotonic clocks (CLOCK_MONOTONIC) for the elapsed component, with wall-clock only at startup.
  • Persist lastMs and refuse to start below it.

Hybrid Logical Clocks (HLC)#

When wall-clock isn’t trustworthy and Lamport clocks (pure counters) lose physical-time meaning, HLCs combine both:

HLC = (physical_ts, logical_counter)
on local event:
pt = max(physical_now, hlc.pt)
if pt == hlc.pt: hlc.lc += 1
else: hlc.pt = pt; hlc.lc = 0
on receive(msg.hlc):
pt = max(hlc.pt, msg.hlc.pt, physical_now)
if pt == hlc.pt == msg.hlc.pt: hlc.lc = max(hlc.lc, msg.hlc.lc) + 1
elif pt == hlc.pt: hlc.lc += 1
elif pt == msg.hlc.pt: hlc.lc = msg.hlc.lc + 1
else: hlc.lc = 0
hlc.pt = pt

HLCs preserve causality (any happens-before edge in the system gives a strictly increasing HLC) while staying close to wall-clock time. CockroachDB and YugabyteDB use them internally for MVCC timestamps.

Spanner’s TrueTime#

Google solved the problem with hardware: every datacenter has GPS receivers and atomic clocks; the API returns a time interval [earliest, latest] rather than a point. Spanner waits out the uncertainty bound (~7 ms) before committing, guaranteeing externally consistent ordering across the entire planet. Cost: GPS + atomic clocks in every DC. Beautiful, but Google-only for now.

Range allocation (cheap alternative)#

A coordinator assigns 1000-ID blocks to each worker. Workers issue IDs locally from their block; when exhausted, request a new block. Pros: no clock dependency. Cons: gaps in the ID space after worker death, coordinator on the cold path.

MongoDB’s ObjectId and Flickr’s “ticket server” both use variants of this.

UUIDv7#

A newer standard (RFC 9562) — 128-bit ID with a 48-bit ms timestamp prefix and 74 random bits. Sorts by time, no coordinator needed, no clock-skew collision risk (random bits dwarf workers). Drop-in for snowflake when you don’t care about a worker-ID dimension.

Trade-offs#

Snowflake / time-prefixed — 64-bit, sorts by time, embeds metadata, requires careful worker-ID assignment and clock discipline. The dominant choice for new systems.
UUIDv4 (random) — 128-bit, no coordinator, no clock, never collides in practice. Costs 2× index space and destroys time-locality on disk (every insert hits a random B-tree page).

Other axes:

  • Coordinated (Flickr ticket server) vs uncoordinated (Snowflake) — coordinated is simpler to reason about but has a tight failure mode (coordinator down → no new IDs). Uncoordinated is more available but harder to debug.
  • Strict total order vs causal order — Lamport / HLC give causal; TrueTime gives external; Snowflake gives “approximately monotonic”. Pick by what your system actually needs.
  • 64-bit vs 128-bit — 64-bit fits in registers, half the index space, sorts faster. 128-bit (UUID) is collision-proof under any clock skew, friendlier for systems that share IDs across organizational boundaries.

Real-world examples#

  • Twitter Snowflake — the canonical design (2010). Powers tweet IDs, DM IDs, and most Twitter primary keys.
  • Instagram — variant with (timestamp, shard_id, per_shard_sequence) — encodes which Postgres shard owns the row directly in the ID.
  • Discord — Snowflake IDs for messages, channels, users. Their time-bucketed Cassandra schema partitions on (channel_id, time_bucket_from_id).
  • Spanner — TrueTime-based commit timestamps; powers Google Ads, Gmail, Drive.
  • CockroachDB / YugabyteDB — HLC for MVCC; chosen so they can run on commodity hardware without atomic clocks.
  • MongoDBObjectId is (4-byte timestamp, 5-byte random, 3-byte counter) — 12 bytes, mostly snowflake-style.
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.