AFS — Andrew File System

Whole-file caching, callbacks for invalidation, last-writer-wins, and how AFS achieved campus-scale before the cloud.

System Advanced
8 min read
distributed-fs caching callbacks afs
Companies this resembles: Carnegie Mellon University

What it is#

The Andrew File System (AFS) is a distributed file system designed at Carnegie Mellon in the 1980s to give every workstation on campus a uniform global view of files, with performance close to local-disk and consistency strong enough that students wouldn’t notice they were sharing a server with thousands of others. Where NFS chose a stateless server and per-block client caching, AFS chose a stateful server with whole-file client caching and explicit callbacks to invalidate stale copies. The bet paid off: AFS scaled to tens of thousands of clients per server cluster at a time when NFS topped out at hundreds, and the callback model became the template for distributed-cache invalidation in everything from CDN purges to Spanner’s lease-based reads.

The version that mattered in practice is AFSv2 — version 1 had a path-walking-per-call design that didn’t scale, and the team rewrote it. The result is the system that ran most of CMU through the late 80s and 90s, the OpenAFS code in production today, and the design pattern most distributed-cache engineers still reach for.

Architecture overview#

AFS is a client-server design with three pieces:

┌──────────────────────┐
│ Vice (servers) │
│ ┌────────────────┐ │
│ │ file storage │ │
│ │ + callbacks │ │
│ └────────────────┘ │
└──────────┬───────────┘
│ (TCP/UDP)
┌────────────────┼────────────────┐
│ │ │
┌───────▼──────┐ ┌───────▼──────┐ ┌──────▼───────┐
│ Venus │ │ Venus │ │ Venus │
│ (client │ │ (client │ │ (client │
│ cache mgr) │ │ cache mgr) │ │ cache mgr) │
│ │ │ │ │ │
│ on-disk │ │ on-disk │ │ on-disk │
│ chunk cache │ │ chunk cache │ │ chunk cache │
└──────┬───────┘ └──────────────┘ └──────────────┘
┌──────▼───────┐
│ user process │
│ (open/read/ │
│ write/close)│
└──────────────┘
  • Vice — the server-side. Stores files in volumes (collections of related files that can be moved between servers as a unit). Holds the authoritative copy and tracks which clients have cached which files.
  • Venus — the client-side cache manager. Lives between the user process and the kernel VFS. Holds a local disk cache (not just memory) of whole files; satisfies reads and writes from the cache when possible.
  • Volumes — the unit of replication, backup, and quota. A volume can be replicated read-only across multiple Vice servers, which is how AFS scales reads.

The whole system runs on a kerberised RPC channel — authentication is per-user, not per-host, which mattered enormously on shared workstations.

Protocol and statelessness#

The crucial difference from NFS: AFS is stateful. Each Vice server tracks the set of files that each client has cached, along with a callback promise — a commitment to notify the client if any of those cached files change. The promise is a soft contract (it’s lost on server crash) but inside its lifetime it lets clients trust their cache without per-operation revalidation.

The interaction model:

  1. Client opens a file. Venus fetches the whole file from Vice and writes it to the local disk cache. Vice records the callback promise.
  2. Subsequent reads and writes go to the local cached copy — at local-disk speed, no network involved.
  3. On close, if the file was modified, Venus writes the whole file back to Vice. Vice updates the master copy and breaks the callback on every other client that had this file cached.
  4. Those other clients, on their next open, see that the callback is broken, refetch the file from Vice, and continue.

The protocol exchanges are coarse — entire files at open/close — but the frequency is tiny. A typical workload runs minutes between server interactions instead of seconds.

Caching and consistency#

AFS’s consistency model is session semantics: changes to a file become visible to other clients only when the writer closes the file. If two clients have the same file open and both write, the last one to close wins; the earlier writer’s changes are silently overwritten on the server.

Compared to NFS’s per-block “close-to-open” consistency with periodic revalidation, AFS is both less consistent (the writer’s intermediate writes are invisible until close) and more consistent (after close, every cache holding the old file is guaranteed-invalidated, not just possibly-revalidated).

The cache is on disk, not just in memory. A 100MB on-disk cache could keep an entire user’s working set local across reboots — a Venus restart preserves the cache and just re-validates it against Vice.

Crash recovery#

The trade-off for statefulness: recovery is harder. When Vice crashes, it loses the callback table — it no longer knows which clients have which files cached.

The protocol handles this cleanly:

  • After a Vice restart, it broadcasts (or each client polls on its next operation) that callbacks have been lost.
  • Each Venus then revalidates every cached file’s version vector before trusting its cache.
  • For most caches this is fast — most files haven’t changed — and the warm-up cost is bounded by cache size, not by request rate.

When a Venus crashes, the on-disk cache survives. After Venus restarts, it presents the cached files to Vice for validation; any files the server’s callback table has moved past are re-fetched.

The relationship between recovery cost and cache size means AFS deployments traditionally capped per-client caches at ~100MB-1GB and provisioned for periodic full-revalidation storms.

Operational characteristics#

What made AFS deployable at campus scale:

  • Whole-volume migration. A volume could be moved between Vice servers transparently — clients with active callbacks were notified, and they re-resolved the volume location through the Volume Location Database. Storage rebalancing, server retirement, and capacity expansion all went through this mechanism.
  • Read-only replication. Stable volumes (binaries, course materials) were replicated across multiple Vice servers. Clients could read from any replica; writes to a read-only volume required a “release” operation that fanned the update out.
  • Per-user Kerberos auth. Every operation carried the user’s identity, not the workstation’s. ACLs were per-directory, not per-file (a small simplification with huge UX impact).
  • @sys directory substitution. Paths could include @sys which the client substituted with its architecture string — bin/@sys/bash would resolve to bin/i386_linux26/bash on one machine and bin/sparc_solaris57/bash on another. One global namespace, multiple binary architectures.

In numbers: a single Vice server cluster could serve tens of thousands of clients (CMU operated one at this scale through the 90s) where comparable NFS deployments capped out around a hundred clients per server.

Trade-offs and gotchas#

AFS strengths. Scales by orders of magnitude over NFS for read-heavy workloads. Per-user auth. On-disk cache survives reboots. Operational primitives (volumes, replication, migration) match a real campus operations team’s needs.
AFS weaknesses. Session-semantics consistency model isn’t POSIX. Write-back on close costs whole-file bandwidth even on a one-byte change. Recovery storms after server restart. The protocol assumed a trusted network — modern deployments require IPsec or VPN to bridge to it safely.

Recurring operational gotchas:

  • Whole-file writes are bad for huge files. Editing a 4GB log file means uploading 4GB on close. Production AFS deployments forbid this pattern.
  • Concurrent writers lose data silently. A pair of users editing the same shared file in turn each see the other’s changes vanish on the second close. Document conventions had to compensate.
  • Cache-poisoning by other users is impossible (callbacks are per-cache-copy) but cache exhaustion is real. Heavy users running find over /afs could push everyone else’s working set out of cache.
  • Volume operations are not free. Releasing a 50GB read-only volume to 10 replicas is a 500GB transfer that can saturate the storage network during business hours.
  • The auth model assumes Kerberos. OpenAFS retains the model but the operational burden of running a Kerberos realm is the dominant reason AFS adoption shrank.
Where AFS's ideas live now

AFS itself is mostly a CMU artefact and OpenAFS niche. But the pattern — whole-object caching with explicit callback-based invalidation — became the substrate of every modern distributed cache: CDNs push purge messages when origin content changes; Spanner uses lease-based reads where the leader notifies clients when leases expire; HTTP’s If-None-Match + ETag is a polled version of the same idea; modern content-delivery systems (Cloudflare, Fastly, Akamai) all run AFS-shaped invalidation pipelines at internet scale. If you ever debug a cache-staleness incident, you’re working inside AFS’s design space.

Most of what AFS got right in the 80s is now received wisdom: cache aggressively, invalidate explicitly, treat the cache as on-disk rather than in-memory, separate the namespace location service from the data servers. The system that doesn’t scale today is the one without those properties.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.