Event-Based Concurrency — Operating Systems

What it is#

Event-based concurrency is the alternative to threads. A single thread runs an event loop: it asks the kernel “which of these file descriptors are ready?” via select / poll / epoll / kqueue, dispatches each ready descriptor to its registered handler, and goes back to waiting. Handlers must not block — they react, mutate state, register the next callback, and return. There is no thread context switch, no locking, no shared-state races, because there is only one thread.

The reason this approach beat threads for high-concurrency network servers (Nginx vs. Apache prefork, Node.js vs. one-thread-per-request frameworks) is that the per-connection cost collapses. A thread per connection costs ~8 MB of stack and ~1 µs of context-switch per yield. A connection in an event loop is a small struct in a hash map and a pointer in epoll’s interest list.

When to use it#

Event-based shines when:

Lots of concurrent connections, mostly idle. Chat servers, push notification fanout, HTTP/2 multiplexing, WebSockets. 100k idle connections cost ~5 MB in epoll; in threads it would cost ~800 GB of stack.
I/O-bound work. The handler does little CPU work per event — parse a header, look up a key, hand off to a backend. CPU isn’t the bottleneck; the kernel’s I/O readiness is.
You need strict latency control. No context switches means no scheduler jitter; the handler runs to completion or it doesn’t run.
The language ecosystem already assumes it. JavaScript, Python’s asyncio, Rust’s tokio, Go’s runtime (internally), Elixir’s BEAM all build on top of an event loop.

Reach for threads instead when:

Per-event work is CPU-heavy. One slow handler blocks the entire loop. Run the CPU work in a worker pool.
You depend on a blocking library (getaddrinfo, file I/O without io_uring). Use a thread pool to wrap it.
The mental model of synchronous code matters and the contention story is simple. Sometimes threads-with-coarse-locks is just easier.

How it works#

The bare event loop#

int epfd = epoll_create1(0);
struct epoll_event ev = { .events = EPOLLIN | EPOLLET, .data.fd = listenfd };
epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev);

struct epoll_event events[128];
while (1) {
    int n = epoll_wait(epfd, events, 128, -1);
    for (int i = 0; i < n; i++) {
        int fd = events[i].data.fd;
        if (fd == listenfd) {
            int c = accept4(listenfd, NULL, NULL, SOCK_NONBLOCK);
            ev.events = EPOLLIN | EPOLLET;
            ev.data.fd = c;
            epoll_ctl(epfd, EPOLL_CTL_ADD, c, &ev);
        } else {
            handle_ready_socket(fd);
        }
    }
}

Single thread, no locks, one syscall per batch of events. Every fd is non-blocking; every read/write returns either bytes-transferred or EAGAIN and you re-arm in epoll.

Why `select` and `poll` lost#

select takes three bitmaps (read/write/except) sized by max fd; the kernel walks the entire interest set each call; you re-pass the bitmaps every call. O(N) per call where N is the highest fd. Limit of FD_SETSIZE (1024 on most systems).

poll removes the size cap and uses an array of (fd, events) pairs but is still O(N) — the kernel scans the array every call.

epoll keeps the interest set inside the kernel; epoll_wait returns only the ready fds. O(R) where R is the number of ready descriptors. For 100k connections with 50 events ready at any moment, that’s 2000x less work per loop iteration.

Edge-triggered vs. level-triggered#

Level-triggered (LT) — epoll_wait returns the fd as long as it’s readable/writable. Same semantics as poll. Easy to use; can produce extra wakeups.
Edge-triggered (ET) — epoll_wait returns the fd only on the transition from “not ready” to “ready.” You must drain it completely (read until EAGAIN) before going back to epoll, or you’ll lose the wakeup.

ET is faster (fewer redundant wakeups) but easier to bug. Nginx and most high-performance servers use ET; Node.js uses LT under the hood (libuv) for safety.

State machines as callbacks#

The cost you pay for no threads is manual state management. A synchronous handler is:

ssize_t n = read(fd, buf, sizeof buf);
process(buf, n);
write(other_fd, buf, n);

The event-based version splits into multiple callbacks, each with its own state to save and restore:

struct conn { int fd, other_fd; char buf[BUFSZ]; size_t pending; enum { READ, WRITE } phase; };

void on_event(struct conn* c) {
    if (c->phase == READ) {
        ssize_t n = read(c->fd, c->buf, sizeof c->buf);
        if (n <= 0) return;
        c->pending = n; c->phase = WRITE;
        arm_for_write(c);
    } else {
        ssize_t w = write(c->other_fd, c->buf, c->pending);
        if (w < (ssize_t)c->pending) { c->pending -= w; return; }
        c->phase = READ;
        arm_for_read(c);
    }
}

What was three lines of synchronous code becomes a state machine. Languages with async/await (JS, C#, Rust) restore the synchronous shape by generating the state machine at compile time.

Async I/O proper#

epoll is readiness notification, not async I/O — you’re told the fd is ready and then you do the read yourself. True async I/O lets you submit a read and get a completion later:

POSIX AIO (aio_read) — broken or simulated on most platforms.
Linux io_uring (2019+) — true submission/completion queues, batch syscalls, zero-copy, kernel-side buffer rings. The future of high-perf Linux I/O.
Windows IOCP — completion-port model, the canonical async I/O API on Windows.

For network I/O, epoll + non-blocking sockets is good enough. For disk I/O, only io_uring (or a thread pool calling synchronous read) gives real async semantics.

Multi-threaded event loops#

A single event loop scales to one CPU. To use N cores: run N event loops, each on its own thread, with their own epoll fd. Two layouts:

SO_REUSEPORT — each thread’s listen socket is bound to the same port; the kernel load-balances incoming connections across them. Used by Nginx, Envoy.
Acceptor + worker pool — one thread accepts and hands the connection to a worker via a queue. More work-stealing flexibility, more cross-thread synchronization.

Either way, shared state across loops is back to needing locks or message-passing. Most production systems try to keep each connection’s state on a single loop.

Variants#

Reactor pattern (epoll, kqueue)#

The pattern described above — wait for readiness, dispatch to handler. Embodied by libevent, libev, libuv, asio.

Proactor pattern (IOCP, io_uring)#

You submit the operation; the kernel notifies you when it’s complete. The handler runs with the bytes already in your buffer. Lower latency on Windows/io_uring; conceptually heavier.

Coroutines / fibers#

A coroutine is a function that can suspend and resume. Each coroutine has its own small stack; the runtime schedules them onto a small pool of OS threads. Go goroutines, Rust async tasks, Lua coroutines, C++20 coroutines. Hides the state-machine ugliness — your code looks synchronous, the compiler generates the suspension points.

Single-loop with worker pool#

Event loop on the main thread for I/O; a ThreadPool for CPU-bound or blocking work. Node.js’s libuv ships this exact split: epoll for sockets, a 4-thread pool for fs.readFile, DNS, crypto.

Frameworks#

libevent / libev / libuv — C event-loop libraries.
Boost.Asio / C++ Networking TS — C++ async I/O.
Tokio / async-std — Rust async runtimes.
asyncio / Trio — Python.
Netty — JVM-based async network framework.

Trade-offs#

Event-based — tiny per-connection footprint, no thread context switches, no locks for per-connection state, predictable latency. Cost: blocking calls break everything; CPU-heavy work blocks the loop; debugging cross-callback state requires reconstructing the implicit state machine; one bug in one handler stalls every other connection.

Thread-per-connection — synchronous code, the call stack tells you what happened, blocking calls are fine, CPU-heavy work doesn’t stall other connections. Cost: 8 MB stack per thread, ~1 µs context switch, locks everywhere shared state appears, scheduler jitter under load, OS thread limits (~32k per process).

Other tensions:

Programming model. Async/await makes event-based code read like threaded code at the cost of a compiler transformation. Without it, callback-passing is verbose.
Latency tail. Threads with preemption have predictable upper bounds (the scheduler will eventually run you). An event loop can have catastrophic tail latency if one handler runs long.
Composability with libraries. Any blocking library — most database drivers, most file APIs — needs an async wrapper. Forgetting to wrap is a classic Node.js bug.

Common pitfalls#

Blocking inside a handler. Calling synchronous getaddrinfo, reading a file with read instead of async, doing a 100ms regex — all stall every other connection. Move expensive work to a thread pool or a separate process.
Forgetting to drain in edge-triggered mode. ET fires once per readability transition. If you read 1 KB and stop because your buffer is full, you won’t get another readiness event until the kernel sees not-ready -> ready again. Loop reads until EAGAIN.
Holding state across callbacks unsafely. “I’ll save this pointer for later” — but the connection closes and the struct is freed before the callback fires. Use refcounting or explicit lifecycle tracking.
Mixing event loops across threads without care. epoll fds are not inherently thread-safe for concurrent modification; coordinate or use one loop per thread.
Per-event allocation churn. Allocating a new struct per event saturates the allocator. Use pools (a freelist of recycled conn structs).
Slowloris and partial reads. A malicious client sends one byte every 30 seconds; your handler reads, returns, and re-arms. The fd ties up a slot forever. Add idle timeouts.
Callbacks that re-enter the loop. Calling epoll_wait from within a handler is undefined; the loop is single-entrant. Schedule work for the next iteration instead.

Why doesn't Linux just expose async I/O for everything?

POSIX AIO has been on the books since 1993 but Linux’s implementation runs in user-space helper threads — defeating the point. The kernel’s primary async surface was epoll (readiness, not completion), and disk I/O didn’t fit the readiness model (a disk is always “ready” — it’s the read itself that’s slow). io_uring (2019) finally provides true async submission/completion across sockets, files, timers, and even forks. It took 26 years because the API surface is huge and the security model is subtle (shared submission queues between userspace and kernel are an attack surface).

Threads and Shared State — the model event-based displaces.
POSIX Threads API — when you do reach for threads inside an event-based program.
Concurrency Bugs — Deadlock, Atomicity, Order — event-based replaces some bugs with others (state-machine bugs).
Locks and Spinlocks — what you avoid by single-threading, what you need across loops.
Condition Variables — the threaded predicate-wait, replaced by callbacks in event-based code.