Condition Variables

Wait / signal / broadcast, the always-use-while rule, the producer-consumer pattern, covering conditions.

Building Block Intermediate
9 min read
condition-variables synchronization pthreads producer-consumer

What it is#

A condition variable is the primitive a thread uses to wait for some predicate over shared state to become true, then to be woken by another thread that just made the predicate true. It always pairs with a mutex: the mutex protects the shared state, the CV is the rendezvous mechanism between waiters and signallers.

The API is three operations:

  • cond_wait(cv, mutex) — atomically releases the mutex and puts the calling thread on the CV’s wait queue. When woken, reacquires the mutex before returning.
  • cond_signal(cv) — wakes one waiter on the queue (if any).
  • cond_broadcast(cv) — wakes all waiters on the queue.

The atomic release-and-wait in step one is the entire reason CVs exist. Without it, there’s a window between “I checked the predicate, it was false” and “I’m now asleep waiting” — a signaller running in that window would signal nothing and the waiter would sleep forever.

When to use it#

Whenever a thread needs to wait for something to happen rather than for exclusive access to data. The two patterns are different:

  • “I need to read or write this shared queue” → mutex.
  • “I need the queue to become non-empty before I can read it” → mutex + CV.

Specific cases:

  • Bounded buffers / producer-consumer — consumers wait while empty, producers wait while full.
  • Thread-pool job queues — workers wait for the queue to gain a task.
  • Barriers — N threads each wait for the other N-1 to arrive (though pthread_barrier_t exists as a higher-level primitive).
  • Reader-writer locks — writers wait while readers are active and vice versa.
  • join-style synchronization — a parent waits for a child to finish a phase.
  • Event-driven state machines — a thread waits for a state transition to happen.

If you find yourself spinning in a loop checking a flag without yielding, you should be using a CV. If you find yourself using sleep(100) to “wait for the other thread to finish,” you should be using a CV.

How it works#

The producer-consumer skeleton#

The canonical example — bounded buffer with one CV per condition:

#define N 16
int buf[N];
int count = 0, head = 0, tail = 0;
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cv_full = PTHREAD_COND_INITIALIZER; // not-full
pthread_cond_t cv_empty = PTHREAD_COND_INITIALIZER; // not-empty
void produce(int x) {
pthread_mutex_lock(&m);
while (count == N) // wait while full
pthread_cond_wait(&cv_full, &m);
buf[tail] = x;
tail = (tail + 1) % N;
count++;
pthread_cond_signal(&cv_empty); // signal a waiting consumer
pthread_mutex_unlock(&m);
}
int consume(void) {
pthread_mutex_lock(&m);
while (count == 0) // wait while empty
pthread_cond_wait(&cv_empty, &m);
int x = buf[head];
head = (head + 1) % N;
count--;
pthread_cond_signal(&cv_full); // signal a waiting producer
pthread_mutex_unlock(&m);
return x;
}

Two CVs — one per predicate (“not full” and “not empty”) — is the safest pattern. With one shared CV you’d have to broadcast instead of signal, because a signal might wake the wrong kind of waiter. The two-CV form lets signal target the correct queue.

What cond_wait actually does#

Under the hood, pthread_cond_wait(&cv, &m) is roughly:

1. Enqueue the calling thread on cv's wait queue.
2. Atomically release m and block.
3. (Sleep until signalled.)
4. On wakeup, reacquire m (may block here if another thread holds it).
5. Return.

The atomic release-and-block in step 2 is what makes the predicate-check-then-wait idiom race-free. If a signaller acquires the mutex, sets the predicate, calls cond_signal, and releases the mutex between the waiter’s predicate check and the call to cond_wait, that signal would be lost — but it can’t, because the waiter still holds the mutex when entering cond_wait, and the signaller can’t acquire it to change the predicate until the waiter is queued.

Signal vs. broadcast#

Use signal when:

  • All waiters are equivalent — any one can handle the wakeup.
  • The signal corresponds to exactly one unit of work (one item produced, one slot freed).

Use broadcast when:

  • Waiters are distinguished — different threads want different resources, but they share a CV.
  • The state change makes the predicate true for multiple waiters at once (e.g. a writer releasing a rwlock should wake all readers).
  • You can’t statically tell how many waiters should run.

The cost of broadcast is the thundering herd — all waiters wake, all race for the mutex, all but one go back to sleep. Cheap when N is small, painful when N is large.

Covering conditions#

When you can’t easily tell which waiter to wake, you use a covering condition: broadcast on every relevant state change and let each waiter re-check its own predicate. The textbook example is an allocator with size-class waiters — a free(p) returning a 64-byte chunk doesn’t know whether the waiting threads want 16, 32, or 64 bytes, so it broadcasts and each waiter checks if its desired size is now available.

Variants#

Mesa vs. Hoare semantics#

There are two historical schools of CV semantics:

  • Hoare semanticssignal immediately transfers control (and the lock) to the woken waiter. The waiter knows the predicate is true at wakeup; no while loop needed. Pure, but requires a context switch on every signal.
  • Mesa semanticssignal just marks a waiter as runnable; the signaller continues to run and release the lock when it pleases. The waiter must recheck the predicate (the world may have changed before it ran). All practical implementations (POSIX pthreads, Java, C++, Go) are Mesa.

Mesa is why you always re-check with while. It’s also why a signal is “best effort” — the waiter doesn’t see the predicate change instantly, only eventually.

cond_timedwait#

The variant with a deadline. Returns ETIMEDOUT if the deadline passed without a signal. Useful for “wait for at most 5 seconds” patterns and as a hedge against bugs where a signal is missed (though that’s a code smell — fix the bug, don’t paper over it with a timeout).

Higher-level primitives built on CVs#

  • Semaphores — counting CV + counter. Either OS-provided or hand-rolled.
  • Barriers — counts arrivals, broadcasts when all N are present.
  • Channels (Go) — a typed bounded buffer with send/recv as the producer/consumer ops, implemented internally with mutex + CVs.
  • Futures / promises — single-shot CV: one writer fulfills, any number of readers wait.
  • Event objects (Win32) — auto-reset and manual-reset events are CVs with a built-in flag.

Lock-free alternatives#

Some patterns can use std::atomic + wait/notify_one (C++20, futex-backed on Linux) to avoid the explicit mutex. The waiter spins briefly checking the atomic, then blocks. Useful for low-contention paths where the mutex would dominate; equivalent to a CV otherwise.

Trade-offs#

Condition variables — flexible, fit any predicate, well-supported in every threading library. Cost: must pair with a mutex; the while-loop discipline is easy to forget; bugs (lost wakeup, missed broadcast) are subtle and hard to repro.
Semaphores — simpler API for resource-counting patterns; producer-consumer with semaphores is two sem_wait/sem_post pairs and no explicit predicate. Cost: only works when the predicate is “count > 0”; harder to extend to richer predicates; can’t atomically combine multiple counters.

Other tensions worth noting:

  • One CV vs. many. One CV per predicate is safer; broadcast on a shared CV is cheaper to write but expensive to run and bug-prone.
  • Signal vs. broadcast. signal is cheaper but only correct when waiters are interchangeable; broadcast always works but burns CPU on wasted wakeups.
  • Holding a lock across wait. cond_wait releases the mutex while sleeping, which is the only safe way; spinning on a flag while holding a lock would deadlock anyone trying to set it. CVs encode this correctly.
  • CV vs. channel. A Go-style channel hides the mutex-and-CV plumbing; for simple producer-consumer it’s strictly better. For richer predicates (“wait until queue has 5 items AND temperature drops below 60”), you still need a CV.

Common pitfalls#

  • Using if instead of while. Already covered — produces lost wakeups, spurious-wakeup bugs, race-on-predicate bugs. The fix is mechanical: if (!pred) cond_wait(...) becomes while (!pred) cond_wait(...).
  • Signalling without holding the mutex. POSIX technically permits it, but the predicate is only well-defined under the lock. Always: lock, mutate, signal, unlock (or lock, mutate, unlock, signal — both are correct, the first is more conservative).
  • Signalling the wrong CV. Producer signals “not-empty”; consumer signals “not-full”. Reversing them produces “everyone waits forever, occasionally” bugs that take days to find.
  • Forgetting the predicate. “I’ll just wait once and assume” — no, the predicate may already be true when you arrive (no wait needed), or false again after wakeup (must re-wait). Always: check, wait, check, act.
  • Broadcasting when signalling would do. Cheap in microbenchmarks, expensive in production with hundreds of waiters. Use signal if any one waiter can take the wakeup.
  • Destroying a CV that has waiters. pthread_cond_destroy on a CV with someone in cond_wait is undefined behaviour. Make sure all waiters have left before tearing down.
  • CV without a mutex. “Spurious wakeup” from cond_wait on the wrong mutex, or with the mutex unlocked, is undefined. The CV and the mutex are a pair — wait, signal, and re-check must all be under the same lock.
What's a 'lost wakeup' bug, and how does the while loop prevent it?

A lost wakeup is when a signaller sets the predicate and signals, but no waiter sees the signal — usually because the waiter checked the predicate, found it false, and was about to call cond_wait when the signaller ran the whole sequence. Without the mutex protecting both the predicate check and the entry to cond_wait, the wakeup vanishes. The mutex + cond_wait’s atomic release-and-block close that race; the while loop closes the related “spurious wakeup” race where the kernel returns from wait for unrelated reasons. Both come for free if you follow the canonical pattern.

Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.