POSIX Threads API
pthread_create / join / mutex / cond — the canonical thread surface and the gotchas around using it correctly.
What it is#
pthreads (POSIX Threads, IEEE Std 1003.1c) is the C-level threading interface that every Unix-like system speaks. On Linux it’s implemented by glibc’s NPTL on top of the clone() syscall and the futex fast-userspace mutex; on macOS by libsystem on top of Mach threads. The surface fits on one slide: create / join a thread, lock / unlock a mutex, wait / signal a condition variable, post / wait on a semaphore.
Every higher-level threading API in C-family languages — C++ std::thread, Rust std::thread, Java native threads, Python’s threading module — sits on pthreads when it runs on Linux or macOS. Knowing pthreads gives you the vocabulary to reason about all of them.
When to use it#
Reach for pthreads directly when:
- You’re writing C and the standard library’s
<threads.h>(C11) isn’t available or portable enough. - You’re writing a library that ships across multiple host languages and needs the lowest-common-denominator thread primitive.
- You’re debugging or profiling threading code and need to see the actual syscalls —
gdb’s thread support,perf’s thread-id columns, andstrace -fall speak the pthreads layer. - You need a feature the higher-level wrapper hides — thread cancellation, custom stack placement, robust mutexes that survive owner death, real-time scheduling attributes.
For new C++ code, prefer std::thread and std::mutex — same semantics, RAII handles cleanup, type-safe. For Go or Rust async work, prefer the runtime’s primitives. pthreads is the floor, not the ceiling.
How it works#
Creating and joining threads#
#include <pthread.h>
void* worker(void* arg) { int id = *(int*)arg; // ... do work ... return (void*)(long)(id * 2); // return value picked up by join}
int main(void) { pthread_t t; int id = 42; pthread_create(&t, NULL, worker, &id); // spawn void* ret; pthread_join(t, &ret); // wait + collect long result = (long)ret; return 0;}pthread_create allocates a stack (default 8 MB on Linux glibc, tunable via pthread_attr_setstacksize), wires up TLS, and issues a clone() with CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_THREAD — the flags that say “share everything except the stack and registers.” pthread_join blocks until the target thread returns and reclaims its resources.
Detached vs. joinable#
Every thread is born joinable — someone must eventually call pthread_join to reclaim its resources, or you leak a thread descriptor and ~8 MB of address space. If you don’t care about the return value or completion, mark the thread detached so cleanup happens automatically:
pthread_attr_t attr;pthread_attr_init(&attr);pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);pthread_create(&t, &attr, worker, NULL);pthread_attr_destroy(&attr);// no join needed; thread frees itself on exitMutexes#
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&m);// critical sectionpthread_mutex_unlock(&m);
pthread_mutex_destroy(&m); // when done with the mutex itselfThe static initializer above gives you a default mutex (non-recursive, no priority inheritance). For recursive locks, error-checking locks, or priority-inheritance locks, use pthread_mutexattr_settype / pthread_mutexattr_setprotocol. Don’t enable recursive locks “to be safe” — the need for one usually signals a design bug.
Condition variables#
pthread_cond_t cv = PTHREAD_COND_INITIALIZER;pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
// waiterpthread_mutex_lock(&m);while (!ready) pthread_cond_wait(&cv, &m);pthread_mutex_unlock(&m);
// signallerpthread_mutex_lock(&m);ready = 1;pthread_cond_signal(&cv);pthread_mutex_unlock(&m);Always while, never if — see Condition Variables for the full discipline.
Other primitives#
pthread_rwlock_*— readers-writer locks. Multiple readers OR one writer.pthread_barrier_*— wait for N threads to all arrive at a point.pthread_once— run an initializer exactly once across all threads.pthread_key_*— thread-local storage with destructors.sem_*(technically POSIX semaphores, not pthreads, but bundled in<semaphore.h>).
Variants#
Linux NPTL vs. older LinuxThreads#
LinuxThreads (glibc < 2.4, last meaningful in ~2003) was the original Linux pthreads — one process per thread, signal handling that didn’t match POSIX, no proper TLS. NPTL (Native POSIX Thread Library) replaced it: kernel-level threads via the clone() flags above, futex-based mutexes, real TLS, POSIX-compliant signals. You’re on NPTL today unless you’re maintaining something ancient.
macOS#
macOS implements pthreads on Mach threads. Most semantics match but a few differ — unnamed semaphores (sem_init) are stubbed out; use dispatch_semaphore_t from GCD, or named semaphores via sem_open. Thread cancellation is also less aggressive.
Windows#
Windows doesn’t ship pthreads. You either use the Win32 API (CreateThread, CRITICAL_SECTION, CONDITION_VARIABLE) or a compatibility shim (pthreads-win32). Cross-platform code typically wraps both behind a thin layer.
Higher-level wrappers#
- C++
<thread>,<mutex>,<condition_variable>— typesafe, RAII-managed pthreads. Strictly better for new C++ code. - OpenMP — pragma-driven thread pools for data-parallel loops. Sits on pthreads internally.
- TBB / oneTBB — Intel’s task-parallel library. Task graphs, work-stealing scheduler.
- Go runtime, Rust tokio, Java virtual threads — M:N scheduling on top of a small pthread pool.
Trade-offs#
pthread_mutex_destroy is a leak; error returns are easy to ignore. The API predates C99 — no generics, no closures, callbacks via void* casts. std::thread / Rust std::thread — RAII handles cleanup, types catch misuse, lambdas pass captured state cleanly. Cost: requires the language toolchain and standard library; debugging at the syscall layer still drops you back to the pthreads names. Other tensions:
- One big lock vs. many small locks. Coarse locking is correct but throttles scale; fine-grained locking is fast but invites deadlock. Most codebases start coarse and split as profiling demands.
pthread_mutex_tvs.pthread_spinlock_t. The mutex blocks the caller via the futex; the spinlock burns CPU. Spinlocks are rarely correct in user space — see Locks and Spinlocks.- Detached vs. joinable. Detached is convenient for fire-and-forget workers; joinable is required when you need the return value or shutdown ordering.
Common pitfalls#
- Ignoring return values. Every pthreads call returns
0on success or a positive errno (not-1+errno).pthread_mutex_lockreturningEDEADLKand being ignored produces a “lock held by no one” mystery. - Passing pointers to stack data into
pthread_create. If the parent’s stack frame exits before the child readsarg, you get garbage or a crash. Either pass an integer cast tovoid*, or heap-allocate the arg and have the child free it. - Forgetting
pthread_joinon joinable threads. Each leaks ~8 MB of address space plus a kernel TID. After ~32k leaked threads a 32-bit process runs out of address space. - Calling fork in a multi-threaded process. Only async-signal-safe functions are legal between
forkandexecin the child.malloc,printf, almost anything interesting is unsafe. Useposix_spawninstead. - Cancelling threads with
pthread_cancel. Asynchronous cancellation is almost never what you want — locks held at the cancellation point are leaked. Prefer cooperative shutdown via astopflag checked at safe points. - Mixing
pthread_mutex_twith signals. A signal delivered while a thread holds a mutex can leave it locked forever if the handler longjmps out. Block signals on threads that hold locks, or use signalfd / sigwait on a dedicated thread. - Static initializer for a heap-allocated mutex.
PTHREAD_MUTEX_INITIALIZERonly works for statically allocated mutexes. For heap mutexes, callpthread_mutex_init(&m, NULL).
What is a futex and why does pthreads need one?
A futex (fast userspace mutex) is a Linux syscall pair (futex(FUTEX_WAIT) and futex(FUTEX_WAKE)) that lets userspace implement locks without entering the kernel on the uncontended path. pthread_mutex_lock does an atomic compare-and-swap on a userspace word; only if the lock is contended does it call futex(FUTEX_WAIT) to sleep. Uncontended lock/unlock is a handful of instructions, no syscall. This is why glibc mutexes scale — the kernel only gets involved when threads actually have to wait. Mutexes before futexes (LinuxThreads, Solaris early threads) entered the kernel on every operation and were correspondingly slow.
Related building blocks#
- Threads and Shared State — the model pthreads exposes.
- Locks and Spinlocks — what
pthread_mutex_tactually is. - Condition Variables — the
pthread_cond_*half. - Semaphores — the
sem_*cousin family. - Concurrency Bugs — Deadlock, Atomicity, Order — how pthreads code goes wrong.