POSIX Threads API — Operating Systems

What it is#

pthreads (POSIX Threads, IEEE Std 1003.1c) is the C-level threading interface that every Unix-like system speaks. On Linux it’s implemented by glibc’s NPTL on top of the clone() syscall and the futex fast-userspace mutex; on macOS by libsystem on top of Mach threads. The surface fits on one slide: create / join a thread, lock / unlock a mutex, wait / signal a condition variable, post / wait on a semaphore.

Every higher-level threading API in C-family languages — C++ std::thread, Rust std::thread, Java native threads, Python’s threading module — sits on pthreads when it runs on Linux or macOS. Knowing pthreads gives you the vocabulary to reason about all of them.

When to use it#

Reach for pthreads directly when:

You’re writing C and the standard library’s <threads.h> (C11) isn’t available or portable enough.
You’re writing a library that ships across multiple host languages and needs the lowest-common-denominator thread primitive.
You’re debugging or profiling threading code and need to see the actual syscalls — gdb’s thread support, perf’s thread-id columns, and strace -f all speak the pthreads layer.
You need a feature the higher-level wrapper hides — thread cancellation, custom stack placement, robust mutexes that survive owner death, real-time scheduling attributes.

For new C++ code, prefer std::thread and std::mutex — same semantics, RAII handles cleanup, type-safe. For Go or Rust async work, prefer the runtime’s primitives. pthreads is the floor, not the ceiling.

How it works#

Creating and joining threads#

#include <pthread.h>

void* worker(void* arg) {
    int id = *(int*)arg;
    // ... do work ...
    return (void*)(long)(id * 2);  // return value picked up by join
}

int main(void) {
    pthread_t t;
    int id = 42;
    pthread_create(&t, NULL, worker, &id);   // spawn
    void* ret;
    pthread_join(t, &ret);                   // wait + collect
    long result = (long)ret;
    return 0;
}

pthread_create allocates a stack (default 8 MB on Linux glibc, tunable via pthread_attr_setstacksize), wires up TLS, and issues a clone() with CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_THREAD — the flags that say “share everything except the stack and registers.” pthread_join blocks until the target thread returns and reclaims its resources.

Detached vs. joinable#

Every thread is born joinable — someone must eventually call pthread_join to reclaim its resources, or you leak a thread descriptor and ~8 MB of address space. If you don’t care about the return value or completion, mark the thread detached so cleanup happens automatically:

pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&t, &attr, worker, NULL);
pthread_attr_destroy(&attr);
// no join needed; thread frees itself on exit

Mutexes#

pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock(&m);
// critical section
pthread_mutex_unlock(&m);

pthread_mutex_destroy(&m);  // when done with the mutex itself

The static initializer above gives you a default mutex (non-recursive, no priority inheritance). For recursive locks, error-checking locks, or priority-inheritance locks, use pthread_mutexattr_settype / pthread_mutexattr_setprotocol. Don’t enable recursive locks “to be safe” — the need for one usually signals a design bug.

Condition variables#

pthread_cond_t cv = PTHREAD_COND_INITIALIZER;
pthread_mutex_t m  = PTHREAD_MUTEX_INITIALIZER;

// waiter
pthread_mutex_lock(&m);
while (!ready) pthread_cond_wait(&cv, &m);
pthread_mutex_unlock(&m);

// signaller
pthread_mutex_lock(&m);
ready = 1;
pthread_cond_signal(&cv);
pthread_mutex_unlock(&m);

Always while, never if — see Condition Variables for the full discipline.

Other primitives#

pthread_rwlock_* — readers-writer locks. Multiple readers OR one writer.
pthread_barrier_* — wait for N threads to all arrive at a point.
pthread_once — run an initializer exactly once across all threads.
pthread_key_* — thread-local storage with destructors.
sem_* (technically POSIX semaphores, not pthreads, but bundled in <semaphore.h>).

Variants#

Linux NPTL vs. older LinuxThreads#

LinuxThreads (glibc < 2.4, last meaningful in ~2003) was the original Linux pthreads — one process per thread, signal handling that didn’t match POSIX, no proper TLS. NPTL (Native POSIX Thread Library) replaced it: kernel-level threads via the clone() flags above, futex-based mutexes, real TLS, POSIX-compliant signals. You’re on NPTL today unless you’re maintaining something ancient.

macOS#

macOS implements pthreads on Mach threads. Most semantics match but a few differ — unnamed semaphores (sem_init) are stubbed out; use dispatch_semaphore_t from GCD, or named semaphores via sem_open. Thread cancellation is also less aggressive.

Windows#

Windows doesn’t ship pthreads. You either use the Win32 API (CreateThread, CRITICAL_SECTION, CONDITION_VARIABLE) or a compatibility shim (pthreads-win32). Cross-platform code typically wraps both behind a thin layer.

Higher-level wrappers#

C++ <thread>, <mutex>, <condition_variable> — typesafe, RAII-managed pthreads. Strictly better for new C++ code.
OpenMP — pragma-driven thread pools for data-parallel loops. Sits on pthreads internally.
TBB / oneTBB — Intel’s task-parallel library. Task graphs, work-stealing scheduler.
Go runtime, Rust tokio, Java virtual threads — M:N scheduling on top of a small pthread pool.

Trade-offs#

Direct pthreads — minimal, portable, no language runtime needed. Cost: every resource (stack, mutex, CV) is manually managed; one forgotten pthread_mutex_destroy is a leak; error returns are easy to ignore. The API predates C99 — no generics, no closures, callbacks via void* casts.

C++ std::thread / Rust std::thread — RAII handles cleanup, types catch misuse, lambdas pass captured state cleanly. Cost: requires the language toolchain and standard library; debugging at the syscall layer still drops you back to the pthreads names.

Other tensions:

One big lock vs. many small locks. Coarse locking is correct but throttles scale; fine-grained locking is fast but invites deadlock. Most codebases start coarse and split as profiling demands.
pthread_mutex_t vs. pthread_spinlock_t. The mutex blocks the caller via the futex; the spinlock burns CPU. Spinlocks are rarely correct in user space — see Locks and Spinlocks.
Detached vs. joinable. Detached is convenient for fire-and-forget workers; joinable is required when you need the return value or shutdown ordering.

Common pitfalls#

Ignoring return values. Every pthreads call returns 0 on success or a positive errno (not -1 + errno). pthread_mutex_lock returning EDEADLK and being ignored produces a “lock held by no one” mystery.
Passing pointers to stack data into pthread_create. If the parent’s stack frame exits before the child reads arg, you get garbage or a crash. Either pass an integer cast to void*, or heap-allocate the arg and have the child free it.
Forgetting pthread_join on joinable threads. Each leaks ~8 MB of address space plus a kernel TID. After ~32k leaked threads a 32-bit process runs out of address space.
Calling fork in a multi-threaded process. Only async-signal-safe functions are legal between fork and exec in the child. malloc, printf, almost anything interesting is unsafe. Use posix_spawn instead.
Cancelling threads with pthread_cancel. Asynchronous cancellation is almost never what you want — locks held at the cancellation point are leaked. Prefer cooperative shutdown via a stop flag checked at safe points.
Mixing pthread_mutex_t with signals. A signal delivered while a thread holds a mutex can leave it locked forever if the handler longjmps out. Block signals on threads that hold locks, or use signalfd / sigwait on a dedicated thread.
Static initializer for a heap-allocated mutex. PTHREAD_MUTEX_INITIALIZER only works for statically allocated mutexes. For heap mutexes, call pthread_mutex_init(&m, NULL).

What is a futex and why does pthreads need one?

A futex (fast userspace mutex) is a Linux syscall pair (futex(FUTEX_WAIT) and futex(FUTEX_WAKE)) that lets userspace implement locks without entering the kernel on the uncontended path. pthread_mutex_lock does an atomic compare-and-swap on a userspace word; only if the lock is contended does it call futex(FUTEX_WAIT) to sleep. Uncontended lock/unlock is a handful of instructions, no syscall. This is why glibc mutexes scale — the kernel only gets involved when threads actually have to wait. Mutexes before futexes (LinuxThreads, Solaris early threads) entered the kernel on every operation and were correspondingly slow.

Threads and Shared State — the model pthreads exposes.
Locks and Spinlocks — what pthread_mutex_t actually is.
Condition Variables — the pthread_cond_* half.
Semaphores — the sem_* cousin family.
Concurrency Bugs — Deadlock, Atomicity, Order — how pthreads code goes wrong.