Project — A Multi-Client UDP Chat App
A small but realistic project: many clients, one relay server, broadcast over UDP. The framing, the keep-alive, the cleanup.
What it is#
A relay chat application built on UDP datagrams. One central server binds a UDP port and maintains an in-memory map of {client_addr: nickname}. Many clients send datagrams to the server; the server rebroadcasts each message to every other connected client. There is no connection state in the protocol sense — UDP gives you no handshake, no FIN, no retransmits — so the project becomes an exercise in inventing exactly enough state on top of UDP to make a working chat: a join command, a keep-alive ping, an idle-timeout for cleanup, and line-oriented framing.
The point of doing it on UDP rather than TCP is that every piece of “connection” behaviour is something you have to build yourself. There is no accept(), no per-client socket, no FIN-detected disconnect. The server has one socket, calls recvfrom in a loop, and tells clients apart by their (host, port) tuple. This makes the moving parts unusually visible.
When to use it#
Build this when you want to:
- See how application-level state lives on top of a connectionless transport. Every “session”, “join”, “leave”, “is the peer still alive” decision is yours to design.
- Practise framing without TCP’s stream-of-bytes problem. UDP gives you whole datagrams — one
sendtobecomes onerecvfrom— so framing collapses to “decide what one datagram means”. - Compare with the TCP version. Build it twice, once on each transport, and the differences in code (and failure modes) are immediate.
- Prototype a small group-chat / lobby / presence service. Real systems use TCP or QUIC for chat, but the topology — many clients, one relay, broadcast-on-receive — is the same.
Reach for TCP (or a higher-level framework) instead when:
- Messages must not be lost. UDP datagrams can vanish silently. A chat over UDP will sometimes drop lines.
- Messages must arrive in order. UDP makes no ordering guarantee. Two consecutive messages from the same client can arrive reversed.
- You need TLS. DTLS exists but is fiddly; if you want privacy, TCP + TLS is far easier.
How it works#
The protocol, in one paragraph#
Every message is a single UTF-8 line terminated by \n. The first message a client sends must be /join <nickname>. After that, plain text lines are chat messages, and /ping is a keep-alive. The server broadcasts each chat line to every other registered client, prefixed with the sender’s nickname. If a client has not sent anything for 30 seconds, the server drops them and announces a /leave. Clients send /ping every 10 seconds so the idle timer doesn’t fire.
ASCII view of the topology:
client A ─────┐ │ UDP datagrams (port 9999) client B ─────┼──────────► server ──────► rebroadcast to all other registered clients │ client C ─────┘The server#
"""relay.py — a UDP chat relay server.
Run with: python relay.pyDefault port: 9999. Idle timeout: 30 s."""import socketimport threadingimport timefrom dataclasses import dataclass
HOST = '0.0.0.0'PORT = 9999IDLE_TIMEOUT = 30.0 # seconds with no traffic before we evict a clientSWEEP_INTERVAL = 5.0 # how often the reaper thread runs
@dataclassclass Client: nickname: str last_seen: float
class Relay: def __init__(self, host: str, port: int) -> None: self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) self.sock.bind((host, port)) self.clients: dict[tuple[str, int], Client] = {} self.lock = threading.Lock() print(f'relay listening on udp/{port}')
def broadcast(self, line: str, exclude: tuple[str, int] | None = None) -> None: payload = (line.rstrip('\n') + '\n').encode('utf-8') with self.lock: targets = [addr for addr in self.clients if addr != exclude] for addr in targets: try: self.sock.sendto(payload, addr) except OSError: pass # client's socket may have died; reaper will clean up
def handle(self, data: bytes, addr: tuple[str, int]) -> None: try: line = data.decode('utf-8').rstrip('\n') except UnicodeDecodeError: return # ignore non-UTF-8 datagrams if not line: return now = time.monotonic()
if line.startswith('/join '): nick = line[6:].strip()[:32] or 'anon' with self.lock: self.clients[addr] = Client(nickname=nick, last_seen=now) self.broadcast(f'*** {nick} joined ***', exclude=addr) self.sock.sendto(f'*** welcome, {nick} ***\n'.encode(), addr) return
with self.lock: client = self.clients.get(addr) if client is None: self.sock.sendto(b'*** please /join <nickname> first ***\n', addr) return
client.last_seen = now if line == '/ping': self.sock.sendto(b'/pong\n', addr) return if line == '/leave': self.drop(addr, reason='left') return self.broadcast(f'<{client.nickname}> {line}', exclude=addr)
def drop(self, addr: tuple[str, int], reason: str) -> None: with self.lock: client = self.clients.pop(addr, None) if client is not None: self.broadcast(f'*** {client.nickname} {reason} ***')
def reaper(self) -> None: while True: time.sleep(SWEEP_INTERVAL) cutoff = time.monotonic() - IDLE_TIMEOUT with self.lock: stale = [a for a, c in self.clients.items() if c.last_seen < cutoff] for addr in stale: self.drop(addr, reason='timed out')
def serve_forever(self) -> None: threading.Thread(target=self.reaper, daemon=True).start() while True: try: data, addr = self.sock.recvfrom(4096) except OSError: continue self.handle(data, addr)
if __name__ == '__main__': Relay(HOST, PORT).serve_forever()The relay is a single UDP socket, one worker thread for the reaper, and a dict guarded by a lock. There is no per-connection state in the kernel — every datagram is a fresh recvfrom. Everything that resembles a “session” lives in self.clients.
The client#
"""chat.py — a UDP chat client.
Run with: python chat.py <nickname> [server_host] [server_port]"""import socketimport sysimport threadingimport time
PING_EVERY = 10.0 # seconds between keep-alives
def recv_loop(sock: socket.socket) -> None: while True: try: data, _ = sock.recvfrom(4096) except OSError: return line = data.decode('utf-8', errors='replace').rstrip('\n') if line == '/pong': continue # quiet keep-alive ack sys.stdout.write(f'\r{line}\n> ') sys.stdout.flush()
def ping_loop(sock: socket.socket, server: tuple[str, int]) -> None: while True: time.sleep(PING_EVERY) try: sock.sendto(b'/ping\n', server) except OSError: return
def main() -> None: nick = sys.argv[1] if len(sys.argv) > 1 else 'anon' host = sys.argv[2] if len(sys.argv) > 2 else '127.0.0.1' port = int(sys.argv[3]) if len(sys.argv) > 3 else 9999 server = (host, port)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.sendto(f'/join {nick}\n'.encode('utf-8'), server)
threading.Thread(target=recv_loop, args=(sock,), daemon=True).start() threading.Thread(target=ping_loop, args=(sock, server), daemon=True).start()
try: while True: line = input('> ') if not line: continue sock.sendto((line + '\n').encode('utf-8'), server) if line == '/leave': break except (EOFError, KeyboardInterrupt): sock.sendto(b'/leave\n', server) finally: sock.close()
if __name__ == '__main__': main()Run the server in one terminal and two or more clients in others:
$ python relay.pyrelay listening on udp/9999
$ python chat.py alice*** welcome, alice ***> hi everyone> <bob> hey alice
$ python chat.py bob*** welcome, bob ****** alice joined ***<alice> hi everyone> hey aliceWhy each piece is the way it is#
SO_REUSEADDRon the server. Lets you restart the relay without waiting on a port-reuse delay. Less critical than for TCP (noTIME_WAITon UDP) but still nice during development.- The lock around
self.clients. The reaper thread reads and mutates the same dict as the mainrecvfromloop. Without the lock you race onpopvsiter. rstrip('\n')everywhere. Robust to clients sending\r\nor no trailing newline. The server always emits\n-terminated lines.- Broadcast excludes the sender. Otherwise every client sees its own messages echo back, which is confusing.
/pongis silent on the client. It’s a transport-level ack; users shouldn’t see it.
Variants#
- Authenticated join. Replace
/join <nick>with/join <nick> <token>; server checks token against a static table. Still a toy, but closer to real systems. - Channels / rooms. Add
/join #roomand/part #room; broadcast is restricted to clients in the same room. - Direct messages.
/msg <nick> hello— server looks up the addr by nickname and unicasts. - Server-driven keep-alive. Instead of clients pinging, the server pings each silent client every 10 s and drops on three missed pongs. Lets the server detect death faster.
- Sequence numbers + dedup. Number every datagram per-sender; receiver drops duplicates and warns on gaps. Edges UDP closer to TCP’s at-most-once-in-order behaviour.
- Multicast variant. Skip the relay entirely: clients join a multicast group (e.g.
239.0.0.1:9999) and send to it. Works on a LAN; not generally over the public Internet. - TLS-equivalent. DTLS via
sslin datagram mode, or a pre-shared symmetric key +Fernetfromcryptography. Out of scope here.
Trade-offs#
b'' tells you the peer left; bytes arrive in order with retransmits. But you pay per-connection memory in the kernel, and framing (the byte stream) is a separate problem. Other tensions worth weighing:
- Idle timeout length. Short (10 s) detects departures fast but punishes laggy networks; long (60 s) is forgiving but presence info goes stale. 30 s with 10 s ping is a defensible middle.
- Broadcast cost. O(clients) per message on the server. Fine to a few hundred clients; for thousands, switch to a publish-subscribe design with rooms.
- Single relay vs federation. One server is a SPOF. Real chat networks (IRC, Matrix) federate; the project’s relay topology is deliberately the simplest case.
- In-memory state. Restart the relay and everyone has to rejoin. Persistence (SQLite, JSON file) is one extra evening of work.
Why use addr tuples as the client key instead of nicknames?
Two reasons. First, the relay must reply to whoever sent the datagram — and the only identity UDP gives you for free is (src_ip, src_port). Second, nicknames can collide; addr tuples are unique per kernel-assigned ephemeral port. Nickname is a display attribute, not the identity.
Common pitfalls#
- Decoding without
errors='replace'. A stray non-UTF-8 byte crashes the client’s recv loop and the user wonders why messages stopped. Either guard each decode with try/except or passerrors='replace'. - Forgetting the lock. The reaper and the main loop both touch
self.clients. Without the lock, you’ll seeKeyErrororRuntimeError: dictionary changed size during iterationin random places. - Pinging at the same interval as the timeout. If ping is every 30 s and timeout is 30 s, a single dropped ping evicts the client. Ping at ~1/3 of the timeout window.
- Echoing messages to the sender. Easy to write
for addr in clients: sendto(...)and forget the exclude. Annoying to debug because everything “works” except double-output. - Reading from a single socket from two threads. Don’t. One thread does
recvfrom, others dosendto. Concurrentrecvfromon the same socket is unspecified across platforms. - No max-length on nicknames or messages. A 64 KB nickname will work but is hostile. Cap at sensible limits (32 chars nick, 1024 chars message).
- Assuming
sendtoalways succeeds. It can raise on routing failure, ICMP-unreachable, or buffer full. Wrap broadcasts in try/except. - Hiding errors with bare
except. During development you want the traceback. Tighten exception types only once the protocol is stable. - No graceful shutdown on the server. SIGINT inside
recvfromraises; consider catchingKeyboardInterruptand announcing*** server going down ***to all clients before exiting. - Skipping the
/joinrequirement. Without it, a stray datagram from a port-scanner registers a “client” with an empty nickname. The explicit/joinkeeps the table clean.
Related building blocks#