Day 46: Building a Distributed Gateway Cluster

Jun 25, 2026

∙ Paid

Introduction

Imagine you’ve built a chat application that handles WebSocket connections. It works great on your laptop with 1,000 users. You deploy it to a powerful server with 32 CPU cores and 64GB of RAM, and it handles 100,000 concurrent users beautifully. You’re ready to launch, right?
Not quite.
There are three critical problems waiting to destroy your system in production:

Problem 1: The Single Point of Failure

Your server crashes. Maybe the process runs out of memory, maybe there’s a bug, maybe someone deploys bad code. Whatever the reason, all 100,000 users disconnect at once. They all try to reconnect within 10 seconds. But your server takes 45 seconds to boot up. You’ve just lost 45 seconds of uptime and created a “thundering herd” where everyone hammers your database to re-validate their sessions at the same time.

Problem 2: The Scaling Wall

You can’t just keep making your server bigger forever. Once you get past a certain size, you hit limits in the operating system itself. The kernel can only handle so many file descriptors, so much TCP buffer memory, so many interrupts. At 500,000 connections, even the context switching between processes becomes measurable overhead. At 1 million connections, the idle TCP buffers alone consume over 1GB of kernel memory.

Problem 3: Zero-Downtime Updates

You need to deploy a security patch. Your options are: (a) kill the server and disconnect everyone, or (b) start a second server and somehow move connections over. But wait - WebSocket connections are stateful. They hold session data in RAM. You can’t just “migrate” them like you can with stateless HTTP requests.

Spring Boot won’t save you here. Its embedded Tomcat server abstracts away the socket layer, making it impossible to implement proper connection draining. Its “graceful shutdown” feature waits for HTTP requests to complete, but WebSockets never “complete” - they’re persistent connections.

Today, we’re going to solve these problems by building a distributed Gateway cluster from scratch using pure Java.

The Core Problem: Stateful Service Coordination

The fundamental issue is this: stateful services can’t be treated like stateless HTTP servers.

When you scale a regular HTTP API horizontally, load balancers can simply round-robin requests across multiple servers. Any server can handle any request because requests are ephemeral - load the user’s data from the database, process it, send a response, and forget about it.

WebSocket Gateways are completely different:

User Alice connects → Gateway #2 stores her session in RAM
Alice sends a message → Must be routed to Gateway #2 (session affinity)
Gateway #2 crashes → Alice's session is GONE from all Gateways
Alice reconnects → Gets routed to Gateway #3 → Gateway #3 says "Who are you?"

This is called the Session Locality Problem. Solutions exist - sticky load balancing, session replication, shard-aware routing - but they all require cluster coordination:

Gateways need to know about each other’s existence
Gateways need to track which sessions are on which servers
Gateways need to detect failures quickly (before TCP times out)
Gateways need to handle membership changes without dropping connections

Kubernetes provides abstractions like Services, Endpoints, and Ingress controllers to handle this. But how do those actually work under the hood? Today we’ll implement the core primitives from scratch using Docker Compose.

Our Architecture: The Flux Cluster

We’re building a minimalist Gateway cluster with three main components:

Component 1: The Gateway Node (Java 21 + NIO)

Each Gateway instance does the following:

Runs a WebSocket server (from previous lessons)
Registers itself in a Service Registry on startup with its hostname, IP address, and current load
Exposes a /health HTTP endpoint for health checks
Listens for SIGTERM (Docker’s shutdown signal)
Drains connections over 30 seconds before exiting

Important Detail: Java running inside Docker doesn’t automatically detect memory limits. If you tell the JVM to use 4GB of heap (-Xmx4g) but Docker limits the container to 2GB, the JVM will try to allocate 4GB anyway. The Linux kernel will then send SIGKILL to terminate the process immediately - no graceful shutdown, just instant death.

The solution is to use this JVM flag:

-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0

This tells the JVM to read the cgroup memory limits from Docker and cap the heap at 75% of the container’s available memory.

Component 2: The Service Registry (Redis)

A simple key-value store that holds the cluster membership information:

gateway:nodes = {
  "gateway-1": {"ip": "172.18.0.3", "port": 8080, "load": 42000, "last_heartbeat": 1704812451},
  "gateway-2": {"ip": "172.18.0.4", "port": 8080, "load": 38000, "last_heartbeat": 1704812452},
  "gateway-3": {"ip": "172.18.0.5", "port": 8080, "load": 0, "last_heartbeat": 1704812450}
}

Each Gateway updates its last_heartbeat timestamp every 5 seconds. If a heartbeat is more than 15 seconds old, we consider that node dead.

Why Redis? It’s CP (consistent during network partitions) and has built-in TTL support for automatic expiration. In production systems, you might use etcd, Consul, or ZooKeeper instead. For this lesson, Redis is simpler and avoids introducing another complex system.

Component 3: The Load Balancer (Java NIO TCP Proxy)

A simple Layer 4 proxy that:

Listens on port 8080 (the public-facing port)
Reads cluster membership from Redis every 1 second
Routes incoming connections using Consistent Hashing based on the user’s ID
Proxies raw TCP bytes in both directions using NIO Selectors

Why Consistent Hashing? If you use round-robin load balancing, when Gateway #2 crashes, not only are existing sessions on Gateway #2 lost, but all future connections get redistributed differently. This breaks session affinity for users trying to reconnect. Consistent hashing ensures that hash(user_id) % N always routes to the same Gateway unless that specific Gateway dies, minimizing the number of disrupted users.