How Meta turned the Linux Kernel into a planet-scale Load Balancer. Part II
A deep architectural narrative on XDP, eBPF, stateless routing, and why hyperscale traffic outgrew proxies.
The Hidden Constraint Stateless Systems Cannot Escape
By the end of Part I, we discovered how Katran had collapsed load balancing into something almost “offensively simple”: a pure function executed at the earliest point in the kernel receive path.
A packet arrived via DMA into host memory. XDP intercepted it before socket allocation, before skb creation, before TCP state machines or retransmission queues even existed.
Katran read a fixed set of bytes, computed a hash over the 5-tuple, performed a constant-time lookup inside an eBPF map, rewrote the destination fields in place, fixed the checksums, and transmitted the packet.
No heap allocation.
No locks.
No connection tracking.
No memory proportional to traffic volume.
Routing had been reduced to deterministic evaluation over immutable input.
But this architectural purity comes with a constraint that stateful systems quietly absorb without thinking:
if the system refuses to remember past decisions, it must be able to reproduce them exactly, indefinitely, using only computation and configuration as inputs.
Katran doesn’t operate in a calm, static environment. Backends fail silently, networks partition, nodes drain for rolling upgrades, and autoscaling stretches and contracts continuously.
At Meta scale (tens of thousands of machines) these events happen constantly, while packets flow without pause.
Naive approaches, like modulo hashing, cannot survive this churn: millions of flows would be reassigned simultaneously, TLS sessions break, caches go cold, and tail latency explodes.
Katran survives by remembering nothing. Each packet recomputes its destination independently, deterministically, relying only on the current configuration and immutable header fields.
Continuity cannot come from memory; it must come from mathematics. Formally, the routing decision is a function:
where flow_identity is the 5-tuple and backend_set reflects the current fleet encoded in eBPF maps. Every backend change alters the function. A naive mapping over a changing set would instantly reshuffle all flows, because Katran does not store previous decisions.
There is no per-flow state, no hash table of prior assignments, no flow table to synchronize. Stateful systems externalize continuity into memory; Katran externalizes it into math.
This is where consistent hashing becomes essential. It distributes flows across the ring such that small changes in the backend set affect only a fraction of traffic, preserving mapping continuity without storing anything.
Every packet independently recomputes its backend, yet the global distribution remains stable, predictable, and local.
In other words: continuity is no longer a property of memory; it is a property of the function itself.
Statelessness, determinism, and resilience are inseparable. This is why consistent hashing is not an optimization: it is the backbone of stateless load balancing at hyperscale.
Why modulo hashing fails under dynamic membership
At first glance, stateless routing appears trivial to implement. The packet already contains a globally unique identifier in the form of its 5-tuple.
Applying a high-quality hash function such as Jenkins hash, MurmurHash, or Toeplitz hash produces a uniformly distributed 32-bit or 64-bit value.
Mapping that value into a backend index can be done using a simple modulo operation:
This approach satisfies nearly all of Katran’s architectural constraints. It is deterministic. It requires no per-flow state. It executes in constant time.
It is also trivial to implement in eBPF. It produces uniform load distribution assuming a well-behaved hash function.
And yet it fails catastrophically the moment the backend set changes.
The problem lies in the divisor.
Modulo arithmetic does not preserve ordering relationships when the modulus changes. When N changes to N+1, the mapping of hash values to indices changes globally.
A hash value that previously mapped to backend 2 may now map to backend 7. Another may map to backend 0. There is no locality preservation property.
To see this concretely, consider a simplified example with N=4 backends. Hash values map into backend indices like this:
Now add a fifth backend, so N=5:
For any given hash value H, the probability that:
is exactly:
This means 80% of all flows are reassigned immediately when a single backend is added. At hyperscale, this is not a theoretical inconvenience. It is an operational disaster.
Every reassigned flow results in packets arriving at backend machines that have never seen that flow before. TCP stacks reject packets with unknown sequence numbers.
TLS sessions fail because cryptographic state does not exist on the new backend. HTTP/2 multiplexed streams break because connection context is lost. Clients initiate retransmissions, exponential backoff, and eventually full connection re-establishment.
This cascades upward.
Connection pools maintained by clients become invalid.
Application-layer caches lose locality.
Backend CPU cache warmth disappears.
NUMA locality is destroyed.
Kernel routing caches become irrelevant.
Traffic patterns that were previously thermally stable suddenly destabilize.
This creates transient overload conditions that amplify latency precisely when the system is undergoing change, which is the worst possible time for instability.
Even more dangerously, this reshuffling occurs regardless of whether the backend change is an addition, removal, or replacement. Removing a single backend causes the same global redistribution effect, forcing nearly all flows to move.
The system becomes topologically fragile. Minor topology changes trigger global traffic churn.
This is fundamentally incompatible with Katran’s stateless model.
Because Katran has no memory, it cannot preserve continuity explicitly. Therefore continuity must be preserved implicitly by the mapping function.
Modulo hashing does not provide this guarantee.
Katran required a routing function whose output changes minimally when the backend set changes. A function whose continuity properties are intrinsic, not emergent.
A function designed not merely for distribution, but for stability under mutation.
This is exactly what consistent hashing provides.
The missing half of the equation
Up to this point, Katran’s model appears almost suspiciously incomplete. A packet arrives, a deterministic hash selects a backend, the destination IP is rewritten, and the packet is forwarded.
No connection table is created.
No session state is remembered.
No record is kept anywhere.
The load balancer forgets the packet the instant it leaves the NIC. Which raises an immediate and deeply practical question:
how does the response find its way back?
Traditional load balancers solve this by remembering everything. They allocate connection entries, store source and destination tuples, and use that memory later to reverse the transformation.
Return traffic is not discovered: it is looked up. Every response packet must re-enter the load balancer, consult state, and be translated back into the client-visible address space.
Katran does something far more radical. It eliminates the need to remember in the first place.
This works because Katran operates in two distinct forwarding modes, each designed to preserve the illusion of a single virtual service IP while minimizing the amount of work Katran itself must perform.
Both modes rely on the same invariant: the load balancer’s job is not to own the connection, but merely to place the packet correctly at the beginning. Once placed, the network itself can do the rest.
This is the difference between supervising every conversation and simply introducing two parties who can speak directly. Katran prefers introductions.
The art of disappearing
Direct Server Return is the purer expression of Katran’s philosophy. In DSR mode, Katran modifies only the minimum amount of information required to steer the packet to the correct backend.
Specifically, it rewrites the destination MAC address so the frame reaches the selected server at layer 2, while leaving the IP layer logically consistent with the virtual service abstraction.
From the client’s perspective, the packet was sent to the VIP. From the backend’s perspective, the packet appears to be addressed to that same VIP, because the backend is explicitly configured to accept traffic destined for that address via a loopback interface.
This configuration is deliberate sleight of hand: multiple machines simultaneously claim ownership of the same IP, but only Katran determines which one actually receives each packet.
Once the backend processes the request, it responds directly to the client using normal IP routing. The response does not pass back through Katran. It does not need translation, correction, or approval.
The backend already knows the client’s source IP, and the VIP is already a valid source address on the backend.
Katran never sees the response. And that absence is the entire point.
By removing itself from the return path, Katran immediately cuts its packet processing load in half. Every byte of outbound response traffic bypasses the load balancer entirely, freeing CPU cycles, memory bandwidth, PCIe bandwidth, and NIC queue capacity.
Latency improves because an entire network hop disappears. Throughput increases because Katran’s processing budget is now devoted exclusively to ingress traffic.
The load balancer becomes a one-way valve.
This has profound scaling implications. In traditional NAT-based load balancers, throughput is constrained by bidirectional processing capacity. Every request and every response must traverse the same CPU, the same queues, the same kernel structures.
In DSR mode, Katran’s throughput ceiling effectively doubles, not because the hardware changed, but because half the work vanished.
The most efficient packet is the one you never have to touch.
Of course, this requires backend cooperation. Each backend must be configured with the VIP on a loopback interface, must disable certain reverse path filtering protections, and must ensure routing policies allow responses to exit with the VIP as the source address.
These adjustments sound invasive, but at Meta’s scale, where fleets are centrally managed and automatically provisioned, such configuration is routine infrastructure hygiene.
In exchange, Katran achieves something remarkable: it participates only in the part of the connection where its intelligence is actually required.
The rest of the time, it actually “disappears”.
The compatibility layer
Not every environment can support Direct Server Return. Legacy systems, asymmetric routing environments, and networks with strict topology constraints often require the load balancer to remain in the return path.
For these cases, Katran supports traditional NAT mode, where both inbound and outbound packets are rewritten to preserve the virtual service abstraction.
In NAT mode, Katran performs two symmetric transformations. On ingress, it rewrites the destination IP from the VIP to the backend’s real address. On egress, it performs the inverse transformation, rewriting the source IP from the backend’s address back to the VIP before forwarding the packet to the client.
From the outside, the illusion remains intact. The client believes it is communicating with a single logical endpoint. The backend receives routable packets addressed to itself. Katran acts as the translator between these two realities.
What makes Katran’s NAT implementation unusual is what it does not do.
It does not allocate connection objects. It does not maintain per-flow state. It does not synchronize session tables across nodes. Instead, it relies entirely on deterministic recomputation.
Because the consistent hashing function produces the same backend selection for every packet in a flow, Katran can reapply the same translation logic independently to each packet without remembering anything about previous packets.
Each packet carries sufficient information to rediscover its correct path. This preserves the defining properties of the system:
Routing decisions remain stateless, lock-free, and purely functional. There are no shared data structures that grow with connection volume.
Memory usage does not scale with traffic, and even CPU cost remains constant per packet. Failure recovery requires no state reconstruction because no state exists to reconstruct.
Even when performing NAT, Katran refuses to become stateful. Statelessness is not a feature. It is a constraint the system refuses to violate.
Where intelligence is allowed to exist
At first glance, Katran’s data plane appears almost aggressively unintelligent. It does not monitor backend health. It does not detect overload. It does not adapt dynamically. It does not even know what services exist in any meaningful sense.
It simply reads precomputed values from eBPF maps and executes deterministic transformations on packets.
This is not a limitation.
It is a deliberate architectural boundary.
All intelligence lives elsewhere.
The control plane operates entirely in user space, where complexity is cheap and mistakes are survivable. It continuously observes the backend fleet, performing health checks, capacity measurements, deployment coordination, and traffic engineering decisions.
It computes consistent hashing rings, assigns weights to reflect backend capacity, removes unhealthy nodes, and introduces new ones during scaling events.
Once computed, these decisions are materialized into eBPF maps inside the kernel. This is the only communication channel between intelligence and execution.
The kernel data plane never queries user space. It never performs system calls. It never waits for locks held by other threads. It never allocates dynamic memory. It never blocks on I/O.
It simply reads from maps that already exist in memory and executes instructions that have already been verified.
The relationship between control plane and data plane is asynchronous and one-directional. The control plane writes new configurations when necessary. The data plane continues executing the previous configuration until the new one appears.
There is no synchronization barrier. There is no pause in packet processing. There is no transitional state where routing becomes uncertain.
This separation produces one of Katran’s most important operational properties:
If the control plane crashes, traffic continues uninterrupted.
The kernel retains the last valid configuration indefinitely. Packets continue to be routed correctly. Services remain reachable. The system does not degrade gradually: it simply stops evolving until the control plane returns.
The inverse failure is equally unremarkable.
If a Katran node fails entirely, upstream Anycast routing automatically shifts traffic to other Katran nodes advertising the same VIP. Because no per-connection state exists, there is nothing to migrate.
New packets arrive at different load balancers, which recompute the same consistent hash decisions and forward traffic to the same backends.
There is no recovery phase. Recovery is instantaneous because nothing was lost.
This is resilience not through redundancy, but through the absence of fragile state.
Watching decisions at line rate
Running routing logic inside the kernel would traditionally be an observability nightmare. Kernel code is opaque, difficult to instrument, and extremely dangerous to modify.
Debugging typically relies on indirect signals: application logs, sampled traces, or inferred metrics reconstructed from incomplete information.
eBPF changes this completely.
Because Katran is implemented as an eBPF program, it can export telemetry directly from the exact instruction path that processes each packet.
Every routing decision can increment counters, update histograms, or emit structured events into shared maps or perf buffers, all without leaving kernel space or introducing meaningful overhead.
This provides observability at the point of truth.
Operators can measure per-service packet rates exactly as they are forwarded. They can observe backend distribution and detect imbalance immediately. They can identify drops caused by invalid packets, resource exhaustion, or configuration transitions.
They can monitor NIC queue utilization and detect early signs of saturation before packet loss begins.
Nothing is inferred.
Nothing is sampled.
Nothing is reconstructed.
The measurements are produced by the same instructions that forward the packet.
Even more importantly, these programs can be updated dynamically. New instrumentation can be deployed without kernel recompilation, without system reboot, and without interrupting traffic.
The kernel effectively becomes a programmable execution environment, capable of evolving while the system remains live.
This represents a fundamental shift in how infrastructure behaves.
The kernel is no longer a static artifact frozen at boot time. It becomes a runtime, executing safe, verified programs that can be replaced as requirements evolve.
Katran is not merely a load balancer.
It is an example of what happens when packet processing stops being a fixed function of the operating system and becomes software again.
Statelessness as resilience
Katran’s architecture repeatedly demonstrates a central theme: statelessness is not merely a performance optimization; it is the source of resilience.
By eliminating memory, per-flow state, and mutable tables from the data plane, Katran transforms a potentially fragile, memory-bound system into a deterministic, local, lock-free computation.
Every section we’ve explored, return traffic handling, control plane separation, observability, and minimal kernel programs, converges on the same insight: the network does not need to remember.
The network needs only to compute, consistently, independently, and correctly.
In hyperscale systems, that subtle architectural shift, from storing what happened to recomputing what must happen, does more than improve speed.
It restores simplicity, predictability, and operational sanity to a layer of infrastructure that was traditionally overburdened with complexity it never required.
Katran does not just move packets faster. It redefines what moving packets efficiently even means. It vanishes in the network, leaving only math, determinism, and physics-aware computation behind.
And in doing so, it proves that sometimes the most radical engineering move is to refuse to remember anything at all.



