Fixing Intermittent Connection Delays and SNAT Conflicts in Kubernetes VXLAN Clusters
Solution Summary
Kubernetes clusters utilizing VXLAN CNI plugins can experience intermittent connection delays and 63-second timeouts due to double-NAT packet masquerading. The fix involves updating iptables KUBE-POSTROUTING rules to treat the masquerade mark as a stateful toggle, applying an XOR operation to clear the mark before VXLAN encapsulation, thus preventing checksum failures.
The Problem
Resolve persistent connection hangs and 63-second timeouts in Kubernetes clusters using VXLAN CNIs by fixing the double-NAT packet masquerade bug.
Why does this happen?
The issue stems from a persistent firewall mark that triggers a redundant SNAT operation on encapsulated VXLAN packets. Because the packet mark persists through the kernel stack, the iptables KUBE-POSTROUTING chain applies masquerading twice, leading to checksum failures and packet drops.
Code Example
# Logic implementation using iptables XOR for mark consumption
iptables -t nat -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
iptables -t nat -A KUBE-POSTROUTING -m mark --mark 0x4000/0x4000 -j MARK --xor-mark 0x4000
iptables -t nat -A KUBE-POSTROUTING -j MASQUERADE --random-fully Step-by-Step Fix
To resolve this, update the iptables logic to treat the masquerade mark as a stateful toggle. First, implement an immediate return guard if the mark is absent. Second, replace the static setting of the mark with an XOR operation during the masquerade process to consume the bit, ensuring it is cleared before the packet re-enters the stack during VXLAN encapsulation.