Fixing Premature Traffic Drops During Kubernetes Node Draining
Solution Summary
Over-aggressive cloud controller predicate checks can prematurely remove cordoned Kubernetes nodes from load balancer pools, causing dropped traffic. The solution involves decoupling node scheduling status from load balancer backend management. Administrators should instead apply the specific exclude-from-external-load-balancers label only when nodes are fully prepared to stop receiving external traffic.
The Problem
Learn how to prevent service load balancer traffic drops during node cordoning by decoupling node scheduling status from cloud load balancer backend management.
Why does this happen?
The issue is caused by an over-aggressive predicate check in the cloud controller that automatically removed cordoned nodes from load balancer pools. This prematurely terminated traffic to nodes during drainage, rather than waiting for the graceful termination of active connections.
Code Example
kubectl label node <node-name> node.kubernetes.io/exclude-from-external-load-balancers=true Step-by-Step Fix
To ensure traffic flows until a node is fully decommissioned, remove reliance on the 'unschedulable' status for LB management. Instead, explicitly control load balancer participation by applying the 'node.kubernetes.io/exclude-from-external-load-balancers' label to nodes only when you are ready to stop receiving external traffic.