Fixing Premature Traffic Drops During Kubernetes Node Draining

#Kubernetes #LoadBalancer #Networking #CloudController #DevOps #NodeAutoscaling

Solution Summary

Over-aggressive cloud controller predicate checks can prematurely remove cordoned Kubernetes nodes from load balancer pools, causing dropped traffic. The solution involves decoupling node scheduling status from load balancer backend management. Administrators should instead apply the specific exclude-from-external-load-balancers label only when nodes are fully prepared to stop receiving external traffic.

The Problem

Learn how to prevent service load balancer traffic drops during node cordoning by decoupling node scheduling status from cloud load balancer backend management.

Why does this happen?

The issue is caused by an over-aggressive predicate check in the cloud controller that automatically removed cordoned nodes from load balancer pools. This prematurely terminated traffic to nodes during drainage, rather than waiting for the graceful termination of active connections.

Code Example

kubectl label node <node-name> node.kubernetes.io/exclude-from-external-load-balancers=true

Step-by-Step Fix

To ensure traffic flows until a node is fully decommissioned, remove reliance on the 'unschedulable' status for LB management. Instead, explicitly control load balancer participation by applying the 'node.kubernetes.io/exclude-from-external-load-balancers' label to nodes only when you are ready to stop receiving external traffic.

Fixing Premature Traffic Drops During Kubernetes Node Draining

Solution Summary

The Problem

Why does this happen?

Code Example

Step-by-Step Fix

Related Solutions

Resolving Kubernetes SNAT Port Exhaustion and Masquerade Collisions

Fixing Intermittent Connection Delays and SNAT Conflicts in Kubernetes VXLAN Clusters

Fixing Kubelet Socket Exhaustion and TIME-WAIT Issues in Kubernetes

Solution Summary

The Problem

Why does this happen?

Code Example

Step-by-Step Fix

Related Solutions

Resolving Kubernetes SNAT Port Exhaustion and Masquerade Collisions

Fixing Intermittent Connection Delays and SNAT Conflicts in Kubernetes VXLAN Clusters

Fixing Kubelet Socket Exhaustion and TIME-WAIT Issues in Kubernetes

We value your privacy