Kubernetes Nodes need occasional maintenance. You could be updating the Node’s kernel, resizing its compute resource in your cloud account, or replacing physical hardware components in a self-hosted installation.
Kubernetes cordons and drains are two mechanisms you can use to safely prepare for Node downtime. They allow workloads running on a target Node to be rescheduled onto other ones. You can then shutdown the Node or remove it from your cluster without impacting service availability.
Applying a Node Cordon
Cordoning a Node marks it as unavailable to the Kubernetes scheduler. The Node will be ineligible to host any new Pods subsequently added to your cluster.
kubectl cordon command to place a cordon around a named Node:
$ kubectl cordon node-1 node/node-1 cordoned
Existing Pods already running on the Node won’t be affected by the cordon. They’ll remain accessible and will still be hosted by the cordoned Node.
You can check which of your Nodes are currently cordoned with the
get nodes command:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION node-1 Ready,SchedulingDisabled control-plane,master 26m v1.23.3
Cordoned nodes appear with the
Draining a Node
The next step is to drain remaining Pods out of the Node. This procedure will evict the Pods so they’re rescheduled onto other Nodes in your cluster. Pods are allowed to gracefully terminate before they’re forcefully removed from the target Node.
kubectl drain to initiate a drain procedure. Specify the name of the Node you’re taking out for maintenance:
$ kubectl drain node-1 node/node-1 already cordoned evicting pod kube-system/storage-provisioner evicting pod default/nginx-7c658794b9-zszdd evicting pod kube-system/coredns-64897985d-dp6lx pod/storage-provisioner evicted pod/nginx-7c658794b9-zszdd evicted pod/coredns-64897985d-dp6lx evicted node/node-1 evicted
The drain procedure first cordons the Node if you’ve not already placed one manually. It will then evict running Kubernetes workloads by safely rescheduling them to other Nodes in your cluster.
You can shutdown or destroy the Node once the drain’s completed. You’ve freed the Node from its responsibilities to your cluster. The cordon provides an assurance that no new workloads have been scheduled since the drain completed.
Ignoring Pod Grace Periods
Drains can sometimes take a while to complete if your Pods have long grace periods. This might not be ideal when you need to urgently take a Node offline. Use the
--grace-period flag to override Pod termination grace periods and force an immediate eviction:
$ kubectl drain node-1 --grace-period 0
This should be used with care – some workloads might not respond well if they’re stopped without being offered a chance to clean up.
Solving Drain Errors
Drains can sometimes result in an error depending on the types of Pod that exist in your cluster. Here are two common issues with their resolutions.
1. “Cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, or StatefulSet”
This message appears if the Node hosts Pods which aren’t managed by a controller. It refers to Pods that have been created as standalone objects, where they’re not part of a higher-level resource like a Deployment or ReplicaSet.
Kubernetes can’t automatically reschedule these “bare” Pods so evicting them will cause them to become unavailable. Either manually address these Pods before performing the drain or use the
--force flag to permit their deletion:
$ kubectl drain node-1 --force
2. “Cannot Delete DaemonSet-managed Pods”
Pods that are part of daemon sets pose a challenge to evictions. DaemonSet controllers disregard the schedulable status of your Nodes. Deleting a Pod that’s part of a DaemonSet will cause it to immediately return, even if you’ve cordoned the Node. Drain operations consequently abort with an error to warn you about this behavior.
You can proceed with the eviction by adding the
--ignore-daemonsets flag. This will evict everything else while overlooking any DaemonSets that exist.
$ kubectl drain node-1 --ignore-daemonsets
You might need to use this flag even if you’ve not created any DaemonSets yourself. Internal components within the
kube-system namespace could be using DaemonSet resources.
Minimizing Downtime With Pod Disruption Budgets
Draining a Node doesn’t guarantee your workloads will remain accessible throughout. Your other Nodes will need time to honor scheduling requests and create new containers.
This can be particularly impactful if you’re draining multiple Nodes in a short space of time. Draining the first Node could reschedule its Pods onto the second Node, which is itself then deleted.
Pod disruption budgets are a mechanism for avoiding this situation. You can use them with Deployments, ReplicationControllers, ReplicaSets, and StatefulSets.
Objects that are targeted by a Pod disruption budget are guaranteed to have a specific number of accessible Pods at any given time. Kubernetes will block Node drains that would cause the number of available Pods to fall too low.
Here’s an example of a
PodDisruptionBudget YAML object:
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: demo-pdb spec: minAvailable: 4 selector: matchLabels: app: my-app
This policy requires there be at least four running Pods with the
app=my-app label. Node drains that would cause only three Pods to be schedulable will be prevented.
The level of disruption allowed is expressed as either the
minAvailable field. Only one of these can exist in a single Pod Disruption Budget object. Each one accepts an absolute number of Pods or a percentage that’s relative to the total number of Pods at full availability:
minAvailable: 4– Require at least four Pods to be available.
maxUnavailable: 50%– Allow up to half of the total number of Pods to be unavailable.
Overriding Pod Disruption Budgets
Pod disruption budgets are a mechanism that provide protection for your workloads. They shouldn’t be overridden unless you must immediately shutdown a Node. The
--disable-eviction flag provides a way to achieve this.
$ kubectl drain node-1 --disable-eviction
This circumvents the regular Pod eviction process. Pods will be directly deleted instead, ignoring any applied disruption budgets.
Bringing Nodes Back Up
Once you’ve completed your maintenance, you can power the Node back up to reconnect it to your cluster. You must then remove the cordon you created to mark the Node as schedulable again:
$ kubectl uncordon node-1 node/node-1 uncordoned
Kubernetes will begin to allocate new workloads to the Node, returning it to active service.
Maintenance of Kubernetes Nodes shouldn’t be attempted until you’ve drained existing workloads and established a cordon. These measures help you avoid unexpected downtime when servicing actively used Nodes.
Basic drains are often adequate if you’ve got capacity in your cluster to immediately reschedule your workloads to other Nodes. Use Pod disruption budgets in situations where consistent availability must be guaranteed. They let you guard against unintentional downtime when multiple drains are commenced concurrently.