Removing a Worker Node from a Kubernetes Cluster

01/09/2025

0 Views 0

SaveSavedRemoved 0

Removing a Worker Node from a Kubernetes Cluster

A Step-by-Step Guide to Safely Removing a Kubernetes Worker Node

Properly managing the lifecycle of your Kubernetes cluster is essential for maintaining stability, performance, and security. Whether you are performing hardware maintenance, scaling down your resources, or decommissioning a faulty machine, knowing how to gracefully remove a worker node is a critical skill for any cluster administrator.

Simply shutting down a node without following the correct procedure can lead to service disruptions, data loss, and an unhealthy cluster state. This guide provides a clear, step-by-step process for safely and effectively removing a worker node from your Kubernetes cluster, ensuring your applications remain available and your cluster stays healthy.

Why Graceful Node Removal is Crucial

When you abruptly remove a node, the Kubernetes control plane might not immediately recognize its absence. This can cause the scheduler to continue attempting to place new pods on the missing node. More importantly, the pods that were running on that node are forcefully terminated, not gracefully shut down. This can interrupt critical operations and, if not managed by a StatefulSet or persistent storage, could lead to permanent data loss.

The correct process involves two main phases: draining the node to safely migrate its workloads and then deleting the node from the cluster’s records.

Step 1: Safely Evicting Workloads with `kubectl drain`

The first and most important step is to “drain” the node. The kubectl drain command is a powerful tool that performs two key actions simultaneously:

Cordoning: It marks the node as unschedulable, preventing the Kubernetes scheduler from placing any new pods on it.
Eviction: It safely evicts all running pods from the node, honoring their termination grace periods. The ReplicaSets, Deployments, or StatefulSets managing these pods will then create replacement pods on other available nodes in the cluster.

To begin the process, identify the name of the node you wish to remove by listing all nodes:

kubectl get nodes

Once you have the node’s name, you can execute the drain command.

kubectl drain <node-name>

Handling Common drain Issues

In a real-world cluster, you will likely encounter pods that resist a simple drain command. This is usually due to pods that are not managed by a controller (like a Deployment) or pods managed by a DaemonSet.

DaemonSets: Pods managed by a DaemonSet are designed to run on every node and cannot be evicted. The drain command will fail by default to prevent you from accidentally disrupting these essential services. To proceed, you must use the --ignore-daemonsets flag. The DaemonSet pods on the target node will be terminated, but not rescheduled elsewhere.
Pods with Local Storage (emptyDir): If pods are using local emptyDir volumes, the drain command will also fail by default to prevent potential data loss. If you are certain that the data in these volumes is temporary and can be discarded, you can override this protection with the --delete-emptydir-data flag.

A more complete and practical drain command often looks like this:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

Wait for the command to complete successfully. You can open another terminal and run kubectl get pods --all-namespaces -o wide to watch the pods from the drained node get terminated and recreated on other nodes.

Step 2: Removing the Node from the Control Plane

Once the node has been successfully drained, it is no longer running any workloads, but it is still registered as part of the cluster. The next step is to remove its object from the Kubernetes API server.

This is accomplished with the kubectl delete node command.

kubectl delete node <node-name>

After executing this command, running kubectl get nodes again will show that the node is no longer part of the cluster. The control plane will no longer consider it for scheduling or monitoring.

It is critical to understand that this command only removes the node from Kubernetes’ records. It does not log into the machine, stop the kubelet service, or shut down the server itself. That is the final, manual step.

Step 3: Decommissioning the Physical or Virtual Machine

With the node safely removed from the cluster’s control plane, you can now proceed with decommissioning the actual machine. The appropriate action depends on your infrastructure:

Cloud Environments (AWS, GCP, Azure): If the node is a cloud instance, you can now safely terminate it through your cloud provider’s console or CLI.
On-Premises Virtual Machines: Shut down and delete the virtual machine using your hypervisor (e.g., VMware, Proxmox).
On-Premises Bare Metal: You can now safely power down the physical server for maintenance or removal.

If you plan to re-add the node to the cluster later after maintenance, it is highly recommended to clean up the Kubernetes components before shutting it down. If you used kubeadm to set up the cluster, you can run the following command on the worker node itself:

sudo kubeadm reset

This will revert the changes made by kubeadm init or kubeadm join, ensuring a clean state if the node is ever repurposed.

Best Practices for Node Removal

Verify Workload Health: Before starting the process, ensure your cluster has enough spare capacity to handle the workloads from the node you are removing. After draining, verify that all evicted pods have been successfully rescheduled and are in a Running state on other nodes.
Monitor Cluster Performance: Removing a node reduces your cluster’s overall capacity. Monitor CPU, memory, and disk pressure on the remaining nodes to ensure they are not overloaded.
Automate with Care: While this process can be automated, ensure your scripts include robust health checks and verification steps to prevent accidental service disruption.
Plan for Stateful Applications: If the node hosts stateful applications with persistent storage, pay extra attention to your storage solution to ensure data is safely detached and reattached to the new pods.

By following this disciplined drain-and-delete process, you can perform essential cluster maintenance with confidence, ensuring zero downtime for your applications and maintaining the integrity of your Kubernetes environment.

Source: https://kifarunix.com/remove-worker-node-from-kubernetes-cluster/