K8s cluster management¶
Control plane management¶
1. Get the status of the control plane¶
# use kubectl (deprecated)
kubectl get componentstatuses
# expected output
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
# get pods status of the components
kubectl get pods -n kube-system
# expected output
calico-kube-controllers-6dd874f784-cmb99 1/1 Running 5768 (294d ago) 2y285d
calico-node-4bclg 1/1 Running 3 (606d ago) 2y285d
calico-node-c2sds 1/1 Running 5 (606d ago) 2y277d
calico-node-ccqqk 1/1 Running 5 (606d ago) 2y282d
calico-node-khcv2 1/1 Running 4 (606d ago) 2y285d
coredns-76b4fb4578-cbsr6 1/1 Running 35 (572d ago) 2y277d
coredns-76b4fb4578-wcrr9 1/1 Running 31 (572d ago) 2y277d
dns-autoscaler-7979fb6659-z9gkw 1/1 Running 3 (606d ago) 2y285d
kube-controller-manager-controlplane1 1/1 Running 63 (276d ago) 606d
kube-proxy-bc9dw 1/1 Running 0 606d
kube-proxy-hp947 1/1 Running 0 606d
kube-proxy-nfwxv 1/1 Running 0 606d
kube-proxy-xgqjc 1/1 Running 0 606d
kube-scheduler-controlplane1 1/1 Running 62 (276d ago) 606d
nginx-proxy-worker1 1/1 Running 7 (606d ago) 606d
nginx-proxy-worker2 1/1 Running 3 (606d ago) 606d
nginx-proxy-worker3 1/1 Running 5 (606d ago) 606d
node-custom-setup-bsxt8 1/1 Running 2 (606d ago) 2y220d
node-custom-setup-gp7tk 1/1 Running 1 (606d ago) 2y220d
node-custom-setup-vkf66 1/1 Running 1 (606d ago) 2y220d
nodelocaldns-9sg7w 1/1 Running 59 (572d ago) 2y281d
nodelocaldns-dklqj 1/1 Running 35 (573d ago) 2y277d
nodelocaldns-lfk5v 0/1 CrashLoopBackOff 101523 (245d ago) 606d
nodelocaldns-pz7h6 1/1 Running 57 (572d ago) 2y281d
The pod manifests of the control plane components are located in
/etc/kubernetes/manifests/
ls /etc/kubernetes/manifests/
# expected output
kube-apiserver.yaml
kube-scheduler.yaml
kube-controller-manager.yaml
1.1 ETCD status¶
The etcd is launched as an external cluster, you should find the config in
/etc/kubernetes/kubeadm-config.yaml
etcd:
external:
endpoints:
- https://<etcd-ip>:2379
1.2 API server status¶
curl -k https://localhost:6443/healthz
curl -k https://localhost:6443/readyz
curl -k https://localhost:6443/livez
# expected output
ok
Stop the k8s cluster¶
Step 1: Drain and cordon worker nodes (optional but safe)
# Repeat for all worker nodes.
kubectl drain <worker-node-name> --ignore-daemonsets --delete-emptydir-data
kubectl cordon <worker-node-name>
This ensures that workloads are gracefully evicted and won't get rescheduled.
Step 2: Stop control plane components (on the master)
Move static pod manifests out of the kubelet watch path, kubelet will decommission the related pods.
The below procedure only works on control plane that is deployed via kubeadm.
sudo mkdir -p /etc/kubernetes/manifests.bak
sudo mv /etc/kubernetes/manifests/*.yaml /etc/kubernetes/manifests.bak/
# get pods status of the components, they should be terminated
kubectl get pods -n kube-system
# if it does not work, you can try to shut the pod down manually, based on your container runtime, the commands are bit
# different
# for containerd
crictl ps -a | grep kube
# or for Docker
docker ps -a | grep kube
# Manual Kill (if you want to force stop)
# for containerd
sudo crictl ps | grep kube | awk '{print $1}' | xargs -r sudo crictl stop
# for docker
sudo docker ps | grep kube | awk '{print $1}' | xargs -r sudo docker stop
Kubelet will detect file removal and terminate the related pods: kube-apiserver, kube-controller-manager, kube-scheduler
Step 3: Stop kubelet and container runtime (on all nodes)
On all control plane and worker nodes:
#
sudo systemctl stop kubelet
# stop container runtime depending on your setup
# for containerd
sudo systemctl stop containerd
# or for Docker
sudo systemctl stop docker
Restart the cluster¶
Step 1: Start kubelet and container runtime (on all nodes)
On all control plane and worker nodes:
# start container runtime depending on your setup
# for containerd
sudo systemctl start containerd
# or for Docker
sudo systemctl start docker
# start kubelet
sudo systemctl start kubelet
Step 2: Restore control plane
On the control plane node, restore static manifests:
sudo mv /etc/kubernetes/manifests.bak/*.yaml /etc/kubernetes/manifests/
Cluster should become available within 30–60 seconds.
Destroy the cluster¶
On the control plane:
sudo kubeadm reset -f
# This command stops and removes Kubernetes state
# 1. Stops the kubelet process (indirectly, by removing configs)
# Removes the local etcd data if it was part of the control plane
# Deletes Kubernetes certificates, kubeconfig files, manifests, and state:
- /etc/kubernetes/admin.conf
- /etc/kubernetes/kubelet.conf
- /etc/kubernetes/controller-manager.conf
- /etc/kubernetes/scheduler.conf
- /etc/kubernetes/pki/*
- /etc/kubernetes/manifests/*
# 2. It tries to revert changes made by kube-proxy and CNI plugins:
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
# 3. It attempts to remove:
/var/lib/cni/
/etc/cni/net.d/
/var/lib/kubelet/
# clean the credentials
sudo rm -rf ~/.kube
sudo systemctl stop kubelet
sudo systemctl stop containerd
# clean up the config and bin
sudo rm -rf /etc/kubernetes
sudo rm -rf /var/lib/etcd
sudo rm -rf /var/lib/kubelet
sudo rm -rf /etc/cni
sudo rm -rf /var/lib/cni
On the workers:
sudo kubeadm reset -f
sudo rm -rf /var/lib/kubelet /etc/kubernetes