K8s cluster management¶

Control plane management¶

1. Get the status of the control plane¶

# use kubectl (deprecated)
kubectl get componentstatuses

# expected output
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true"}

# get pods status of the components
kubectl get pods -n kube-system

# expected output
calico-kube-controllers-6dd874f784-cmb99   1/1     Running            5768 (294d ago)     2y285d
calico-node-4bclg                          1/1     Running            3 (606d ago)        2y285d
calico-node-c2sds                          1/1     Running            5 (606d ago)        2y277d
calico-node-ccqqk                          1/1     Running            5 (606d ago)        2y282d
calico-node-khcv2                          1/1     Running            4 (606d ago)        2y285d
coredns-76b4fb4578-cbsr6                   1/1     Running            35 (572d ago)       2y277d
coredns-76b4fb4578-wcrr9                   1/1     Running            31 (572d ago)       2y277d
dns-autoscaler-7979fb6659-z9gkw            1/1     Running            3 (606d ago)        2y285d
kube-controller-manager-controlplane1      1/1     Running            63 (276d ago)       606d
kube-proxy-bc9dw                           1/1     Running            0                   606d
kube-proxy-hp947                           1/1     Running            0                   606d
kube-proxy-nfwxv                           1/1     Running            0                   606d
kube-proxy-xgqjc                           1/1     Running            0                   606d
kube-scheduler-controlplane1               1/1     Running            62 (276d ago)       606d
nginx-proxy-worker1                        1/1     Running            7 (606d ago)        606d
nginx-proxy-worker2                        1/1     Running            3 (606d ago)        606d
nginx-proxy-worker3                        1/1     Running            5 (606d ago)        606d
node-custom-setup-bsxt8                    1/1     Running            2 (606d ago)        2y220d
node-custom-setup-gp7tk                    1/1     Running            1 (606d ago)        2y220d
node-custom-setup-vkf66                    1/1     Running            1 (606d ago)        2y220d
nodelocaldns-9sg7w                         1/1     Running            59 (572d ago)       2y281d
nodelocaldns-dklqj                         1/1     Running            35 (573d ago)       2y277d
nodelocaldns-lfk5v                         0/1     CrashLoopBackOff   101523 (245d ago)   606d
nodelocaldns-pz7h6                         1/1     Running            57 (572d ago)       2y281d

The pod manifests of the control plane components are located in /etc/kubernetes/manifests/

ls /etc/kubernetes/manifests/

# expected output
kube-apiserver.yaml
kube-scheduler.yaml
kube-controller-manager.yaml

1.1 ETCD status¶

The etcd is launched as an external cluster, you should find the config in /etc/kubernetes/kubeadm-config.yaml

etcd:
  external:
    endpoints:
      - https://<etcd-ip>:2379

1.2 API server status¶

curl -k https://localhost:6443/healthz
curl -k https://localhost:6443/readyz
curl -k https://localhost:6443/livez

# expected output
ok

Stop the k8s cluster¶

Step 1: Drain and cordon worker nodes (optional but safe)

# Repeat for all worker nodes.
kubectl drain <worker-node-name> --ignore-daemonsets --delete-emptydir-data
kubectl cordon <worker-node-name>

This ensures that workloads are gracefully evicted and won't get rescheduled.

Step 2: Stop control plane components (on the master)

Move static pod manifests out of the kubelet watch path, kubelet will decommission the related pods.

The below procedure only works on control plane that is deployed via kubeadm.

sudo mkdir -p /etc/kubernetes/manifests.bak
sudo mv /etc/kubernetes/manifests/*.yaml /etc/kubernetes/manifests.bak/

# get pods status of the components, they should be terminated
kubectl get pods -n kube-system

# if it does not work, you can try to shut the pod down manually, based on your container runtime, the commands are bit 
# different
# for containerd
crictl ps -a | grep kube
# or for Docker
docker ps -a | grep kube

# Manual Kill (if you want to force stop)
# for containerd
sudo crictl ps | grep kube | awk '{print $1}' | xargs -r sudo crictl stop

# for docker
sudo docker ps | grep kube | awk '{print $1}' | xargs -r sudo docker stop

Kubelet will detect file removal and terminate the related pods: kube-apiserver, kube-controller-manager, kube-scheduler

Step 3: Stop kubelet and container runtime (on all nodes)

On all control plane and worker nodes:

# 
sudo systemctl stop kubelet
# stop container runtime depending on your setup
# for containerd
sudo systemctl stop containerd
# or for Docker
sudo systemctl stop docker

Restart the cluster¶

Step 1: Start kubelet and container runtime (on all nodes)

On all control plane and worker nodes:

# start container runtime depending on your setup
# for containerd
sudo systemctl start containerd
# or for Docker
sudo systemctl start docker     

# start kubelet 
sudo systemctl start kubelet

Step 2: Restore control plane

On the control plane node, restore static manifests:

sudo mv /etc/kubernetes/manifests.bak/*.yaml /etc/kubernetes/manifests/

Cluster should become available within 30–60 seconds.

Destroy the cluster¶

On the control plane:

sudo kubeadm reset -f
# This command stops and removes Kubernetes state
# 1. Stops the kubelet process (indirectly, by removing configs)
# Removes the local etcd data if it was part of the control plane
# Deletes Kubernetes certificates, kubeconfig files, manifests, and state:
- /etc/kubernetes/admin.conf
- /etc/kubernetes/kubelet.conf
- /etc/kubernetes/controller-manager.conf
- /etc/kubernetes/scheduler.conf
- /etc/kubernetes/pki/*
- /etc/kubernetes/manifests/*

# 2. It tries to revert changes made by kube-proxy and CNI plugins:
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X

# 3. It attempts to remove:
/var/lib/cni/
/etc/cni/net.d/
/var/lib/kubelet/


# clean the credentials
sudo rm -rf ~/.kube

sudo systemctl stop kubelet
sudo systemctl stop containerd 

# clean up the config and bin
sudo rm -rf /etc/kubernetes
sudo rm -rf /var/lib/etcd
sudo rm -rf /var/lib/kubelet
sudo rm -rf /etc/cni 
sudo rm -rf /var/lib/cni

On the workers:

sudo kubeadm reset -f
sudo rm -rf /var/lib/kubelet /etc/kubernetes