Deploy a k8s cluster¶

This tutorial shows how to deploy a k8s cluster with internet access on debian servers.

1. Prerequisites¶

Suppose we have three servers with Debian 11 and below hardware: - 4 CPU / vCPU - 8 GB RAM - 20 GB free disk space - Sudo User with Admin rights - Stable Internet Connectivity (optional)

1.1 Cluster setup¶

Master Node (k8s-master) – 10.50.5.67
Worker Node 1 (k8s-worker1) – 10.50.5.68
Worker Node 2 (k8s-worker2) – 10.50.5.69

1.2 Cluster Network config¶

To enable the communication between Master node and worker node, we need to set up hostnames

1.2.1: Change server hostname¶

# run this on master node
sudo hostnamectl set-hostname k8s-master

# run this on worker1
sudo hostnamectl set-hostname k8s-worker1

# run this on worker2
sudo hostnamectl set-hostname k8s-worker2

1.2.2: List server hostnames of the cluster in the `/etc/hosts`¶

Add the following lines into /etc/hosts:

10.50.5.67       k8s-master
10.50.5.68       k8s-worker1
10.50.5.69       k8s-worker2

1.2.3: Check connectivity¶

You can try to ping each workder node from the master and vice-versa

# from master
ping k8s-worker1
ping k8s-worker2

# from the worker
ping k8s-master

1.2.4 Enable IPv4 packet forwarding¶

By default, the Linux kernel does not allow IPv4 packets to be routed between interfaces. Most Kubernetes cluster networking implementations will change this setting (if needed), but some might expect the administrator to do it for them. (Some might also expect other sysctl parameters to be set, kernel modules to be loaded, etc; consult the documentation for your specific network implementation.

We will provide the complete procedure on how to set up this in docs/Container/Containerd/02.Install_config_containerd.md

1.2.5 Check your cgroup drivers¶

On Linux, control groups are used to constrain resources that are allocated to processes.

Both the kubelet and the underlying container runtime need to interface with control groups to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits. To interface with control groups, the kubelet and the container runtime need to use a cgroup driver. It's critical that the kubelet and the container runtime use the same cgroup driver and are configured the same.

There are two cgroup drivers available:

cgroupfs
systemd

In our case, as we use debian 11, the default cgroup is cgroup v2(Uses a unified hierarchy, improving resource delegation and security), and the default cgroup driver is systemd. You can check the cgroup value with below command

mount | grep cgroup

# expected output
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

1.2.6 Disable firewalls¶

If you have firewalls installed on your severs, the easiest way is to disable them.

ufw disable

sudo systemctl stop apparmor
sudo systemctl disable apparmor

Or you can follow the below commands to set up specific rules

# On Master node, run
$ sudo ufw allow 6443/tcp
$ sudo ufw allow 2379/tcp
$ sudo ufw allow 2380/tcp
$ sudo ufw allow 10250/tcp
$ sudo ufw allow 10251/tcp
$ sudo ufw allow 10252/tcp
$ sudo ufw allow 10255/tcp
$ sudo ufw reload

# On Worker Nodes,
$ sudo ufw allow 10250/tcp
$ sudo ufw allow 30000:32767/tcp
$ sudo ufw reload

2. Install container runtime¶

Containers require a container runtime to run on the host machine. As a result, we must install a container runtime before deploying a k8s cluster.

For now, Containerd is the industry standard container run time, we must install containerd on all master and worker nodes.

Don't use the containerd binary of the native apt repo. Use the version of containerd.io

The detailed installation guide is in docs/Container/Containerd/02.Install_config_containerd.md

3. Diable swap¶

For kubelet to work smoothly, it is recommended to disable swap. Run following commands to turn off swap. This step can be skipped if your server does not have swap

$ sudo swapoff -a
$ sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

4. Install k8s packages¶

The k8s releases are updated every 6 months. So the below docs maybe obsolete. The official doc can be found here https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management

The below docs are tested on debian 11 server with k8s v1.33.

4.1 Setup k8s apt repository¶

You need to set up k8s apt repository on all nodes

# install required dependencies
sudo apt install gnupg gnupg2 curl software-properties-common apt-transport-https -y

# create the keyrings folder
sudo mkdir -p /etc/apt/keyrings

# add GPG key
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# allow unprivileged APT programs to read this keyring
sudo chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg 

# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
# add v1.33 k8s repo to source.list 
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# helps tools such as command-not-found to work correctly
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list

In releases older than Debian 12 and Ubuntu 22.04, folder /etc/apt/keyrings does not exist by default, and it should be created before the curl command.

7. Install kubelet, kubectl and kubeadm on all nodes¶

Run the following apt commands on all the nodes to install Kubernetes cluster components like kubelet, kubectl and Kubeadm.

sudo apt update

# you can check the available version before install
apt-cache madison kubeadm

# expected output
kubeadm | 1.33.2-1.1 | https://pkgs.k8s.io/core:/stable:/v1.33/deb  Packages
kubeadm | 1.33.1-1.1 | https://pkgs.k8s.io/core:/stable:/v1.33/deb  Packages
kubeadm | 1.33.0-1.1 | https://pkgs.k8s.io/core:/stable:/v1.33/deb  Packages

sudo apt install kubelet kubeadm kubectl -y

# check the installed version
apt list --installed kubeadm

# hold is used to mark a package as held back, which will prevent the package from being automatically installed, upgraded or removed.
sudo apt-mark hold kubelet kubeadm kubectl

8. Create Kubernetes Cluster with Kubeadm¶

Now, we need to use kubeadm to create a k8s cluster.

8.1 Use custom root CA¶

By default, the k8s cluster will generate a root CA for TLS communication. If you want to use a custom root CA certificate, you need to do the following steps:

# Step1: put your custom CA files in:
# The ca.crt is a valid X.509 root certificate
/etc/kubernetes/pki/ca.crt
# The ca.key is the corresponding private key (PEM format, unencrypted)
/etc/kubernetes/pki/ca.key

# Step2: Permissions Check
chmod 600 /etc/kubernetes/pki/ca.key
chown root:root /etc/kubernetes/pki/ca.key

# step3: prepare the config (check 8.2)

# step4: init the cluster with the config file
kubeadm init --config kubeadm-config.yaml --upload-certs

# step5: Optional, if certs were already created with a different CA and you want to re-sign:
kubeadm certs renew all --config kubeadm-config.yaml

Do not pass --skip-phases or --certificate-key, unless managing every phase manually

8.2 Init the k8s control plane (master node)¶

Run the following command only from the master node, it will init the master node as a control plane endpoint

# short version
sudo kubeadm init

# long version
sudo kubeadm init --control-plane-endpoint=k8s-master --kubernetes-version v1.27.0

# with a custom config file, all the configuration is in a file, it's better for versioning
kubeadm init --config kubeadm-config.yaml --upload-certs

Below is an example of the kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
# Configuration for the local kubelet.
nodeRegistration:
  # the hostname used for the node in the cluster.
  name: onyxia-master
  # The path to the container runtime interface socket (here: containerd).
  criSocket: unix:///run/containerd/containerd.sock
  # Forces kubelet to advertise the correct internal IP. Prevents wrong IP detection on multi-interface hosts.
  kubeletExtraArgs:
    node-ip: 10.50.5.67

localAPIEndpoint:
  # The IP address the kube-apiserver binds to on this node.
  advertiseAddress: 10.50.5.67
  # Port exposed for control-plane communication 
  bindPort: 6443

---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
# Target Kubernetes version to install 
kubernetesVersion: v1.33.2
# DNS name or IP + port of the cluster’s load balancer or primary API server. In single-node, it's the master's IP.
controlPlaneEndpoint: "10.50.5.67:6443"
networking:
  # CIDR used by the CNI plugin 
  # REQUIRED to match Calico default
  podSubnet: 192.168.0.0/16       
  serviceSubnet: 10.96.0.0/12
  dnsDomain: cluster.local
apiServer:
  # Additional Subject Alternative Names added to the kube-apiserver certificate. Allows API to be reached via IP or DNS.
  certSANs:
    - "10.50.5.67"
    - "onyxia-master"
  # Configures the authorization model 
  extraArgs:
    authorization-mode: Node,RBAC

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: iptables

You can notice the config file has three main parts: - InitConfiguration: It controls the local node's behavior - ClusterConfiguration: It defines the global cluster settings - KubeProxyConfiguration: It defines kube-proxy behavior

I had a warning, because the api v1beta3 is too old, kubeadm has a command to convert it to v1beta4

# Run this command to convert the config
kubeadm config migrate --old-config kubeadm-config.yaml --new-config new-config.yaml

Below is the generated new config in v1beta4

apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: 56mm95.5r9xhf31zr9r9gpm
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.50.5.67
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  imagePullSerial: true
  kubeletExtraArgs:
  - name: node-ip
    value: 10.50.5.67
  name: onyxia-master
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
timeouts:
  controlPlaneComponentHealthCheck: 4m0s
  discovery: 5m0s
  etcdAPICall: 2m0s
  kubeletHealthCheck: 4m0s
  kubernetesAPICall: 1m0s
  tlsBootstrap: 5m0s
  upgradeManifests: 5m0s
---
apiServer:
  certSANs:
  - 10.50.5.67
  - onyxia-master
  extraArgs:
  - name: authorization-mode
    value: Node,RBAC
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s
certificateValidityPeriod: 8760h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 10.50.5.67:6443
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: v1.33.2
networking:
  dnsDomain: cluster.local
  podSubnet: 192.168.0.0/16
  serviceSubnet: 10.96.0.0/12
proxy: {}
scheduler: {}

Now you have you needed to run kubeadm init --config kubeadm-config.yaml --upload-certs If everything works well, you should see the below output

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join k8s-master:6443 --token 48elby.xre538l1ytebqe7q \
        --discovery-token-ca-cert-hash sha256:e0cccf7851ec76248163359058ea8e9aad478daefe14180c82f881a5e433dbda \
        --control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join k8s-master:6443 --token 48elby.xre538l1ytebqe7q \
        --discovery-token-ca-cert-hash sha256:e0cccf7851ec76248163359058ea8e9aad478daefe14180c82f881a5e433dbda

You can notice there are three commands : - setup kubectl (k8s client) to connect the k8s cluster (regular user and root user) - To join any number of master nodes to control plane - To join any number of worker nodes to the cluster

8.2 Setup kubectl¶

To start interacting with cluster as a regular user, run following commands on master node

$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run following kubectl command to get nodes and cluster information,

$ kubectl get nodes
$ kubectl cluster-info

8.3 Join worker node¶

If you fogot the token to join the cluster, you can generate a new one with below command:

# you need to have k8s admin right
kubeadm token create --print-join-command

# you should see below outputs
kubeadm join 10.50.5.67:6443 --token 03mxyl.eg12gb36v3ya2bcs --discovery-token-ca-cert-hash sha256:1b19519e76812d286a93413320499bfd7ac1e06f7bd795994e086d0d1d0e6661

# list existing token
kubeadm token list

# you should see below outputs
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
03mxyl.eg12gb36v3ya2bcs   23h         2023-04-18T15:05:50Z   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token

You can notice the token has a TTL, so it expires in 23hours

The general form of the join command is shown below:

kubeadm join <api-server-ip:port> --token <token-value> \
--discovery-token-ca-cert-hash sha256:<hash value>

So we need three information,

k8s Api-server-ip and port
a Valid token
Token-ca-cert-hash value

You can run below command to get the api server ip and port

kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}' && echo ""

You need to have root privilege to run kubeadm join

8.4 Check your k8s status¶

# get available nodes
NAME            STATUS     ROLES           AGE     VERSION
onyxia-master   NotReady   control-plane   5m43s   v1.33.2
onyxia-w01      NotReady   <none>          20s     v1.33.2
onyxia-w02      NotReady   <none>          9s      v1.33.2

# get pods in name space
kubectl get pods -n kube-system

If you see the nodes status are Not-Ready, you can check the pods status in kube-system with the below command. kubectl get pods -n kube-system. If you see coredns is pending, it's normal. Because it requires a network addon to run. We will install it in the next section

9. Reset the cluster¶

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/

10. Install Calico Pod Network Addon¶

To enable communication between nodes and services in k8s cluster, we need a network addon. In our case, we use Calico. The project page is here

On the master node, run beneath command to install calico. Here I choose the current latest version v3.25.1. You can visit the project page and get the latest version.

# general form
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/{calico-versioin}/manifests/calico.yaml

# example
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.1/manifests/calico.yaml

You need to check the compatibility between your k8s cluster version and the calico version. For

11. Test your k8s cluster¶

To test Kubernetes cluster installation, let’s try to deploy nginx-based application via deployment. Run beneath commands,

# get general status of your cluster
kubectl cluster-info

# create a deployment with nginx image
kubectl create deployment nginx-app --image=nginx --replicas 2

# create a service which uses the deployment
kubectl expose deployment nginx-app --name=nginx-web-svc --type NodePort --port 80 --target-port 80

# get the pod info the deployment
kubectl get pods -o wide

NAME                         READY   STATUS    RESTARTS   AGE   IP               NODE          NOMINATED NODE   READINESS GATES
nginx-app-7df7b66fb5-b6lhk   1/1     Running   0          83m   192.168.194.68   k8s-worker1   <none>           <none>
nginx-app-7df7b66fb5-qtsw7   1/1     Running   0          83m   192.168.194.66   k8s-worker1   <none>           <none>

# get the info of the service, you need to get the node port
kubectl describe svc nginx-web-svc

# the output

Name:                     nginx-web-svc
Namespace:                default
Labels:                   app=nginx-app
Annotations:              <none>
Selector:                 app=nginx-app
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.103.208.99
IPs:                      10.103.208.99
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31169/TCP
Endpoints:                <none>
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

# with the above two commands, you know that the pods runs on `k8s-worker1`, the node port is 31169.
# You can try to access the nginx service with the below command.
# you need to modify the url and port based on the svc output
curl http://k8s-worker1:31169

# clean the cluster after test
kubectl delete deployment nginx-app
kubectl delete service nginx-web-svc

If you get a HTML response from the nginx server with success, it means your k8s cluster is good.

Appendix1: cgroup and systemd¶

Systemd organizes processes using cgroups to track and manage resource usage. Each systemd unit (like a service) runs in its own cgroup. For example, the nginx.service runs in /sys/fs/cgroup/system.slice/nginx.service/ This ensures services are isolated and can have specific resource limits.

Systemd provides commands to control resource usage dynamically:

# Limit CPU usage:
systemctl set-property nginx.service CPUQuota=50%

# Limit Memory usage:
systemctl set-property nginx.service MemoryMax=500M

These settings are applied via cgroup controllers in the background.

cgroupfs driver¶

The cgroupfs driver is the default cgroup driver in the kubelet. When the cgroupfs driver is used, the kubelet and the container runtime directly interface with the cgroup filesystem to configure cgroups.

The cgroupfs driver is not recommended when systemd is the init system because systemd expects a single cgroup manager on the system. Additionally, if you use cgroup v2 , use the systemd cgroup driver instead of cgroupfs.

systemd cgroup driver¶

When systemd is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager.

systemd has a tight integration with cgroups and allocates a cgroup per systemd unit. As a result, if you use systemd as the init system with the cgroupfs driver, the system gets two different cgroup managers.

Two cgroup managers result in two views of the available and in-use resources in the system. In some cases, nodes that are configured to use cgroupfs for the kubelet and container runtime, but use systemd for the rest of the processes become unstable under resource pressure.

The approach to mitigate this instability is to use systemd as the cgroup driver for the kubelet and the container runtime when systemd is the selected init system.

To set systemd as the cgroup driver, edit the KubeletConfiguration option of cgroupDriver and set it to systemd. For example:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
...
cgroupDriver: systemd

If you configure systemd as the cgroup driver for the kubelet, you must also configure systemd as the cgroup driver for the container runtime. Refer to the documentation for your container runtime for instructions. For example:

containerd
CRI-O

Caution: Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.

If you have automation that makes it feasible, replace the node with another using the updated configuration, or reinstall it using automation.

Appendix : Install containerd via default apt repo (not recommended)¶

You can find the offical intallation doc of containd here

Configure containerd¶

If you have problems with containerd, check this 02.Install_config_containerd.md

Appendix: Kubeadm init without internet access¶

Get all images that you need to pull

kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c

You need to pull below image into local containerd cache - registry.k8s.io/coredns/coredns:v1.10.1 - registry.k8s.io/etcd:3.5.7-0 - registry.k8s.io/kube-apiserver:v1.27.0 - registry.k8s.io/kube-controller-manager:v1.27.0 - registry.k8s.io/kube-proxy:v1.27.0 - registry.k8s.io/kube-scheduler:v1.27.0