Lab K208 - Kubernetes Cluster Administration

Defining Quotas

Create and switch to a new staging namespace.

config get-contexts

kubectl create namespace staging

kubectl config set-context --current --namespace=staging

config get-contexts

Define quota

file: staging-quota.yaml

apiVersion: v1
kind: ResourceQuota
metadata:
  name: staging
  namespace: staging
spec:
  hard:
    requests.cpu: "0.5"
    requests.memory: 500Mi
    limits.cpu: "2"
    limits.memory: 2Gi
    count/deployments.apps: 1
kubectl get quota -n staging
kubectl apply -f staging-quota.yaml
kubectl get quota -n staging
kubectl describe  quota

file: nginx-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: staging
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      name: nginx
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          limits:
            memory: "500Mi"
            cpu: "500m"
          requests:
            memory: "200Mi"
            cpu: "200m"
kubectl apply -f nginx-deploy.yaml
kubectl describe  quota -n staging


Lets now try to scale up the deployment and observe.

kubectl scale deploy nginx --replicas=4

kubectl get deploy

NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   2/4     2            2           2m55s

What happened ?

  • Even though deployment updated the number of desired replicas, only 2 are available
  • Deployment calls replicaset to launch new replicas. If you describe the replicaset it throws an error related to quota being exceeded.

e.g.

# kubectl get rs
NAME               DESIRED   CURRENT   READY   AGE
nginx-56c479cd4f   4         2         2       5m4s

# kubectl describe rs nginx-56c479cd4f

  Warning  FailedCreate      34s (x5 over 73s)  replicaset-controller  (combined from similar events): Error creating: pods "nginx-56c479cd4f-kwf9h" is forbidden: exceeded quota: staging, requested: requests.cpu=200m,requests.memory=200Mi, used: requests.cpu=400m,requests.memory=400Mi, limited: requests.cpu=500m,requests.memory=500Mi

You just configured resource quota based on a namespace. Now, switch back your namespace to instavote or the the one you were using before the beginning of this lab.

kubectl config set-context --current --namespace=instavote
kubectl config get-contexts

Nodes Maintenance

You could isolate a problematic node for further troubleshooting by cordonning it off. You could also drain it while preparing for maintenance.

Cordon a Node

kubectl get pods -o wide

NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
db-66496667c9-qggzd       1/1       Running   0          5h        10.233.74.74   node4
redis-5bf748dbcf-ckn65    1/1       Running   0          42m       10.233.71.26   node3
redis-5bf748dbcf-vxppx    1/1       Running   0          1h        10.233.74.79   node4
result-5c7569bcb7-4fptr   1/1       Running   0          5h        10.233.71.18   node3
result-5c7569bcb7-s4rdx   1/1       Running   0          5h        10.233.74.75   node4
vote-56bf599b9c-22lpw     1/1       Running   0          1h        10.233.74.80   node4
vote-56bf599b9c-4l6bc     1/1       Running   0          50m       10.233.74.83   node4
vote-56bf599b9c-bqsrq     1/1       Running   0          50m       10.233.74.82   node4
vote-56bf599b9c-xw7zc     1/1       Running   0          50m       10.233.74.81   node4
worker-6cc8dbd4f8-6bkfg   1/1       Running   0          39m       10.233.75.15   node2

Lets cordon one of the nodes and observe.

kubectl cordon node4
node/node4 cordoned

Observe the changes


$ kubectl get pods -o wide
NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
db-66496667c9-qggzd       1/1       Running   0          5h        10.233.74.74   node4
redis-5bf748dbcf-ckn65    1/1       Running   0          43m       10.233.71.26   node3
redis-5bf748dbcf-vxppx    1/1       Running   0          1h        10.233.74.79   node4
result-5c7569bcb7-4fptr   1/1       Running   0          5h        10.233.71.18   node3
result-5c7569bcb7-s4rdx   1/1       Running   0          5h        10.233.74.75   node4
vote-56bf599b9c-22lpw     1/1       Running   0          1h        10.233.74.80   node4
vote-56bf599b9c-4l6bc     1/1       Running   0          51m       10.233.74.83   node4
vote-56bf599b9c-bqsrq     1/1       Running   0          51m       10.233.74.82   node4
vote-56bf599b9c-xw7zc     1/1       Running   0          51m       10.233.74.81   node4
worker-6cc8dbd4f8-6bkfg   1/1       Running   0          40m       10.233.75.15   node2



$ kubectl get nodes -o wide


NAME      STATUS                     ROLES         AGE       VERSION   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node1     Ready                      master,node   1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-130-generic   docker://17.3.2
node2     Ready                      master,node   1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-124-generic   docker://17.3.2
node3     Ready                      node          1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-130-generic   docker://17.3.2
node4     Ready,SchedulingDisabled   node          1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-124-generic   docker://17.3.2


Now launch a new deployment and scale it.

kubectl create deployment cordontest --image=busybox --replicas=5
kubectl scale deploy cordontest --replicas=5

kubectl get pods -o wide

what happened ?

  • New pods scheduled due the deployment above, do not get launched on the node which is been cordoned off.
$ kubectl uncordon node4
node/node4 uncordoned


$ kubectl get nodes -o wide
NAME      STATUS    ROLES         AGE       VERSION   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node1     Ready     master,node   1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-130-generic   docker://17.3.2
node2     Ready     master,node   1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-124-generic   docker://17.3.2
node3     Ready     node          1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-130-generic   docker://17.3.2
node4     Ready     node          1d        v1.10.4   <none>        Ubuntu 16.04.4 LTS   4.4.0-124-generic   docker://17.3.2

delete the test deployment

kubectl delete deploy cordontest

Drain a Node

Draining a node will not only mark it unschedulable but also will evict existing pods running on it. Use it with care.

$ kubectl drain node3
node/node3 cordoned
error: unable to drain node "node3", aborting command...

There are pending nodes to be drained:
 node3
error: pods with local storage (use --delete-local-data to override): kubernetes-dashboard-55fdfd74b4-jdgch; DaemonSet-managed pods (use --ignore-daemonsets to ignore): calico-node-4f8xc

Drain with options

kubectl drain node3 --delete-local-data --ignore-daemonsets

Observe the effect,

kubectl get pods -o wide
kubectl get nodes -o wide

To add the node back to the available schedulable node pool,

kubectl uncordon node4
node/node4 uncordoned

Summary

In this lab, we learnt about limiting resource by defining per namespace quota, as well as learnt how to prepare nodes for maintenance by cordoning and draining it.