Analysis and Experiments
Setup Metrics Server
If you try to pull monitoring information using the following commands
kubectl top pod
kubectl top node
it does not show it, rather gives you a error message similar to
[output]
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
Even though the error mentions heapster, its replaced with metrics server by default now.
Deploy metric server with the following commands,
cd ~
git clone https://github.com/schoolofdevops/metrics-server.git
kubectl apply -k metrics-server/manifests/overlays/release
Validate
kubectl get deploy,pods -n kube-system --selector='k8s-app=metrics-server'
You could validate again with
kubectl top pod
kubectl top node
where expected output should be similar to,
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kind-control-plane 123m 6% 688Mi 17%
kind-worker 39m 1% 498Mi 12%
kind-worker2 31m 1% 422Mi 10%
If you see a similar output, monitoring is now been setup.
Deploy Prometheus and Grafana
Set up repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Install Prometheus and Grafana as
helm upgrade --install prom -n monitoring \
prometheus-community/kube-prometheus-stack \
--create-namespace \
--set grafana.service.type=NodePort \
--set grafana.service.nodePort=30400 \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
Redeploy Nginx Ingress Controller
Re deploy nginx ingress controller with helm, this time enabling the exposing the metrics which can then be scraped/collected by prometheus.
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.metrics.enabled=true \
--set controller.metrics.serviceMonitor.enabled=true --set \ controller.metrics.serviceMonitor.additionalLabels.release="prometheus" \
--set controller.hostPort.enabled=true \
--set controller.hostPort.ports.http=80 \
--set controller.hostPort.ports.https=443 \
--set controller.service.type=NodePort \
--set-string controller.nodeSelector."kubernetes\.io/os"=linux \
--set-string controller.nodeSelector.ingress-ready="true"
Setup Grafana Dashboard for Nginx Ingress Controller
Now, login to grafana and import custom dashboard for Nginx Ingress as
- Left menu (hover over +) -> Dashboard
- Click "Import"
- Enter the copy pasted json from https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/grafana/dashboards/nginx.json
- Click Import JSON
- Select the Prometheus data source
- Click "Import"
⠀ It may look similar to this, with possibly less data initially
However, if you see some metric coming in, your setup with Nginx Ingress and Promethus Integration is working ! You may pat your back at this time :)
Updated Rollout Configuration with Experiment and Analysis
File: /prod/rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: vote
spec:
replicas: 5
strategy:
blueGreen: null
canary:
canaryService: vote-preview
stableService: vote
steps:
- setCanaryScale:
replicas: 2
- experiment:
duration: 3m
templates:
- name: canary
specRef: canary
service:
name: experiment
analyses:
- name: fitness-test
templateName: canary-fitness-test
- setWeight: 20
- pause:
duration: 10s
- setWeight: 40
- pause:
duration: 10s
- setWeight: 60
- analysis:
templates:
- templateName: loadtest
- templateName: latency
- setWeight: 80
- pause:
duration: 10s
- setWeight: 100
trafficRouting:
nginx:
stableIngress: vote
additionalIngressAnnotations:
canary-by-header: X-Canary
canary-by-header-value: siege
Explanation
-
Rollout Configuration:
-
The rollout strategy includes canary steps with set weights and pauses.
- Each canary step includes an experiment with a specified duration (e.g., 3 minutes).
- The experiment step runs a experimental replicaset and launches a fitness test to validate if the new version looks okay.
-
After 60% traffic is shifted to canary, a load test is lauched along with analysis from prometheus to check if the new version will perform okay with the load.
-
Analysis Templates:
-
Defines a templates for running various tests and analyses.
- The
loadtest
container runs the load testing script against the canary service (vote-preview
). - The
fitness-test
job runs a test to validate if the new version is fit for deployment. - the
latency
analysis fetches latency metrics from Prometheus and checks if the application is responding in acceptable time frame even with load conditions.
⠀
How it Works
- At each setWeight step, traffic is gradually shifted to the canary version.
- The analysis step includes both the load test and the metric analysis.
- The experiment runs for 3 minutes, during which the fitness test is conducted.
- Simultaneously with load test , the analysis template checks Prometheus metrics to ensure the canary is performing correctly.
- If the analysis detects errors beyond the acceptable threshold, the rollout will trigger a rollback.
- If the canary passes the load test and analysis, the rollout proceeds to the next step.
⠀ By configuring the experiment and analysis to run in parallel, you can ensure comprehensive testing and validation of the canary version, enabling automatic rollback if any issues are detected.
Template for Load Testing
File prod/loadtest-analysistemplate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: loadtest
spec:
metrics:
- name: loadtest-vote
provider:
job:
spec:
template:
spec:
containers:
- name: siege
image: schoolofdevops/loadtest:v1
command:
- siege
- "--concurrent=2"
- "--benchmark"
- "--time=5m"
- "--header='X-Canary: siege'"
- "http://vote.example.com"
restartPolicy: Never
hostAliases:
- ip: "xx.xx.xx.xx"
hostnames:
- "vote.example.com"
backoffLimit: 4
where,
* replace xx.xx.xx.xx
with internal IP Address of worker
node. Find out by using
kubectl get nodes -o wide
[sample output]
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kind-control-plane Ready control-plane 2d23h v1.30.0 172.18.0.2 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-31-generic containerd://1.7.15
kind-worker Ready <none> 2d23h v1.30.0 172.18.0.4 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-31-generic containerd://1.7.15
kind-worker2 Ready <none> 2d23h v1.30.0 172.18.0.3 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-31-generic containerd://1.7.15
From this output, you are going to use 172.18.0.4
in the configuration above.
AnalysisTemplate for Prometheus Metrics
File : prod/latency-analysistemplate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: latency
spec:
metrics:
- name: nginx-latency-ms
initialDelay: 1m
interval: 1m
failureLimit: 2
count: 4
successCondition: result < 50.0
failureCondition: result >= 50.0
provider:
prometheus:
address: http://prom-kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
query: |
scalar(
1000 * histogram_quantile(0.99,
sum(
rate(
nginx_ingress_controller_request_duration_seconds_bucket{ingress="vote", exported_namespace="prod"}[1m]
)
) by (le)
)
)
Fitness Test for Canary
File: prod/fitness-analysistemplate.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: canary-fitness-test
spec:
metrics:
- name: canary-fitness
interval: 30s
count: 3
successCondition: result == "true"
failureLimit: 1
provider:
job:
spec:
template:
spec:
containers:
- name: fitness-test
image: curlimages/curl
command: ["/bin/sh", "-c"]
args:
- |
FITNESS_RESULT="false"
CANARY_SERVICE_URL="http://vote-preview"
# Perform the fitness test
RESPONSE=$(curl -s $CANARY_SERVICE_URL)
# Check if the response contains the expected string
if [[ "$RESPONSE" == *"Processed by container ID"* ]]; then
FITNESS_RESULT="true"
fi
# Return the fitness test result
echo $FITNESS_RESULT
restartPolicy: Never
backoffLimit: 1
Update Kustomization for Prod
File : prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../base
- ingress.yaml
- fitness-analysistemplate.yaml
- latency-analysistemplate.yaml
- loadtest-analysistemplate.yaml
apply
kustomize build prod
kubectl apply -k prod
watch the rollout using
kubectl argo rollouts get rollout vote