Analysis and Experiments

In this lab, we aim to explore advanced deployment strategies using ArgoCD and Prometheus. We will set up and configure ArgoCD to visualize rollouts, deploy a metrics server for monitoring, and integrate Prometheus and Grafana for sophisticated monitoring and analysis. Additionally, we will implement canary deployments with experiments and analysis to ensure the stability and performance of our applications before fully rolling them out to production. This comprehensive approach will help us achieve reliable and efficient continuous delivery in our Kubernetes environment.

Stage 0 - Prepwork

As part of the prep work, we will minimize the footprint of the application set by deploying only the vote app in the prod namespace as well as install the argo rollout extenstion so that we could visualize rollouts right from ArgoCD.

Minimize ApplicationSet Footprint

We will minimize the footprint of the application set by deploying only the vote app in the prod namespace. This is to avoid overloading our lab environment created with KIND with minimal resources.

Delete the previous appset and clean up the application deployments using,

kubectl delete appset instavote -n argocd 

Reconfigure the appset to launch only vote app in prod. You could use this code as reference

wget -c https://gist.githubusercontent.com/initcron/a5928b1fc4fad138045d88135faf0e44/raw/0e4bc43e3ee4c79c5c7075a03f34c01441fa6376/instavote-appset-vote-prod.yaml

kubectl apply -f instavote-appset-vote-prod.yaml

validate

kubectl get appset,app -n argocd    

Setup ArgoCD Extension for Rollout Visualization

To apply the ArgoCD server patch to install the Argo Rollouts extension, follow these steps:

Step 1: Save the Patch to a File

Create a new YAML file, e.g., argocd-rollouts-extension-patch.yaml, and add the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-server
  namespace: argocd
spec:
  template:
    spec:
      initContainers:
        - name: rollout-extension
          image: quay.io/argoprojlabs/argocd-extension-installer:v0.0.5
          env:
            - name: EXTENSION_URL
              value: 'https://github.com/argoproj-labs/rollout-extension/releases/download/v0.3.5/extension.tar'
          volumeMounts:
            - mountPath: /tmp/extensions/
              name: extensions
          securityContext:
            allowPrivilegeEscalation: false
      containers:
        - name: argocd-server
          volumeMounts:
            - mountPath: /tmp/extensions/
              name: extensions
      volumes:
        - name: extensions
          emptyDir: {}

Step 2: Apply the Patch

You could add the rollout extenstion so that you could view the progress of canary/blue/green deployments right from ArgoCD UI.

Run the following command to apply the patch to the argocd-server deployment:

kubectl patch deployment argocd-server -n argocd --patch-file argocd-rollouts-extension-patch.yaml

Step 3: Restart the ArgoCD Server

After applying the patch, restart the ArgoCD server to ensure changes take effect:

kubectl rollout restart deployment argocd-server -n argocd

Step 4: Verify the Deployment Check if the argocd-server deployment includes the new initContainers section:

kubectl get deployment argocd-server -n argocd -o yaml | grep -A 10 "initContainers:"

To validate, try accessing your ArgoCD UI, navigate to an application, and you should see the "Rollout" tab under the app details. 🚀


Stage 1 - Setup Monitoring with Ingress and Prometheus

Setup Metrics Server

Its importantt to have metrics server to get the essential pod and node monitoring data.

If you try to pull monitoring information using the following commands

kubectl top pod

kubectl top node

it does not show it, rather gives you a error message similar to

[output]

Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

Even though the error mentions heapster, its replaced with metrics server by default now.

Deploy metric server with the following commands,

cd ~
git clone https://github.com/schoolofdevops/metrics-server.git
kubectl apply -k metrics-server/manifests/overlays/release

Validate

kubectl get deploy,pods -n kube-system --selector='k8s-app=metrics-server'

You could validate again with

kubectl top pod

kubectl top node

where expected output should be similar to,

kubectl top node

NAME                 CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
kind-control-plane   123m         6%     688Mi           17%
kind-worker          39m          1%     498Mi           12%
kind-worker2         31m          1%     422Mi           10%

If you see a similar output, monitoring is now been setup.

Deploy Prometheus and Grafana

To further setup sophisticated monitoring, we will deploy Prometheus and Grafana.

Begin by setting up the helm repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Install Prometheus and Grafana as

helm upgrade --install prom -n monitoring \
  prometheus-community/kube-prometheus-stack \
  --create-namespace \
  --set grafana.service.type=NodePort \
  --set grafana.service.nodePort=30500 \
  --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

Login using

User : admin
Pass: prom-operator

Redeploy Nginx Ingress Controller

First, uninstall nginx ingress conroller as,

helm un ingress-nginx -n ingress-nginx

Re deploy nginx ingress controller with helm, this time enabling the exposing the metrics which can then be scraped/collected by prometheus. This will automatically integrate nginx with prometheus.

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  --set controller.metrics.enabled=true \
  --set controller.metrics.serviceMonitor.enabled=true --set \ controller.metrics.serviceMonitor.additionalLabels.release="prometheus" \
  --set controller.hostPort.enabled=true \
  --set controller.hostPort.ports.http=80 \
  --set controller.hostPort.ports.https=443 \
  --set controller.service.type=NodePort \
  --set-string controller.nodeSelector."kubernetes\.io/os"=linux \
  --set-string controller.nodeSelector.ingress-ready="true"

Setup Grafana Dashboard for Nginx Ingress Controller

Now, login to grafana and import custom dashboard for Nginx Ingress as

â € It may look similar to this, with possibly less data initially

However, if you see some metric coming in, your setup with Nginx Ingress and Promethus Integration is working ! You may pat your back at this time :)

Add Local DNS to access the Vote App via Browser

Note: You have to make this change on your local laptop/desktop

As part of app deployment, you have created the ingress rules based on hostnames e.g. vote.example.com. You could validate this by running the following command

kubectl get ingress -n instavote-prod

In order for you to be able to access this app from browser using hostname, there has to be a dns entry pointing to your nodes, which is running as nginx (exposed via nodeport).


  vote.example.com     -------+                        +----- vote:80
                              |     +-------------+    |  (namespace= prod)
                              |     |   ingress   |    |
                              +===> |   node:80   | ===+
                              |     +-------------+    |
                              |                        |  (namespace= xyz)
   xyz.example.com     -------+                        +----- xyz:80

To achieve this you need to either,

  • Create a DNS entry, provided you own the domain and have access to the dns management console.
  • Create a local hosts file entry. On unix systems its in /etc/hosts file. On windows its at C:\Windows\System32\drivers\etc\hosts. You need admin access to edit this file.

For example, on a linux or osx, you could edit it as,

sudo vim /etc/hosts

And add an entry such as ,

xxx.xxx.xxx.xxx vote.example.com 

where,

  • xxx.xxx.xxx.xxx is the actual IP address of one of the nodes running traefik.

e.g with KIND based environment

127.0.0.01 vote.example.com 

And then access the app urls using http://vote.example.com

Name Based Routing


Stage 2 - Full Fledged Canary Rollout

Configure Canary Rollout Configuration with Experiment and Analysis

In this section, we will configure canary rollout configurations with experiments and analysis using ArgoCD. We will set up a canary deployment strategy, define experiments for testing, and analyze the results to ensure the stability and performance of our applications before fully rolling them out to production.

Experiments and analysis are crucial components of the canary deployment strategy. Experiments involve running a temporary version of the application alongside the stable version to observe its behavior under real-world conditions. Analysis, on the other hand, involves collecting and evaluating metrics from these experiments to determine if the new version meets the desired performance and stability criteria. The purpose of these steps is to ensure that any issues are detected early, allowing for automatic rollbacks if necessary, thus minimizing the risk of deploying faulty updates to production.

Lets first update the canary rollout configuration to include experiments and analysis.

Since we are using helm charts to deploy our application, we will need to update the relevant files in instavote-gitops repo.

File: instavote-gitops/charts/vote/env/prod.yaml

canary:
  steps:
    - setCanaryScale:
        replicas: 2
    - experiment:
        duration: 3m
        templates:
        - name: fitness
          specRef: canary
          service:
            name: experiment
        analyses:
          - name: fitness-test
            templateName: canary-fitness-test
    - setWeight: 20
    - pause:
        duration: 10s
    - setWeight: 40
    - pause:
        duration: 10s
    - setWeight: 60
    - analysis:
        templates:
        - templateName: loadtest
        - templateName: latency
    - setWeight: 80
    - pause:
        duration: 10s
    - setWeight: 100

Explanation

  • Rollout Configuration:

  • The rollout strategy includes canary steps with set weights and pauses.

  • Each canary step includes an experiment with a specified duration (e.g., 3 minutes).
  • The experiment step runs a experimental replicaset and launches a fitness test to validate if the new version looks okay.
  • After 60% traffic is shifted to canary, a load test is lauched along with analysis from prometheus to check if the new version will perform okay with the load.

  • Analysis Templates:

  • Defines a templates for running various tests and analyses.

  • The loadtest container runs the load testing script against the canary service (vote-preview).
  • The fitness-test job runs a test to validate if the new version is fit for deployment.
  • the latency analysis fetches latency metrics from Prometheus and checks if the application is responding in acceptable time frame even with load conditions.

â €

How it Works

  • At each setWeight step, traffic is gradually shifted to the canary version.
  • The analysis step includes both the load test and the metric analysis.
  • The experiment runs for 3 minutes, during which the fitness test is conducted.
  • Simultaneously with load test , the analysis template checks Prometheus metrics to ensure the canary is performing correctly.
  • If the analysis detects errors beyond the acceptable threshold, the rollout will trigger a rollback.
  • If the canary passes the load test and analysis, the rollout proceeds to the next step.

â € By configuring the experiment and analysis to run in parallel, you can ensure comprehensive testing and validation of the canary version, enabling automatic rollback if any issues are detected.

Let's now create the experiment and analysis templates.

1. Fitness Test Experiment for Canary

File: intstavote-gitops/charts/vote/templates/fitness-analysistemplate.yaml

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: canary-fitness-test
spec:
  metrics:
  - name: canary-fitness
    interval: 30s
    count: 3
    successCondition: result == "true"
    failureLimit: 1
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: fitness-test
                image: curlimages/curl
                command: ["/bin/sh", "-c"]
                args:
                - |
                  FITNESS_RESULT="false"
                  CANARY_SERVICE_URL="http://vote-canary"

                  # Perform the fitness test
                  RESPONSE=$(curl -s $CANARY_SERVICE_URL)

                  # Check if the response contains the expected string
                  if [[ "$RESPONSE" == *"Processed by container ID"* ]]; then
                    FITNESS_RESULT="true"
                  fi

                  # Return the fitness test result
                  echo $FITNESS_RESULT
              restartPolicy: Never
          backoffLimit: 1

where, replace CANARY_SERVICE_URL="http://vote-canary" with the actual URL of the canary service.

2. Analysis Template for Load Testing

File intstavote-gitops/charts/vote/templates/loadtest-analysistemplate.yaml

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: loadtest
spec:
  metrics:
  - name: loadtest-vote
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: siege
                image: schoolofdevops/loadtest:v1
                command:
                  - siege
                  - "--concurrent=2"
                  - "--benchmark"
                  - "--time=5m"
                  - "--header='X-Canary: siege'"
                  - "http://vote.example.com"
              restartPolicy: Never
              hostAliases:
              - ip: "xx.xx.xx.xx"
                hostnames:
                - "vote.example.com"
          backoffLimit: 4

where, * replace xx.xx.xx.xx with internal IP Address of kind-worker node which runs nginx. Find out by using

kubectl get nodes -o wide

[sample output]

NAME                 STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION     CONTAINER-RUNTIME
kind-control-plane   Ready    control-plane   2d23h   v1.30.0   172.18.0.2    <none>        Debian GNU/Linux 12 (bookworm)   6.8.0-31-generic   containerd://1.7.15
kind-worker          Ready    <none>          2d23h   v1.30.0   172.18.0.4    <none>        Debian GNU/Linux 12 (bookworm)   6.8.0-31-generic   containerd://1.7.15
kind-worker2         Ready    <none>          2d23h   v1.30.0   172.18.0.3    <none>        Debian GNU/Linux 12 (bookworm)   6.8.0-31-generic   containerd://1.7.15

From this output, you are going to use 172.18.0.4 in the configuration above.

3. AnalysisTemplate with Latecy Metrics from Prometheus

File : intstavote-gitops/charts/vote/templates/latency-analysistemplate.yaml

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: latency
spec:
  metrics:
  - name: nginx-latency-ms
    initialDelay: 1m
    interval: 1m
    failureLimit: 2
    count: 4
    successCondition: result < 50.0
    failureCondition: result >= 50.0
    provider:
      prometheus:
        address: http://prom-kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
        query: |
          scalar(
            1000 * histogram_quantile(0.99,
              sum(
                rate(
                  nginx_ingress_controller_request_duration_seconds_bucket{ingress="vote", exported_namespace="instavote-prod"}[1m]
                )
              ) by (le)
            )
          )
Commit and push the Helm Chart

validate

helm template vote --values env/prod.yaml .

commit and push to git

git status
git add *
git status
git commit -am "added fitness test and prometheus analysis"
git push origin main

If you are mapping prod to release branch, raise a PR and merge the changes to release branch.

watch the rollout using

kubectl argo rollouts list rollout -A
watch kubectl argo rollouts get rollout  vote-prod -n instavote-prod

More commands to watch the progressive canary with

kubectl get ar -A
kubectl get exp -A
kubectl get pods,jobs

You could also watch the rollout using the web UI.

Stage 3 - Cleaning Up

Once you are done with the lab, you can delete the components created during this lab using the following commands.

# Delete Appsets along with Applications
kubectl delete appset instavote -n argocd 
helm uninstall -n monitoring prom  

# From the path where you cloned the repo
kubectl delete -k metrics-server/manifests/overlays/release

# Uninstall Nginx Ingress Controller
helm uninstall -n ingress-nginx ingress-nginx

# Delete Argo Image Updater 
kubectl delete -n argocd -f https://raw.githubusercontent.com/argoproj-labs/argocd-image-updater/stable/manifests/install.yaml

Summary

In this lab, we explored advanced deployment strategies using ArgoCD and Prometheus. We set up and configured ArgoCD to visualize rollouts, deployed a metrics server for monitoring, and integrated Prometheus and Grafana for sophisticated monitoring and analysis. We implemented canary deployments with experiments and analysis to ensure the stability and performance of our applications before fully rolling them out to production. By leveraging these tools and techniques, we achieved reliable and efficient continuous delivery in our Kubernetes environment, ensuring that any issues are detected early and allowing for automatic rollbacks if necessary. This comprehensive approach helps minimize the risk of deploying faulty updates to production and ensures the overall stability and performance of our applications.