Writing Custom Operators with Go
In this lab, we will create a custom operator using Go to manage a custom resource called StaticWebsite. The StaticWebsite custom resource will be used to create a static website using Nginx. It will create a Deployment, Service, and PersistentVolumeClaim (PVC) to host the static website. The operator will watch for changes in the StaticWebsite custom resource and update the Deployment, Service, and PVC accordingly.
Part 1: Setting up the Environment
Sample code used in this lab is available at: Sample Git Repo
Setup Sample Code
cd ~
git clone https://github.com/advk8s/static-website-operator.git operator-sample
export SAMPLE_HOME="$(pwd)/operator-sample"
Step 1: Install Go
Install make
apt install make 
Refer to the official documentation to install Go on your operating system.
# For Linux 
wget -c https://go.dev/dl/go1.24.2.linux-amd64.tar.gz
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.24.2.linux-amd64.tar.gz 
export PATH=$PATH:/usr/local/go/bin 
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
go version 
Step 2: Install Operator SDK
Refer to official documentation to install Operator SDK on your operating system.
# For Linux 
# Download Operator SDK
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.39.2
curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}
# Install the binary
chmod +x operator-sdk_${OS}_${ARCH}
sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk
# Verify installation
operator-sdk version
Part 2: Creating a Custom Operator with Go
Step 1: Create a New Operator Project
Now let's create the operator project structure with a local module path:
# Create a directory for your operator
mkdir -p $HOME/static-website-operator
export PROJECT_HOME="$(pwd)/static-website-operator"
cd $HOME/static-website-operator
# Initialize a new operator project with local module path
operator-sdk init --domain example.com --repo static-website-operator
# Create an API and controller
operator-sdk create api --group websites --version v1alpha1 --kind StaticWebsite --resource --controller
Step 2: Define the Custom Resource Definition
cp $SAMPLE_HOME/api/v1alpha1/staticwebsite_types.go $PROJECT_HOME/api/v1alpha1/staticwebsite_types.go
Step 3: Update the Controller
cp $SAMPLE_HOME/controllers/staticwebsite_controller.go $PROJECT_HOME/internal/controller/staticwebsite_controller.go
Step 4: Update the Main.go File
cp $SAMPLE_HOME/main.go $PROJECT_HOME/cmd/main.go
Step 5: Generate the CRD Manifests and Code
# Check the CRD manifests before generation
ls config/crd
# Run code generation
make generate
# Update the CRD manifests
make manifests
Examine the generated manifests:
ls config/crd/bases/ 
you shall see a new CRD manifest:
websites.example.com_staticwebsites.yaml
Step 6: Build and Load the Operator Image to Kind For local testing with kind, we'll build the operator image and load it directly:
# Build the image locally
make docker-build IMG=static-website-operator:v0.1.0
# Validate the image
docker image ls 
# Load the image into kind
kind load docker-image static-website-operator:v0.1.0
Step 7: Deploy the Operator to Your Kind Cluster Deploy the operator to your kind cluster:
# Install the CRDs into the cluster
make install
# Deploy the operator with the local image
make deploy IMG=static-website-operator:v0.1.0
You can verify that the operator is running:
# Check the operator namespace
kubectl get pods -n static-website-operator-system
Step 8: Create a Sample StaticWebsite Resource
Create a Custom Resource Sample at config/samples/websites_v1alpha1_staticwebsite.yaml
apiVersion: websites.example.com/v1alpha1
kind: StaticWebsite
metadata:
  name: sample-website
spec:
  image: nginx:alpine
  replicas: 2
  resources:
    limits:
      cpu: "200m"
      memory: "256Mi"
    requests:
      cpu: "100m"
      memory: "128Mi"
  storage:
    size: "1Gi"
    storageClassName: "standard"  # Use the default storage class in your cluster
    mountPath: "/usr/share/nginx/html"
Open a new terinal and start wathing for the resources which will be created by the operator:
kubectl get staticwebsites,deploy,svc,pv,pvc
Create a sample StaticWebsite custom resource using the YAML file we created above:
# Apply the sample resource
kubectl apply -f config/samples/websites_v1alpha1_staticwebsite.yaml
Step 9: Verify the Deployment Check if the resources were created properly:
# Check the StaticWebsite resource
kubectl get staticwebsites
# Check the created resources
kubectl get deployments
kubectl get services
kubectl get pvc
# Check the status of the pods
kubectl get pods
# Check the logs of the operator
kubectl logs -n static-website-operator-system deployment/static-website-operator-controller-manager -c manager
Part 3: Adding New Features to the Operator
Lets now add two new features to the operator:
- Add two new shortnames to the StaticWebsite CRD ie, swandsws
- Add a new field NodePortto the StaticWebsite CRD, defining which the service should be exposed on a NodePort.
Step 1: Update the CRD
Update the CRD to include the new shortnames:
File: api/v1alpha1/staticwebsite_types.go
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase"
// +kubebuilder:printcolumn:name="Replicas",type="integer",JSONPath=".spec.replicas"
// +kubebuilder:printcolumn:name="Available",type="integer",JSONPath=".status.availableReplicas"
// +kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url"
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
// +kubebuilder:resource:shortName=sw;sws
Its the last line that you see above that you have to add
Also upadte the CRD to accept the new field:
// StaticWebsiteSpec defines the desired state of StaticWebsite
type StaticWebsiteSpec struct {
    // Image is the container image for the static website
    // +kubebuilder:validation:Required
    Image string `json:"image"`
    // Replicas is the number of replicas to deploy
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:default=1
    Replicas int32 `json:"replicas,omitempty"`
    // Resources defines the resource requirements for the container
    // +optional
    Resources ResourceRequirements `json:"resources,omitempty"`
    // Storage defines the storage configuration for the website content
    // +optional
    Storage *StorageSpec `json:"storage,omitempty"`
    // NodePort specifies the NodePort to expose the service on
    // +optional
    // +kubebuilder:validation:Minimum=30000
    // +kubebuilder:validation:Maximum=32767
    NodePort *int32 `json:"nodePort,omitempty"`
}
Update the controller to handle the new field:
File: internal/controller/staticwebsite_controller.go
// serviceForWebsite returns a website Service object
func (r *StaticWebsiteReconciler) serviceForWebsite(website *websitesv1alpha1.StaticWebsite) *corev1.Service {
        labels := map[string]string{
                "app":        "staticwebsite",
                "controller": website.Name,
        }
        serviceType := corev1.ServiceTypeClusterIP
        servicePort := corev1.ServicePort{
            Port:       80,
            TargetPort: intstr.IntOrString{Type: intstr.Int, IntVal: 80},
            Name:       "http",
        }
        logger := log.FromContext(context.Background())
        logger.Info("Creating service for website",
          "Name", website.Name,
          "HasNodePort", website.Spec.NodePort != nil)
        // If NodePort is specified, set the service type to NodePort and assign the NodePort
        if website.Spec.NodePort != nil {
            serviceType = corev1.ServiceTypeNodePort
            servicePort.NodePort = *website.Spec.NodePort
        }
        svc := &corev1.Service{
            ObjectMeta: metav1.ObjectMeta{
                Name:      website.Name,
                Namespace: website.Namespace,
            },
            Spec: corev1.ServiceSpec{
                Type:     serviceType,
                Selector: labels,
                Ports:    []corev1.ServicePort{servicePort},
            },
        }
        // Set the owner reference for garbage collection
        controllerutil.SetControllerReference(website, svc, r.Scheme)
        return svc
}
Also in the same controller, update the status URL in the Reconcile function to include NodePort information when applicable:
        // Update the status
        if found.Status.ReadyReplicas > 0 && website.Status.Phase != "Running" {
                website.Status.Phase = "Running"
                website.Status.AvailableReplicas = found.Status.ReadyReplicas
                // Update URL based on whether NodePort is used
                if website.Spec.NodePort != nil {
                    website.Status.URL = fmt.Sprintf("http://<node-ip>:%d", *website.Spec.NodePort)
                } else {
                    website.Status.URL = fmt.Sprintf("http://%s.%s.svc.cluster.local", website.Name, website.Namespace)
                }
                if err := r.Status().Update(ctx, website); err != nil {
                        logger.Error(err, "Failed to update StaticWebsite status")
                        return ctrl.Result{}, err
                }
        } else if found.Status.ReadyReplicas != website.Status.AvailableReplicas {
          ...
          ...
          rest of the code 
Rebuild the controller and deploy as
make generate
make manifests
make docker-build IMG=static-website-operator:v0.2.0
kind load docker-image static-website-operator:v0.2.0
make install
make deploy IMG=static-website-operator:v0.2.0
Validate the shortnames as
kubectl api-resources
kubectl get sw
kubectl get sws
Update the sample resource to include the new nodePort field:
File: config/samples/websites_v1alpha1_staticwebsite.yaml
apiVersion: websites.example.com/v1alpha1
kind: StaticWebsite
metadata:
  name: sample-website
spec:
  image: nginx:alpine
  replicas: 2
  resources:
    limits:
      cpu: "200m"
      memory: "256Mi"
    requests:
      cpu: "100m"
      memory: "128Mi"
  storage:
    size: "1Gi"
    storageClassName: "standard"  # Use the default storage class in your cluster
    mountPath: "/usr/share/nginx/html"
  nodePort: 30400
Delete the previous version and create a new instance of static website as,
kubectl delete  -f config/samples/websites_v1alpha1_staticwebsite.yaml
kubectl apply -f config/samples/websites_v1alpha1_staticwebsite.yaml
validate
kubectl get sw,deploy,svc,pvc
Cleaning up
When you are done with the example, you can delete the operator and its resources:
make undeploy
validate
kubectl get sw,deploy,svc,pvc