Creating Custom Kubernetes Operators for Ollama

04/08/2025

2 Views 0

SaveSavedRemoved 0

Creating Custom Kubernetes Operators for Ollama

Streamline Your AI Workloads: How to Build a Custom Kubernetes Operator for Ollama

As large language models (LLMs) become central to modern applications, developers and DevOps teams face a new challenge: how to effectively manage these complex workloads within Kubernetes. While Kubernetes excels at orchestrating stateless containers, deploying and managing stateful, resource-intensive applications like Ollama requires a more sophisticated approach. Manually creating Deployments, Services, and ConfigMaps for each model is tedious, error-prone, and doesn’t scale.

This is where the Kubernetes Operator pattern provides a powerful solution. By creating a custom operator, you can extend the Kubernetes API to understand, deploy, and manage Ollama instances as if they were native resources. This guide will walk you through the concepts and steps required to build your own Ollama Kubernetes Operator, transforming a complex manual process into a simple, declarative one.

Why Build a Custom Operator for Ollama?

An operator is essentially an automated, application-specific controller. Instead of you telling Kubernetes how to run Ollama (by defining pods, services, etc.), you simply declare the desired state of your Ollama instance, and the operator handles the rest.

The key benefits include:

Declarative Management: Simply create a resource like apiVersion: "ai.my.domain/v1", kind: "Ollama", ... and let the operator configure everything.
Automation of Complex Tasks: The operator can automate the entire lifecycle, including deployment, configuration, scaling, and even updates.
Encapsulated Expertise: All the operational knowledge for running Ollama is encoded directly into the operator, making deployments consistent and repeatable for everyone on your team.
Enhanced Scalability: Easily spin up or tear down multiple, isolated Ollama instances for different teams or models without manual intervention.

The Core Components: CRDs and Controllers

At the heart of any operator are two key components:

Custom Resource Definition (CRD): This extends the Kubernetes API, creating a new resource type that you define. In our case, we’ll create an Ollama resource. The CRD defines the spec (the desired state you provide, like which model to use and how many replicas) and the status (the observed, real-world state of the deployment).
Controller: This is the logic that watches for changes to your custom resources and works to make the current state of the cluster match the desired state in the spec. This process is known as the reconciliation loop. When you create or update an Ollama resource, the controller’s reconciliation loop is triggered, and it will create, update, or delete the underlying Kubernetes objects (like Deployments and Services) as needed.

A Step-by-Step Guide to Building Your Ollama Operator

To build our operator, we’ll use the Operator SDK, a powerful framework that simplifies the development process by scaffolding the project and generating boilerplate code.

Prerequisites:

Go programming language
Docker or another container runtime
kubectl command-line tool
Access to a Kubernetes cluster (e.g., kind, Minikube, or a cloud provider)
The Operator SDK CLI

Step 1: Scaffold the Project

First, create a new project directory and initialize the operator project using the Operator SDK. These commands create the basic structure, including API definitions and the controller skeleton.

# Example commands
operator-sdk init --domain my.domain --repo github.com/your-user/ollama-operator
operator-sdk create api --group ai --version v1 --kind Ollama

This sets up the necessary files for your Ollama custom resource under the ai.my.domain/v1 API group.

Step 2: Define the `Ollama` API (CRD)

Next, you need to define the structure of your Ollama resource. Open the api/v1/ollama_types.go file. Here, you’ll define the fields for the spec and status.

A simple OllamaSpec might look like this:

// OllamaSpec defines the desired state of an Ollama instance
type OllamaSpec struct {
    // The name of the LLM model to run (e.g., "llama3")
    Model string `json:"model"`

    // The number of replicas for the Ollama deployment
    // +optional
    Replicas *int32 `json:"replicas,omitempty"`
}

The OllamaStatus can be used to report the readiness of the pods:

// OllamaStatus defines the observed state of Ollama
type OllamaStatus struct {
    // A list of the names of the pods running the model
    // +optional
    Pods []string `json:"pods,omitempty"`

    // A simple status message
    // +optional
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

After defining these types, run make manifests to regenerate the CRD manifest file in the config/crd/bases/ directory.

Step 3: Implement the Controller Logic

This is where the magic happens. In internal/controller/ollama_controller.go, you’ll implement the Reconcile method. This function is called every time an Ollama resource is created, updated, or deleted.

The logic follows a clear pattern:

Fetch the Ollama Instance: Load the custom resource that triggered the reconciliation event.
Check for Existing Deployment: Query the cluster to see if a Deployment for this Ollama instance already exists.
If Not Found, Create It: If no Deployment exists, build a new Deployment definition based on the spec from the Ollama resource (e.g., setting the container image, command to pull the specified model, and replica count). Then, create this Deployment in the cluster.
If Found, Reconcile: If a Deployment already exists, compare its current state to the desired state in the spec. For example, if the replicas field in the Ollama resource has changed, update the existing Deployment to match the new count.
Reconcile the Service: Ensure a Service exists to expose the Ollama Deployment within the cluster. If it doesn’t, create it.
Update the Status: Finally, update the status field of the Ollama resource to reflect the current state, such as listing the running pods or reporting a “Ready” condition.

Step 4: Build and Deploy the Operator

Once the logic is complete, you can build and deploy the operator to your cluster.

Build and Push the Image: Use make to build the operator’s container image and push it to a container registry.
bash make docker-build docker-push IMG="your-registry/ollama-operator:v0.0.1"
Deploy to the Cluster: The make deploy command will apply the necessary CRDs, RBAC rules (Roles and RoleBindings), and the Deployment for the operator itself.
bash make deploy IMG="your-registry/ollama-operator:v0.0.1"

Step 5: Test Your Operator

With the operator running, you can now manage Ollama declaratively. Create a simple YAML file named my-ollama-instance.yaml:

apiVersion: ai.my.domain/v1
kind: Ollama
metadata:
  name: llama3-instance
spec:
  model: "llama3"
  replicas: 1

Apply it to your cluster:

kubectl apply -f my-ollama-instance.yaml

Your operator will detect this new resource and automatically create a Deployment and Service to run the llama3 model. You can verify this by running kubectl get deployments and kubectl get pods. Changing the replicas in the YAML file and reapplying it will cause the operator to automatically scale the deployment up or down.

Actionable Security and Best Practices

When building an operator for production, consider the following:

Resource Management: LLMs are resource-intensive. Always define CPU and memory requests and limits in the Deployment your operator creates. This prevents a single model from starving other applications on the cluster.
Role-Based Access Control (RBAC): The operator requires permissions to manage Deployments, Services, and other resources. Ensure its ServiceAccount is granted only the necessary permissions, following the principle of least privilege.
Stateful Data: If you need to persist downloaded models between pod restarts, configure your operator to create and manage PersistentVolumeClaims (PVCs) for the Ollama pods.
Graceful Shutdown: Implement logic to handle the deletion of an Ollama resource gracefully, ensuring all associated components are cleaned up properly by using finalizers.

By investing the time to build a custom Kubernetes operator, you can create a robust, automated, and scalable platform for managing LLMs, empowering your entire organization to leverage AI more effectively.

Source: https://collabnix.com/building-custom-kubernetes-operators-for-ollama-4/