Skip to content

Module 2: Configure and Deploy to OpenShift

Now that you understand the project layout, it is time to configure the agent for a real LLM endpoint and deploy it to OpenShift. By the end of this module your agent will be running in a pod, reachable over HTTPS, and answering questions using a model served by vLLM or LlamaStack.

Set your model endpoint

Open agent.yaml and find the model: section. This is where you tell the agent which LLM to call. Every value supports ${VAR:-default} substitution, so the same file works for local development (use the defaults) and production (inject env vars via ConfigMap).

model:
  endpoint: ${MODEL_ENDPOINT:-http://vllm-predictor.model-ns.svc.cluster.local/v1}
  name: ${MODEL_NAME:-/mnt/models}
  temperature: 0.7
  max_tokens: 4096

The endpoint is an OpenAI-compatible /v1 URL. The name is whatever the inference server expects as the model identifier -- for vLLM this is typically the Hugging Face model ID or /mnt/models if the model is loaded from a local path.

Finding your model endpoint

If your model is deployed via OpenShift AI (RHOAI), the internal service URL follows the pattern:

http://<inference-service-name>-predictor.<namespace>.svc.cluster.local/v1

List all InferenceServices across namespaces to find yours:

oc get inferenceservice -A

Why OPENAI_API_KEY?

The OpenAI Python SDK requires an API key even when calling unauthenticated endpoints like vLLM. Set OPENAI_API_KEY to any non-empty string (e.g. not-required) to satisfy the SDK. The agent's ConfigMap handles this for you in the Helm deploy step below.

Set agent identity

The agent: section at the top of agent.yaml controls metadata that appears in logs and the /v1/agent-info endpoint. Update it to describe your agent:

agent:
  name: ${AGENT_NAME:-calculus-agent}
  description: "A math tutor agent that solves calculus problems step by step"
  version: 0.1.0

The name and description are surfaced by the /v1/agent-info REST endpoint, which is useful for service discovery when you have many agents running in a cluster.

Understanding the Helm chart

The chart/ directory contains a Helm chart that produces the Kubernetes resources your agent needs. Here is what each template creates:

Template Kubernetes Resource Purpose
deployment.yaml Deployment Pod spec, container image, env vars from ConfigMap
service.yaml Service ClusterIP exposing port 8080 inside the cluster
configmap.yaml ConfigMap Env vars built from values.config entries
route.yaml Route (OpenShift) External HTTPS access with TLS edge termination

What is a Helm chart?

Helm is a package manager for Kubernetes. A chart is a collection of templated YAML files that produce Kubernetes resources when rendered. You override template variables at deploy time with --set key=value flags or a custom values file. Think of it as docker-compose.yml but for Kubernetes.

Key values in values.yaml

Open chart/values.yaml to see the full set of knobs. The ones you will use most often:

image.repository and image.tag -- where Kubernetes pulls the container image from. When using the OpenShift internal registry, this is the image stream path (e.g. image-registry.openshift-image-registry.svc:5000/calculus-agent/calculus-agent).

config.* -- every key under config: becomes an environment variable in the ConfigMap. These are substituted into agent.yaml at runtime. For example, setting config.MODEL_ENDPOINT overrides the ${MODEL_ENDPOINT} placeholder.

route.enabled -- when true, the chart creates an OpenShift Route that gives your agent an external HTTPS URL. When false, the agent is only reachable inside the cluster via its Service.

resources -- CPU and memory requests/limits. The defaults (100m CPU, 256Mi memory) are reasonable because the agent is I/O-bound: it spends most of its time waiting for LLM responses over the network.

Build the container image

You need a container image before you can deploy. There are two approaches.

A BuildConfig builds your image directly in the cluster's internal registry. No need to push images to an external registry, and the build runs on x86_64 regardless of your laptop's architecture.

# Create a binary BuildConfig that accepts source uploads
oc new-build --binary --name=calculus-agent --strategy=docker -n calculus-agent

# Tell it to use Containerfile instead of Dockerfile
oc patch bc/calculus-agent --type=json \
  -p '[{"op":"replace","path":"/spec/strategy/dockerStrategy/dockerfilePath","value":"Containerfile"}]' \
  -n calculus-agent

# Upload your source and start the build
oc start-build calculus-agent --from-dir=. --follow -n calculus-agent

What is a BuildConfig?

A BuildConfig is an OpenShift resource that tells the platform how to build a container image. With --binary, it accepts source code uploaded from your local machine and builds the image server-side. The resulting image is pushed to the cluster's internal registry automatically -- no external registry credentials needed.

The --follow flag streams build logs to your terminal. When the build completes you will see a line like Push successful followed by the internal image reference.

Option B: Local build + push

If you prefer to build locally (or need to push to an external registry like Quay), use the Makefile target:

make build IMAGE_NAME=calculus-agent IMAGE_TAG=v1
podman push calculus-agent:v1 quay.io/your-org/calculus-agent:v1

Architecture mismatch

make build passes --platform linux/amd64 automatically. If you build with raw podman build on an Apple Silicon Mac, you must include that flag yourself or the image will be ARM64, which will not run on x86_64 OpenShift nodes.

Deploy with Helm

Deploy mechanisms in this tutorial

Different components use different deploy mechanisms:

  • Agent (this module): helm install / helm upgrade via the chart in chart/. The chart produces a Deployment, Service, ConfigMap, and Route.
  • MCP server (Module 3): ./deploy.sh <namespace>, which applies openshift.yaml (BuildConfig + Deployment + Service + Route).
  • Gateway and UI (Module 5): ./deploy.sh <namespace> for the initial deploy, plus helm upgrade when configuration changes.

The Makefile wraps these: make deploy PROJECT=<namespace> calls the right tool for each project type.

With the image built, deploy the agent:

# Get the internal registry path for the image we just built
IMAGE=$(oc get is calculus-agent -n calculus-agent -o jsonpath='{.status.dockerImageRepository}')

# Deploy the chart
helm install calculus-agent chart/ \
  --set image.repository=$IMAGE \
  --set image.tag=latest \
  --set image.pullPolicy=Always \
  --set config.MODEL_ENDPOINT=http://vllm-predictor.model-ns.svc.cluster.local/v1 \
  --set config.MODEL_NAME=/mnt/models \
  --set config.OPENAI_API_KEY=not-required \
  --set route.enabled=true \
  -n calculus-agent

Here is what each --set does:

  • image.repository -- points at the image stream in the internal registry. The oc get is command retrieves the full path.
  • image.tag -- latest tracks the most recent build. Pin to a specific tag for production.
  • image.pullPolicy=Always -- forces Kubernetes to pull the image on every pod restart, so you always get the latest build.
  • config.MODEL_ENDPOINT -- the /v1 URL of your vLLM or LlamaStack inference endpoint.
  • config.MODEL_NAME -- the model identifier your endpoint expects.
  • config.OPENAI_API_KEY -- satisfies the SDK requirement. Set to a real key only if your endpoint requires authentication.
  • route.enabled=true -- creates an OpenShift Route so you can reach the agent from outside the cluster.

What is an ImageStream?

An ImageStream is an OpenShift abstraction that tracks container images in the internal registry. When you build with a BuildConfig, the output image is tagged in an ImageStream. oc get is shows you the registry path that Kubernetes needs to pull the image.

Verify the deployment

Run through these checks to confirm everything is working.

# 1. Check pod status — you want Running with 1/1 ready
oc get pods -n calculus-agent -l app.kubernetes.io/instance=calculus-agent

# 2. Watch logs for startup messages
oc logs deployment/calculus-agent -n calculus-agent --tail=15

# 3. Get the external route URL
ROUTE=$(oc get route calculus-agent -n calculus-agent -o jsonpath='{.spec.host}')

# 4. Health check
curl -sk "https://$ROUTE/healthz"
# Expected: {"status":"ok"}

# 5. Agent info — confirms identity and model config
curl -sk "https://$ROUTE/v1/agent-info" | python -m json.tool

# 6. Send a real message
curl -sk "https://$ROUTE/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}'

If step 6 returns a JSON response with the model's answer, your agent is live.

When things go wrong

These are the most common issues and how to fix them.

ImagePullBackOff -- Kubernetes cannot pull the container image. The image repository path is usually wrong. Run oc get is -n calculus-agent to find the correct internal registry path, then helm upgrade with the corrected image.repository value.

CrashLoopBackOff -- the container starts and immediately crashes. Check logs with oc logs deployment/calculus-agent -n calculus-agent. Common causes: a missing Python dependency, a syntax error in agent.yaml, or a PermissionError on source files (see the warning below).

File permissions in containers

The UBI base image runs as a non-root user (UID 1001). If source files were copied into the image with owner-only permissions (600), the container process cannot read them. The Containerfile includes a chmod step to fix this, but if you have modified the Containerfile, verify that the permission fix is still in place.

Route returns 503 -- the pod is not ready yet. Wait for the rollout to finish: oc rollout status deployment/calculus-agent -n calculus-agent. If the rollout is stuck, check pod logs.

Model returns errors -- if the health check passes but chat completions fail, the issue is usually the model endpoint. Verify the endpoint is reachable from inside the cluster:

oc exec deployment/calculus-agent -n calculus-agent -- \
  curl -s http://vllm-predictor.model-ns.svc.cluster.local/v1/models

Old image after rebuild -- OpenShift caches images. After building a new version, restart the deployment to pick it up: oc rollout restart deployment/calculus-agent -n calculus-agent.

Redeploying after changes

The development cycle for deployed agents is: edit code, rebuild the image, restart the deployment. Here is the sequence:

# 1. Rebuild the image in the cluster
oc start-build calculus-agent --from-dir=. --follow -n calculus-agent

# 2. Restart the deployment to pick up the new image
oc rollout restart deployment/calculus-agent -n calculus-agent

# 3. Wait for the new pod to become ready
oc rollout status deployment/calculus-agent -n calculus-agent

The Makefile provides a shortcut that wraps these steps:

make redeploy PROJECT=calculus-agent

Updating configuration without rebuilding

If you only need to change environment variables (model endpoint, log level, etc.), you do not need to rebuild the image. Run helm upgrade with the new --set config.* values. The ConfigMap checksum annotation in the Deployment template automatically triggers a rolling restart when the ConfigMap changes.

helm upgrade calculus-agent chart/ \
  --set config.MODEL_ENDPOINT=http://new-endpoint.svc.cluster.local/v1 \
  --reuse-values \
  -n calculus-agent

What's next

Your agent is running in OpenShift and responding to requests. In Module 3, you'll build an MCP server that provides real calculus tools for the agent to use.