Skip to content

8. Production Hardening

Your full stack is deployed: agent, MCP server, gateway, UI, and code execution sandbox. Everything works in development. This final module covers what it takes to run the stack in production -- secrets management, FIPS compliance, authentication, security policy, resource limits, and monitoring.

FIPS compliance

Federal Information Processing Standards (FIPS 140-2/140-3) mandate the use of validated cryptographic modules. FIPS compliance is required for U.S. government workloads and many enterprise environments. If your OpenShift cluster runs with FIPS mode enabled, every container in the cluster must use FIPS-validated crypto.

Red Hat UBI base images are FIPS-capable out of the box. When the host kernel has fips=1 set, UBI's OpenSSL automatically restricts itself to FIPS-validated algorithms. No application-level configuration is needed.

To verify FIPS mode is active in a running pod:

oc exec deployment/calculus-agent -n calculus-agent -- \
  cat /proc/sys/crypto/fips_enabled

A return value of 1 means FIPS mode is active. A return value of 0 means the host kernel does not have FIPS enabled.

What breaks under FIPS

MD5 hashing raises an error unless called with usedforsecurity=False. TLS is restricted to AEAD cipher suites (AES-GCM, AES-CCM). If your agent calls an external endpoint that requires legacy ciphers (CBC, RC4), the TLS handshake will fail. The fix is on the remote endpoint, not your agent.

The litellm migration

The framework originally used litellm as an LLM abstraction layer. Two problems forced a switch:

  1. FIPS incompatibility. litellm's dependency tree pulls in cryptographic libraries that are not FIPS-validated. On a FIPS-enabled cluster, these libraries either fail at import time or silently use non-compliant algorithms.

  2. Supply chain compromise. litellm versions 1.82.7 and 1.82.8 were compromised in a supply chain attack (March 2026). The malicious versions exfiltrated API keys to an external endpoint.

The fix was straightforward: replace litellm with the openai async SDK. vLLM, LlamaStack, llm-d, and most inference servers expose an OpenAI-compatible API, so litellm's abstraction layer was adding complexity without adding value. The result is a simpler dependency tree that is easier to audit, FIPS-compliant, and free of supply chain risk.

Never install litellm 1.82.7 or 1.82.8

These versions are compromised. If you encounter them in a lockfile or dependency tree, pin to >=1.83.0 or <=1.82.6.

The takeaway: fewer dependencies means a smaller attack surface. Prefer standard SDKs over abstraction layers when the abstraction doesn't carry its weight.

Secrets management

Production credentials must never appear in agent.yaml, prompts, or source code. OpenShift Secrets are the standard mechanism for injecting sensitive values at runtime.

Create a Secret

The openai SDK requires OPENAI_API_KEY to be set, even when calling unauthenticated endpoints like vLLM (set it to any non-empty string in that case). For endpoints that require real credentials, create a Secret:

oc create secret generic llm-credentials \
  --from-literal=OPENAI_API_KEY=sk-your-real-key-here \
  -n calculus-agent

Mount via Helm values

The Helm chart's env section supports secretKeyRef for injecting Secret values as environment variables. Add this to your values.yaml:

env:
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: llm-credentials
        key: OPENAI_API_KEY

Then upgrade the release:

helm upgrade calculus-agent chart/ --reuse-values -n calculus-agent

The Deployment template injects the Secret value as an environment variable. The ${OPENAI_API_KEY} reference in agent.yaml picks it up at runtime through normal env var substitution.

Secrets vs ConfigMaps

Use ConfigMaps for non-sensitive configuration (MODEL_ENDPOINT, LOG_LEVEL). Use Secrets for credentials, API keys, and tokens. Secrets are base64-encoded at rest and can be encrypted with etcd encryption if your cluster is configured for it.

MCP server authentication

The calculus-helper MCP server includes JWT authentication support in src/core/auth.py. When enabled, the server validates a bearer token on every request before executing any tool.

Enable JWT auth on the MCP server

Set these environment variables on the MCP server deployment:

Variable Purpose
MCP_AUTH_JWT_ALG Algorithm: RS256, HS256, etc. Auth is disabled if unset
MCP_AUTH_JWT_SECRET Shared secret for HMAC algorithms
MCP_AUTH_JWT_JWKS_URI JWKS endpoint URL (alternative to a static key)
MCP_AUTH_JWT_ISSUER Expected iss claim in the token
MCP_AUTH_JWT_AUDIENCE Expected aud claim in the token

For HMAC-based auth (simplest to set up):

oc create secret generic mcp-auth \
  --from-literal=MCP_AUTH_JWT_SECRET=your-shared-secret \
  -n calculus-agent

Then add the env vars to the MCP server's Helm values:

env:
  - name: MCP_AUTH_JWT_ALG
    value: HS256
  - name: MCP_AUTH_JWT_SECRET
    valueFrom:
      secretKeyRef:
        name: mcp-auth
        key: MCP_AUTH_JWT_SECRET

Configure the agent for authenticated MCP

On the agent side, the MCP server entry in agent.yaml supports auth headers. The agent passes a bearer token when connecting:

mcp_servers:
  - url: ${MCP_CALCULUS_URL:-http://mcp-server.calculus-mcp.svc.cluster.local:8080/mcp/}
    auth:
      token: ${MCP_AUTH_TOKEN}

Store the token in a Secret and inject it the same way as OPENAI_API_KEY.

Production auth patterns

For production, prefer RS256 with a JWKS endpoint over shared secrets. This lets you rotate keys without redeploying. Set MCP_AUTH_JWT_JWKS_URI to your identity provider's JWKS URL (e.g., Keycloak or Red Hat SSO).

Security configuration

The security section in agent.yaml controls runtime security behavior:

security:
  mode: ${SECURITY_MODE:-enforce}
  tool_inspection:
    enabled: ${TOOL_INSPECTION_ENABLED:-true}

Enforce vs observe mode

Mode Behavior Use when
enforce Blocks execution when a security finding is detected Production
observe Logs findings but allows execution to continue Tuning and testing

Start with observe when you first deploy to understand what the security layer flags. Once you've reviewed the findings and confirmed they're legitimate, switch to enforce.

Tool inspection

When tool_inspection.enabled is true, the framework validates tool inputs and outputs against their declared schemas before and after execution. This catches malformed tool calls from the LLM and unexpected return values from tool implementations.

You can override the global mode per layer. For example, to enforce tool inspection but only observe guardrails while tuning them:

security:
  mode: enforce
  tool_inspection:
    enabled: true
  guardrails:
    mode: observe

Resource limits and scaling

Resource limits

The default Helm values set conservative resource limits because agents are I/O-bound -- they spend most of their time waiting for LLM and MCP responses:

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Adjust these based on your agent's actual usage. An agent that processes large context windows or runs heavy tool-result parsing may need more memory.

Horizontal scaling

The agent and MCP server scale independently. Add a HorizontalPodAutoscaler to scale the agent based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: calculus-agent
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: calculus-agent
  minReplicas: 2
  maxReplicas: 8
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Apply it with oc apply -f hpa.yaml -n calculus-agent. The same pattern works for the MCP server -- create a separate HPA targeting its Deployment.

Apply HPA after your final Helm upgrade

The HPA takes ownership of .spec.replicas once applied. Any subsequent helm upgrade will conflict with the HPA over the replica count. Apply the HPA as the last step, after all Helm configuration is finalized. If you need to run helm upgrade later, delete the HPA first, upgrade, then re-apply.

Why min 2 replicas?

A single replica means any pod restart causes downtime. Two replicas ensure one pod is always available during rolling updates and restarts.

Monitoring

Health and readiness probes

The agent exposes /healthz for liveness probes. The Helm chart includes probe definitions -- enable them in values.yaml:

probes:
  enabled: true

This configures Kubernetes to restart the pod if /healthz stops responding (liveness) and to stop routing traffic to it during startup (readiness).

Pod logs

The most immediate debugging tool. Watch logs in real time:

oc logs deployment/calculus-agent -n calculus-agent -f

Key log patterns to watch for:

Pattern Meaning
Uvicorn running on Agent started successfully
Connected to MCP server MCP connection established
Tool inspection finding Security layer flagged a tool call
Retrying after error Backoff triggered on a failed LLM call
Max iterations reached Agent hit the loop ceiling -- check loop.max_iterations

Set LOG_LEVEL to DEBUG temporarily when investigating issues, then return to INFO or WARNING for normal operation.

Route timeouts

OpenShift Routes have a default timeout of 30 seconds. LLM calls regularly exceed this, especially with large context windows. If you haven't already set this in Module 5, annotate the agent's Route:

oc annotate route calculus-gateway \
  haproxy.router.openshift.io/timeout=120s \
  -n calculus-agent --overwrite

Do the same for the agent Route if it is also directly exposed. The UI Route typically doesn't need a longer timeout since it serves static assets.

What's next

You've built and hardened a complete AI agent system across eight modules:

  1. Scaffolded an agent project and understood every file
  2. Configured the agent for a real LLM and deployed it to OpenShift
  3. Built an MCP server with calculus tools
  4. Wired the MCP tools into the agent
  5. Deployed a gateway and chat UI for browser-based interaction
  6. Added a code execution sandbox for numerical computation
  7. Extended the agent with AI-assisted slash commands
  8. Hardened the stack with secrets, authentication, security policy, and monitoring

The calculus-agent/ and calculus-helper/ directories in this repository serve as complete reference implementations. Use them as starting points for your own agents.

For deeper dives into specific topics, see the Reference pages: agent.yaml configuration, Helm chart anatomy, BaseAgent API, and MCP protocol details.