5. Gateway and UI¶

Your agent responds to curl requests, but real users need a chat interface. In this module you'll scaffold a Go gateway and a web UI, deploy both to OpenShift, and wire them together so users can talk to your calculus agent through a browser. You'll also learn why a gateway layer exists even when it starts as a simple proxy.

Architecture overview¶

Before scaffolding, here's how the pieces fit together:

Browser  ──▶  UI  ──▶  Gateway  ──▶  Agent  ──▶  MCP Server
  (chat)    (Route)    (Route)      (Service)     (Service)

Each component is its own Deployment, Service, and Route. The UI is a Go server that serves static assets and proxies API calls to the gateway. The gateway is a Go binary that proxies requests to the agent. The agent and MCP server are what you built in Modules 1--4.

Two environment variables control the wiring:

Variable	Set on	Points to
`BACKEND_URL`	Gateway	Agent's in-cluster Service URL
`API_URL`	UI	Gateway's in-cluster Service URL

Both use internal service URLs because both the gateway and UI server run inside the cluster. The UI server acts as a reverse proxy -- the browser never calls the gateway directly.

Scaffold the gateway¶

Navigate back to the parent directory (alongside calculus-agent/ and calculus-helper/), then scaffold:

cd ..
fips-agents create gateway calculus-gateway
cd calculus-gateway && ls

build/          CLAUDE.md       Containerfile   Makefile
README.md       chart/          cmd/            go.mod
internal/       planning/

The gateway is a structured Go project using only the standard library. The handlers live in internal/handler/, configuration in internal/config/, and the server entrypoint in cmd/server/main.go.

The proxy handler¶

The interesting piece is the ChatHandler in internal/handler/chat.go, which proxies /v1/chat/completions requests to the agent:

// internal/handler/chat.go -- proxies /v1/chat/completions to the agent
package handler

import (
    "bytes"
    "io"
    "net/http"

    "github.com/fips-agents/calculus-gateway/internal/proxy"
)

type ChatHandler struct {
    BackendURL string
    Client     *http.Client
}

func (h *ChatHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    body, _ := io.ReadAll(r.Body)
    defer r.Body.Close()

    // Forward to backend
    req, _ := http.NewRequest(http.MethodPost, h.BackendURL+"/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Content-Type", "application/json")
    resp, err := h.Client.Do(req)
    if err != nil {
        http.Error(w, `{"error":"backend request failed"}`, http.StatusBadGateway)
        return
    }
    defer resp.Body.Close()

    // Relay response (supports both sync JSON and SSE streaming)
    for k, v := range resp.Header {
        w.Header()[k] = v
    }
    w.WriteHeader(resp.StatusCode)
    io.Copy(w, resp.Body)
}

The full implementation in the scaffold handles streaming and sync modes separately, with proper SSE relay for streaming responses (see internal/proxy/sse.go). The code shown here is a simplified view of the forwarding logic. The scaffold also includes health and readiness handlers (/healthz, /readyz), request logging middleware, and configuration loaded from environment variables.

Why not just expose the agent directly?¶

The gateway looks redundant right now. It proxies requests without modification. But it exists to be the future home of concerns that don't belong in the agent:

Authentication. Verify OAuth2/OIDC tokens before requests reach the agent. The agent stays focused on reasoning; it never sees unauthenticated traffic.

Rate limiting. Protect the LLM backend from runaway clients. A single middleware in the gateway applies limits across all agents behind it.

Multi-agent routing. When you have multiple agents (calculus, chemistry, physics), the gateway inspects the request and routes to the right backend. The UI doesn't need to know which agent handles which domain.

Request logging and metrics. Centralized access logs, latency histograms, and error rates -- all in one place, independent of agent implementation language.

Start thin, grow incrementally

Resist the urge to add middleware before you need it. Ship the proxy, get end-to-end working, then layer in auth and rate limiting when the requirements are clear.

Scaffold the UI¶

Navigate back to the parent directory, then scaffold:

cd ..
fips-agents create ui calculus-ui
cd calculus-ui && ls

CLAUDE.md       Containerfile   Makefile        README.md
chart/          cmd/            go.mod          planning/
static/

The UI is a Go server that does two things: it serves the chat interface as embedded static assets, and it acts as a reverse proxy for API calls to the gateway.

UI architecture

The scaffolded UI is a Go server that embeds static assets (HTML, JS, CSS) and acts as a reverse proxy for API calls. The browser talks only to the UI server -- no cross-origin requests needed. The chat interface includes SSE streaming, tool call visualization, and KaTeX math rendering.

The static/ directory contains the frontend: index.html, app.js, and style.css. These are embedded into the Go binary at build time using static/embed.go.

How the reverse proxy works¶

Instead of injecting the gateway URL into the frontend JavaScript, the UI server proxies API requests itself. The browser makes all calls to the same origin -- the UI route -- and the Go server forwards /v1/ requests to the gateway:

// From cmd/server/main.go -- the key proxy setup
proxy := &httputil.ReverseProxy{
    Director: func(r *http.Request) {
        r.URL.Scheme = backendURL.Scheme
        r.URL.Host = backendURL.Host
        r.Host = backendURL.Host
    },
    FlushInterval: -1, // flush immediately for SSE streaming
}

mux.Handle("/v1/", proxy)          // API calls proxied to backend
mux.Handle("/", http.FileServer(http.FS(static.Files()))) // static assets

The browser calls /v1/chat/completions on the same origin as the UI. The Go server proxies these requests to the backend. This avoids CORS issues entirely -- no cross-origin requests, no preflight checks. API_URL is set on the server via an environment variable (typically through the Helm chart's ConfigMap), not injected into the frontend JavaScript.

Deploy the gateway¶

Set $CTX to your kubeconfig context name

Module 5 passes --kube-context="$CTX" and --context="$CTX" on every helm and oc invocation. Set it once before you start so we don't quietly mutate the active project on a shared kubeconfig:

export CTX=$(oc config current-context)

Build and deploy the gateway to your namespace. BACKEND_URL must be set during the initial install -- the default points at a placeholder service that doesn't exist, so the readiness probe will fail and the deploy will hang if you skip it:

cd calculus-gateway
make build-openshift PROJECT=calculus-agent

# Resolve the image path from the ImageStream the build created
GW_IMAGE=$(oc get is calculus-gateway --context="$CTX" -n calculus-agent \
  -o jsonpath='{.status.dockerImageRepository}')

helm upgrade --install calculus-gateway chart/ \
  --set image.repository=$GW_IMAGE \
  --set config.BACKEND_URL=http://calculus-agent:8080 \
  -n calculus-agent --kube-context="$CTX"

oc rollout status deployment/calculus-gateway -n calculus-agent --context="$CTX"

make build-openshift creates a BuildConfig (if one doesn't already exist), uploads the source, and builds the image in the cluster. The oc get is command reads the internal registry path from the resulting ImageStream -- without --set image.repository, the chart defaults to a short name that the cluster cannot pull. The helm upgrade --install deploys the chart with the correct image and backend URL from the start.

In-cluster service DNS

http://calculus-agent:8080 uses Kubernetes short-name DNS. This works when the gateway and agent are in the same namespace. For cross-namespace communication, use the fully qualified form:

http://calculus-agent.<agent-namespace>.svc.cluster.local:8080

Verify the gateway is proxying correctly:

GW_ROUTE=$(oc get route calculus-gateway -n calculus-agent --context="$CTX" -o jsonpath='{.spec.host}')

# Health check (gateway's own endpoint)
curl -sk "https://$GW_ROUTE/healthz"
# {"status":"ok"}

# Proxied request to agent
curl -sk "https://$GW_ROUTE/v1/agent-info" | python -m json.tool

The /v1/agent-info response should match what you saw when hitting the agent directly in Module 4. The request traveled: your machine --> gateway route --> gateway pod --> agent service --> agent pod.

Deploy the UI¶

cd calculus-ui
make build-openshift PROJECT=calculus-agent

# Resolve the image path from the ImageStream the build created
UI_IMAGE=$(oc get is calculus-ui --context="$CTX" -n calculus-agent \
  -o jsonpath='{.status.dockerImageRepository}')

helm upgrade --install calculus-ui chart/ \
  --set image.repository=$UI_IMAGE \
  --set config.API_URL=http://calculus-gateway:8080 \
  -n calculus-agent --kube-context="$CTX"

oc rollout status deployment/calculus-ui -n calculus-agent --context="$CTX"

In-cluster URL, not external route

API_URL points to the gateway's in-cluster service URL because the UI server proxies API requests server-side. The browser never calls the gateway directly -- it sends requests to the UI origin, and the Go server forwards them.

Don't paste the gateway's HTTPS Route URL here

A common mistake is to set API_URL to the gateway's external route (https://calculus-gateway-...apps.example.com). That URL terminates TLS at the OpenShift router and invites CORS / TLS-trust failures from inside the pod, even though it appears to work from your laptop. API_URL must be the in-cluster Service URL (http://calculus-gateway:8080).

Get the UI route and open it in your browser:

UI_ROUTE=$(oc get route calculus-ui -n calculus-agent --context="$CTX" -o jsonpath='{.spec.host}')
echo "https://$UI_ROUTE"

Configure route timeouts¶

Agent responses can take 30 seconds or more, especially when the LLM chains multiple tool calls. OpenShift's default route timeout is 30 seconds -- right at the edge. You need to raise it on both the gateway and UI routes.

# Set 120-second timeout on the gateway route
oc annotate route calculus-gateway \
  haproxy.router.openshift.io/timeout=120s \
  --overwrite -n calculus-agent --context="$CTX"

# Set 120-second timeout on the UI route
oc annotate route calculus-ui \
  haproxy.router.openshift.io/timeout=120s \
  --overwrite -n calculus-agent --context="$CTX"

The silent 504

Without this annotation, complex queries that trigger multiple tool calls will return a 504 Gateway Timeout from HAProxy. The agent finishes the work successfully -- the client just never sees the response. This is one of the most common deployment surprises.

Test end-to-end¶

Open https://<UI_ROUTE> in your browser. You should see a chat interface. Type a calculus problem and watch the full chain execute:

You: What is the integral of sin(x)*cos(x)?

Here's what happens:

The browser sends a POST to the UI's /v1/chat/completions (same origin).
The UI server proxies the request to the gateway's in-cluster service.
The gateway proxies the request to the agent's in-cluster service.
The agent calls call_model(), which sends the conversation to the LLM.
The LLM decides to call the integrate tool.
The agent dispatches the tool call over MCP to the calculus-helper server.
The MCP server runs SymPy and returns the result.
The agent feeds the result back to the LLM, which formats the answer.
The response streams back through the gateway and UI server to the browser.

The answer should show something like: sin(x)^2 / 2 + C.

Try a few more to exercise the full stack:

"Differentiate ln(x^2 + 1)" -- tests the differentiate tool.
"Evaluate the integral of 1/x from 1 to e" -- tests definite integration.
"Find the second derivative of e^(2x)" -- tests higher-order derivatives.

If any request hangs or times out, check the route timeout annotation first. Then check pod logs for each component in the chain:

oc logs deployment/calculus-ui -n calculus-agent --context="$CTX" --tail=20
oc logs deployment/calculus-gateway -n calculus-agent --context="$CTX" --tail=20
oc logs deployment/calculus-agent -n calculus-agent --context="$CTX" --tail=20

Redeploying after changes¶

Each component follows a build-then-deploy cycle:

# Gateway
cd calculus-gateway
make build-openshift PROJECT=calculus-agent
make deploy PROJECT=calculus-agent

# UI
cd calculus-ui
make build-openshift PROJECT=calculus-agent
make deploy PROJECT=calculus-agent

You can redeploy any component independently. The gateway doesn't need to restart when you update the agent or MCP server -- it proxies to the service, which automatically routes to the new pod after a rollout.

Order of deployment

When deploying the full stack for the first time, deploy bottom-up: MCP server first, then agent, then gateway, then UI. Each layer needs the one below it to be running. For subsequent updates, deploy only the components that changed.

The gateway pattern¶

The three-tier architecture you've built (UI -- gateway -- agents) is a common pattern in production agent systems. Here's why each layer exists:

UI layer. Handles presentation, session state, user preferences, and proxies API calls to the gateway. It can be replaced (a mobile app, a CLI, a Slack bot) without touching anything behind the gateway.

Gateway layer. The single point of entry for all clients. It owns cross-cutting concerns: authentication, authorization, rate limiting, request logging, and routing. When you add a second agent (say, a physics solver), the gateway routes requests to the right backend based on the domain. Clients never talk directly to agents.

Agent layer. Pure reasoning. Each agent connects to its own set of MCP servers and tools. It exposes a standard /v1/chat/completions API. It doesn't know about auth tokens, rate limits, or which UI is calling it.

This separation means you can scale, secure, and update each layer independently. The gateway can enforce a 100-request-per-minute limit without the agent knowing. The agent can swap its LLM backend without the UI caring. The UI can be redesigned without touching any backend code.

What's next¶

You have a complete user-facing system: browser to UI to gateway to agent to MCP server. In Module 6, you'll add a code execution sandbox that lets the agent write and run Python -- giving it the ability to do numerical computation, generate data tables, and go beyond what symbolic tools alone can handle.