Agent Memory with MemoryHub¶
Throughout Modules 1--11, the calculus-agent ran with memory disabled
(NullMemoryClient). That was fine -- every request was self-contained, and
the agent didn't need to remember anything across sessions. A calculus tutor
that forgets everything between conversations is still useful, but it isn't
great -- it can't remember that a student prefers step-by-step solutions, that
they're working through a textbook chapter on Taylor series, or that they
asked the same integral yesterday.
Cross-session memory matters when agents need to retain user preferences, prior results, or organizational knowledge. MemoryHub provides governed, scoped memory with semantic search: the agent writes facts into a shared store, and later retrieves them by meaning rather than exact match. The backend is a PostgreSQL + pgvector database fronted by an MCP server, so the agent talks to memory the same way it talks to any other tool.
This module covers two integration paths. Part 1 uses fips-agents -- what you've been building with since Module 1. Part 2 shows the same MemoryHub backend accessed through kagenti ADK's A2A extension system, which is useful context if your organization uses Kagenti (see Where to Go Next for background).
Prerequisites
- Modules 0--2 complete (working cluster, deployed calculus-agent)
- A MemoryHub instance -- either deployed to your cluster (this module walks through that) or accessible at a known URL
Part 1: fips-agents¶
This section adds MemoryHub to the calculus-agent you built in Modules 1--4. By the end, the agent will persist memories across sessions and retrieve them via semantic search.
Deploy MemoryHub¶
MemoryHub is an independent project that deploys its own namespace, database, and MCP endpoint. Clone the repository and install it:
The make install target creates the memory-hub-mcp namespace and deploys
four components: PostgreSQL with pgvector, the MemoryHub MCP server, an OAuth
service for credential management, and a Route for external access.
Verify the MCP endpoint is reachable:
curl -sf https://memory-hub-mcp-memory-hub-mcp.apps.<cluster>/mcp/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | python -m json.tool
Replace <cluster> with your cluster's apps domain. A successful response
lists MemoryHub's MCP tools (memory_search, memory_write, memory_list,
etc.).
Local development without a cluster deployment
If you already have a MemoryHub instance running elsewhere, you can skip
the deploy step entirely. Set MEMORYHUB_URL and MEMORYHUB_API_KEY as
environment variables pointing at the existing instance and proceed to the
wiring step.
Wire memory into calculus-agent¶
In your calculus-agent/ project, run the /add-memory slash command in
Claude Code. The slash command walks through the configuration changes
interactively:
- Creates or updates
.memoryhub.yamlwithserver_urlpointing at your MemoryHub MCP endpoint - Updates
agent.yamlto setmemory.backend: memoryhubandmemory.config_path: .memoryhub.yaml - Updates the
Containerfileto copy.memoryhub.yamlinto the image - Optionally adds MemoryHub as an MCP server so the LLM can call memory tools directly
The key configuration result is two files. In agent.yaml, add the memory:
block:
And the MemoryHub connection config:
# .memoryhub.yaml (key fields)
server_url: https://memory-hub-mcp-memory-hub-mcp.apps.<cluster>/mcp/
memory_loading:
mode: focused
pattern: lazy
The mode: focused setting tells the agent to load only memories relevant to
the current conversation (via semantic search), rather than dumping the entire
memory store into context. The pattern: lazy setting defers memory loading
until the agent actually needs it, keeping cold-start times low.
Use memory in agent code¶
BaseAgent exposes the memory backend through self.memory. Two operations
cover the common case -- searching for relevant memories and writing new ones:
# In your agent's step() method
results = await self.memory.search("calculus preferences", max_results=5)
await self.memory.write(
content="User prefers step-by-step solutions",
scope="user",
weight=0.8,
)
The scope parameter controls visibility. MemoryHub uses a five-tier scoping
model:
| Scope | Visible to | Example |
|---|---|---|
user |
One user across all agents | "Prefers LaTeX notation" |
project |
All users in a project | "Project uses SI units" |
role |
All agents with a given role | "Calculus tutors show worked steps" |
organizational |
All agents in an org | "Company style guide rules" |
enterprise |
All agents everywhere | "Compliance: no PII in responses" |
The weight parameter (0.0--1.0) signals how important the memory is.
Higher-weight memories rank higher in search results and are less likely to be
pruned during curation.
Test it¶
Start the agent locally with memory enabled:
Send a message that establishes a preference:
curl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "I prefer step-by-step solutions with LaTeX formatting."}]
}' | python -m json.tool
Stop the agent (Ctrl+C), then restart it and ask a question that should
trigger memory retrieval:
make run
# In another terminal:
curl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is the integral of x^2?"}]
}' | python -m json.tool
If memory is working, the response should reflect the stored preference -- step-by-step format with LaTeX. Verify directly against MemoryHub to confirm the memory was persisted:
curl -s https://memory-hub-mcp-memory-hub-mcp.apps.<cluster>/mcp/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {"name": "memory_search", "arguments": {"query": "preferences", "max_results": 5}},
"id": 1
}' | python -m json.tool
Curation rejection¶
MemoryHub doesn't accept every write blindly. A curation layer checks each incoming memory against existing content and configured rules before persisting it. If the write is too similar to an existing memory, or violates a curation rule, MemoryHub blocks it and returns a structured rejection.
Try it: write the same fact twice. The first write succeeds. The second triggers a near-duplicate rejection:
# First write -- succeeds
await self.memory.write(
content="User prefers step-by-step solutions",
scope="user",
weight=0.8,
)
# Second write -- blocked by curation
result = await self.memory.write(
content="The user prefers step-by-step solutions",
scope="user",
weight=0.8,
)
The rejection response includes the reason, the similar memory's ID and similarity score, and a recommendation:
{
"blocked": true,
"reason": "exact_duplicate",
"detail": "Memory is 99% similar to existing memory b7ba6e8e-...",
"nearest_score": 0.9869,
"existing_memory_id": "b7ba6e8e-...",
"recommendation": "update_existing"
}
The recommendation field tells the agent what to do instead -- in this case,
update the existing memory rather than creating a duplicate. The fips-agents
MemoryClient surfaces this as a structured return value; in kagenti ADK (Part
2), the equivalent is a MemoryRejectionError exception.
This is the curation system working as designed. Without it, agents that write aggressively (common with LLM-driven memory) would fill the store with redundant entries, degrading search quality over time.
Deploy with memory enabled¶
Redeploy the agent to OpenShift:
The agent pod needs credentials to reach MemoryHub. There are two paths depending on your MemoryHub configuration:
API key (simple path): Create an OpenShift Secret with the key and reference it in your Helm values:
oc create secret generic memoryhub-creds \
--from-literal=MEMORYHUB_API_KEY=<your-key> \
-n calculus-agent
OAuth (production path): Create a Secret with the OAuth client credentials:
oc create secret generic memoryhub-creds \
--from-literal=MEMORYHUB_AUTH_URL=https://auth.example.com/token \
--from-literal=MEMORYHUB_CLIENT_ID=<client-id> \
--from-literal=MEMORYHUB_CLIENT_SECRET=<client-secret> \
-n calculus-agent
In both cases, update your Helm values to mount the Secret as environment variables in the agent pod.
Silent fallback to NullMemoryClient
If the API key or OAuth credentials are missing or invalid, the agent does
not crash -- it silently falls back to NullMemoryClient and runs without
memory. This is safe but easy to miss. Check the pod logs for the string
memory disabled or NullMemoryClient after deployment to confirm memory
is actually active.
Part 2: The kagenti ADK approach¶
This section shows the same MemoryHub backend accessed through kagenti ADK's A2A extension system. You don't need to deploy anything new -- the MemoryHub instance from Part 1 is the same backend. The difference is in how credentials flow: fips-agents stores credentials agent-side (env vars or Secrets), while kagenti ADK pushes them client-side through a fulfillment pattern.
This is useful context if your organization uses Kagenti. See Where to Go Next for installation guidance and caveats -- this section covers only the MemoryHub integration, not the broader platform.
Install the dependency¶
Server-side wiring¶
Kagenti ADK uses an Annotated injection pattern to declare that an agent
demands MemoryHub. The runtime resolves the demand at call time, injecting
a connected store instance:
from typing import Annotated
from kagenti_adk.a2a.extensions import MemoryHubExtensionSpec
from kagenti_adk.server.store.memoryhub_memory_store import MemoryHubExtensionServer
@server.agent()
async def my_agent(
input: Message,
context: RunContext,
memoryhub: Annotated[
MemoryHubExtensionServer,
MemoryHubExtensionSpec.single_demand(),
],
):
store = memoryhub.store(context.context_id)
results = await store.search("query", max_results=5)
await store.create(
"content",
scope="project",
project_id="my-project",
weight=0.7,
)
The single_demand() declaration tells the A2A framework that this agent
requires exactly one MemoryHub instance. The framework advertises this
requirement in the agent's A2A card, so clients know what to supply.
Client-side fulfillment¶
The client reads the agent's A2A card, discovers the MemoryHub demand, and supplies the URL and credentials in request metadata. This keeps secrets on the client side -- the server never stores them:
from kagenti_adk.a2a.extensions import (
MemoryHubExtensionClient,
MemoryHubExtensionSpec,
MemoryHubFulfillment,
)
from pydantic import SecretStr
spec = MemoryHubExtensionSpec.from_agent_card(agent_card)
metadata = MemoryHubExtensionClient(spec).fulfillment_metadata(
memoryhub_fulfillments={
"default": MemoryHubFulfillment(
url="https://memoryhub.example.com/mcp/",
api_key=SecretStr("..."),
)
}
)
The fulfillment metadata travels with the request. The server extracts it, connects to MemoryHub with the supplied credentials, and disposes of the connection when the request completes.
Environment variable fallback¶
For development and testing, kagenti ADK can resolve MemoryHub credentials from environment variables instead of client-side fulfillment:
| Variable | Purpose |
|---|---|
MEMORYHUB_URL |
MemoryHub MCP endpoint URL |
MEMORYHUB_API_KEY |
Static API key (simple path) |
MEMORYHUB_AUTH_URL |
OAuth token endpoint (OAuth path) |
MEMORYHUB_CLIENT_ID |
OAuth client ID |
MEMORYHUB_CLIENT_SECRET |
OAuth client secret |
When these variables are set, the server satisfies its own demand without requiring client-side fulfillment. This is convenient for local development but bypasses the security benefit of client-side credential management.
Comparison¶
Both approaches use the same MemoryHub backend and the same five-tier scoping model. The differences are in wiring and credential flow:
| Concern | fips-agents | kagenti ADK |
|---|---|---|
| Configuration | agent.yaml + .memoryhub.yaml |
A2A extension declaration |
| Dependency | memoryhub (pip) |
kagenti-adk[memoryhub] (pip) |
| Memory access | self.memory.search() / .write() |
store.search() / .create() |
| Credential management | Agent-side (env vars or Secret) | Client-side (fulfillment metadata) |
| Backend | MemoryHub MCP | MemoryHub MCP |
| Scoping model | Same 5-tier scoping | Same 5-tier scoping |
| Curation rejection | Handled by MemoryClient |
MemoryRejectionError exception |
The fips-agents path is simpler to set up and fits naturally into the
agent.yaml configuration model you've been using throughout the tutorial. The
kagenti ADK path is more flexible in multi-tenant environments where you don't
want agents storing credentials -- the client supplies them per-request.
What's next¶
The memory integration works with any agent that connects to MemoryHub -- the
backend is the same regardless of framework. If you're exploring multi-agent
architectures, the Where to Go Next page covers
Kagenti's broader platform capabilities including A2A communication, workload
identity, and MCP gateway routing. For fips-agents, the agent.yaml reference
documents additional memory backends
including SQLite and pgvector for environments where MemoryHub isn't available.