9. File Uploads and Document Processing¶
Your hardened stack handles chat, tool calls, code execution, and observability.
The next thing real users will ask for is the ability to attach a document
and have the agent read it. PDFs of regulations. Spreadsheets of measurements.
Slide decks of meeting notes. In this module you'll add the file-upload track
end-to-end: the agent's /v1/files endpoint, the streaming gateway proxy, the
drag-drop UI, Docling-based parsing, and a ClamAV sidecar for virus scanning.
What you'll build¶
Browser ──▶ UI ──▶ Gateway ──▶ Agent
(drag, (size cap + (POST /v1/files,
paste, MIME allowlist, Docling parse,
chip) streaming proxy) FileStore persist)
│
▼
ClamAV sidecar
(localhost:8088)
The pipeline runs in this order on every upload: size cap → MIME sniff → virus scan → parse → persist. Each layer is independent — disabling the scanner doesn't change the parser, swapping the FileStore backend doesn't change the gateway. By the end of this module a user will be able to drop a PDF onto the chat input, watch a progress chip while it uploads, and ask questions whose answers come from the document's contents.
Use cases¶
| Use case | What the agent does |
|---|---|
| Document Q&A | "Summarise sections 3 and 4 of this regulation" |
| Report analysis | "Pull the action items from the attached meeting notes" |
| Data ingestion | "Validate this CSV against the schema and flag anomalies" |
| Compliance review | "Cross-check this contract against the policy library" |
In every case the agent's job is the same: receive an opaque file_id,
retrieve the extracted text, and reason over it. The agent server handles
everything between "user clicked attach" and "extracted text is ready".
Configuring the agent server¶
File uploads are off by default. Toggle them on with three changes to
agent.yaml:
server:
storage:
backend: ${STORAGE_BACKEND:-sqlite} # files needs persistence
sqlite_path: ${SQLITE_PATH:-./agent.db}
files:
enabled: ${FILES_ENABLED:-true}
backend: ${FILES_BACKEND:-sqlite} # sqlite | postgres | "" (Null)
max_file_size_bytes: 52428800 # 50 MiB
bytes_dir: ./files # PVC mount in production
allowed_mime_types: [] # empty defers to gateway
scanner:
url: "${FILES_SCANNER_URL:-}" # ClamAV sidecar URL
timeout_seconds: 30.0
fail_mode: ${FILES_SCANNER_FAIL_MODE:-open}
The [files] extra pulls in Docling (text extraction) and
python-magic (content-based MIME sniffing). It adds ~5 GB to the
container image because of Docling's torch and transformers dependencies, so
keep it opt-in for agents that don't ingest files:
Rebuild the container with make build after enabling the extra. The
endpoint surface comes online automatically:
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/files |
Upload a file (multipart) |
GET |
/v1/files?session_id=<id> |
List a session's files |
GET |
/v1/files/{file_id} |
Fetch metadata for a single file |
DELETE |
/v1/files/{file_id} |
Remove metadata + bytes |
A successful upload returns the JSON metadata record:
{
"file_id": "file_abc123def456...",
"filename": "report.pdf",
"mime_type": "application/pdf",
"size_bytes": 124567,
"sha256": "9f2b...",
"parse_status": "completed",
"session_id": "s_42",
"created_at": "2026-04-28T18:09:42+00:00"
}
Hold on to that file_id. Pass it back to /v1/chat/completions via the
file_ids array, and the agent server injects the file's extracted text into
the conversation before the LLM sees the user's message.
Here's what a request with file references looks like (you'll run this for real in the lab exercise below):
curl http://my-agent:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"messages": [{"role": "user", "content": "What are the action items?"}],
"file_ids": ["file_abc123def456..."]
}'
No tool call required. The injection is automatic.
Docling: format support and parser semantics¶
PlaintextParser handles text/*, application/json, and a small structured-
text allowlist (markdown, csv, yaml). Everything else routes to Docling,
which converts a wide range of formats to clean Markdown:
| Format | MIME types | What Docling extracts |
|---|---|---|
application/pdf |
Text, tables, headings, page structure | |
| Office documents | application/vnd.openxmlformats-officedocument.* |
Word, Excel, PowerPoint content |
| HTML | text/html |
Stripped of nav/script, content preserved |
| Images | image/png, image/jpeg |
OCR via the bundled OCR engine |
Parsing runs inline — the upload response only returns once
parse_status is completed (or failed). For large documents this can
take seconds. The agent server runs Docling under asyncio.to_thread so it
doesn't block the event loop, but if your typical upload is a 200-page PDF
you'll want background-queue parsing (still a future agent-server feature
at the time of writing).
parse_status transitions:
pending— bytes uploaded, parsing not yet attemptedprocessing— parse in flightcompleted—extracted_textis populatedfailed—parse_erroris populatedskipped— file type intentionally not parsed (binary, unknown)
A failed parse still persists the upload — the user gets a file_id they
can reference, and a tool can still inspect the raw bytes via
get_bytes(file_id).
Persistence: SQLite, Postgres, and bytes layout¶
The FileStore ABC has three concrete backends today:
null— uploads accepted then discarded (smoke testing, demos).sqlite— metadata in SQLite, bytes on local FS sharded byfile_idprefix. Single-replica only.postgres— metadata in Postgres, bytes still on local FS. Use when the agent runs alongside other Postgres-backed features (sessions, feedback, traces) so you only run one database.
Both sqlite and postgres write bytes to bytes_dir, which must be a
PVC in production. Without one, every pod restart loses uploaded files.
The reference Helm values put a sane default in place:
# chart/values.yaml — production fragment
config:
STORAGE_BACKEND: postgres
# DATABASE_URL is injected from a Secret — see Module 8. Use the
# postgresql:// scheme; the actual credentials never live in values.yaml.
FILES_ENABLED: "true"
FILES_BACKEND: postgres
files:
enabled: true
backend: postgres
# bytes_dir defaults to ./files inside the container; mount a PVC there.
S3-compatible bytes backend
The agent server supports an S3-compatible BytesStore backend
(bytes_backend.type: s3 in agent.yaml). Use it for multi-replica
deployments so pods don't need a shared ReadWriteMany PVC.
MinIO as the bytes target¶
The recommended S3-compatible setup is a MinIO deployment alongside the agent:
helm install minio bitnami/minio \
-n calculus-agent \
--set auth.rootUser=agent --set auth.rootPassword='<from-secret>' \
--set defaultBuckets=agent-files
Then point the agent at the MinIO service in agent.yaml:
files:
bytes_backend:
type: s3
s3:
endpoint: http://minio:9000
bucket: agent-files
access_key_secret: minio-credentials
MinIO is on-cluster, supports the S3 API, and works with the FIPS-validated TLS settings discussed in Module 8.
Security: defense in depth¶
User-uploaded files are an obvious attack surface. The stack runs three independent security layers, and the layered behavior is not an accident — each one catches a class of failure the others can miss.
Layer 1: Size cap (gateway and agent)¶
The gateway enforces GATEWAY_FILES_MAX_BYTES (default 25 MiB) by
inspecting Content-Length and falling back to a MaxBytesReader mid-
stream. Requests over the cap return 413 before any byte reaches the
agent. The agent then enforces a second cap via
server.files.max_file_size_bytes (default 50 MiB) — set the agent cap
higher than the gateway cap so honest uploads aren't double-rejected.
Layer 2: MIME validation¶
The gateway validates the multipart file part's declared Content-Type
against GATEWAY_FILES_ALLOWED_MIME (supports image/* wildcards). The
agent runs libmagic content sniffing on the bytes themselves and
re-validates against server.files.allowed_mime_types. A client cannot
rename evil.exe to report.pdf and lie about the MIME header — libmagic
reads magic bytes:
# fipsagents/server/files.py — abridged
def detect_mime(data: bytes) -> str | None:
# python-magic + libmagic; falls back to client claim with a warning
# when libmagic isn't available.
...
# In the upload handler:
sniffed = detect_mime(data)
mime_type = sniffed or file.content_type or "application/octet-stream"
if files_cfg.allowed_mime_types and mime_type not in files_cfg.allowed_mime_types:
raise HTTPException(status_code=415, detail=...)
Layer 3: Virus scanning (ClamAV sidecar)¶
For production deployments, run ClamAV as a sidecar container in the
same pod. The agent's HttpScanner POSTs each upload to a configurable URL
and expects a JSON response of {"infected": bool, "viruses": [...]}:
# chart/values.yaml — enable the sidecar
files:
enabled: true
virusScanner:
enabled: true
failMode: closed # production: 503 the upload on scanner errors
image:
repository: clamav/clamav
tag: stable
persistence:
enabled: true # PVC for the signature database
size: 2Gi
The Helm chart wires FILES_SCANNER_URL=http://localhost:8088/scan onto
the agent container automatically when both files.enabled and
files.virusScanner.enabled are true.
ClamAV requires an HTTP shim
The reference clamav/clamav:stable image ships clamd on TCP 3310 but
does not expose the {infected, viruses} JSON contract on its own.
Either build a custom image with a small FastAPI shim that wraps clamd,
or deploy a tiny adapter container alongside it. The agent server's
contract is documented at
packages/fipsagents/src/fipsagents/server/scanner.py.
fail_mode controls behavior when the scanner is unreachable: open
(accept the upload, log a warning — fine for development) or closed
(return 503 — production-recommended). Set it to closed once you have
confidence in the sidecar's uptime.
The gateway: streaming proxy, not just pass-through¶
The gateway-template's FilesUploadHandler is purpose-built for multipart:
it walks the inbound multipart synchronously up to the first file part,
validates its Content-Type against the allowlist, then streams the body
through to the agent via a multipart.Writer over an io.Pipe. The body
is never buffered in gateway memory.
This matters because the gateway is the customer-facing entry point and needs to fail fast on obviously-bad uploads. Validation happens before the upstream request fires, so a 415 response never costs a TCP connection to the agent. Configure via three env vars:
| Variable | Default | Description |
|---|---|---|
GATEWAY_FILES_MAX_BYTES |
25m (25 MB) |
Hard cap; supports k/m/g |
GATEWAY_FILES_ALLOWED_MIME |
unset | Comma-separated allowlist; image/* ok |
GATEWAY_FILES_UPLOAD_TIMEOUT |
5m |
Per-request deadline for backend POST |
The Helm chart's files block sets all three. Match the agent's caps and
allowlist so a request the gateway accepts never gets bounced by the
agent on the next layer.
The UI: drag, paste, and progress chips¶
The chat UI exposes file upload through three input affordances:
- Drag-and-drop — dragging files over the input shows a dashed-border drop zone overlay
- Paste — pasting a clipboard image or file from Finder triggers an upload alongside any typed message
- File picker — a paperclip button opens the OS file dialog
Each attachment renders as a chip with a real determinate progress bar
(via XMLHttpRequest's upload-progress events; fetch lacks them). The
chip stores {status: uploading | ready | failed, progress, file_id?,
error?}. Send is disabled while any chip is still uploading.
When the user submits, the UI snapshots readyFileIds() and includes them
on the chat-completion request body as file_ids: [...]. Failed chips
don't block sending — they have no file_id so they're effectively
no-ops.
Server-side errors surface directly on the chip:
| Status | Surfaced as |
|---|---|
| 413 | "File too large" |
| 415 | "File type not allowed" |
| 422 | "File rejected (virus scan)" |
| any other | JSON error field or "HTTP NNN" |
Configure the UI's pre-flight cap via two env vars surfaced through
/api/config:
# chart/values.yaml for the UI
files:
maxBytes: "25m" # match the gateway
allowedMime: "application/pdf,image/*" # optional, defers to gateway when empty
Lab exercise: PDF Q&A on the calculus agent¶
Add a PDF Q&A capability to the calculus agent you've been building.
Step 1: Enable the upload track
Edit calculus-agent/agent.yaml:
server:
storage:
backend: sqlite
sqlite_path: ./agent.db
files:
enabled: true
backend: sqlite
max_file_size_bytes: 52428800
allowed_mime_types:
- application/pdf
- text/plain
Add the files extra to pyproject.toml under [project.optional-dependencies],
then reinstall. Before:
After:
Update the Containerfile so the container image installs the files extra.
Find the pip install line in the builder stage and change it:
Without this change, make build produces an image without Docling or
python-magic, and /v1/files will fail at runtime with an import error.
Then install locally. Make sure you are in the calculus-agent/ directory:
Step 2: Test the endpoint locally
Start the agent and wait for it to become healthy (backgrounding the process does not guarantee the HTTP port is open yet):
make run-local &
until curl -sf http://localhost:8080/healthz >/dev/null 2>&1; do sleep 1; done
echo "%PDF-1.4 stub" > /tmp/example.pdf
curl -F "file=@/tmp/example.pdf" http://localhost:8080/v1/files | jq
You should see a file_id and parse_status: "completed" (or skipped
for the stub).
Step 3: Try a real document
Use any PDF file you have available, or create a simple test file:
Upload the file and capture the returned file_id:
Verify the upload succeeded and the file ID was captured:
If this is empty, the upload failed -- check the curl output above for errors.
Now ask the agent a question referencing the uploaded file:
curl http://localhost:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d "{
\"messages\": [{\"role\": \"user\", \"content\": \"Summarise this document in 3 bullets.\"}],
\"file_ids\": [\"$(cat /tmp/file_id)\"]
}" | jq -r '.choices[0].message.content'
The agent answers using the document's content. No tool call was needed -- the agent server injected the extracted text before the LLM saw the prompt.
Step 4: Verify the security controls
Try uploading something disallowed:
echo MZbinary > /tmp/evil.exe
curl -i -F "file=@/tmp/evil.exe;type=application/x-msdownload" http://localhost:8080/v1/files
You should see 415 Unsupported Media Type with a detail message
indicating the MIME type is not in the allowlist.
Try uploading something oversized:
dd if=/dev/zero of=/tmp/huge.pdf bs=1M count=100 2>/dev/null
curl -i -F "file=@/tmp/huge.pdf;type=application/pdf" http://localhost:8080/v1/files
You should see 413 Request Entity Too Large.
Step 5: Deploy with the gateway and UI
Multi-cluster safety
The remaining commands use $CTX for --kube-context and
--context flags. Set it to the name of your kubeconfig context
so you do not accidentally target the wrong cluster:
Once the agent is happy locally, redeploy. Make sure you're in the
calculus-agent/ directory:
oc start-build calculus-agent --from-dir=. --follow -n calculus-agent --context="$CTX"
oc rollout restart deployment/calculus-agent -n calculus-agent --context="$CTX"
Update the gateway and UI to enable file upload support. Navigate to the directory where you scaffolded these projects in Module 5:
helm upgrade calculus-gateway ../calculus-gateway/chart/ \
--kube-context="$CTX" -n calculus-gateway \
--reuse-values \
--set files.maxBytes=25m \
--set files.allowedMime="application/pdf,image/*"
helm upgrade calculus-ui ../calculus-ui/chart/ \
--kube-context="$CTX" -n calculus-ui \
--reuse-values \
--set files.maxBytes=25m
Chart paths
The ../calculus-gateway/chart/ and ../calculus-ui/chart/ paths
assume you're running from calculus-agent/ and that all projects
are sibling directories. Adjust the paths if your layout differs.
Open the UI, drag a PDF onto the chat input, watch the progress chip, then ask a question.
Verifying everything is wired up¶
Check that file uploads are enabled on the agent:
AGENT_ROUTE=$(oc get route calculus-agent --context="$CTX" -n calculus-agent -o jsonpath='{.spec.host}')
curl -sk "https://$AGENT_ROUTE/v1/agent-info" | jq '.server.files.enabled'
Check the gateway's configured caps:
Verify the UI exposes the config:
UI_ROUTE=$(oc get route calculus-ui --context="$CTX" -n calculus-ui -o jsonpath='{.spec.host}')
curl -sk "https://$UI_ROUTE/api/config" | jq
Round-trip test -- upload through the gateway and confirm the full path works. First, look up the gateway route:
GATEWAY_ROUTE=$(oc get route calculus-gateway --context="$CTX" -n calculus-gateway -o jsonpath='{.spec.host}')
echo "$GATEWAY_ROUTE"
Then upload a test file through it:
If any layer rejects, the error surfaces with a JSON message — read the status code first, then the body. The most common gotchas are gateway/agent allowlist mismatches and forgotten PVCs eating uploaded files on pod restart.
What's next¶
You now have file ingestion wired across the full stack. Some directions to take it further:
- Background-queue parsing for large documents (multi-hundred-page PDFs).
- Chunking + pgvector retrieval so the agent doesn't dump entire documents into context. Needs a fresh ADR before code.
- Custom parsers beyond Docling — for example, the
xml-analysis-framework
for S1000D technical documentation, plugged in via the
FileParserABC. - Per-tenant quotas so a single user can't monopolise the bytes PVC.
The patterns from this module — opt-in extras, layered security, streaming proxies, and explicit failure modes — apply equally to any new ingestion surface you bolt onto the agent. File uploads are the first; webhooks, message-queue ingestion, and scheduled pulls follow the same shape.