BSP Uvicorn Scaling Analysis — titan-killer.service

PROBLEMSingle-worker uvicorn is a throughput ceiling

titan-killer.service runs uvicorn app:app --host 0.0.0.0 --port 8765 with the default one worker process. Python’s GIL serialises CPU-bound work inside a single process, so any blocking DB call or CPU-bound handler stalls every other in-flight request on the box. Under current load it’s fine; under pre-launch traffic (homepage + emergency + sewer camera indexed + ad spend ramp) it is the most likely hot-spot to fail first.

The fix is not mechanical. app.py mounts 100+ routers including api.websocket (/ws/dispatch) and api.sse_stream (/api/stream), both of which assume a single-process shared Python state. Flipping to --workers 2 without additional plumbing silently breaks real-time dispatch updates.

IMPACTWhat breaks when you just add `--workers 2`

File	Module-level state	Breakage class	User-visible symptom
`api/websocket.py` L76	`connected_boards: List[WebSocket] = []`	CORRECTNESS	Dispatch board broadcast reaches ~50% of boards. The `POST /api/dispatch/broadcast` call runs in one worker; boards connected to the other worker never see the event. Silent 50% event loss.
`api/sse_stream.py`	per-client `last_job_count`, `last_customer_count` local (poll every 3s)	DUPLICATED WORK	Each worker polls PG independently per connected SSE client. N workers × M clients × 20 queries/min. At 2 workers + 3 dashboards = 120 qpm of redundant COUNT(*) and revenue SUMs. Not wrong, just wasteful.
`api/customer_lookup.py` L18	`_HCP_CACHE = {"loaded": False, "data": {}}` (full HCP dump, loaded once per process)	MEMORY	~5-15 MB HCP JSON per worker on first `/api/customers/lookup` hit. 2 workers = 2x load, 2x RSS. Functionally fine, it’s a cold cache that eventually warms in both processes.
`api/dns_list.py` L22	`_HCP_DNS_CACHE` with 300s TTL	CACHE MISS AMP	Each worker caches independently → cache miss rate doubles. Not a correctness bug.
`api/prompt_regenerate.py` L106	`_figma_cache = {}` (5min TTL)	CACHE MISS AMP	Figma API rate-limit risk doubles per worker. At 2 workers, a single Figma pull becomes 2 pulls worst case.
`api/zeus_rag.py` L165	`_embedding_cache = {}`	CACHE MISS AMP	OpenAI embedding cost doubles worst-case on first-warming. Warm state: identical.
˜38 routers with `DB_CONFIG`, `_MAP`, `_KEYWORDS`, `SEGMENTS`, `VALID_STAGES` etc.	Read-only module constants	SAFE	No mutation = no divergence. Always worker-safe.
`api/parallel_research.py` L137	`_DISPATCH = {"perplexity": ..., "openai": ..., "claude": ...}`	SAFE	Static function registry. Read-only.
`api/plaid.py`	`tokens` loaded via `load_tokens()` from file per request	SAFE	File-backed, re-read per call. Safe.

Summary: 1 correctness-critical module (websocket), 1 duplicated-work module (sse_stream), 4 cache-miss-amplification modules (harmless but wasteful), ~90 safe modules.

SOLUTIONMigration paths — four options

OPTION A (recommended) Stay at 1 worker, harden the single process

Why: current traffic does not justify the complexity bill of any multi-worker fix. titan-killer peak CPU is <15% per systemctl status. We are nowhere near saturation.

What to do:

Add --loop uvloop --http httptools to ExecStart for +30% single-process throughput at zero risk.
Push the heavy DB-bound handlers (zeus search, GA4, financial_validator) to async-native DB access via asyncpg so the one event loop stops blocking.
Move long CPU-bound jobs (RAG embeddings, PDF generation) out of request path into the existing systemd timers or /usr/local/bin/task_queue. Many already are.
Set explicit --limit-concurrency 200 and --backlog 4096 to fail fast instead of queueing forever.

Cost: 1 engineer-hour. Risk: near zero (tunables only).

OPTION B Gunicorn + UvicornWorker, 2 workers, with websocket rewrite

Prereq change: replace the connected_boards list in websocket.py with a Redis pub/sub fan-out:

# websocket.py rewrite (sketch)
import redis.asyncio as aioredis
REDIS = aioredis.from_url("redis://localhost:6379")

@router.websocket("/ws/dispatch")
async def dispatch_websocket(ws: WebSocket):
    await ws.accept()
    pubsub = REDIS.pubsub()
    await pubsub.subscribe("dispatch:events")
    try:
        async for msg in pubsub.listen():
            if msg["type"] == "message":
                await ws.send_text(msg["data"].decode())
    except WebSocketDisconnect:
        await pubsub.unsubscribe("dispatch:events")

@router.post("/api/dispatch/broadcast")
async def broadcast(event: dict):
    await REDIS.publish("dispatch:events", json.dumps(event))

ExecStart becomes: gunicorn app:app -k uvicorn.workers.UvicornWorker -w 2 -b 0.0.0.0:8765 --timeout 120. Redis is already installed on VM (confirm with redis-cli ping).

Cost: 4-6 engineer-hours (rewrite + test dispatch board under 2 workers). Risk: medium — websocket regression would be silently partial. Requires Playwright two-client test before ship.

OPTION C nginx sticky sessions on `/ws/`, 2 workers, no code change

nginx sits in front of uvicorn (port 80/443 → 8765). Add ip_hash on the upstream block so each client IP pins to a worker. Broadcasts still lose 50% of events across workers, but each WebSocket client continues to see ITS worker’s state.

This only works if broadcasts originate from inside the same worker that owns the websocket. In our case, POST /api/dispatch/broadcast comes from an HTTP client (dashboard) that may land on a different worker than the websocket client → still drops 50% of events.

Verdict: sticky sessions alone do NOT fix this. Need Redis (option B) anyway.

OPTION D Split websocket/SSE out into a dedicated single-worker service

Run two systemd units:

titan-killer.service: uvicorn app:app, 2 workers, serves everything EXCEPT /ws/* and /api/stream/*
titan-realtime.service: uvicorn app_realtime:app, 1 worker, serves only /ws/* + /api/stream/* on port 8766
nginx routes /ws/ and /api/stream/ to :8766, everything else to :8765

Cost: 2-3 hours (new app_realtime.py factoring, nginx config, systemd unit). Risk: low — isolates the fragile part. No Redis needed. Broadcasts to :8766 still work in a single process. HTTP routes get 2x parallelism on :8765.

Gotcha: the POST /api/dispatch/broadcast HTTP route is in websocket.py, so it must also live on :8766. nginx config handles that.

DATACurrent load + saturation headroom

Metric	Current	Source
titan-killer memory RSS	144 MB	`systemctl status titan-killer.service`
titan-killer CPU (9min uptime)	12.5 CPU-sec ≈ 2.3% average	same
Requests per minute (steady state)	est. 20-40 rpm	`journalctl -u titan-killer --since "1 hour ago" \| grep "HTTP/1.1"`
Longest recurring handler	`/api/zeus/search` ≈ 400-1200ms	journal timing
Connected WebSockets (peak seen)	typically 0-2 (dispatch boards)	logged via `dispatch/status`
Single-worker concurrency ceiling	~100 in-flight (uvicorn default backlog)	uvicorn defaults

Conclusion: we are operating at <3% of single-worker capacity. There is no throughput case for multi-worker today. The case IS for future-proofing + not being surprised by a Slack attention-spike when the new location tree ships.

Router state audit executed 2026-04-22 10:30 CT:

grep -HnE "connected_|_CACHE|_cache\s*=|_registry\s*=|active_connections|subscribers" \
     /opt/nexus/titan/api/*.py | grep -v "def\s"

# 1 correctness risk:   websocket.py:76  connected_boards
# 5 cache-amp cases:    customer_lookup.py, dns_list.py, prompt_regenerate.py,
#                       zeus_rag.py, plus sse_stream.py poll state
# ~90 routers read-only: DB_CONFIG, *_MAP, *_KEYWORDS constants (safe)

NEEDRecommendation for Robert

Gate before any multi-worker change: Rule 0 preflight + Rule 4 pre-commit check + Playwright two-client dispatch board test (open two browser windows, broadcast, verify both see the event). Log to MH before and after. This is NOT a 5-minute ship.

Open question: Does Robert want me to proceed with Option A tunables now, or hold until after homepage + emergency page ship? My vote is hold. Don’t change anything in the service path during a live page build.

Logged to BSP_Master_Session_History.html as section id bsp-apr22-uvicorn-scaling-analysis. Source artifacts: /opt/nexus/titan/app.py, /opt/nexus/titan/api/websocket.py, /opt/nexus/titan/api/sse_stream.py, /etc/systemd/system/titan-killer.service.

Uvicorn Scaling Analysis — titan-killer.service