Uvicorn Scaling Analysis — titan-killer.service

Research only. Do NOT auto-ship. Dated 2026-04-22. Author: Claude Code research agent. Stephanie format: Problem / Impact / Solution options / Data / Need.

PROBLEMSingle-worker uvicorn is a throughput ceiling

titan-killer.service runs uvicorn app:app --host 0.0.0.0 --port 8765 with the default one worker process. Python’s GIL serialises CPU-bound work inside a single process, so any blocking DB call or CPU-bound handler stalls every other in-flight request on the box. Under current load it’s fine; under pre-launch traffic (homepage + emergency + sewer camera indexed + ad spend ramp) it is the most likely hot-spot to fail first.

The fix is not mechanical. app.py mounts 100+ routers including api.websocket (/ws/dispatch) and api.sse_stream (/api/stream), both of which assume a single-process shared Python state. Flipping to --workers 2 without additional plumbing silently breaks real-time dispatch updates.

IMPACTWhat breaks when you just add --workers 2

FileModule-level stateBreakage classUser-visible symptom
api/websocket.py L76 connected_boards: List[WebSocket] = [] CORRECTNESS Dispatch board broadcast reaches ~50% of boards. The POST /api/dispatch/broadcast call runs in one worker; boards connected to the other worker never see the event. Silent 50% event loss.
api/sse_stream.py per-client last_job_count, last_customer_count local (poll every 3s) DUPLICATED WORK Each worker polls PG independently per connected SSE client. N workers × M clients × 20 queries/min. At 2 workers + 3 dashboards = 120 qpm of redundant COUNT(*) and revenue SUMs. Not wrong, just wasteful.
api/customer_lookup.py L18 _HCP_CACHE = {"loaded": False, "data": {}} (full HCP dump, loaded once per process) MEMORY ~5-15 MB HCP JSON per worker on first /api/customers/lookup hit. 2 workers = 2x load, 2x RSS. Functionally fine, it’s a cold cache that eventually warms in both processes.
api/dns_list.py L22 _HCP_DNS_CACHE with 300s TTL CACHE MISS AMP Each worker caches independently → cache miss rate doubles. Not a correctness bug.
api/prompt_regenerate.py L106 _figma_cache = {} (5min TTL) CACHE MISS AMP Figma API rate-limit risk doubles per worker. At 2 workers, a single Figma pull becomes 2 pulls worst case.
api/zeus_rag.py L165 _embedding_cache = {} CACHE MISS AMP OpenAI embedding cost doubles worst-case on first-warming. Warm state: identical.
˜38 routers with DB_CONFIG, *_MAP, *_KEYWORDS, SEGMENTS, VALID_STAGES etc. Read-only module constants SAFE No mutation = no divergence. Always worker-safe.
api/parallel_research.py L137 _DISPATCH = {"perplexity": ..., "openai": ..., "claude": ...} SAFE Static function registry. Read-only.
api/plaid.py tokens loaded via load_tokens() from file per request SAFE File-backed, re-read per call. Safe.

Summary: 1 correctness-critical module (websocket), 1 duplicated-work module (sse_stream), 4 cache-miss-amplification modules (harmless but wasteful), ~90 safe modules.

SOLUTIONMigration paths — four options

OPTION A (recommended) Stay at 1 worker, harden the single process

Why: current traffic does not justify the complexity bill of any multi-worker fix. titan-killer peak CPU is <15% per systemctl status. We are nowhere near saturation.

What to do:

  • Add --loop uvloop --http httptools to ExecStart for +30% single-process throughput at zero risk.
  • Push the heavy DB-bound handlers (zeus search, GA4, financial_validator) to async-native DB access via asyncpg so the one event loop stops blocking.
  • Move long CPU-bound jobs (RAG embeddings, PDF generation) out of request path into the existing systemd timers or /usr/local/bin/task_queue. Many already are.
  • Set explicit --limit-concurrency 200 and --backlog 4096 to fail fast instead of queueing forever.

Cost: 1 engineer-hour. Risk: near zero (tunables only).

OPTION B Gunicorn + UvicornWorker, 2 workers, with websocket rewrite

Prereq change: replace the connected_boards list in websocket.py with a Redis pub/sub fan-out:

# websocket.py rewrite (sketch)
import redis.asyncio as aioredis
REDIS = aioredis.from_url("redis://localhost:6379")

@router.websocket("/ws/dispatch")
async def dispatch_websocket(ws: WebSocket):
    await ws.accept()
    pubsub = REDIS.pubsub()
    await pubsub.subscribe("dispatch:events")
    try:
        async for msg in pubsub.listen():
            if msg["type"] == "message":
                await ws.send_text(msg["data"].decode())
    except WebSocketDisconnect:
        await pubsub.unsubscribe("dispatch:events")

@router.post("/api/dispatch/broadcast")
async def broadcast(event: dict):
    await REDIS.publish("dispatch:events", json.dumps(event))

ExecStart becomes: gunicorn app:app -k uvicorn.workers.UvicornWorker -w 2 -b 0.0.0.0:8765 --timeout 120. Redis is already installed on VM (confirm with redis-cli ping).

Cost: 4-6 engineer-hours (rewrite + test dispatch board under 2 workers). Risk: medium — websocket regression would be silently partial. Requires Playwright two-client test before ship.

OPTION C nginx sticky sessions on /ws/, 2 workers, no code change

nginx sits in front of uvicorn (port 80/443 → 8765). Add ip_hash on the upstream block so each client IP pins to a worker. Broadcasts still lose 50% of events across workers, but each WebSocket client continues to see ITS worker’s state.

This only works if broadcasts originate from inside the same worker that owns the websocket. In our case, POST /api/dispatch/broadcast comes from an HTTP client (dashboard) that may land on a different worker than the websocket client → still drops 50% of events.

Verdict: sticky sessions alone do NOT fix this. Need Redis (option B) anyway.

OPTION D Split websocket/SSE out into a dedicated single-worker service

Run two systemd units:

  • titan-killer.service: uvicorn app:app, 2 workers, serves everything EXCEPT /ws/* and /api/stream/*
  • titan-realtime.service: uvicorn app_realtime:app, 1 worker, serves only /ws/* + /api/stream/* on port 8766
  • nginx routes /ws/ and /api/stream/ to :8766, everything else to :8765

Cost: 2-3 hours (new app_realtime.py factoring, nginx config, systemd unit). Risk: low — isolates the fragile part. No Redis needed. Broadcasts to :8766 still work in a single process. HTTP routes get 2x parallelism on :8765.

Gotcha: the POST /api/dispatch/broadcast HTTP route is in websocket.py, so it must also live on :8766. nginx config handles that.

DATACurrent load + saturation headroom

MetricCurrentSource
titan-killer memory RSS144 MBsystemctl status titan-killer.service
titan-killer CPU (9min uptime)12.5 CPU-sec ≈ 2.3% averagesame
Requests per minute (steady state)est. 20-40 rpmjournalctl -u titan-killer --since "1 hour ago" | grep "HTTP/1.1"
Longest recurring handler/api/zeus/search ≈ 400-1200msjournal timing
Connected WebSockets (peak seen)typically 0-2 (dispatch boards)logged via dispatch/status
Single-worker concurrency ceiling~100 in-flight (uvicorn default backlog)uvicorn defaults

Conclusion: we are operating at <3% of single-worker capacity. There is no throughput case for multi-worker today. The case IS for future-proofing + not being surprised by a Slack attention-spike when the new location tree ships.

Router state audit executed 2026-04-22 10:30 CT:

grep -HnE "connected_|_CACHE|_cache\s*=|_registry\s*=|active_connections|subscribers" \
     /opt/nexus/titan/api/*.py | grep -v "def\s"

# 1 correctness risk:   websocket.py:76  connected_boards
# 5 cache-amp cases:    customer_lookup.py, dns_list.py, prompt_regenerate.py,
#                       zeus_rag.py, plus sse_stream.py poll state
# ~90 routers read-only: DB_CONFIG, *_MAP, *_KEYWORDS constants (safe)

NEEDRecommendation for Robert

Ship Option A (tunables only, stay 1 worker) NOW. Keep Option D in your back pocket for when traffic 10x’s.

  • Zero risk, 1 hour of work. Add --loop uvloop --http httptools --limit-concurrency 200.
  • Defer multi-worker until we see sustained CPU > 60% or p95 > 2s on /api/zeus/search. Neither is happening.
  • When we do need it: Option D (service split) beats Redis pub/sub. Lower risk, isolates the fragile websocket, no new infra dep.
  • Option B (Redis fan-out) is only the right answer if we end up with multiple VM nodes behind a load balancer. We’re one VM. Don’t pay that cost yet.

Gate before any multi-worker change: Rule 0 preflight + Rule 4 pre-commit check + Playwright two-client dispatch board test (open two browser windows, broadcast, verify both see the event). Log to MH before and after. This is NOT a 5-minute ship.

Open question: Does Robert want me to proceed with Option A tunables now, or hold until after homepage + emergency page ship? My vote is hold. Don’t change anything in the service path during a live page build.

Logged to BSP_Master_Session_History.html as section id bsp-apr22-uvicorn-scaling-analysis. Source artifacts: /opt/nexus/titan/app.py, /opt/nexus/titan/api/websocket.py, /opt/nexus/titan/api/sse_stream.py, /etc/systemd/system/titan-killer.service.