Research only. Do NOT auto-ship. Dated 2026-04-22. Author: Claude Code research agent. Stephanie format: Problem / Impact / Solution options / Data / Need.
titan-killer.service runs uvicorn app:app --host 0.0.0.0 --port 8765 with the default one worker process. Python’s GIL serialises CPU-bound work inside a single process, so any blocking DB call or CPU-bound handler stalls every other in-flight request on the box. Under current load it’s fine; under pre-launch traffic (homepage + emergency + sewer camera indexed + ad spend ramp) it is the most likely hot-spot to fail first.
The fix is not mechanical. app.py mounts 100+ routers including api.websocket (/ws/dispatch) and api.sse_stream (/api/stream), both of which assume a single-process shared Python state. Flipping to --workers 2 without additional plumbing silently breaks real-time dispatch updates.
--workers 2| File | Module-level state | Breakage class | User-visible symptom |
|---|---|---|---|
api/websocket.py L76 |
connected_boards: List[WebSocket] = [] |
CORRECTNESS | Dispatch board broadcast reaches ~50% of boards. The POST /api/dispatch/broadcast call runs in one worker; boards connected to the other worker never see the event. Silent 50% event loss. |
api/sse_stream.py |
per-client last_job_count, last_customer_count local (poll every 3s) |
DUPLICATED WORK | Each worker polls PG independently per connected SSE client. N workers × M clients × 20 queries/min. At 2 workers + 3 dashboards = 120 qpm of redundant COUNT(*) and revenue SUMs. Not wrong, just wasteful. |
api/customer_lookup.py L18 |
_HCP_CACHE = {"loaded": False, "data": {}} (full HCP dump, loaded once per process) |
MEMORY | ~5-15 MB HCP JSON per worker on first /api/customers/lookup hit. 2 workers = 2x load, 2x RSS. Functionally fine, it’s a cold cache that eventually warms in both processes. |
api/dns_list.py L22 |
_HCP_DNS_CACHE with 300s TTL |
CACHE MISS AMP | Each worker caches independently → cache miss rate doubles. Not a correctness bug. |
api/prompt_regenerate.py L106 |
_figma_cache = {} (5min TTL) |
CACHE MISS AMP | Figma API rate-limit risk doubles per worker. At 2 workers, a single Figma pull becomes 2 pulls worst case. |
api/zeus_rag.py L165 |
_embedding_cache = {} |
CACHE MISS AMP | OpenAI embedding cost doubles worst-case on first-warming. Warm state: identical. |
˜38 routers with DB_CONFIG, *_MAP, *_KEYWORDS, SEGMENTS, VALID_STAGES etc. |
Read-only module constants | SAFE | No mutation = no divergence. Always worker-safe. |
api/parallel_research.py L137 |
_DISPATCH = {"perplexity": ..., "openai": ..., "claude": ...} |
SAFE | Static function registry. Read-only. |
api/plaid.py |
tokens loaded via load_tokens() from file per request |
SAFE | File-backed, re-read per call. Safe. |
Summary: 1 correctness-critical module (websocket), 1 duplicated-work module (sse_stream), 4 cache-miss-amplification modules (harmless but wasteful), ~90 safe modules.
Why: current traffic does not justify the complexity bill of any multi-worker fix. titan-killer peak CPU is <15% per systemctl status. We are nowhere near saturation.
What to do:
--loop uvloop --http httptools to ExecStart for +30% single-process throughput at zero risk.asyncpg so the one event loop stops blocking./usr/local/bin/task_queue. Many already are.--limit-concurrency 200 and --backlog 4096 to fail fast instead of queueing forever.Cost: 1 engineer-hour. Risk: near zero (tunables only).
Prereq change: replace the connected_boards list in websocket.py with a Redis pub/sub fan-out:
# websocket.py rewrite (sketch)
import redis.asyncio as aioredis
REDIS = aioredis.from_url("redis://localhost:6379")
@router.websocket("/ws/dispatch")
async def dispatch_websocket(ws: WebSocket):
await ws.accept()
pubsub = REDIS.pubsub()
await pubsub.subscribe("dispatch:events")
try:
async for msg in pubsub.listen():
if msg["type"] == "message":
await ws.send_text(msg["data"].decode())
except WebSocketDisconnect:
await pubsub.unsubscribe("dispatch:events")
@router.post("/api/dispatch/broadcast")
async def broadcast(event: dict):
await REDIS.publish("dispatch:events", json.dumps(event))
ExecStart becomes: gunicorn app:app -k uvicorn.workers.UvicornWorker -w 2 -b 0.0.0.0:8765 --timeout 120. Redis is already installed on VM (confirm with redis-cli ping).
Cost: 4-6 engineer-hours (rewrite + test dispatch board under 2 workers). Risk: medium — websocket regression would be silently partial. Requires Playwright two-client test before ship.
/ws/, 2 workers, no code changenginx sits in front of uvicorn (port 80/443 → 8765). Add ip_hash on the upstream block so each client IP pins to a worker. Broadcasts still lose 50% of events across workers, but each WebSocket client continues to see ITS worker’s state.
This only works if broadcasts originate from inside the same worker that owns the websocket. In our case, POST /api/dispatch/broadcast comes from an HTTP client (dashboard) that may land on a different worker than the websocket client → still drops 50% of events.
Verdict: sticky sessions alone do NOT fix this. Need Redis (option B) anyway.
Run two systemd units:
titan-killer.service: uvicorn app:app, 2 workers, serves everything EXCEPT /ws/* and /api/stream/*titan-realtime.service: uvicorn app_realtime:app, 1 worker, serves only /ws/* + /api/stream/* on port 8766/ws/ and /api/stream/ to :8766, everything else to :8765Cost: 2-3 hours (new app_realtime.py factoring, nginx config, systemd unit). Risk: low — isolates the fragile part. No Redis needed. Broadcasts to :8766 still work in a single process. HTTP routes get 2x parallelism on :8765.
Gotcha: the POST /api/dispatch/broadcast HTTP route is in websocket.py, so it must also live on :8766. nginx config handles that.
| Metric | Current | Source |
|---|---|---|
| titan-killer memory RSS | 144 MB | systemctl status titan-killer.service |
| titan-killer CPU (9min uptime) | 12.5 CPU-sec ≈ 2.3% average | same |
| Requests per minute (steady state) | est. 20-40 rpm | journalctl -u titan-killer --since "1 hour ago" | grep "HTTP/1.1" |
| Longest recurring handler | /api/zeus/search ≈ 400-1200ms | journal timing |
| Connected WebSockets (peak seen) | typically 0-2 (dispatch boards) | logged via dispatch/status |
| Single-worker concurrency ceiling | ~100 in-flight (uvicorn default backlog) | uvicorn defaults |
Conclusion: we are operating at <3% of single-worker capacity. There is no throughput case for multi-worker today. The case IS for future-proofing + not being surprised by a Slack attention-spike when the new location tree ships.
Router state audit executed 2026-04-22 10:30 CT:
grep -HnE "connected_|_CACHE|_cache\s*=|_registry\s*=|active_connections|subscribers" \
/opt/nexus/titan/api/*.py | grep -v "def\s"
# 1 correctness risk: websocket.py:76 connected_boards
# 5 cache-amp cases: customer_lookup.py, dns_list.py, prompt_regenerate.py,
# zeus_rag.py, plus sse_stream.py poll state
# ~90 routers read-only: DB_CONFIG, *_MAP, *_KEYWORDS constants (safe)
Ship Option A (tunables only, stay 1 worker) NOW. Keep Option D in your back pocket for when traffic 10x’s.
--loop uvloop --http httptools --limit-concurrency 200.CPU > 60% or p95 > 2s on /api/zeus/search. Neither is happening.Gate before any multi-worker change: Rule 0 preflight + Rule 4 pre-commit check + Playwright two-client dispatch board test (open two browser windows, broadcast, verify both see the event). Log to MH before and after. This is NOT a 5-minute ship.
Open question: Does Robert want me to proceed with Option A tunables now, or hold until after homepage + emergency page ship? My vote is hold. Don’t change anything in the service path during a live page build.
Logged to BSP_Master_Session_History.html as section id bsp-apr22-uvicorn-scaling-analysis. Source artifacts: /opt/nexus/titan/app.py, /opt/nexus/titan/api/websocket.py, /opt/nexus/titan/api/sse_stream.py, /etc/systemd/system/titan-killer.service.