🧠 NEXUS EVOLUTION PROOF · DEEP-DIVE DEBRIEF

Why the Dashboard Says 🔴 CRITICAL and the Plumbers Don't

April 17, 2026 · 9:55 PM Central Time · For: Robert Dove · Status: Diagnosis complete, no code changed

🎯 The 60-Second Story

The dashboard's two 🔴 CRITICAL alerts are not a plumber problem. Techs didn't stop closing jobs.
On Apr 12, we quarantined nexus_titan_migration.py because it caused the Apr 3 phantom $6.4M fire. That was the right move.
But that script had three jobs: (1) insert ST jobs, (2) backfill invoice totals, (3) write the st_jobs_cache.json file. Only job #1 was reassigned (to the 15-min daemon). Jobs #2 and #3 became orphans.
The two alerts are watching those orphans. The monitors don't know the script was retired.
But under the monitors sits a chronic real gap: invoice_total is populated on only 9% to 55% of completed jobs across all of April, never higher. That is a real measurement problem, and it's why ST shows $13K/wk while Big Sale shows $226K/wk.
Two separate problems, one loud and cosmetic, one quiet and architectural. The quiet one is the one that matters.

① Patient Vitals what's actually running vs. what's not

🟢 Actually Alive

Component	Last heartbeat
`titan_sync_daemon.py`	1 min ago ✅ (every 15m)
`titan_invoice_sync.py`	05:15 CT today ✅
titan-killer.service (API)	active ✅
zeus modules (6 of 7)	all 🟢
ST API auth	pulled 169 invoices today ✅
Postgres INSERTs into `titan.jobs`	last 1h 33m ago ✅

🔴 What the Monitors Yell About

Alert	Why it fires
Anomaly `zero_invoice` 75%	6 of 8 recent jobs have no invoice_total
Data freshness: ST Jobs (6d)	`st_jobs_cache.json` mtime is Apr 12

Both are downstream of the same Apr 12 decision. Neither reflects a tech problem.

② The Pipeline Map live flow of one ST job from creation to dashboard

┌─────────────────────────────────────────────────────────────────────┐ │ 🏠 Customer calls · BSP schedules · Tech arrives · Job gets done │ └──────────────────────────────┬──────────────────────────────────────┘ │ Writes into ServiceTitan │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ 🏛️ ServiceTitan API (source of truth for operational data) │ │ /jobs /invoices /customers /estimates │ └──────────┬────────────────────┬────────────────────────┬────────────┘ │ │ │ every 15 min 05:15 CT daily RETIRED Apr 12 │ │ │ ▼ ▼ ▼ ┌───────────────────┐ ┌────────────────────┐ ┌─────────────────────┐ │ titan_sync_daemon │ │ titan_invoice_sync │ │ nexus_titan_migration│ │ │ │ │ │ (quarantined — was │ │ INSERT new jobs │ │ UPDATE invoice_ │ │ the $6.4M phantom) │ │ INSERT customers │ │ total WHERE st_id │ │ │ │ INSERT estimates │ │ matches │ │ Used to also write: │ │ │ │ │ │ • invoice backfill │ │ (no job_number, │ │ 169 invoices/day │ │ • job_number sync │ │ no invoice_total,│ │ 86–115 updates/day │ │ • st_jobs_cache.json│ │ no scheduled_at) │ │ │ │ │ └─────────┬─────────┘ └──────────┬─────────┘ └──────────┬──────────┘ │ │ │ └───────────────────────┴───────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ 🗄️ Postgres · bsp_analytics · titan.jobs (the live table) │ │ 11,831 jobs · last insert 1h 33m ago │ └────────────┬───────────────────────────────────────┬────────────────┘ │ │ ▼ ▼ ┌──────────────────────┐ ┌──────────────────────────┐ │ 🧠 Dashboards + APIs │ │ 👮 Anomaly detector │ │ (HCP, Stephanie, │ │ reads zero_invoice │ │ Big Sale, Audrey…) │ │ on 8 recent jobs │ │ │ │ 🔴 fires at 75% │ │ SHOWS: broken ST rev │ │ │ │ $13.7K/wk │ │ 👮 Session enforcer │ │ │ │ checks st_jobs_cache │ │ Real revenue lives │ │ mtime · 6d old │ │ in Big Sale $226K/wk │ │ 🔴 STALE │ └──────────────────────┘ └──────────────────────────┘

③ The Chronic Gap invoice_total populated rate, by day

Every row below is completed jobs. If tech work flows into invoices flows into invoice_total, this bar should be mostly full. It is not. And it has not been all month.

Day	Completed	With $	Populated %
Apr 18 (partial)	1	0	0%
Apr 17	10	0	0%
Apr 16	10	0	0%
Apr 15	11	6	55%
Apr 14	19	2	11%
Apr 13	9	1	11%
Apr 10	9	3	33%
Apr 08	11	1	9%
Apr 04	4	2	50%
Apr 02	6	3	50%

Average across April: ~26% of completed jobs carry a non-zero invoice_total. Peak day: 55%. That means 74% of the time, our own ST mirror does not know what a job was worth.

④ The Causal Chain Apr 3 fire · Apr 12 treaty · Apr 17 alert

📅 Apr 3 04:07 UTC ▼ 🔥 PHANTOM $6.4M discovered nexus_titan_migration.py:249 INSERT missing created_at 10,461 jobs stamped same timestamp · scheduled_at spans 5 years 📅 Apr 3 → Apr 12 ▼ 🧯 Incident response · Evolution Protocols v1 published 29-file blast radius documented in BSP_Data_Trust_Evolution_v1.html 📅 Apr 12 (The Nexus Treaty) ▼ 🔒 nexus_titan_migration.py → one_time_migrations/ + chmod -x 🔒 Postgres trigger guard added to titan.jobs (prevents bulk INSERT) 🔒 titan_sync_daemon.py takes over job INSERTs (every 15 min) 11,729 phantom rows quarantined · 292 → 128 timers BUT — three responsibilities were never reassigned: ❌ Invoice total backfill on older jobs ❌ job_number population ❌ st_jobs_cache.json daily write 📅 Apr 12 → Apr 17 (5 days) ▼ 🕳️ Orphaned work piles up Each day's completed jobs enter titan.jobs as skeletons and stay that way 📅 Apr 17 21:42 CT ▼ 🚨 Evolution Proof fires 🔴🔴 zero_invoice CRITICAL — 6 of 8 is over 70% threshold ST Jobs 6d stale — cache file mtime is Apr 12 00:00

⑤ The Math why "75%" is both technically correct and statistically loud

Small-sample noise check

Sample: 8 jobs · 6 zero-invoice · point estimate 75%. Wilson 95% confidence interval: 40.9% to 93.0%. On 8 data points, the true rate could be 41% or it could be 93%. The threshold bar at 70% is inside that interval.

Sample

n = 8

jobs scheduled ≤ 7d

Zero-invoice

invoice_total is null or 0

Point rate

75%

fires at ≥ 70%

95% CI

41 – 93%

Wilson score

But the signal is real at larger N

Widen to all completed jobs Apr 1 through Apr 18: 124 jobs total, 100 of them zero-invoice. That's an 80.6% zero rate on n=124. Wilson 95% CI: 73% to 87%. Statistically robust. The detector picked the wrong window, but the underlying finding is real.

Revenue implication

Evolution Proof reports $13,745/wk from ST and $226,703/wk from Big Sale. Ratio: 6.1%. If Big Sale is truth, ST is capturing only about 6% of real revenue. The 78% zero-invoice rate on our ST mirror and the 6% revenue capture ratio are the same story told two different ways.

⑥ Monitor vs Reality lane diagram

Alert the dashboard shows	What it's measuring	What it means in reality
🔴 CRIT zero_invoice 75%	Last 8 jobs with scheduled_at in 7d	Technically correct. Label ("techs not closing jobs") is wrong. Real cause: invoice sync doesn't backfill older jobs.
🔴 STALE ST Jobs 6d	mtime of `st_jobs_cache.json`	File was written by the quarantined migration script. DB itself is fresh. Monitor watches a ghost.
💪 56/100 Muscle score	How much Nexus is acting on data	Downstream of the first two. If ST mirror is 74% empty, action engines have thin signal.
🟢 6 of 7 data fresh	ad_throttler, 3cx, st_enforce, ai_intake, anomaly_log, ads_audit	Correct. All six write-every-day files are under 6h old.

⑦ The Fix Menu ten levers, ranked by impact × effort

#	Lever	Impact	Effort	Risk
A1	Write a small job that has `titan_sync_daemon.py` also emit `st_jobs_cache.json` on each cycle	Silences ST Jobs stale alert permanently	~30 min	🟢 Low
A2	Drop `st_jobs_cache.json` from `DATA_SOURCES` in `nexus_session_enforcer_v2.py` and replace with a live DB freshness query	Silences the alert AND makes the freshness check accurate	~20 min	🟢 Low
B1	Raise `check_zero_invoice_rate()` minimum sample from any to `n ≥ 20`	Stops small-N false alarms without hiding real gap	~10 min	🟢 Low
B2	Also exclude jobs completed in last 24h (give invoice sync a chance)	Cuts the residual daily lag noise	~10 min	🟢 Low
C1	Widen invoice sync window from 7 days to 30 days	Should lift population rate substantially on older jobs	~10 min + one re-run	🟡 Medium
C2	Change invoice sync key from `modifiedOnOrAfter` to pull invoices for all open jobs in last 30d regardless of invoice modify date	Catches jobs whose invoice was created but never modified	~1 hr	🟡 Medium
C3	Hook invoice sync to ST webhook `invoice.updated` so it is event-driven not cron	Near-realtime population · strongest fix	~3 hr (webhook listener exists)	🟡 Medium
D1	Diagnostic: sample 10 recent zero-invoice jobs, call ST API directly, compare totals	Tells us how many are sync-miss vs. genuinely $0 (warranty / declined)	~20 min	🟢 Low
D2	Remove the four cron lines that point into `/purgatory/` and `/backups/` (monday_sync, st_data_fixer, etc.)	Cleans noise; no functional change	~10 min	🟢 Low
E1	Rename the `zero_invoice` alert label from "Techs not closing jobs" to "ST mirror invoice coverage gap"	Stops misleading anyone who glances at the dashboard	~2 min	🟢 Low

⑧ My Recommendation what to do Monday morning

🎯 Sequenced plan

Today (quiet the noise): Do A2 + B1 + B2 + E1. All four are < 45 minutes total and do not touch the data pipeline. Dashboard goes 🟢 without pretending problems away.
Diagnostic before fix (D1): Pick 10 zero-invoice jobs, pull their invoices from ST directly, find out the split between "sync missed it" and "job truly has no invoice yet". This decides whether C1 or C3 is the right fix.
The real fix (C1 first, C3 later): Widen invoice sync window to 30 days as a one-line change. Watch the population rate for 48 hours. If still below 70%, promote to C3 event-driven.
Cleanup (D2): Retire the dead cron lines. These have been erroring daily since Apr 12 and add noise to every log file.

What I did not do: I touched zero code on the VM. All findings are read-only. Saying "yes" to any of A–E means me making the change with you reviewing before I restart the service.

🧭 How to read the Evolution Proof from now on

When the proof says 🔴 CRITICAL, ask three questions in this order:

Does the DB itself say the data is stale? (Query MAX(updated_at), not a cache file.)
Is the sample size big enough to trust the percentage? (Under n=20, treat any rate as a hint not a verdict.)
Is the ALERT LABEL telling you the cause, or just the symptom? Most of our labels describe symptoms.

The dashboard is a thermometer, not a diagnosis. It is very good at telling you something is off. It is not very good at telling you what.

Generated 2026-04-17 · 9:55 PM Central Time · /api/titan-audit + read-only DB queries
Logged to Master History as bsp-apr17-nexus-pipeline-deep-dive