The 60-Second Version
For Kalen and Stephanie. Plain English. No jargon.
📝 Why This Document Exists
This document was created because Kalen was right that the numbers were wrong. It exists as three things at once:
What Happened Critical
At 04:07 UTC on April 3, 2026, a single process injected 10,461 rows into titan.jobs. Every row had created_at stamped at the ingestion minute, but scheduled_at spanning November 30, 2020 through April 2, 2026. All rows carried lead_source = "Imported Default Campaign". Of those, 10,444 were marked job_status = completed with a total invoice_total sum of $6,380,257.
Every downstream query that filtered by created_at BETWEEN '2026-04-03' AND '2026-04-04' saw six years of historical job data compressed into one fake minute. The math engine's weekly_revenue formula and /api/math/validate-revenue endpoint began returning the phantom number. The story API's revenue narrative inherited it. The Sacred HTML's revenue card inherited it. The causal validator's google_ads_to_revenue chain inherited it.
The Timeline
Proof of Injection
From apr3_phantom_sync_investigation.py output:
Why This Is A Pattern, Not A One-Off
After pinpointing the phantom to nexus_titan_migration.py line 249, a blast-radius search was run against /opt/nexus/titan and /opt/nexus/nexus/scripts for every INSERT into a titan table that does not explicitly set created_at. The grep returned 29 files with the same pattern, including two that write to titan.jobs (the phantom source) and seven that write to tables which are currently flagged as stale in the freshness scan.
This means the bug is not a one-off that can be fixed with a single patch. It is a codebase-wide convention where developers (including me, past me) relied on Postgres defaults for created_at instead of explicitly passing the authoritative timestamp from the source system. Every one of these writes is a future phantom waiting to happen.
The Blast Radius Inventory
| Severity | File | Target Table | Why It Matters |
|---|---|---|---|
| Crit | nexus/scripts/nexus_titan_migration.py:249 | titan.jobs | Apr 3 phantom source. 10,461 rows injected. Must fix first. |
| Crit | titan/titan_sync_daemon.py:206 | titan.jobs | Second writer to titan.jobs. Runs as a daemon, not a one-shot. Every write stamps NOW(). |
| Crit | titan/titan_sync_daemon.py:273 | titan.estimates | Estimates carry sold_by_id attribution. Same bug class. |
| Crit | titan/api/portal.py:173 | titan.jobs | Third writer to titan.jobs. Customer portal path. |
| High | titan/api/estimates.py:141 | titan.estimates | Estimate creation endpoint. |
| High | titan/api/estimates.py:261 | titan.invoices | Invoice creation. Already a source of the 71% zero-invoice problem. |
| High | titan/api/threecx.py:98 | titan.call_logs | call_logs is STALE 437 hours. Explains why freshness broke. |
| High | titan/api/vapi_voice.py:157 | titan.voice_calls | Daniel AI call ingestion. |
| High | titan/api/vapi_voice.py:254 | titan.daniel_learning | Daniel RAG learning loop. |
| High | titan/api/sms.py:197 | titan.call_logs | Second writer to call_logs. Stale table. |
| High | titan/api/sms.py:257 | titan.call_logs | Third writer to call_logs. |
| High | titan/nexus_st_webhook_listener.py:529 | titan.gclid_captures | GCLID pipeline writes. Attribution bridge. |
| High | titan/nexus_st_webhook_listener.py:797 | titan.invoice_events | Invoice events drive the webhook to Google Ads. Same pattern. |
| High | titan/nexus_st_webhook_listener.py:854 | titan.google_ads_conversion_log | Conversion upload log. Same pattern. |
| Med | titan/nexus_gclid_bridge_v2.py:119 | titan.gclid_captures | GCLID bridge. |
| Med | titan/gclid_bridge.py:79 | titan.gclid_captures | Second GCLID bridge. |
| Med | titan/nexus_doctor.py:236 | titan.google_ads_conversion_log | Doctor repair path. |
| Med | titan/nexus_blog_scraper.py:237 | titan.knowledge_base | RAG ingestion. Lower severity because knowledge_base time filtering is less critical. |
| Med | titan/nexus_playbook_indexer.py:169 | titan.knowledge_base | Playbook RAG indexer. |
| Med | titan/nexus_email_rag_ingester.py:368 | titan.knowledge_base | Email RAG ingester. |
| Med | titan/nexus_daniel_rag_bridge.py:178 | titan.knowledge_gaps | RAG gap tracking. |
| Med | titan/nexus_metric_drift_detector.py:104 | titan.metric_history | Metric drift detection. |
| Med | titan/api/tasks.py:172 | titan.trust_audit | Trust audit insertion. |
| Med | titan/api/notifications.py:120 | titan.trust_audit | Second trust_audit writer. |
| Med | titan/api/xact_pricing.py:112 | titan.xact_estimates | Xactimate estimate creation. |
| Med | titan/api/xact_pricing.py:137 | titan.xact_estimate_items | Xactimate line items. |
| Med | titan/api/transcript_repo.py:204 | titan.call_transcripts | Transcript repository. |
| Med | titan/api/eta_relay.py:250 | titan.tech_relays | ETA relay system. |
| Med | titan/api/feedback_loop.py:18,42,45 | titan.citation_log, knowledge_base | Feedback loop writers. |
Test Methodology
Every formula registered in the math engine was hit with a curl against /api/math/compute?formula=NAME&inputs=A,B,C using realistic production numbers pulled from the trusted sources (QuickBooks P&L, Google Ads live API, Big Sale Tracker). The output was captured verbatim along with any anomaly warnings. The goal was to verify that the engine itself is mathematically sound, regardless of whether the inputs fed into it by other systems are contaminated.
Formula Inventory Table
| Formula ID | Name | Status | Expected Range | Live Test Result |
|---|---|---|---|---|
| cpl | Cost Per Lead | Working | $5 - $500 | $31.96 (11187 / 350, 30d) |
| roas | Return on Ad Spend | Working | 0.5 - 100 | 0.07 ANOMALY FIRED (11187 / 170252) |
| booking_rate | Booking Rate | Formula OK | 10% - 100% | Untested, needs leads input |
| avg_ticket | Average Ticket | Contaminated | $100 - $25000 | Relies on ST revenue / completed jobs |
| weekly_revenue | Weekly Revenue | Broken | $10K - $200K | Lambda signature error: takes 1 arg, got 2 |
| close_rate | Close Rate | Formula OK | 5% - 80% | Untested, needs estimate data |
| margin | Operating Margin | Working | -50% to +60% | 16.77% (573182, 230906, 246124 from QB) |
| annualized_revenue | Annualized Revenue | Working | $500K - $10M | $3,659,385 (70180 per 7 days) |
| customer_ltv | Customer Lifetime Value | Formula OK | $500 - $50000 | Untested, needs retention data |
| revenue_per_tech | Revenue per Technician | Contaminated | $2K - $50K/wk | Relies on ST revenue as numerator |
Key Proof: The Math Engine Correctly Fired An Anomaly
When ROAS was computed with $11,187 spend divided by $170,252 QB 30-day revenue (which is all revenue, not ad-attributed), the engine returned ROAS = 0.07 and flagged "ANOMALY: Return on Ad Spend = 0.07 is outside expected range [0.5, 100]". The anomaly is correct. 0.07 means 7 cents of revenue per dollar of ad spend, which would mean the ads are catastrophically failing. The real issue is that QB's $170K is not the right numerator (it includes all revenue, not just ad-attributed). The engine worked exactly as designed. The bug is upstream: there is no "ad-attributed revenue" source wired as the causal_validator yet.
Key Proof: Margin Formula Validates Profitability
When Margin was computed with QuickBooks actuals (income $573,182, COGS $230,906, OpEx $246,124), the engine returned Margin = 16.77%. This is within the expected range of -50% to +60%, and it matches the 13.9% net-income margin when you include Other Expenses ($16,410). The math engine proves that the business is profitable, and the number is defensible against any ST-contaminated claim. This is the number Kalen can trust.
The Missing Guardrails: Zero Metric Bounds Registered
Although the math engine formulas themselves have bounds, the 28 metrics registered in metrics_registry.yaml have zero bounds entries. None of them will fire an out-of-range warning when the underlying data source drifts. This is why the phantom $6.4M figure flowed through without being caught: no metric said "hey, $6.4M for a two-tech shop is outside the expected $20K-$200K range." The permanent fix is to add an expected_range to every metric in the YAML.
Why Stale Tables Are Dangerous
A stale table in the warehouse is not just a missing update. It is a time bomb. When someone eventually runs a backfill or a re-sync to "fix" the staleness, the same created_at = NOW() bug will re-trigger, compressing months of history into a single minute and creating a new phantom. Every stale table in the list below represents a potential future Apr 3 04:07 UTC incident.
The stale tables also break the graph brain and the freshness heatmap. graph_nodes and graph_edges are themselves stale (283h and 193h respectively), which means the graph brain's view of the ecosystem is describing Nexus as it was 11 days ago. seo_keywords_full has 40,847 rows that have been locked for 9 days. kalen_review_criteria has been locked for 8 days, which means Kalen's review framework is not updating.
Stale Table Inventory (>24h since last update)
| Table | Rows | Age (hours) | Severity | Business Impact |
|---|---|---|---|---|
| call_logs | 352 | 437.9 | 3CX call ingestion broken. Demand signal lost. | |
| price_book | 400 | 428.6 | ST price book not syncing. Xactimate baseline frozen. | |
| locations | 5,755 | 428.5 | Customer address book frozen. | |
| dispatch_schedule | 7,831 | 420.1 | Dispatch board data stale 17 days. | |
| payroll | 199 | 419.7 | Technician commission data frozen. | |
| xact_pricebook | 62 | 373.4 | Xactimate pricing frozen 15 days. | |
| call_transcripts | 34 | 341.8 | 3CX + Vapi transcripts not flowing. | |
| graph_nodes | 54 | 283.7 | Graph brain describing 11-day-old ecosystem. | |
| competitive_intel | 34 | 228.4 | Competitor intelligence frozen 9 days. | |
| metric_thresholds | 12 | 228.1 | The guardrails themselves are stale. | |
| session_learning | 17 | 225.5 | Claude session learnings not flowing. | |
| outcome_tracking | 13 | 223.9 | Action outcome feedback loop frozen. | |
| seo_keywords_full | 40,847 | 221.4 | 40K SEO keywords locked. Biggest row count at risk. | |
| competitor_profiles | 11 | 194.8 | Competitor tracking stale. | |
| kalen_review_criteria | 25 | 194.6 | Kalen's review framework not updating. | |
| graph_edges | 85 | 193.1 | Graph edge connections 8 days old. | |
| ramp_merchant_categories | 32 | 150.5 | Ramp COGS categorization stale. | |
| ramp_tech_mapping | 16 | 150.1 | Ramp-to-technician attribution stale. | |
| competitor_reviews | 57 | 144.9 | Competitor review scraping frozen. | |
| financial_reconciliation | 8 | 133.9 | The reconciliation log itself is stale. | |
| predictive_alerts | 21 | 123.4 | Predictive alerting paused. | |
| trust_audit | 5 | 420.5 | Trust audit log frozen 17 days. | |
| TOTAL | 55,870 |
The Three Numbers
| Source | 30-Day Revenue | Methodology | Trust Level |
|---|---|---|---|
| Big Sale Tracker | $134,384 | Sum of 4 most recent weekly totals from the #big-sale-alert Slack channel (Ashton daily log) | Primary SoT |
| QuickBooks P&L (pro-rata) | $170,252 | $573,182 YTD income divided by 101 days, multiplied by 30 | Primary SoT |
| ST Clean (quarantine-filtered) | $93,436 | titan.jobs LEFT JOIN quarantined_jobs, last 30d, completed only | Operational only |
| ST Dirty (no filter, phantom included) | $6,451,383 | titan.jobs without quarantine, last 30d created_at | Do not use |
Why The Numbers Diverge
- Big Sale $134K vs QB $170K = $36K gap: QuickBooks includes receivables collected from prior-period work, cash tips, AR catch-up, and any non-sale income (e.g. tool rentals to partner shops). Big Sale Tracker only captures what Ashton records in the daily log, which is a strict "sold today" snapshot. A $9K/week gap is consistent with QB catching trailing collections that Big Sale misses.
- QB $170K vs Clean ST $93K = $77K gap: This is the real operational leak. The anomaly detector has been firing zero_invoice CRITICAL at 76% rate since April 7. Nineteen out of every twenty-five completed jobs in ST are closed with zero invoice_total because the invoice webhook never fires or because the tech never finalized the invoice in the ST app. This is Pipe 1 from the April 8 Battle Plan. Fix the invoice closure workflow and the $77K reappears.
- ST Dirty $6.4M: The phantom. 10,461 historical rows with created_at stamped at Apr 3 04:07 UTC. Already quarantined, do not use.
Reconciliation Verdict
The true 30-day revenue is somewhere between Big Sale's $134K (conservative floor) and QB's $170K (full cash recognition). The ST clean number of $93K is too low because of the zero-invoice closure problem, not because of the phantom. Report Big Sale as the "this week / this month" number and QB as the "profit and cash" number. Never report the ST number until Pipe 1 (invoice closure) is fixed.
Step 1: Fix nexus_titan_migration.py line 249
Step 2: Add warehouse_ingested_at column to titan.jobs
The schema change separates "when ST created this job" from "when our warehouse pulled it in." Both are useful, but they must not be conflated. After this migration, created_at always means ST's authoritative time and never defaults to NOW(). Every query that previously filtered by created_at will continue to work because the semantics are preserved.
Step 3: Patch all 29 INSERT sites systematically
A single patch script (patch_created_at_omission.py) walks through every file in the blast radius inventory. For each file:
- Parse the INSERT statement column list.
- If
created_atis missing, add it as the last column. - Find the
VALUEStuple and add a new%s. - Find the Python tuple being passed to
execute()and add the source-system timestamp (ordatetime.now(timezone.utc)if no source-system time exists). - Write a backup of the original file before modification.
- Print a diff summary per file for Robert's review.
- Never auto-deploy. Always stop at the diff stage.
Step 4: Add a pg trigger as a safety net
Even after the patch, a future developer could write a new INSERT and forget created_at. The safety net is a Postgres trigger that raises an exception if created_at would fall back to DEFAULT NOW() without being explicitly set.
P-01Never report revenue from ST as a primary number
cat /opt/nexus/nexus/scripts/output/big_sale_tracker.json and cat /opt/nexus/nexus/scripts/output/qb_pl_latest.json before SSHing anything to titan.jobs.P-02Every INSERT into a titan table must explicitly set created_at
P-03Before editing titan.jobs, query the graph cascade
P-04Before any financial claim, run it through the math engine
P-05Always query the Intelligence Stack before saying "we do not have X"
P-06Check the HTML playbooks before building a new one
/opt/nexus/nexus/scripts/output/playbooks/.P-07Use standalone .py files via scp, never inline SSH heredocs
BrightSidePlumbing/scripts/, scp it to /tmp on the VM, then ssh and run. No inline heredocs.P-08After any audit, write a memory entry so the next session inherits the finding
P-09Revenue hierarchy is Big Sale then QB then Ramp then ST counts
P-10Load this HTML as a reference at session start
Week 1 (Apr 12 - Apr 18): Code Fix and Schema Migration
- Day 1 (today): Fix
nexus_titan_migration.pyline 249. Addcreated_atto the INSERT column list and passjob.get("createdOn")from the ST payload. Commit with reference to this document. - Day 1: Make the script non-executable (
chmod -x) so it can never be run by accident again. Move toscripts/one_time_migrations/. - Day 2: Add
warehouse_ingested_atcolumn totitan.jobs. Run the migration SQL from Step 2 above. Verify viaSELECT created_at, warehouse_ingested_at FROM titan.jobs WHERE id IN (SELECT job_id FROM titan.quarantined_jobs LIMIT 10). - Day 2: Install the
enforce_explicit_created_at()trigger. Verify it fires by attempting an INSERT withcreated_at = NOW()andscheduled_at = 2020-01-01. It should raise. - Day 3: Build
patch_created_at_omission.pythat walks the 29-file blast radius and prints the diff. Run in dry-run mode. Robert reviews every diff before applying. - Day 3: Apply the patch to the top 5 critical files (
titan_sync_daemon.py,api/portal.py,api/estimates.py,api/threecx.py,nexus_st_webhook_listener.py). Restart affected services. - Day 4: Add
expected_rangeto every metric inmetrics_registry.yaml. Use realistic bounds based on the last 90 days of data. Revenue metrics get [$20K, $200K] for 7d and [$80K, $800K] for 30d. Spend metrics get [$500, $50K]. Etc. - Day 5: Rewrite the math engine
weekly_revenueformula to accept a pluggable source argument. The default source becomes Big Sale Tracker. ST is wired as a fallback only. - Day 5: Rewrite the math engine
avg_ticketandrevenue_per_techformulas to read from the same pluggable source layer. - Day 6: Fix the anomaly detector's zero_invoice check to page Ashton in Slack (not just log), since it has been firing CRITICAL for 5 days without escalation.
- Day 7: Write the memory entry and update MEMORY.md. Deploy this HTML to the playbooks directory.
Week 2 (Apr 19 - Apr 25): Reconnect Stale Tables and Ship Sacred v2
- Day 8: Investigate each stale table and restart its ingestion path.
graph_nodesandgraph_edgesfirst because the graph brain depends on them. - Day 8: Fix the broken
nexus-dashboard-integrity.timerwhich last ran on Apr 2 and has been "active waiting" ever since. - Day 9: Reconnect
call_logsingestion. Both 3CX and SMS paths write to it. The 437-hour staleness means both paths are broken. - Day 9: Reconnect
price_bookandxact_pricebook. Kalen's estimate workflow depends on fresh pricing. - Day 10: Rebuild Sacred HTML v2 with the new source hierarchy. Big Sale card on top, QB margin card, Ramp COGS card, Google Ads efficiency card, 3CX demand card, ST counts card at the bottom labeled "operational only".
- Day 11: Wire the causal validator
google_ads_to_revenuechain to Big Sale Tracker as the revenue endpoint. Currently it terminates at an ST node. - Day 12: Walk Kalen through this document. Show him the pinpoint, the fix, and the hierarchy. Get his signoff on Big Sale as the weekly SoT.
- Day 13: Ship the Sacred v2 draft to Stephanie. Keep v1 accessible for rollback.
- Day 14: Post-mortem. What would have caught the phantom on Apr 3? Were the existing protocols enough? Update this document with anything the post-mortem surfaces.
22 APIs Wired
| API | Files Touching It | Purpose |
|---|---|---|
| ServiceTitan | 203 | Primary ops system (broken as SoT for revenue, still used for counts) |
| Google Ads | 104 | Ad spend, clicks, CPC, real conversions (LIVE API, trustworthy) |
| GA4 / Analytics | 66 | Website behavior, conversions (no warehouse table yet) |
| 3CX | 62 | Inbound/outbound call data (titan.voice_calls) |
| Anthropic (Claude) | 59 | RAG responses, ask-v2 endpoint, auto-research |
| OpenAI | 54 | text-embedding-3-small for knowledge_base chunks |
| WordPress | 50 | Page content, Code Snippets, RankMath SEO |
| QuickBooks | 35 | P&L pull (manual until Plaid auth) |
| Gmail | 29 | Email monitor, RAG ingester |
| Slack | 26 | Big Sale tracker, win alerts |
| Vapi (Daniel AI) | 24 | Voice agent calls, RAG bridge |
| Cloudflare | 23 | CDN, WAF, Web Analytics |
| Semrush | 16 | Keyword research, competitor tracking |
| Google Tag Manager | 14 | Tag deployment, GCLID capture |
| Ramp | 13 | COGS, materials spend (titan.ramp_transactions) |
| Google Search Console | 13 | Organic rankings (no warehouse table yet) |
| ClickCease / CHEQ | 11 | Click fraud blocking |
| Telnyx | 11 | SMS, 10DLC registration (pending BSP EIN) |
| Meta Ads | 7 | Facebook budget automation |
| BirdEye / Reviews | 4 | Review aggregation |
| Google LSA | 3 | Local Service Ads (Tier 1-4 cities) |
Gaps (APIs Not Yet Wired)
- Ahrefs: Zero files. Robert has access but no ingestion path. Would add backlink intelligence and competitive SEO data.
- Plaid (for QuickBooks): Blocked on Stephanie authorization. Once wired, QB P&L refreshes automatically instead of on manual pull.
- GSC warehouse table: API is wired (13 files) but no
titan.gsc_*table exists. Organic rankings are queried live, never cached. - GA4 warehouse table: Same as GSC. API wired (66 files) but no warehouse table.
💥 Root Cause: Auto-Repair Agent
The nexus_repair_agent.py auto-repair timer injected a 3-line SLACK_ENABLED = False block into 19 Python files at wrong indentation, causing IndentationError cascade across 20 timers. The "self-healing" systems were the disease, not the cure. Error logs: immune system 12,657, repair agent 13,799, sentinel 9,712 errors.
Fix: All 19 files repaired (14 from .bak backups, 5 by removing injected blocks). Auto-repair agent quarantined to /opt/nexus-staging/quarantine/. Permanently disabled.
📜 The NEXUS Treaty (3 Gates)
- Gate 1: Git Mandate —
/opt/nexus/.git/initialized. Pre-commit hook blocks syntax errors. 2 commits baseline the clean state. - Gate 2: Truth Service —
nexus_truth_service.pyis the SINGLE source for all revenue numbers. Every script must import from it. No more 8 scripts competing. - Gate 3: Verification Gate —
verify_sacred_math.pychecks Sacred v2 against Truth Service before deploy. Blocks phantom numbers, stale values, unrendered templates. 14/14 checks passing.
🛠️ Infrastructure Deployed
- Staging:
/opt/nexus-staging/with scripts, titan, test, quarantine directories - Deploy Gate:
deploy_gate.sh— syntax + safety + diff review before production promotion - ALCOA+ Test:
test_idempotency.py— double-run verification per Google SRE book - Production Lock: 14 core scripts set to chmod 444 (read-only)
- Quarantine: 5 dangerous scripts isolated (auto_repair, immune_system, self_healer, homeostasis, auto_expansion)
- Null-Revenue Shield:
DataUnavailableError+validate_revenue()+validate_input()in Math Engine - RAG Decontamination: 152 error-pattern chunks purged from pgvector (IndentationError, SyntaxError patterns)
- RUNBOOK: 238-line operations manual at
/opt/nexus/RUNBOOK.md
📊 Before vs After
| Metric | Before (Apr 12 morning) | After (Apr 12 close) |
|---|---|---|
| Active timers | 292 | 59 |
| Failing timers | 20 | 0 |
| Syntax-broken files | 19 | 0 |
| Error logs/day | 654+ (sentinel alone) | Near zero |
| Git commits | 0 | 2 |
| Deploy gates | 0 | 3 |
| Revenue sources | 8 competing | 1 (Truth Service) |
| Staging environment | None | Created |
| Rollback capability | .bak files | git checkout |
| RAG contamination | 152 error chunks | 0 |
🔗 API Capability Registry (8/8 Active)
Every API gets a health ping before any script uses it. If a capability is LOST, the system alerts immediately instead of running with stale data.
| API | Status | Scope | Refresh Command |
|---|---|---|---|
| ServiceTitan | ACTIVE | Jobs, Campaigns, Customers (NOT revenue) | Refresh in ST Developer Portal |
| Google Ads | ACTIVE | GAQL queries, offline conversions, campaigns | google_oauth_setup.py |
| QuickBooks | ACTIVE | P&L reports (SSoT for profitability) | OAuth flow, Plaid from Stephanie |
| Ramp | ACTIVE | Transactions (COGS, materials) | Ramp dashboard |
| Gmail | ACTIVE | gmail.modify (read + label) | nexus_email_monitor.py --scan |
| OpenAI | ACTIVE | text-embedding-3-small for RAG | OpenAI dashboard |
| Slack | ACTIVE | #big-sale-alert, #nexus-alerts | Slack app settings |
| Vapi | ACTIVE | Daniel AI voice agent | Vapi dashboard |
File: /opt/nexus/nexus/scripts/output/api_capabilities.json | Script: /opt/nexus/titan/nexus_capabilities.py