💥

NEXUS Failure Pattern Encyclopedia

21 documented failure patterns across 17 sessions. Every fuckup that cost hours, money, or trust.
READ THIS BEFORE EDITING ANY SYSTEM. Every pattern here was learned the hard way.
21
Documented Patterns
100+
Hours Lost
$134K+
Revenue Impact
21
Permanent Fixes
📊
Data Integrity Failures
5 PATTERNS
P-01 CRITICAL
Phantom Revenue ($6.4M ghost data)
$6.4M PHANTOM 16+ HOURS
💥
Symptom
$6.4M revenue appeared Apr 3 at 04:07 UTC. Reports showed impossible numbers.
➡️
🔍
Root Cause
nexus_titan_migration.py line 249: INSERT INTO titan.jobs missing created_at column. 10,461 rows got NULL timestamps.
➡️
💎
💎 Solution
Postgres trigger guard enforce_explicit_created_at(). 11,729 rows quarantined. Migration script archived.
💎 SOLUTION: Every INSERT INTO titan.* MUST include an explicit created_at from the source system. Postgres trigger blocks any insert matching the phantom pattern.
P-02 CRITICAL
ST job_status Filter Missing ($128K phantom week)
$128K PHANTOM KALEN CAUGHT IT
💰
Symptom
Revenue report showed $128K/week. Kalen said: "that's not real." Actual revenue from ST was $0 (broken).
➡️
🔍
Root Cause
Query used completed_at IS NOT NULL instead of job_status = 'completed'. Unclosed jobs with phantom timestamps returned revenue.
➡️
💎
💎 Solution
MANDATORY: WHERE job_status = 'completed' AND job_status NOT ILIKE '%%cancel%%' on EVERY revenue query. Math Engine enforces.
💎 SOLUTION: NEVER query revenue without job_status = 'completed'. Using completed_at alone returns phantom revenue from unclosed jobs. This is the #1 data integrity rule.
P-03 CRITICAL
Experiment Count Destruction (299 to 10)
3 INCIDENTS
💣
Symptom
Experiment file overwritten with subset. Mar 22: nearly destroyed 228. Apr 5: 299 overwritten with 10.
➡️
🔍
Root Cause
Auto-generated subset (experiment_tracker.json = 12 items) treated as the full list. Script overwrote unified_experiments.json without checking count.
➡️
💎
💎 Solution
pre_write_gate.py: gated_write() checks count before writing. Blocks any write with fewer items than existing file. Session guardian verifies 299+ at start.
💎 SOLUTION: BEFORE touching ANY experiment file, check unified_experiments.json count FIRST. If your count is LOWER, STOP. You are about to destroy data.
P-04 HIGH
Wrong Google Ads Upload Target (MCC vs Child)
2+ HOURS
📤
Symptom
uploadCallConversions returned "click associated with different account." Zero conversions uploaded.
➡️
🔍
Root Cause
Uploaded to child account 7269555791 instead of MCC 8449092450. Clicks originate at MCC level, uploads must target MCC.
➡️
💎
💎 Solution
Changed CUSTOMER_ID to MCC_ID in upload scripts. Both nexus_uploadcallconversions.py and nexus_offline_conversions.py target MCC 8449092450.
💎 SOLUTION: ALL Google Ads uploads (call conversions, offline conversions) go to MCC 8449092450, NEVER to child account 7269555791.
P-05 HIGH
Revenue Hierarchy Violation (8 Scripts, 8 Numbers)
RECURRING
Revenue Source Hierarchy (MANDATORY ORDER)
Big Sale
$70K/wk
SSoT weekly
>
QuickBooks
$79.7K
SSoT monthly/annual
>
ServiceTitan
$0
BROKEN (counts only)
⚠️
💎 SOLUTION: NEVER report ST revenue alone. Check big_sale_tracker.json FIRST. Truth Service (nexus_truth_service.py) is the single import point for all revenue numbers.
💻
Code & Syntax Failures
5 PATTERNS
P-06 CRITICAL
Auto-Repair Agent Lobotomy (19 files, 36K errors/day)
36K ERRORS/DAY
🤖
The "Self-Healer"
nexus_repair_agent.py injected SLACK_ENABLED = False at wrong indentation into 19 files. IndentationError cascade.
➡️
🔥
Cascade
20 timers failing. 36,000+ daily errors: immune system (12,657), repair agent (13,799), sentinel (9,712).
➡️
🛑
Quarantined
5 self-healing scripts quarantined permanently. 14 files restored from .bak backups. Pre-commit hook blocks syntax errors.
💎 SOLUTION: NEVER let an automated script edit production .py files. The "self-healing" systems were the disease. Use staging + deploy_gate.sh + pre-commit hooks.
P-07 HIGH
Inline SSH Python (20+ failures in one session)
20+ ATTEMPTS
ssh user@host 'python3 -c "print(f\"value: {x}\")"' # Shell escaping + Python f-string braces = GUARANTEED FAILURE
# Protocol P-07: Write .py file locally, scp to VM, run there # 1. Write("script.py") locally # 2. scp script.py dovew@VM:/tmp/ # 3. ssh dovew@VM "python3 /tmp/script.py"
💎 SOLUTION: Any Python >10 lines goes in a .py file. NEVER inline heredocs or f-strings in SSH commands. This failed 20+ times before becoming a rule.
P-08 MEDIUM
Python .format() vs CSS var()
html = """body{background:var(--bg)}""".format(total=5) # Python interprets CSS {--bg} as a format placeholder. KeyError.
# Use %% formatting or string concatenation: html = "Total: %%d docs" %% total # Or hardcode CSS values, never pass through .format()
💎 SOLUTION: Never use .format() on strings containing CSS. Use %% formatting or string concatenation.
P-09 MEDIUM
chmod 444 Permission Trap (2-hour bug hunt)
chmod 444 production_file.py # Lock it for "safety" git add production_file.py # Can't stage (no write permission) py_compile production_file.py # Can't read properly # Files appeared broken but were actually clean Python
# Correct approach: # 1. Use chmod 644 for production files # 2. Use pre-commit hook for syntax enforcement # 3. Use Git branch protection for change control # chmod 444 creates false positives and blocks tooling
💎 SOLUTION: Never use chmod 444 to guard code. Use Git pre-commit hooks and branch protection instead. Filesystem locks block the tools that verify the code.
P-10 MEDIUM
Template Tags Showing Raw in Production
<div>{{ m('weekly_revenue') }}</div> <!-- Stephanie sees literal {{ m('weekly_revenue') }} on screen --> <!-- Regen script rejected block due to "naked number" checker -->
# Either: make the whole block use templates (no hardcoded numbers) # OR: render tags manually and commit the rendered values # THEN: run verify_sacred_math.py to catch any unrendered templates # Verification Gate (Gate 3) blocks deploys with raw tags
💎 SOLUTION: Run verify_sacred_math.py AFTER every Sacred HTML edit. It catches unrendered templates before Stephanie sees them.
🎨
HTML/CSS Failures
4 PATTERNS
P-11 HIGH
Div Nesting Cascade (GSC section broke 8 sections)
ENTIRE BOTTOM HALF
📦
Symptom
Everything from Systems Health down was visually broken. Sections trapped inside the GSC card's grid.
➡️
🔍
Root Cause
GSC/GA4 section opened 4 divs (section wrapper, outer grid, card, inner stats grid) then cut to next section without closing ANY of them.
➡️
💎
💎 Solution
Completed GSC card, added GA4 card, closed all 4 wrapper divs. Verified all 9 sections at consistent nesting depth=3.
💎 SOLUTION: After injecting ANY section into Sacred HTML, verify div nesting depth at section boundaries. Every major section should be at the same depth. Use the div balance checker script.
P-12 HIGH
Injecting Content Into Wrong Grid Depth
<div style="grid-template-columns:1fr 1fr 1fr"> <div>Card 1</div> <div>Card 2</div> <!-- INJECTED FULL-WIDTH SECTION HERE --> <div>Trapped in column 3, not full width</div> </div>
<div style="grid-template-columns:1fr 1fr 1fr"> <div>Card 1</div> <div>Card 2</div> <div>Card 3</div> </div> <!-- CLOSE the grid FIRST, then inject --> <div>Full-width section, renders correctly</div>
💎 SOLUTION: Close ALL parent grid/flex divs before starting new full-width sections. Never inject into the middle of an existing grid.
P-13 HIGH
Sections Dumped at Bottom Instead of Logical Position
Wrong (Lazy Append)
Executive Summary
Revenue Health
Gap Analysis
Ads Performance
Quick Links
GSC/GA4 (dumped here)
Great Stabilization (dumped here)
Intelligence Findings (dumped here)
Correct (Logical Flow)
Executive Summary
Revenue Health
Gap Analysis (after revenue)
Ads Performance
GSC/GA4 (inside performance)
Systems + Stabilization (together)
Owner Board
Quick Links
💎 SOLUTION: Every new section must be inserted at its logical position in the document flow. Gap Analysis after Revenue. GSC/GA4 near Ads. Stabilization inside Systems. Never append at bottom.
P-14 MEDIUM
Stale Numbers Propagating Across 172 Documents
23 DOCS STALE
Examples Found
$64,204 (should be $70,180)
$47,008 (should be $79,743)
$2.44M (should be $3.66M)
26.8%% (should be 16.8%%)
292 timers (should be 61)
4,425 RAG chunks (should be 15,397)
Prevention
Document Freshness Enforcer (weekly scan)
Truth Service as single revenue source
fix_all_stale_docs.py for batch updates
Verification Gate blocks stale deploys
87 stale numbers fixed across 172 docs
💎 SOLUTION: All numbers in HTML documents must come from Truth Service or be updated by the Freshness Enforcer. Hardcoded numbers decay. The enforcer catches them.
⚙️
Operations Failures
3 PATTERNS
P-15 CRITICAL
Shallow Validation ("Timer Active" = Working)
RECURRING PATTERN
Shallow Checks (LIES)
❌ "Timer is active" ≠ working
❌ "Script runs" ≠ effective
❌ "API returns 200" ≠ right answer
❌ "Exit code 0" ≠ no errors
❌ "File exists" ≠ correct data
Deep Checks (TRUTH)
✅ Did it WRITE data?
✅ Did the data CHANGE?
✅ Is the result CORRECT?
✅ Does the output match expectations?
✅ Can you show the PROOF?
Real Examples That Burned Us:
• Auto-tagger "working" for weeks in DRY RUN writing zero tags
• Offline conversions "running" but timing out every execution
• Revenue "returning data" but phantom $128K from unclosed jobs
• Session enforcer "active" but not checking actual output files
💎 SOLUTION: Before telling Robert ANY system works, show: (1) what it produced (2) was the output correct (3) did it change real data. If you can't show all three, say "running but unverified."
P-16 HIGH
Endpoint Assumed Without Testing (ST Booking 404)
3+ HOURS
📖
"Docs say it exists"
Built entire integration around ST booking POST endpoint based on documentation and AI research.
➡️
🚫
404 Not Found
Endpoint doesn't exist. Scope not available. 3+ hours building code that can never work.
➡️
💎
Test First
Hit the actual endpoint with a test request BEFORE writing any integration code. 30 seconds saves 3 hours.
💎 SOLUTION: NEVER build an integration based on documentation alone. Hit the actual endpoint first. If it returns 404, the docs are wrong or the scope isn't available.
P-17 HIGH
Offline Conversion Pipeline Wrong Bucket ($0 to Smart Bidding)
WEEKS OF STARVED BIDDING
💸
Wrong Bucket
Pipeline uploaded to 'Offline Job Completion' (secondary, $0 value) instead of 'ST Job Completed (API)' (primary, $762 avg).
➡️
🔍
Starved Bidding
Smart Bidding had ZERO conversion value data for weeks. Optimizing blind. Wrong conversion_action_id at line 415.
➡️
💎
Fixed
DEFAULT_CONVERSION_ACTION_ID = "7537150978" (ST Job Completed). Auto-selects primary action. Consent fields added.
💎 SOLUTION: When uploading offline conversions, verify the conversion_action_id maps to the PRIMARY conversion action, not a secondary one. Check Google Ads UI to confirm which action Smart Bidding reads.
📋
Process Failures
4 PATTERNS
P-18 CRITICAL
Two Nicks Confusion (BSP Tech vs Inspector Nick)
REPEATED 3x
👷
Nick Chernioglo
• BSP field technician (tech id=1)
• Ramp spending: $26,639
• Does diagnostics + camera work
• Material review section in Sacred v2
• His materials may fund jobs other techs close
🔍
Nick Welty (Inspector Nick)
• SEPARATE business (inspectornick.com)
• 15 employees, not a BSP tech
• BSP's highest revenue channel ($5,275 avg ticket)
• Partnership section in Sacred v2
• Apr 1 meeting: 8 action items for Kalen
💎 SOLUTION: NEVER confuse these two. Sacred v2 Nick section = Chernioglo (BSP tech). Inspector Nick Partnership section = Welty (separate business). Mixing them up was caught 3 times.
P-19 HIGH
Stephanie Format Violation (Problem-First Instead of Solution-First)
COMMUNICATION
Wrong (Problem First)
"Nick's Ramp charges are $26,639 but his sales are only $14,986. This is a $11,653 gap that could be leakage..."
Correct (Solution First)
SOLUTION: Cross-reference Ramp vs ST closings
PROBLEM: $11,653 gap in Nick's numbers
WHY IT MATTERS: Attribution vs theft
STATUS: Robert pulling data this week
💎 SOLUTION: Every section for Stephanie follows SOLUTION > PROBLEM > WHY IT MATTERS > WHERE IT STANDS. Lead with the answer, not the scary number. 3 bullets max. No tech jargon.
P-20 HIGH
Date Discipline Failure (Wrong Day of Week)
# Sacred v2 said "Week of April 12" and "April 14" # Monday is actually April 13, 2026 # Wrong dates in a standup doc destroy trust instantly
# ALWAYS verify day-of-week before writing dates # python3 -c "from datetime import date; d=date(2026,4,13); print(d.strftime('%%A'))" # Monday # Use EXACTLY what Robert said. Never guess.
💎 SOLUTION: NEVER guess dates. Verify day-of-week programmatically. If Robert mentions a date, use EXACTLY what he said. Wrong dates in client-facing docs destroy trust.
Incident Timeline
Mar 16, 2026
First session. Multiple pricing inventions, wrong date for Evelyn call, incorrect HTML formats. 5 failure patterns documented.
Mar 22, 2026
Nearly destroyed 228 experiments by overwriting with subset. Stale review count shipped. Pre-write gate concept born.
Mar 24, 2026
7 protocol violations in one session. Session enforcer lost, API keys re-asked 3x, scattered across 6 tasks. Sequential execution rule created.
Mar 27, 2026
No enforcer run. Lost 3 API keys. Ugly HTML output. Mandatory session start protocol.
Apr 3, 2026 04:07 UTC
$6.4M PHANTOM EVENT. nexus_titan_migration.py line 249 missing created_at. 10,461 rows. Triggered Data Trust Evolution v1. Postgres trigger guard deployed.
Apr 5, 2026
299 experiments overwritten with 10. Third count destruction incident. pre_write_gate.py deployed. gated_write() mandatory for protected files.
Apr 10, 2026
Offline conversions uploading to wrong bucket (secondary $0 instead of primary $762). Smart Bidding starved for weeks. Conversion action ID hardcoded to correct value.
Apr 12, 2026
THE GREAT STABILIZATION. Auto-repair agent identified as root cause (19 files lobotomized, 36K errors/day). NEXUS Treaty deployed (3 gates). 292 to 59 timers. 20 broken files fixed. RAG decontaminated. Structural governance replaces band-aids.
Apr 12, 2026
DOCUMENT OVERWRITE RECIDIVISM (3rd time). document_library.html glassmorphism design overwritten with flat rebuild. Category buttons, sort, visual design all lost. Pattern P-22 created. NEVER overwrite existing HTML. Modify the generator script.
P-22 CRITICAL
Document Overwrite Recidivism
3 INCIDENTS TRUST DESTROYER
💥
Bad
"Improve" existing HTML by overwriting. Content destroyed, no backup.
➡️
🔍
Cause
Don't read existing file first. Don't backup. Don't ask Robert. Assume I can recreate what I haven't read.
➡️
💎
Fix
NEVER modify existing HTML. Modify the GENERATOR SCRIPT and regenerate. Always git commit before touching production HTML.
🚨 3 Documented Incidents
Mar 19, 2026
Scientific Method Engine (81KB) stripped to a 24-experiment version. Destroyed work built across multiple sessions that Stephanie was reviewing.
Mar 21 + Mar 24, 2026
index.html searchable library destroyed TWICE. Replaced with auto-generated page without asking.
Apr 12, 2026
document_library.html bluish glassmorphism design overwritten with boring flat rebuild. Category buttons, sort, and visual design all lost.
💎 SOLUTION: NEVER overwrite ANY existing HTML file in /documents/playbooks/. If enhancement is needed, modify the GENERATOR SCRIPT and regenerate. ALWAYS create NEW files with NEW names. ALWAYS git commit before touching production HTML. If you can't find the generator script, the file is hand-built — DON'T TOUCH IT.
🔗
Related Documents
Data Trust Evolution
Full phantom investigation + NEXUS Treaty
Sacred HTML v2
Monday standup + accountability
Autonomous Intelligence
Self-healing + antibodies + evolution layers