NEXUS Failure Pattern Encyclopedia | Bright Side Plumbing

📊

Data Integrity Failures

5 PATTERNS

P-01 CRITICAL

Phantom Revenue ($6.4M ghost data)

$6.4M PHANTOM 16+ HOURS

💥

Symptom

$6.4M revenue appeared Apr 3 at 04:07 UTC. Reports showed impossible numbers.

➡️

🔍

Root Cause

nexus_titan_migration.py line 249: INSERT INTO titan.jobs missing created_at column. 10,461 rows got NULL timestamps.

➡️

💎

💎 Solution

Postgres trigger guard enforce_explicit_created_at(). 11,729 rows quarantined. Migration script archived.

💎 SOLUTION: Every INSERT INTO titan.* MUST include an explicit created_at from the source system. Postgres trigger blocks any insert matching the phantom pattern.

P-02 CRITICAL

ST job_status Filter Missing ($128K phantom week)

$128K PHANTOM KALEN CAUGHT IT

💰

Symptom

Revenue report showed $128K/week. Kalen said: "that's not real." Actual revenue from ST was $0 (broken).

➡️

🔍

Root Cause

Query used completed_at IS NOT NULL instead of job_status = 'completed'. Unclosed jobs with phantom timestamps returned revenue.

➡️

💎

💎 Solution

MANDATORY: WHERE job_status = 'completed' AND job_status NOT ILIKE '%%cancel%%' on EVERY revenue query. Math Engine enforces.

💎 SOLUTION: NEVER query revenue without job_status = 'completed'. Using completed_at alone returns phantom revenue from unclosed jobs. This is the #1 data integrity rule.

P-03 CRITICAL

Experiment Count Destruction (299 to 10)

3 INCIDENTS

💣

Symptom

Experiment file overwritten with subset. Mar 22: nearly destroyed 228. Apr 5: 299 overwritten with 10.

➡️

🔍

Root Cause

Auto-generated subset (experiment_tracker.json = 12 items) treated as the full list. Script overwrote unified_experiments.json without checking count.

➡️

💎

💎 Solution

pre_write_gate.py: gated_write() checks count before writing. Blocks any write with fewer items than existing file. Session guardian verifies 299+ at start.

💎 SOLUTION: BEFORE touching ANY experiment file, check unified_experiments.json count FIRST. If your count is LOWER, STOP. You are about to destroy data.

P-04 HIGH

Wrong Google Ads Upload Target (MCC vs Child)

2+ HOURS

📤

Symptom

uploadCallConversions returned "click associated with different account." Zero conversions uploaded.

➡️

🔍

Root Cause

Uploaded to child account 7269555791 instead of MCC 8449092450. Clicks originate at MCC level, uploads must target MCC.

➡️

💎

💎 Solution

Changed CUSTOMER_ID to MCC_ID in upload scripts. Both nexus_uploadcallconversions.py and nexus_offline_conversions.py target MCC 8449092450.

💎 SOLUTION: ALL Google Ads uploads (call conversions, offline conversions) go to MCC 8449092450, NEVER to child account 7269555791.

P-05 HIGH

Revenue Hierarchy Violation (8 Scripts, 8 Numbers)

RECURRING

Revenue Source Hierarchy (MANDATORY ORDER)

Big Sale

$70K/wk

SSoT weekly

QuickBooks

$79.7K

SSoT monthly/annual

ServiceTitan

BROKEN (counts only)

⚠️

💎 SOLUTION: NEVER report ST revenue alone. Check big_sale_tracker.json FIRST. Truth Service (nexus_truth_service.py) is the single import point for all revenue numbers.

💻

Code & Syntax Failures

5 PATTERNS

P-06 CRITICAL

Auto-Repair Agent Lobotomy (19 files, 36K errors/day)

36K ERRORS/DAY

🤖

The "Self-Healer"

nexus_repair_agent.py injected SLACK_ENABLED = False at wrong indentation into 19 files. IndentationError cascade.

➡️

🔥

Cascade

20 timers failing. 36,000+ daily errors: immune system (12,657), repair agent (13,799), sentinel (9,712).

➡️

🛑

Quarantined

5 self-healing scripts quarantined permanently. 14 files restored from .bak backups. Pre-commit hook blocks syntax errors.

💎 SOLUTION: NEVER let an automated script edit production .py files. The "self-healing" systems were the disease. Use staging + deploy_gate.sh + pre-commit hooks.

P-07 HIGH

Inline SSH Python (20+ failures in one session)

20+ ATTEMPTS

ssh user@host 'python3 -c "print(f\"value: {x}\")"' # Shell escaping + Python f-string braces = GUARANTEED FAILURE

# Protocol P-07: Write .py file locally, scp to VM, run there # 1. Write("script.py") locally # 2. scp script.py dovew@VM:/tmp/ # 3. ssh dovew@VM "python3 /tmp/script.py"

💎 SOLUTION: Any Python >10 lines goes in a .py file. NEVER inline heredocs or f-strings in SSH commands. This failed 20+ times before becoming a rule.

P-08 MEDIUM

Python .format() vs CSS var()

html = """body{background:var(--bg)}""".format(total=5) # Python interprets CSS {--bg} as a format placeholder. KeyError.

# Use %% formatting or string concatenation: html = "Total: %%d docs" %% total # Or hardcode CSS values, never pass through .format()

💎 SOLUTION: Never use .format() on strings containing CSS. Use %% formatting or string concatenation.

P-09 MEDIUM

chmod 444 Permission Trap (2-hour bug hunt)

chmod 444 production_file.py # Lock it for "safety" git add production_file.py # Can't stage (no write permission) py_compile production_file.py # Can't read properly # Files appeared broken but were actually clean Python

# Correct approach: # 1. Use chmod 644 for production files # 2. Use pre-commit hook for syntax enforcement # 3. Use Git branch protection for change control # chmod 444 creates false positives and blocks tooling

💎 SOLUTION: Never use chmod 444 to guard code. Use Git pre-commit hooks and branch protection instead. Filesystem locks block the tools that verify the code.

P-10 MEDIUM

Template Tags Showing Raw in Production

<div>{{ m('weekly_revenue') }}</div>

# Either: make the whole block use templates (no hardcoded numbers) # OR: render tags manually and commit the rendered values # THEN: run verify_sacred_math.py to catch any unrendered templates # Verification Gate (Gate 3) blocks deploys with raw tags

💎 SOLUTION: Run verify_sacred_math.py AFTER every Sacred HTML edit. It catches unrendered templates before Stephanie sees them.

🎨

HTML/CSS Failures

4 PATTERNS

P-11 HIGH

Div Nesting Cascade (GSC section broke 8 sections)

ENTIRE BOTTOM HALF

📦

Symptom

Everything from Systems Health down was visually broken. Sections trapped inside the GSC card's grid.

➡️

🔍

Root Cause

GSC/GA4 section opened 4 divs (section wrapper, outer grid, card, inner stats grid) then cut to next section without closing ANY of them.

➡️

💎

💎 Solution

Completed GSC card, added GA4 card, closed all 4 wrapper divs. Verified all 9 sections at consistent nesting depth=3.

💎 SOLUTION: After injecting ANY section into Sacred HTML, verify div nesting depth at section boundaries. Every major section should be at the same depth. Use the div balance checker script.

P-12 HIGH

Injecting Content Into Wrong Grid Depth

<div style="grid-template-columns:1fr 1fr 1fr"> <div>Card 1</div> <div>Card 2</div>  <div>Trapped in column 3, not full width</div> </div>

<div style="grid-template-columns:1fr 1fr 1fr"> <div>Card 1</div> <div>Card 2</div> <div>Card 3</div> </div>  <div>Full-width section, renders correctly</div>

💎 SOLUTION: Close ALL parent grid/flex divs before starting new full-width sections. Never inject into the middle of an existing grid.

P-13 HIGH

Sections Dumped at Bottom Instead of Logical Position

Wrong (Lazy Append)

Executive Summary
Revenue Health
Gap Analysis
Ads Performance
Quick Links
GSC/GA4 (dumped here)
Great Stabilization (dumped here)
Intelligence Findings (dumped here)

Correct (Logical Flow)

Executive Summary
Revenue Health
Gap Analysis (after revenue)
Ads Performance
GSC/GA4 (inside performance)
Systems + Stabilization (together)
Owner Board
Quick Links

💎 SOLUTION: Every new section must be inserted at its logical position in the document flow. Gap Analysis after Revenue. GSC/GA4 near Ads. Stabilization inside Systems. Never append at bottom.

P-14 MEDIUM

Stale Numbers Propagating Across 172 Documents

23 DOCS STALE

Examples Found

$64,204 (should be $70,180)
$47,008 (should be $79,743)
$2.44M (should be $3.66M)
26.8%% (should be 16.8%%)
292 timers (should be 61)
4,425 RAG chunks (should be 15,397)

Prevention

Document Freshness Enforcer (weekly scan)
Truth Service as single revenue source
fix_all_stale_docs.py for batch updates
Verification Gate blocks stale deploys
87 stale numbers fixed across 172 docs

💎 SOLUTION: All numbers in HTML documents must come from Truth Service or be updated by the Freshness Enforcer. Hardcoded numbers decay. The enforcer catches them.

⚙️

Operations Failures

3 PATTERNS

P-15 CRITICAL

Shallow Validation ("Timer Active" = Working)

RECURRING PATTERN

Shallow Checks (LIES)

❌ "Timer is active" ≠ working
❌ "Script runs" ≠ effective
❌ "API returns 200" ≠ right answer
❌ "Exit code 0" ≠ no errors
❌ "File exists" ≠ correct data

Deep Checks (TRUTH)

✅ Did it WRITE data?
✅ Did the data CHANGE?
✅ Is the result CORRECT?
✅ Does the output match expectations?
✅ Can you show the PROOF?

Real Examples That Burned Us:

• Auto-tagger "working" for weeks in DRY RUN writing zero tags
• Offline conversions "running" but timing out every execution
• Revenue "returning data" but phantom $128K from unclosed jobs
• Session enforcer "active" but not checking actual output files

💎 SOLUTION: Before telling Robert ANY system works, show: (1) what it produced (2) was the output correct (3) did it change real data. If you can't show all three, say "running but unverified."

P-16 HIGH

Endpoint Assumed Without Testing (ST Booking 404)

3+ HOURS

📖

"Docs say it exists"

Built entire integration around ST booking POST endpoint based on documentation and AI research.

➡️

🚫

404 Not Found

Endpoint doesn't exist. Scope not available. 3+ hours building code that can never work.

➡️

💎

Test First

Hit the actual endpoint with a test request BEFORE writing any integration code. 30 seconds saves 3 hours.

💎 SOLUTION: NEVER build an integration based on documentation alone. Hit the actual endpoint first. If it returns 404, the docs are wrong or the scope isn't available.

P-17 HIGH

Offline Conversion Pipeline Wrong Bucket ($0 to Smart Bidding)

WEEKS OF STARVED BIDDING

💸

Wrong Bucket

Pipeline uploaded to 'Offline Job Completion' (secondary, $0 value) instead of 'ST Job Completed (API)' (primary, $762 avg).

➡️

🔍

Starved Bidding

Smart Bidding had ZERO conversion value data for weeks. Optimizing blind. Wrong conversion_action_id at line 415.

➡️

💎

Fixed

DEFAULT_CONVERSION_ACTION_ID = "7537150978" (ST Job Completed). Auto-selects primary action. Consent fields added.

💎 SOLUTION: When uploading offline conversions, verify the conversion_action_id maps to the PRIMARY conversion action, not a secondary one. Check Google Ads UI to confirm which action Smart Bidding reads.

📋

Process Failures

4 PATTERNS

P-18 CRITICAL

Two Nicks Confusion (BSP Tech vs Inspector Nick)

REPEATED 3x

👷

Nick Chernioglo

• BSP field technician (tech id=1)
• Ramp spending: $26,639
• Does diagnostics + camera work
• Material review section in Sacred v2
• His materials may fund jobs other techs close

🔍

Nick Welty (Inspector Nick)

• SEPARATE business (inspectornick.com)
• 15 employees, not a BSP tech
• BSP's highest revenue channel ($5,275 avg ticket)
• Partnership section in Sacred v2
• Apr 1 meeting: 8 action items for Kalen

💎 SOLUTION: NEVER confuse these two. Sacred v2 Nick section = Chernioglo (BSP tech). Inspector Nick Partnership section = Welty (separate business). Mixing them up was caught 3 times.

P-19 HIGH

Stephanie Format Violation (Problem-First Instead of Solution-First)

COMMUNICATION

Wrong (Problem First)

"Nick's Ramp charges are $26,639 but his sales are only $14,986. This is a $11,653 gap that could be leakage..."

Correct (Solution First)

SOLUTION: Cross-reference Ramp vs ST closings
PROBLEM: $11,653 gap in Nick's numbers
WHY IT MATTERS: Attribution vs theft
STATUS: Robert pulling data this week

💎 SOLUTION: Every section for Stephanie follows SOLUTION > PROBLEM > WHY IT MATTERS > WHERE IT STANDS. Lead with the answer, not the scary number. 3 bullets max. No tech jargon.

P-20 HIGH

Date Discipline Failure (Wrong Day of Week)

# Sacred v2 said "Week of April 12" and "April 14" # Monday is actually April 13, 2026 # Wrong dates in a standup doc destroy trust instantly

# ALWAYS verify day-of-week before writing dates # python3 -c "from datetime import date; d=date(2026,4,13); print(d.strftime('%%A'))" # Monday # Use EXACTLY what Robert said. Never guess.

💎 SOLUTION: NEVER guess dates. Verify day-of-week programmatically. If Robert mentions a date, use EXACTLY what he said. Wrong dates in client-facing docs destroy trust.

⏰

Incident Timeline

Mar 16, 2026

First session. Multiple pricing inventions, wrong date for Evelyn call, incorrect HTML formats. 5 failure patterns documented.

Mar 22, 2026

Nearly destroyed 228 experiments by overwriting with subset. Stale review count shipped. Pre-write gate concept born.

Mar 24, 2026

7 protocol violations in one session. Session enforcer lost, API keys re-asked 3x, scattered across 6 tasks. Sequential execution rule created.

Mar 27, 2026

No enforcer run. Lost 3 API keys. Ugly HTML output. Mandatory session start protocol.

Apr 3, 2026 04:07 UTC

$6.4M PHANTOM EVENT. nexus_titan_migration.py line 249 missing created_at. 10,461 rows. Triggered Data Trust Evolution v1. Postgres trigger guard deployed.

Apr 5, 2026

299 experiments overwritten with 10. Third count destruction incident. pre_write_gate.py deployed. gated_write() mandatory for protected files.

Apr 10, 2026

Offline conversions uploading to wrong bucket (secondary $0 instead of primary $762). Smart Bidding starved for weeks. Conversion action ID hardcoded to correct value.

Apr 12, 2026

THE GREAT STABILIZATION. Auto-repair agent identified as root cause (19 files lobotomized, 36K errors/day). NEXUS Treaty deployed (3 gates). 292 to 59 timers. 20 broken files fixed. RAG decontaminated. Structural governance replaces band-aids.

Apr 12, 2026

DOCUMENT OVERWRITE RECIDIVISM (3rd time). document_library.html glassmorphism design overwritten with flat rebuild. Category buttons, sort, visual design all lost. Pattern P-22 created. NEVER overwrite existing HTML. Modify the generator script.

P-22 CRITICAL

Document Overwrite Recidivism

3 INCIDENTS TRUST DESTROYER

💥

Bad

"Improve" existing HTML by overwriting. Content destroyed, no backup.

➡️

🔍

Cause

Don't read existing file first. Don't backup. Don't ask Robert. Assume I can recreate what I haven't read.

➡️

💎

Fix

NEVER modify existing HTML. Modify the GENERATOR SCRIPT and regenerate. Always git commit before touching production HTML.

🚨 3 Documented Incidents

Mar 19, 2026

Scientific Method Engine (81KB) stripped to a 24-experiment version. Destroyed work built across multiple sessions that Stephanie was reviewing.

Mar 21 + Mar 24, 2026

index.html searchable library destroyed TWICE. Replaced with auto-generated page without asking.

Apr 12, 2026

document_library.html bluish glassmorphism design overwritten with boring flat rebuild. Category buttons, sort, and visual design all lost.

💎 SOLUTION: NEVER overwrite ANY existing HTML file in /documents/playbooks/. If enhancement is needed, modify the GENERATOR SCRIPT and regenerate. ALWAYS create NEW files with NEW names. ALWAYS git commit before touching production HTML. If you can't find the generator script, the file is hand-built — DON'T TOUCH IT.