BSP Cloud Architecture — 5 Fixes, 3 Complete

🌐

WEBSITE

Hostinger
callbrightside.com
bricks.callbrightside.com

→

🧠

NEXUS VM

Google Cloud (Iowa → KC future)
93 API routers • 385 endpoints
PostgreSQL (65+ tables)
59 timers + health worker
7 intelligence stack APIs
⏰ America/Chicago (CDT)

→

📱

TEAM (KC)

Kalen • Stephanie
Ashton • Robert
Audrey • Techs

🔴 PROBLEM OPEN

136 credentials stored as plain text on disk

61 API secrets + 52 config values + 13 token files sitting in /opt/nexus/nexus/config/.env. ServiceTitan, Google Ads, Facebook, Slack, Vapi, QuickBooks, Plaid, Gmail, Cloudflare, and 25+ more. If the VM is compromised, every API key is instantly exposed. Every system goes dark. $6.86M of customer data at risk.

🔒

Google Secret
Manager

AES-256 vault
$0/month

📊 DATA

136 credentials exposed $0 cost (free tier) 2 hours to migrate AES-256 encryption

Secrets fetched at runtime, never on disk. Audit trail on every access. Auto-rotation for expiring tokens. If VM is hacked, attacker gets nothing.

💡 Why it matters: These 136 credentials control access to $6.86M in customer data, $70K/week in revenue tracking, Daniel AI (65 calls/week), Google Ads ($500/day budget), and every Slack notification Ashton relies on. One breach = every system goes dark simultaneously.

ETA: This week. Robert migrates all .env vars to Secret Manager + updates Python scripts to use google-cloud-secret-manager library.

🔴 PROBLEM OPEN

Zero automated backups on the VM

300+ Python files, PostgreSQL database, 59 timers, 136 credentials. On April 12, the auto-repair agent destroyed 19 files. Recovery took HOURS of manual rebuilding. With automated snapshots, it would have been a 5-minute rollback.

💾

GCP Daily
Snapshots

7-day retention
~$2/month

📊 DATA

0 backups currently ~$2/month cost 15 min to set up 5-min restore time

Automatic snapshot every day at 2 AM CT. Keep 7 days. One-click restore of the entire VM to any point in the last week. The Great Stabilization (Apr 12) would have been a 5-minute fix instead of hours.

💡 Why it matters: The VM is the brain of the entire operation. 300+ scripts, 65+ database tables, 379 experiments, every dashboard, every notification. If it goes down without a backup, Robert rebuilds for hours while Ashton gets zero lead notifications, Daniel AI goes silent, and Monday standup has no data.

ETA: Robert does this in GCP Console (VM auth scope doesn't allow CLI). Compute Engine → Snapshots → Create Schedule. 2 minutes.

✅ FIXED COMPLETE

Was blind to RAM and disk usage

379 experiments running on one VM. Only CPU was visible. Memory leaks, disk filling up from growing PostgreSQL — all invisible until a crash. The Paul Bertrand incident (Ashton missed 4 calls) could have been prevented if we saw the system was under stress.

📊

Ops Agent +
Health Worker

$0/month
LIVE ✅

📊 DATA

✅ Ops Agent: active ✅ Health Worker: every 15 min RAM: 31% Disk: 74.5%

Google Ops Agent monitors RAM + disk + logs. Nexus Health Worker checks RAM, disk, CPU load, PostgreSQL, Titan API, and critical services every 15 minutes. Slack alert if anything crosses threshold. Fixed Apr 17.

💡 Why it matters: 379 experiments + 59 timers + 385 API endpoints all share one VM. A single memory leak can cascade into API crashes → Ashton misses leads → customers lost. Now we see it coming and fix it BEFORE anyone notices.

✅ FIXED COMPLETE

Dynamic IP risked breaking all webhooks

If the VM rebooted and got a new IP, every webhook (Slack, Daniel AI, ST, Vapi) would break. Hours of rewiring each time. Google building two billion-dollar data centers in KC means future sub-2ms latency when a KC region opens.

📍

Static IP
Verified

34.55.179.122
PERMANENT ✅

📊 DATA

✅ IP: 34.55.179.122 (permanent) ✅ Type: ONE_TO_ONE_NAT ~10ms to KC (Iowa) ~2ms future (KC region)

IP is static and will not change on reboot. All webhooks, Slack integrations, and Daniel AI callbacks are safe. When Google opens a KC region, migrate for 5x faster API calls. Verified Apr 17.

💡 Why it matters: Every webhook, every Slack bot, every Daniel AI callback points to this IP. A dynamic IP change on reboot = hours of rewiring 34 API connections. Static IP means the VM is always reachable at the same address, forever. The KC data center future means the nexus_number_gate.py (12-source financial verification) runs 5x faster.

✅ FIXED COMPLETE

VM was on UTC — 5 hours off from Kansas City

Paul Bertrand called at 7:05 AM CT but the logs showed "12:05 PM." Every timestamp, every timer, every cron job was 5 hours wrong. Ashton couldn't trust the notification times. Every team member had to mentally convert UTC to Central.

🕒

America/
Chicago

CDT (-0500)
LIVE ✅

📊 DATA

✅ Timezone: CDT ✅ 3 timers adjusted ✅ All logs in KC time

Set to America/Chicago (CDT, -0500). Zero-dollar alert adjusted to 6 AM CT. Weather engine to 6AM/12PM/6PM CT. Service watchdog to 6AM/6PM CT. All future Daniel AI call logs show correct KC time. Fixed Apr 17.

💡 Why it matters: Paul Bertrand called 4 times at 7 AM CT. The notification showed "12:05 PM." Ashton didn't know if these were morning calls or afternoon calls. Every timer that said "fire at 11:00" was actually firing at 6 AM CT without anyone knowing. The entire team was operating on a clock that was 5 hours wrong.

🔗 What Connects to What (34 external APIs in ↔ 385 Nexus endpoints out)

📥 34 APIs Feeding In

🔌 ServiceTitan (jobs, estimates, dispatch)
💰 QuickBooks (P&L, deposits, vendors)
📊 Google Ads (campaigns, device, budgets)
📈 GA4 (sessions, conversions, traffic)
📘 Facebook Ads (leads, reach, spend)
📧 Gmail (forms, alerts, meetings)
📞 Vapi / Daniel AI (calls, bookings)
📂 HCP BSP + 100Y (customers, LTV)
💳 Ramp (COGS, tech spending)
🏦 Plaid (bank deposits — pending)
🔍 GSC (keywords, positions)
☁️ Cloudflare (cache, DNS, gateway)
📞 3CX (call logs, agent load)
🌧️ NWS Weather (forecasts, alerts)
🎨 Figma (design specs, nodes)
🤖 Anthropic + OpenAI + Gemini (AI)
🏠 Hostinger (deploy, themes)
🌐 WordPress REST (pages, media)
📱 Telnyx (SMS, 10DLC)
🔎 SEMRush + BrightLocal (SEO)
📋 Monday.com + Sheets (projects)
🛡️ ClickCease (fraud detection)
💬 Slack (DMs, webhooks, alerts)
🔊 Deepgram (transcription)

🧠

NEXUS

93 routers
385 endpoints
65+ DB tables
59 timers
7 intel APIs

📤 Outputs Powering

📋 Sacred HTML v2 (Monday standup)
💬 Slack (Ashton/Stephanie/Jordan DMs)
🤖 Daniel AI (after-hours calls)
🌧️ Weather Engine (auto ad budgets)
🔔 Form Lead Notifier (Gmail 5 min)
📊 22 Morpheus Dashboards
🧪 379 Experiments (Scientific Method)
💸 Money Finder ($778K recovery)
🏗️ Bricks (CSS + Playwright QA)
📧 Email campaigns (blocked on ESP)
🔢 Number Gate (12-source verification)
📝 Master History (1.4MB session log)
🏥 Health Worker (RAM/disk/services)
📸 Photo Engine (38 photos indexed)
🔍 Zeus RAG (4,425 knowledge chunks)
⚡ Priority Engine (335 experiments ranked)

🏗️ BSP Cloud Architecture Upgrade

📍 Current Architecture

136 credentials stored as plain text on disk

Zero automated backups on the VM

Was blind to RAM and disk usage

Dynamic IP risked breaking all webhooks

VM was on UTC — 5 hours off from Kansas City

🔗 What Connects to What (34 external APIs in ↔ 385 Nexus endpoints out)

📥 34 APIs Feeding In

📤 Outputs Powering