🏗️ BSP Cloud Architecture Upgrade

5 fixes for the Nexus VM — 3 of 5 COMPLETE — $2/month total

✅ 3 FIXED
🔴 2 OPEN

📍 Current Architecture

🌐
WEBSITE
Hostinger
callbrightside.com
bricks.callbrightside.com
🧠
NEXUS VM
Google Cloud (Iowa → KC future)
93 API routers • 385 endpoints
PostgreSQL (65+ tables)
59 timers + health worker
7 intelligence stack APIs
⏰ America/Chicago (CDT)
📱
TEAM (KC)
Kalen • Stephanie
Ashton • Robert
Audrey • Techs
🔴 PROBLEM OPEN

136 credentials stored as plain text on disk

61 API secrets + 52 config values + 13 token files sitting in /opt/nexus/nexus/config/.env. ServiceTitan, Google Ads, Facebook, Slack, Vapi, QuickBooks, Plaid, Gmail, Cloudflare, and 25+ more. If the VM is compromised, every API key is instantly exposed. Every system goes dark. $6.86M of customer data at risk.

🔒
Google Secret
Manager
AES-256 vault
$0/month
📊 DATA
136 credentials exposed $0 cost (free tier) 2 hours to migrate AES-256 encryption

Secrets fetched at runtime, never on disk. Audit trail on every access. Auto-rotation for expiring tokens. If VM is hacked, attacker gets nothing.

💡 Why it matters: These 136 credentials control access to $6.86M in customer data, $70K/week in revenue tracking, Daniel AI (65 calls/week), Google Ads ($500/day budget), and every Slack notification Ashton relies on. One breach = every system goes dark simultaneously.

ETA: This week. Robert migrates all .env vars to Secret Manager + updates Python scripts to use google-cloud-secret-manager library.

🔴 PROBLEM OPEN

Zero automated backups on the VM

300+ Python files, PostgreSQL database, 59 timers, 136 credentials. On April 12, the auto-repair agent destroyed 19 files. Recovery took HOURS of manual rebuilding. With automated snapshots, it would have been a 5-minute rollback.

💾
GCP Daily
Snapshots
7-day retention
~$2/month
📊 DATA
0 backups currently ~$2/month cost 15 min to set up 5-min restore time

Automatic snapshot every day at 2 AM CT. Keep 7 days. One-click restore of the entire VM to any point in the last week. The Great Stabilization (Apr 12) would have been a 5-minute fix instead of hours.

💡 Why it matters: The VM is the brain of the entire operation. 300+ scripts, 65+ database tables, 379 experiments, every dashboard, every notification. If it goes down without a backup, Robert rebuilds for hours while Ashton gets zero lead notifications, Daniel AI goes silent, and Monday standup has no data.

ETA: Robert does this in GCP Console (VM auth scope doesn't allow CLI). Compute Engine → Snapshots → Create Schedule. 2 minutes.

✅ FIXED COMPLETE

Was blind to RAM and disk usage

379 experiments running on one VM. Only CPU was visible. Memory leaks, disk filling up from growing PostgreSQL — all invisible until a crash. The Paul Bertrand incident (Ashton missed 4 calls) could have been prevented if we saw the system was under stress.

📊
Ops Agent +
Health Worker
$0/month
LIVE ✅
📊 DATA
✅ Ops Agent: active ✅ Health Worker: every 15 min RAM: 31% Disk: 74.5%

Google Ops Agent monitors RAM + disk + logs. Nexus Health Worker checks RAM, disk, CPU load, PostgreSQL, Titan API, and critical services every 15 minutes. Slack alert if anything crosses threshold. Fixed Apr 17.

💡 Why it matters: 379 experiments + 59 timers + 385 API endpoints all share one VM. A single memory leak can cascade into API crashes → Ashton misses leads → customers lost. Now we see it coming and fix it BEFORE anyone notices.

✅ FIXED COMPLETE

Dynamic IP risked breaking all webhooks

If the VM rebooted and got a new IP, every webhook (Slack, Daniel AI, ST, Vapi) would break. Hours of rewiring each time. Google building two billion-dollar data centers in KC means future sub-2ms latency when a KC region opens.

📍
Static IP
Verified
34.55.179.122
PERMANENT ✅
📊 DATA
✅ IP: 34.55.179.122 (permanent) ✅ Type: ONE_TO_ONE_NAT ~10ms to KC (Iowa) ~2ms future (KC region)

IP is static and will not change on reboot. All webhooks, Slack integrations, and Daniel AI callbacks are safe. When Google opens a KC region, migrate for 5x faster API calls. Verified Apr 17.

💡 Why it matters: Every webhook, every Slack bot, every Daniel AI callback points to this IP. A dynamic IP change on reboot = hours of rewiring 34 API connections. Static IP means the VM is always reachable at the same address, forever. The KC data center future means the nexus_number_gate.py (12-source financial verification) runs 5x faster.

✅ FIXED COMPLETE

VM was on UTC — 5 hours off from Kansas City

Paul Bertrand called at 7:05 AM CT but the logs showed "12:05 PM." Every timestamp, every timer, every cron job was 5 hours wrong. Ashton couldn't trust the notification times. Every team member had to mentally convert UTC to Central.

🕒
America/
Chicago
CDT (-0500)
LIVE ✅
📊 DATA
✅ Timezone: CDT ✅ 3 timers adjusted ✅ All logs in KC time

Set to America/Chicago (CDT, -0500). Zero-dollar alert adjusted to 6 AM CT. Weather engine to 6AM/12PM/6PM CT. Service watchdog to 6AM/6PM CT. All future Daniel AI call logs show correct KC time. Fixed Apr 17.

💡 Why it matters: Paul Bertrand called 4 times at 7 AM CT. The notification showed "12:05 PM." Ashton didn't know if these were morning calls or afternoon calls. Every timer that said "fire at 11:00" was actually firing at 6 AM CT without anyone knowing. The entire team was operating on a clock that was 5 hours wrong.

🔗 What Connects to What (34 external APIs in ↔ 385 Nexus endpoints out)

📥 34 APIs Feeding In

🧠
NEXUS
93 routers
385 endpoints
65+ DB tables
59 timers
7 intel APIs

📤 Outputs Powering

$2/month
Total cost of all 5 fixes
3 complete • 2 remaining (Secret Manager + Snapshots) • Enterprise security • Disaster recovery • Crash prevention