pi-mono coding-agent — Weaponized for BSP

Extracting operational weapons from Mario Zechner's Pi coding-agent and committing to applying each to BSP work. Goal: stop generating slop. Move from agent-as-firehose to agent-as-discipline.
Generated 2026-04-15 04:33 UTC · Repo: github.com/badlogic/pi-mono

1. What Pi is + why Mario built it

Pi is a TypeScript coding-agent shipped at @mariozechner/pi-coding-agent. ~21 docs files. The author is Mario Zechner — same author of the "Slowing the fuck down" post Robert sent earlier. Mario is not an AI hater; he built Pi because he uses agentic coding daily and wanted a tool that doesn't enable the slop patterns he sees in the industry.

The core philosophy (verbatim from the README + docs)

Why this matters for BSP

I'm currently the agent in this Robert/Kalen/Stephanie context. My tools, my discipline, my context handling — all directly comparable to what Pi codifies. Where Pi ships a clean primitive, I'm shipping ad-hoc snippets. Where Pi has explicit hooks for safety gates, I have informal "stop and ask" rules I sometimes forget. Where Pi forces session branching, I forward-patch and accumulate drift. Pi is a working blueprint of the agent I should be acting like.

2. The 12 weapons extracted

Weapon 1 — Minimal default tool set

Pi: Default tools = read / write / edit / bash. Anything else (grep, find, ls, custom) is opt-in. Resists the temptation to ship tooling-as-feature.
Apply to BSP: I keep creating diagnostic Code Snippets every time I need to inspect something. 8 diagnostic snippets are still active on bricks staging that should be deleted. Mario's discipline: use the four primitives + grep/find. If I need to inspect Bricks state, write a one-shot bash command, not a permanent REST endpoint.
Commit: No new diagnostic snippets unless they'll be used 5+ times. For one-off inspection, use curl + jq or a temp Python script that gets deleted after the run.

Weapon 2 — No built-in plan mode (write plans to files)

Pi: "No plan mode. Write plans to files for model review." Avoids the trap of plans living in the agent's head where they decay between turns.
Apply to BSP: My plans live in chat conversations or get half-encoded in Master History after the fact. Pi's discipline: every multi-turn job starts with a written plan in a file, agent re-reads it each turn, updates it as it works.
Commit: For any job spanning more than 3 turns, create /opt/nexus/nexus/scripts/output/plans/<job-name>.md at the start. Re-read on every turn. Update at the end. Master History entry references the plan file.

Weapon 3 — No integrated TODO (use external TODO.md)

Pi: Built-in to-dos confuse the model. Use external TODO.md files instead. Same reasoning as plan mode.
Apply to BSP: I use the in-memory task system (TaskCreate/TaskUpdate). Tasks die at session end. Mario's discipline: persist task state in a file the next session can re-read.
Commit: Maintain /opt/nexus/nexus/scripts/output/BSP_OPEN_BACKLOG.md. Add items as discovered. Mark complete with checkboxes. Read at session start, write at session end. Single source of truth across sessions.

Weapon 4 — Skills with progressive disclosure

Pi: Skill names + descriptions load at startup (cheap). Full SKILL.md instructions only load when invoked via /skill:name. Keeps system prompt focused.
Apply to BSP: My CLAUDE.md is ~50K tokens loaded every turn. Most of it isn't relevant to most turns. Pi's discipline: split CLAUDE.md into named skills, each with frontmatter (name + 1-line desc), full instructions only on demand.
Commit: Refactor CLAUDE.md into /opt/nexus/nexus/skills/ with 1 SKILL.md per discipline (verification-gate, ship-checklist, session-protocol, hcp-naming, bricks-build, etc.). Loader stays in main CLAUDE.md as a 1-line index.

Weapon 5 — Session branching (tree of attempts)

Pi: Sessions are trees. /tree visualizes branches. /fork creates a new branch from any point. Multiple parallel approaches coexist in one session file.
Apply to BSP: When I hit a fork ("patch the existing tree or rebuild from Figma?"), I pick one and forward-patch. If wrong, I patch over the patch. Mario's discipline: branch and try both; pick the winner; archive the loser with a "why this failed" note.
Commit: When facing an architectural fork, write both options to a plan file. Build the smaller one as a probe. If it works, expand. If not, the plan file documents why so the next session doesn't repeat.

Weapon 6 — Tree-navigation lets you go back to the last good state

Pi: Anywhere in a session you can navigate back via /tree and resume from there. Not "git revert" — the previous state is still in the session ready to fork from.
Apply to BSP: When I broke the footer with overzealous CSS, I spent a turn reverting. With Pi's tree, I'd jump back to the pre-CSS state and try a different approach — no manual revert. BSP analog: snapshot postmeta + WP file state at every "stable point" so I can restore.
Commit: Before any risky edit (theme files, postmeta, Google Ads), save a JSON snapshot to /opt/nexus/nexus/scripts/output/snapshots/<timestamp>-<target>.json. If the change breaks, restore in one command instead of debugging backwards.

Weapon 7 — Compaction with structured summary (goal / constraints / progress / decisions / next-steps)

Pi: Auto-compacts at contextTokens > contextWindow - 16384. Summary captures: goal, constraints, progress, key decisions, next steps, critical context, plus tracked file lists. Newest 20K tokens preserved unsummarized.
Apply to BSP: My Master History entries are mostly proof tables + prose. They lack the goal/constraints/progress/decisions/next-steps structure. That structure is what makes a compacted summary actually resumable.
Commit: Master History entries adopt the Pi-summary structure: Goal (what was attempted), Constraints (what couldn't be done), Progress (what shipped), Key decisions (architectural choices made), Next steps (what's next session). The prose proof-tables stay but live under those headers.

Weapon 8 — Permission gates as explicit hooks

Pi: The tool_call hook can BLOCK tool execution before it runs and mutate input arguments. session_before_* hooks gate compact / fork / tree / switch.
Apply to BSP: Risky actions (Cloudflare purge of all, Google Ads budget changes, theme file overwrite, Code Snippet delete) need explicit confirmation gates. Right now I just do them and report. Mario's discipline: gate first, act second.
Commit: Maintain a RISKY_ACTIONS.md with every action that requires explicit Robert confirmation. Examples: Google Ads campaign mutations, Cloudflare purge_everything, theme file rewrites, snippet deletes, postmeta full-replace. Before any of these, ask. No batched risky-action runs without sign-off.

Weapon 9 — Pre-tool input mutation (validate before execute)

Pi: The tool_call hook can mutate input args before tool execution. So a permission gate can also rewrite (e.g., add a dry-run flag to a destructive bash command).
Apply to BSP: When the weather budget mutator was about to push sewer to $300 (regression), there should have been a pre-write check that compares against the standing-decision floor ($500) and aborts if violated. Pre-tool guards are cheap.
Commit: For weather_budget_mutator.py and similar autonomous mutators, add a pre-write sanity check that reads STANDING_DECISIONS.md and aborts/logs if the proposed value violates a documented decision.

Weapon 10 — Post-tool result modification (middleware chain)

Pi: The tool_result hook can modify output after execution, chained middleware-style. So one extension can sanitize output, another can log, another can transform.
Apply to BSP: When I curl the live page after a deploy, I should pipe the result through a middleware that runs the spec audit + screenshot + diff automatically. Right now those are separate manual steps I sometimes skip.
Commit: Bash wrapper deploy_and_verify.sh that runs deploy → cache purge → curl → audit_v2.py → screenshot → diff_images.py as one chain. Single command, full QA, no skip-the-verify.

Weapon 11 — Prompt templates with /name expansion

Pi: Markdown files in a templates directory. Filename = command. /review expands review.md. $1, $@ for arguments.
Apply to BSP: The Service Page SOP I logged in Battle Plan is text. It's not invocable. Pi's discipline: it should be a template I can invoke with arguments (e.g., /build-service-page emergency-plumbing expands the SOP with the page name substituted in).
Commit: Convert the Service Page SOP into /opt/nexus/nexus/templates/build-service-page.md with $1 = post slug, $2 = Figma node ID, $3 = service name. Next session invokes /build-service-page sewer-repair-kc 602:9 sewer-repair.

Weapon 12 — Tool delegation to remote (SSH / containers as first-class)

Pi: Tool operations like user_bash + custom tool definitions can delegate read/write/bash to remote systems (SSH, containers). The agent doesn't care if the file is local or remote.
Apply to BSP: I'm already doing this — every Bash call goes via SSH to the VM. But I do it ad-hoc per command. Pi formalizes it as a delegate pattern. I should standardize.
Commit: Build a small wrapper vm.py that exposes vm.read(path), vm.write(path, content), vm.bash(cmd) over SSH so my Python scripts don't reinvent the SSH plumbing every time.

3. Anti-patterns I'm running right now that Pi explicitly designed against

Pi explicitly avoidsI'm doing this anywayCost
Sub-agentsI spawn parallel research agents in earlier sessions (Walker v1/v2/v3) that produce conflicting outputsEach Walker produced a different "Audrey-faithful" answer, none correct, accumulated drift
Built-in to-dosUse TaskCreate/Update — dies at session endBacklog leaks across sessions because no persistent file tracks open items
Plan mode in-headPlans live in chat, decay between turnsRobert had to re-explain the menu spec multiple times because it wasn't in a file I re-read
Permission popups (use extensions)I act on risky things then report (e.g., pushed sewer to $500 without a "confirm?" gate)Robert is the post-hoc gate instead of the pre-hoc gate; reversal cost on Bricks footer was ~30 min
Background bash (use tmux)I run background bash, sometimes lose output, sometimes don't notice failureDiagnostic ambiguity; have to re-run to confirm
Batteries-included tooling67 Code Snippets accumulated, 48 active, 8 diagnostic should be deletedCluttered surface, slow to grok, Mario's "merchants of complexity" trap
The deepest mismatch: Pi treats every interaction as a tree branch you can fork from. I treat my session as a forward-only stack — patch on patch on patch. That's exactly the "compounding booboos with delayed pain" Mario warned about in the Slowing Down post.

4. Specific tools to build immediately

ToolMaps to Pi weaponEffort
/opt/nexus/nexus/scripts/output/BSP_OPEN_BACKLOG.md — single source of truth task fileWeapon 3 (no integrated TODO)30 min
/opt/nexus/nexus/scripts/output/STANDING_DECISIONS.md — declared invariants (sewer = $500, BBQs = paused, etc.)Weapon 9 (pre-tool guards)15 min
/opt/nexus/nexus/scripts/output/RISKY_ACTIONS.md — actions requiring confirmationWeapon 8 (permission gates)15 min
deploy_and_verify.sh — atomic deploy + cache purge + audit + screenshot + diff chainWeapon 10 (post-tool middleware)1 hour
Convert Service Page SOP → build-service-page.md template invocable by commandWeapon 11 (prompt templates)30 min
vm.py — VM SSH wrapper exposing read/write/bashWeapon 12 (tool delegation)1 hour
Postmeta + theme-file snapshotter for safe-restore pointsWeapon 6 (tree-navigation)1 hour
CLAUDE.md split into skills/<name>/SKILL.md with progressive loadingWeapon 4 (skills)3 hours (large refactor)

Total: ~7-8 hours of tooling work to bring my discipline up to Pi's defaults. Each one prevents a class of slop I've already shipped.

5. Concrete operational commitments (effective immediately)

  1. No new diagnostic snippets unless they'll be used 5+ times. One-off probes go in temp Python files that get deleted.
  2. Every multi-turn job (>3 turns) starts with a plan file. Re-read on every turn. Update at end. Master History entry references the plan file path.
  3. BSP_OPEN_BACKLOG.md is the single source of truth for open work. Read at session start. Updated at session end. TaskCreate/Update are session-local convenience only.
  4. Risky actions (Google Ads mutations, Cloudflare purge_everything, theme file overwrite, snippet delete, postmeta full-replace) require explicit confirmation from Robert before execution. Logged in RISKY_ACTIONS.md.
  5. Pre-tool guards on autonomous mutators. weather_budget_mutator and similar agents must read STANDING_DECISIONS.md before mutation and abort if violating a standing decision.
  6. Master History entries adopt the Pi-summary structure: Goal / Constraints / Progress / Key decisions / Next steps. Proof tables nest under those headers.
  7. Architectural forks get branched, not patched. When facing "patch vs rebuild?", write both options to a plan file. Build the smaller probe. Pick winner. Archive loser with a "why this failed" note in the plan.
  8. Snapshot postmeta + theme files before risky writes so restore is one command, not a debug session.
  9. Deploy is one chained command — apply + cache purge + audit + screenshot + diff. No "I'll verify in the next turn."

6. How to use this doc when reviewing my future work

If you (Robert, Kalen, Stephanie) review something I shipped and feel the "this is sloppy" vibe, the diagnostic isn't "is the code wrong?" — it's "which Pi weapon did I skip?" Use this table:

Symptom you seeWeapon I skipped
Diagnostic snippets accumulating, surface getting clutteredWeapon 1 (minimal default tools)
I forget what we decided 2 turns agoWeapon 2 (write plans to files) + Weapon 3 (external TODO)
Same lesson surfaces multiple times across sessionsWeapon 4 (skills with progressive disclosure) — the lesson isn't loaded as a skill yet
I patch over a broken thing instead of trying a fresh approachWeapon 5 (session branching) + Weapon 6 (tree-navigation)
Master History entries don't tell you what to do next sessionWeapon 7 (structured compaction)
I did a risky thing without askingWeapon 8 (permission gates)
An autonomous mutator regressed a standing decisionWeapon 9 (pre-tool guards)
I shipped without doing the verify pipelineWeapon 10 (post-tool middleware)
SOPs / recipes can't be invoked by name with argumentsWeapon 11 (prompt templates)
Repeated SSH plumbing in every Python scriptWeapon 12 (tool delegation wrapper)
The framing change Mario forces: the agent isn't a productivity firehose. It's a system you build to enforce discipline. Each weapon is friction that prevents a specific class of slop. Friction = comprehension. Friction = the agent staying useful past the demo.

Source: github.com/badlogic/pi-mono/tree/main/packages/coding-agent · 21 docs files audited · Read companion BSP Bricks Codebase Documentation for the practical artifacts this session shipped.