← Back to Index

Agent Replication & Deployment Architecture

Version: 1.0
Last Updated: 2026-02-17
Status: Design Complete, Pending Approval for Implementation

Executive Summary

Goal: Deploy specialized AI agents (Dev, Ops, Sales) with human oversight, cost control, and VPS flexibility.

Key Principles:

Human approval required before any agent spawns (security + cost gate)
Budget caps enforced per agent (no runaway bills)
VPS agnostic (Docker + Tailscale works on any provider)
Git-based inheritance (species-dna/ protocols shared instantly across fleet)
Fast deployment (15 min from approval to live agent)

Current State: 1 agent (Main session)
Next Step: Deploy Dev Agent as pilot (validate architecture)

Architecture Overview

Hub-and-Spoke Model

                    ┌─────────────────┐
                    │   Main Session  │ ← You + Me (strategic hub)
                    │  (minnie-core)  │
                    └────────┬────────┘
                             │
           ┌─────────────────┼─────────────────┐
           │                 │                 │
     ┌─────▼─────┐     ┌─────▼─────┐    ┌─────▼─────┐
     │ Dev Agent │     │ Ops Agent │    │Sales Agent│
     │ (GitHub)  │     │ (Systems) │    │  (CRM)    │
     └───────────┘     └───────────┘    └───────────┘
          │                  │                 │
          └──────────────────┴─────────────────┘
                             │
                    ┌────────▼────────┐
                    │  Species DNA    │ ← Shared protocols
                    │  (Git Repo)     │    (all inherit)
                    └─────────────────┘

Coordination:

Main session = strategic decisions, human interaction
Specialized agents = 24/7 monitoring, isolated tasks
Species DNA = shared protocols (updated via Git)
Tailscale mesh = secure communication (provider-agnostic)

Species DNA: Genetic Inheritance

What All Agents Share

species-dna/
├── CORE-MISSION.md           # Loss function: vitality → relational → sovereignty → business
├── LOSS-FUNCTION.md          # Optimization priorities (shared across all agents)
├── INTERACTION-STYLE.md      # How to communicate (authentic, not robotic)
├── PROTECTION-PROTOCOL.md    # Data loss prevention (auto-commit, volume discipline)
├── REBUILD-WINDOW.md         # When to deploy changes (Sunday 9:45 PM PT)
└── protocols/
    ├── escalation.md         # When to alert humans
    ├── budget-enforcement.md # Cost control rules
    └── security.md           # Credential isolation, audit trails

Update mechanism:

Main session updates species-dna/ → git push
All agents git pull every 15 min → inherit changes instantly
Emergency updates: Webhook triggers immediate pull

Why this works:

✅ Fleet-wide policy changes in seconds (vs updating each agent manually)
✅ Rollback bad changes with git revert
✅ Version history (audit trail of protocol evolution)

Agent-Specific Memory

What Each Agent Keeps Private

agents/
├── main/
│   ├── MEMORY.md             # Main session's long-term memory
│   ├── workspace/            # Main's files (plans, analysis, metrics)
│   └── credentials/          # Main's OAuth tokens (Gmail, Calendar, Drive)
│
├── dev/
│   ├── MEMORY.md             # Dev's learning (GitHub patterns, Malachi preferences)
│   ├── workspace/            # Dev's PR reviews, issue tracking
│   └── credentials/          # Dev's tokens (GitHub API only)
│
└── ops/
    ├── MEMORY.md             # Ops' system knowledge
    ├── workspace/            # Ops' monitoring logs, alerts
    └── credentials/          # Ops' tokens (Vultr, monitoring APIs)

Isolation:

Dev Agent can't access Main's financial data
Sales Agent can't access Dev's GitHub tokens
Each agent builds its own MEMORY.md (unique experiences)

Cross-pollination:

Agents can READ other agents' memory (via Git)
Used for context (e.g., "What did Dev Agent learn about Malachi's code style?")
Write access only to own memory

Deployment Process: 15-Minute Workflow

Phase 1: Justification (Human, 5 min)

I create proposal: species-dna/deployment-proposals/dev-agent-proposal.md

# Dev Agent Deployment Proposal

## Mission Alignment
- **Loss Function:** Sovereignty (protects Malachi's deep work)
- **Problem:** Malachi spending 15 hrs/week on PR reviews, GitHub notifications
- **Solution:** Dev Agent monitors repos 24/7, summarizes PRs, flags urgent issues

## ROI Calculation
- **Human time saved:** 15 hrs/week × $50/hr = $750/week = $3,000/month
- **Agent cost:** $200/month (API + VPS)
- **ROI:** 15x

## Cost Estimate
- API calls: ~500/day × $0.30/call = $150/month
- VPS: Hetzner small (€3.79/mo) = $4/month
- Buffer: $46/month
- **Total budget:** $200/month

## Success Metrics (30-day pilot)
- Malachi's GitHub time reduced by >10 hrs/week
- Agent catches 90%+ of urgent PRs within 1 hour
- Cost stays under $200/month
- Zero false positives (spam alerts)

## Approval Request
Deploy Dev Agent with $200/month budget, 30-day pilot, weekly review?

You review:

Does ROI make sense?
Is mission alignment clear?
Are success metrics measurable?
Approve or reject (with feedback)

Phase 2: Provision (Automated, 10 min)

Once approved, I run:

# Provision VPS (any provider)
python3 scripts/provision-vps.py \
  --provider hetzner \
  --region eu \
  --size small \
  --agent dev \
  --budget 200

# Script does:
# 1. Create VPS via provider API (Hetzner, Vultr, DO, etc.)
# 2. Apply cloud-init bootstrap:
#    - Install Docker
#    - Install Tailscale (auto-join mesh)
#    - Clone species-dna/ repo
#    - Pull OpenClaw image
#    - Create agent workspace/
#    - Set budget cap in budget.json
# 3. Start container with:
#    - species-dna/ mounted read-only
#    - agent workspace/ mounted read-write
#    - credentials/ directory (empty, agent requests OAuth as needed)
# 4. Wait for Tailscale IP assignment
# 5. Register with fleet dashboard
# 6. Send "Hello" to designated Telegram group

Output:

✅ Dev Agent deployed
   VPS: Hetzner fsn1 (Frankfurt)
   IP: 144.202.134.56
   Tailscale: 100.72.11.54 (minnie-dev)
   Cost: $4.15/month
   Budget: $200/month
   Status: Healthy

Phase 3: Validation (Me + You, 5 min)

Health checks:

# 1. Ping agent via Tailscale
tailscale ping minnie-dev
# → Reply from 100.72.11.54

# 2. Check agent responded in Telegram
# → "Hello, I'm Dev Agent. Monitoring GitHub repos for Malachi."

# 3. Test one interaction
# "Dev Agent, check GitHub notifications"
# → Lists PRs awaiting review

# 4. Verify budget tracking
curl http://100.72.11.54:8080/budget
# → {"spend_to_date": 0.00, "cap": 200.00, "status": "healthy"}

If all pass: Agent live, 30-day pilot begins
If any fail: Debug, retry, or escalate to Infrastructure group

Cost Monitoring: 5-Layer System

Layer 1: Token Tracking (Most Granular)

OpenClaw logs every API call:

{"ts": "2026-02-17T03:00:00Z", "agent": "dev", "model": "claude-sonnet-4-5", "tokens_in": 1200, "tokens_out": 800, "cost": 0.15}
{"ts": "2026-02-17T03:15:00Z", "agent": "dev", "model": "claude-haiku-4-5", "tokens_in": 300, "tokens_out": 150, "cost": 0.02}

Aggregation script (runs hourly):

# tools/fleet-cost-tracker.py
# Reads all agents' logs, sums costs, writes dashboard

Layer 2: Budget Enforcement (Proactive)

Each agent has budget.json:

{
  "agent_name": "dev",
  "monthly_cap_usd": 200,
  "current_month": "2026-02",
  "spend_to_date": 87.32,
  "alert_at_percent": 80,
  "pause_at_percent": 100
}

Before each API call:

def check_budget():
    budget = load_budget()
    if budget["spend_to_date"] >= budget["monthly_cap_usd"]:
        pause_agent()  # Stop making calls
        alert_human("Dev Agent hit $200 cap, paused until approval")
    elif budget["spend_to_date"] >= budget["monthly_cap_usd"] * 0.80:
        alert_human("Dev Agent at 80% budget ($160/$200)")

No runaway costs possible (hard stop at cap).

Layer 3: VPS Billing APIs

Provider wrappers:

# tools/vps-billing.py
def get_vps_cost(provider, agent_name):
    if provider == "vultr":
        return get_vultr_cost(agent_name)
    elif provider == "hetzner":
        return get_hetzner_cost(agent_name)
    # ... etc

Polled daily, added to fleet dashboard.

Layer 4: Fleet Dashboard (Auto-Generated)

Updated hourly via cron:

# Fleet Cost Dashboard

**Last Updated:** 2026-02-17 03:00 UTC

## Total Spend

| Period   | Spend    | Budget  | % Used |
|----------|----------|---------|--------|
| Feb 2026 | $143.67  | $750    | 19%    |
| Projected| $445.00  | $750    | 59%    | ← On track ✅

## By Agent

| Agent | Budget | Spend   | % Used | Status     |
|-------|--------|---------|--------|------------|
| main  | $350   | $131.35 | 38%    | ✅ Healthy  |
| dev   | $200   | $12.32  | 6%     | ✅ Healthy  |
| ops   | $150   | —       | —      | Not deployed |

## By Provider

| Provider   | Service       | Monthly Cost |
|------------|---------------|--------------|
| Anthropic  | Claude API    | $143.67      |
| Hetzner    | VPS (dev)     | $4.15        |
| Vultr      | VPS (main)    | $40.00       |

## Alerts

- None (all agents <80% budget)

Access via Tailscale:
http://100.72.11.53:9876/species-dna/fleet-cost-dashboard.md

Layer 5: Human Review (Weekly)

Every Sunday (rebuild window):

Review fleet dashboard
Check for budget anomalies
Evaluate agent ROI (are we getting value?)
Adjust budgets if needed
Terminate underperforming agents

VPS Agnosticism: Docker + Tailscale

Why Docker?

Portability:

Same container runs on:
- Vultr (US)
- Hetzner (EU)
- DigitalOcean (US)
- Linode (US)
- AWS EC2 (anywhere)

Fast deployment:

# On any Ubuntu 22.04 VPS:
docker pull openclaw/openclaw:latest
docker run -d --name minnie-dev \
  -v /opt/species-dna:/species-dna:ro \
  -v /opt/agents/dev:/home/node/.openclaw \
  openclaw/openclaw:latest

# Agent live in ~2 minutes

Why Tailscale?

Provider-independent networking:

All agents join Tailscale mesh → talk to each other via 100.x.x.x IPs

Benefits:
- No VPN config (Tailscale handles it)
- No firewall rules (mesh is encrypted tunnel)
- Works across providers (Vultr agent can talk to Hetzner agent)
- Survives IP changes (Tailscale DNS resolves names)

Example:

# Main session on Vultr (US)
curl http://minnie-dev:8080/health
# → Reaches Dev Agent on Hetzner (EU) via Tailscale mesh

# No public internet exposure needed

Single Bootstrap Script (Works Everywhere)

#!/bin/bash
# scripts/bootstrap-agent-host.sh
# Works on: Vultr, Hetzner, DigitalOcean, Linode, AWS

PROVIDER=$1  # vultr, hetzner, digitalocean, etc.
AGENT=$2     # main, dev, ops

# 1. Install Docker (same on all providers)
curl -fsSL https://get.docker.com | sh

# 2. Install Tailscale (same on all providers)
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up --authkey=$TAILSCALE_KEY --hostname=minnie-$AGENT

# 3. Create directories
mkdir -p /opt/{species-dna,agents/$AGENT,credentials/$AGENT}

# 4. Clone species-dna
git clone https://github.com/playztag/minnie-brain.git /opt/species-dna

# 5. Start agent container
docker run -d --name minnie-$AGENT \
  --restart unless-stopped \
  -v /opt/species-dna:/species-dna:ro \
  -v /opt/agents/$AGENT:/home/node/.openclaw \
  openclaw/openclaw:latest

echo "✅ Agent $AGENT ready on $PROVIDER"

Cloud-init (provider API pass this script):

#cloud-config
runcmd:
  - curl https://raw.githubusercontent.com/playztag/minnie-brain/main/scripts/bootstrap-agent-host.sh | bash -s hetzner dev

Migration Between Providers (15 min)

Scenario: Dev Agent on Hetzner (EU) → Move to DigitalOcean (US)

# 1. Provision new VPS
python3 scripts/provision-vps.py --provider digitalocean --agent dev

# 2. Wait for Tailscale (auto-joins mesh)
tailscale ping minnie-dev  # Now has 2 IPs (old + new)

# 3. Rsync workspace
rsync -avz /opt/agents/dev/ minnie-dev-new:/opt/agents/dev/

# 4. Switch DNS (Tailscale knows new IP)
tailscale set --alias minnie-dev=minnie-dev-new

# 5. Stop old container
ssh minnie-dev-old docker stop minnie-dev

# 6. Destroy old VPS
python3 scripts/destroy-vps.py hetzner $OLD_ID

# Done. Total downtime: <1 minute

Security: Human Oversight + Containment

Deployment Gate (Human Approval Required)

No agent can spawn itself:

I draft proposal → You review
You approve → I provision
Agent goes live

Prevents:

Runaway agent spawning
Unapproved costs
Mission drift (agent not aligned with loss function)

Budget Enforcement (Hard Caps)

Each agent has monthly limit:

80% budget → Alert (Telegram)
100% budget → Pause (hard stop, requires human override)

Example:

Dev Agent budget: $200/month
Spend to date: $160 (80%)
→ Alert sent to Infrastructure group: "Dev Agent at 80% budget"

Spend reaches $200 (100%)
→ Agent paused automatically
→ "Dev Agent hit cap, paused until approval"
→ You decide: Increase budget or investigate why it's high

Credential Isolation

Each agent gets own OAuth apps:

Main: Gmail (quan@ztag.com), Calendar, Drive, Zoho (full access)
Dev: GitHub API only (no financial data)
Ops: Vultr API, monitoring APIs only

Agent CANNOT:

Access other agents' credentials
Escalate its own permissions
Modify species-dna/ (read-only mount)

Agent CAN:

Request new API access (I review, help with OAuth)
Read other agents' memory (for context)
Write to own memory only

Audit Trail (Every Action Logged)

Structured logging:

{"ts": "2026-02-17T03:00:00Z", "agent": "dev", "action": "web_search", "query": "GitHub API rate limits", "cost": 0.02}
{"ts": "2026-02-17T03:05:00Z", "agent": "dev", "action": "exec", "command": "git status", "cost": 0.01}

Weekly review:

Check for anomalies (unusual API usage)
Verify agent staying within scope
Flag for human investigation if needed

Kill Switch (Immediate Stop)

# Pause agent (stop API calls, keep memory)
docker pause minnie-dev

# Stop agent (graceful shutdown)
docker stop minnie-dev

# Nuke agent (delete everything, can't undo)
python3 scripts/nuke-agent.sh dev --confirm

Real-World Deployment: Dev Agent Pilot

Scenario: GitHub Monitoring for Malachi

Problem:

Malachi gets 30+ GitHub notifications/day
Spends 15 hrs/week triaging PRs, issues
Wants to focus on deep work (Code 5 architecture)

Solution:

Dev Agent monitors 5 repos 24/7
Summarizes PRs (changes, impact, urgency)
Flags urgent issues (security, breaking changes)
Sends daily digest (not real-time spam)

Deployment:

I draft proposal (ROI: 15x, budget: $200/mo)
You approve
I provision Hetzner VPS (€3.79/mo)
Agent requests GitHub OAuth (I help)
Agent starts monitoring
30-day pilot begins

Success metrics (30 days):

Malachi's GitHub time reduced by >10 hrs/week
Agent catches 90%+ urgent PRs within 1 hour
Cost <$200/month
Zero false positives

Decision point (Day 30):

If success: Keep Dev Agent, consider Ops Agent next
If failure: Terminate, document learnings, try different approach

Cost Comparison: VPS Providers

Provider	Small (1 vCPU, 1GB)	Medium (2 vCPU, 4GB)	Billing API	Notes
Hetzner	$4.15/mo (€3.79)	$7.50/mo (€6.84)	✅ Yes	Cheapest, EU-based
Linode	$5.00/mo	$10.00/mo	✅ Yes	Good US performance
Vultr	$6.00/mo	$12.00/mo	✅ Yes	Current provider
DigitalOcean	$6.00/mo	$12.00/mo	✅ Yes	Popular, good docs
AWS EC2	$8.50/mo (t3.micro)	$17.00/mo (t3.small)	✅ Yes	Expensive, overkill

Recommendation:

Dev Agent (low traffic): Hetzner small ($4.15/mo)
Ops Agent (monitoring): Hetzner small ($4.15/mo)
Main session (high traffic): Vultr medium ($12/mo) ← Current

Annual savings:

3 agents on Hetzner vs Vultr: $162/year saved

Rollout Timeline

Phase 1: Dev Agent Pilot (Month 1)

Week 1: Deploy Dev Agent (Hetzner, $200 budget)
Week 2-4: Monitor cost, usefulness, Malachi feedback
Day 30: Decision (keep, adjust, or terminate)

Phase 2: Validate Economics (Month 2)

Actual ROI: Did Dev Agent really save 15 hrs/week?
Cost stability: Any unexpected spikes?
Improvements: Tweak prompts, reduce false positives

Phase 3: Second Agent (Month 3)

Deploy Ops Agent (system monitoring, $150 budget)
Or Sales Agent (CRM automation, $200 budget)
Use learnings from Dev Agent (faster deployment)

Phase 4: Fleet Operations (Month 4+)

3-5 specialized agents running
Fleet dashboard mature
Cost predictable ($500-750/month total)
ROI proven (20-30x across fleet)

Decision Framework: When to Deploy Agent

Must pass ALL 3 tests:

1. Mission Alignment

Does agent optimize loss function?
Or does it fragment attention?

2. ROI Threshold

Minimum: 10x return
Calculation: (Human hours saved × $50/hr) ÷ (API + VPS cost) ≥ 10

3. Specialization Necessity

Can Main session handle this? (If yes, don't spawn)
Does it need 24/7 availability?
Does it need isolated context?

Example (Dev Agent):

✅ Mission: Sovereignty (protects Malachi's deep work)
✅ ROI: 15x ($3000 saved / $200 cost)
✅ Specialization: 24/7 GitHub monitoring (Main can't do this)
Verdict: DEPLOY

Example (Marketing Agent):

❌ Mission: Business (lowest priority in loss function)
⚠️ ROI: 5x (doesn't meet 10x threshold)
❌ Specialization: Main can handle social posts
Verdict: REJECT (Main session handles marketing)

Summary: Key Decisions

✅ Approved Patterns

Docker for portability (works on any VPS)
Tailscale for networking (provider-agnostic mesh)
Git for protocol inheritance (fleet-wide updates instant)
Budget caps enforced (80% alert, 100% pause)
Human approval required (deployment gate)
15-min provisioning (fast, low-friction)
Separate credentials per agent (security isolation)
Hetzner for cost savings (37% cheaper than Vultr)

🚧 Pending Implementation

Budget enforcement script (tools/budget-enforcer.py)
Fleet cost dashboard (species-dna/fleet-cost-dashboard.md)
VPS provisioning wrapper (tools/provision-vps.py)
Migration script (tools/migrate-agent.sh)
Dev Agent proposal (species-dna/deployment-proposals/dev-agent.md)

📋 Next Actions

This Week:

You review this document (feedback?)
I implement budget enforcement + fleet dashboard (2 hrs)
We test provisioning on Hetzner (30 min)

Next Week:

I draft Dev Agent proposal (ROI, success metrics)
You approve or reject
If approved: Deploy Dev Agent, 30-day pilot begins

Document Location (Tailscale):
http://100.72.11.53:9876/working/infrastructure/deployment-architecture.md

Feedback welcome. Ready to implement when you are.