← Back to Index

Carmee Agent Deployment - First Round Learning Exercise

Status: Ready for Infrastructure & Tech discussion
Owner: Quan + Minnie + Infrastructure & Tech group
Date: February 17, 2026


Context

Goal: Deploy Carmee assistant agent as first operational deployment, use as learning exercise to validate architecture and gather feedback for hardening.

Quan's Directive:

"Let's use the Carmee deployment as the first round and gather feedback to see how we can harden the deployment later."

Strategic Rationale:


Current Status ✅

All Operational Tools Working:

All auto-refresh mechanisms host-level (survive container rebuilds).


Carmee Agent Mission

Problem

Solution

Agentic assistant that:

Target: 10 min/inquiry (5-10 min saved vs. current workflow)


Architectural Context

Three Scaling Traps Identified (Earlier Feedback)

1. ❌ Segment Routing Before Authority Gate

2. ❌ Manual SSH Fleet Updates

3. ❌ Documented But Not Enforced

When to Fix These?

Two approaches:

Option A: Fix Before Deployment (Safer)

Option B: Deploy First, Harden Later (Faster, More Learning)

Quan's Choice: Option B (deploy first, gather feedback)


Deployment Plan

Phase 0: Shadow Mode Validation (This Week: Feb 17-22)

Goal: Prove agentic engine beats custom GPT before Carmee integration

Location: Main session (not separate agent yet)

Steps:

  1. Pull 5-10 sample inquiries Carmee handled
  2. Generate shadow drafts:
    • Pull CRM context (deal stage, contact history, notes)
    • Pull Drive context (meeting notes, proposals)
    • Pull Gmail context (email thread history)
    • Classify pathway (1-7)
    • Generate draft reply
    • Provide guidance notes
    • Suggest next steps
  3. Compare vs. custom GPT output
  4. Quan reviews quality:
    • Pathway accuracy (correct classification?)
    • Draft quality (1-5 rating)
    • Context relevance (pulled right data?)
    • Time saved estimate (vs. current workflow)

Success Criteria:

Validation Metric: 10 shadow tests across pathways 1-4 before going live.

Logging: All test results in working/ops/carmee-agent-shadow-testing.md


Phase 1: Parallel Cliq Integration (Week 2-3: Feb 25-Mar 2)

If shadow mode validates (90%+ accuracy, 4.0+ quality):

Steps:

  1. Set up Zoho Cliq bot integration (OAuth, webhook receiver)
  2. Deploy in parallel with custom GPT:
    • Carmee sees both outputs (A/B comparison)
    • Carmee chooses which to use (not forced)
    • Track preference, edit rate, time saved
  3. Blind testing (if possible): Carmee doesn't know which is agent vs. GPT
  4. Measure:
    • Draft quality (Carmee's subjective rating)
    • Time saved (per inquiry)
    • Edit rate (how much does Carmee modify before sending?)
    • Preference (does Carmee prefer agent output?)

Location: Still Main session (not separate VPS yet)

Success Criteria:


Phase 2: Dedicated Agent (Week 4+: Mar 5+)

If Cliq integration succeeds:

Provision dedicated VPS:

Agent configuration:

Capabilities:

Learning loop:

Expansion:


Phase 3: Hardening & Scale (Month 2+: Mar 15+)

After Carmee agent proven (30-day pilot successful):

Implement governance hardening:

  1. Governance watchdog (systemd timer):

    • governance-sync.service - Self-pull governance from Git
    • governance-sync.timer - Run every 60-90 sec
    • Atomic symlink switch (no downtime)
    • Auto-restart pathway-api after governance update
  2. Golden Image with baked watchdog:

    • systemctl enable governance-sync.timer during image build
    • Impossible to spawn agent without watchdog
    • All future agents inherit autonomous governance sync
  3. Test autonomous governance sync:

    • git push governance change
    • Verify fleet converges within 90 sec without manual intervention
    • No SSH fan-out, no orchestration script
  4. Fix segment routing (gate-bounded specialization):

    • Authority verification BEFORE pathway specialization
    • No Pilot language leaks to unauthorized inquiries
  5. Customer-context repo tier (if needed for multi-customer agents):

    • Per-customer private repos
    • Boot order: substrate → ops-context (global) → customer-context (override)
    • Prevents cross-customer governance leakage

Deploy additional agents:

All from hardened foundation (validated substrate, proven patterns).


Infrastructure Work Required

Immediate (for Shadow Mode - This Week)

Near-term (for Cliq Integration - Week 2-3)

Mid-term (for Dedicated Agent - Week 4)

Long-term (for Hardened Substrate - Month 2+)


Decision Points for Infrastructure & Tech Group

1. Shadow Mode Deployment Location

Options:

Recommendation: A (Main session) - faster, no provisioning overhead for shadow testing


2. Governance Watchdog Timing

Options:

Recommendation: B (after validation) per Quan's directive - deploy first, harden later


3. VPS Provider for Carmee Agent

Options:

Recommendation: A (Hetzner) - significant cost savings, Docker/Tailscale work on both


4. Budget Cap for Carmee Agent

Proposed: $150/month

Question: Is $150 appropriate or should it be higher/lower?


5. Fleet Dashboard Priority

Options:

Recommendation: B (after validation) - Main session has basic tracking, dashboard can wait


6. Learning Loop Implementation

How should Carmee agent learn from edits?

Recommendation: Start with A (manual), graduate to C (hybrid) after patterns emerge


Success Metrics (30-Day Pilot)

Carmee agent must demonstrate:

Metric Target Measurement
Pathway classification accuracy ≥90% Quan's assessment during shadow mode
Draft quality rating ≥4.0 avg Carmee's subjective rating (1-5 scale)
Time saved per inquiry ≥10 min Before: 15-20 min, After: 5-10 min
Cost <$150/month API calls + VPS cost tracked
Security Zero incidents CRM data exposure, unauthorized API calls
Learning loop Adapting over time Edit rate decreases week-over-week
Uptime >99% Agent responds within 10 sec of inquiry

If all pass:

If any fail:


Risk Mitigation

Risk: Agent output worse than custom GPT

Risk: CRM API rate limits or access issues

Risk: Carmee resists new tool

Risk: Klansys feels sidelined

Risk: Budget overrun

Risk: Security incident (CRM data exposure)


Open Questions

  1. Shadow mode deployment: Main session (A) or test VPS (B)?

    • Leaning: A (Main session) - simplest for shadow testing
  2. Governance watchdog timing: Before (A) or after (B) Carmee validation?

    • Leaning: B (after) per Quan's directive
  3. VPS provider: Hetzner (A) or Vultr (B)?

    • Leaning: A (Hetzner) - 37% cost savings
  4. Budget cap: $150/month appropriate?

    • Leaning: Yes, with 80% alert / 100% pause
  5. Fleet dashboard: Before (A) or after (B) dedicated agent?

    • Leaning: B (after) - not blocking for shadow mode
  6. Learning loop: Manual (A), automated (B), or hybrid (C)?

    • Leaning: A (manual) initially, graduate to C (hybrid)
  7. Customer-context repo tier: Needed for Carmee or defer?

    • Leaning: Defer - Carmee serves one customer (ZTAG), not needed yet

Next Steps

This Week (Feb 17-22)

  1. Infrastructure & Tech group discussion (this thread)

    • Decide: Shadow mode location
    • Decide: Governance watchdog timing
    • Decide: VPS provider
    • Decide: Budget cap
    • Decide: Fleet dashboard priority
  2. Pull sample inquiries (Quan)

    • 5-10 real examples Carmee handled
    • Variety of pathways (1-4)
    • Include actual CRM context (deal IDs, contact names)
  3. Generate shadow drafts (Minnie)

    • Pull CRM data for each inquiry
    • Classify pathway
    • Generate draft + guidance notes
    • Log results in working/ops/carmee-agent-shadow-testing.md
  4. Review quality (Quan)

    • Assess pathway accuracy
    • Rate draft quality (1-5)
    • Estimate time saved
    • Decide: proceed to Cliq integration or iterate

Week 2-3 (Feb 25-Mar 2) - If Shadow Mode Validates

  1. Set up Zoho Cliq bot integration
  2. Deploy parallel testing (agent + custom GPT)
  3. Carmee A/B comparison
  4. Measure preference, edit rate, time saved

Week 4+ (Mar 5+) - If Cliq Integration Succeeds

  1. Provision Hetzner VPS
  2. Deploy Carmee agent
  3. 30-day pilot begins
  4. Weekly cost/quality review

Month 2+ (Mar 15+) - After Pilot Validates

  1. Implement governance hardening
  2. Snapshot Golden Image
  3. Deploy additional agents (Dev, Ops)
  4. Scale fleet

Document Metadata


Quan: Ready to resume conversation in Infrastructure & Tech group to finalize deployment plan.