← Back to Index

Carmee Agent Deployment - First Round Learning Exercise

Status: Ready for Infrastructure & Tech discussion
Owner: Quan + Minnie + Infrastructure & Tech group
Date: February 17, 2026

Context

Goal: Deploy Carmee assistant agent as first operational deployment, use as learning exercise to validate architecture and gather feedback for hardening.

Quan's Directive:

"Let's use the Carmee deployment as the first round and gather feedback to see how we can harden the deployment later."

Strategic Rationale:

Perfect architecture without deployment = theoretical value
Imperfect deployment with learning loop = real feedback → better hardening
Carmee's workflow is bounded (sales drafts, CRM context) - lower risk than multi-domain agents
Unknown unknowns only emerge from production usage

Current Status ✅

All Operational Tools Working:

✅ Gmail OAuth (3 accounts, auto-refresh hourly)
✅ Google Calendar (multi-calendar integration)
✅ Google Drive (meeting notes search)
✅ Zoho CRM (just authorized Feb 17, deals/contacts/notes READ access, validated with live data)
✅ Weather API (wttr.in)
✅ Package tracking (UPS API)
✅ Auto-commit (hourly workspace backup)
✅ Tailscale mesh (VPS accessible at 100.72.11.53)

All auto-refresh mechanisms host-level (survive container rebuilds).

Carmee Agent Mission

Problem

Carmee manually drafts sales replies in Zoho Cliq
Uses custom GPT (Klansys-built) with manual CRM context copy/paste
Rigid programmatic routing (brittle on edge cases)
Manual workflow: Cliq → ChatGPT → Zoho CRM → back to Cliq
Time per inquiry: 15-20 minutes (with context switching)

Solution

Agentic assistant that:

Pulls live Zoho CRM data (deals, contacts, notes)
Classifies inquiry pathway (1-7) with confidence scores
Generates context-enriched draft replies
Provides guidance notes (decision cycle, emphasis points, stakeholder type)
Suggests next steps (follow-up timing, escalation triggers)

Target: 10 min/inquiry (5-10 min saved vs. current workflow)

Architectural Context

Three Scaling Traps Identified (Earlier Feedback)

1. ❌ Segment Routing Before Authority Gate

Problem: If inquiry mentions "grant" OR from "*.k12.ca.us" → route to grant-specialized K-12 agent
Issue: Site-level inquiry (no purchasing authority) gets routed to agent that uses Pilot language/rollout framing before authority verification
Fix needed: Gate → Allowed Pathways → Segment Specialization (authority bounds specialization)

2. ❌ Manual SSH Fleet Updates

Problem: git push → SSH fan-out → restart agents manually
Issue: At N=30, you become "restart daemon" orchestrating deployments
Fix needed: Autonomous governance watchdog (systemd timer, self-pull governance, atomic symlink switch, auto-restart)

3. ❌ Documented But Not Enforced

Problem: "Watchdog pattern documented" but not baked into Golden Image
Issue: One forgotten timer = silent governance fork at scale
Fix needed: Bake governance-sync.timer into substrate (impossible to spawn without it)

When to Fix These?

Two approaches:

Option A: Fix Before Deployment (Safer)

Implement governance watchdog now
Bake into Golden Image
Deploy Carmee from hardened foundation
Pros: No technical debt, scales cleanly
Cons: Delays deployment, theoretical fixes (not validated against real usage)

Option B: Deploy First, Harden Later (Faster, More Learning)

Deploy Carmee with current tooling
Observe friction in production
Fix scaling traps based on actual pain points
Pros: Real feedback informs better fixes, faster to value
Cons: Some rework later, but bounded (Carmee is single agent)

Quan's Choice: Option B (deploy first, gather feedback)

Deployment Plan

Phase 0: Shadow Mode Validation (This Week: Feb 17-22)

Goal: Prove agentic engine beats custom GPT before Carmee integration

Location: Main session (not separate agent yet)

Steps:

Pull 5-10 sample inquiries Carmee handled
Generate shadow drafts:
- Pull CRM context (deal stage, contact history, notes)
- Pull Drive context (meeting notes, proposals)
- Pull Gmail context (email thread history)
- Classify pathway (1-7)
- Generate draft reply
- Provide guidance notes
- Suggest next steps
Compare vs. custom GPT output
Quan reviews quality:
- Pathway accuracy (correct classification?)
- Draft quality (1-5 rating)
- Context relevance (pulled right data?)
- Time saved estimate (vs. current workflow)

Success Criteria:

Pathway classification: 90%+ accuracy (9/10 correct)
Draft quality: 4.0+ avg rating
Context relevance: 4.0+ avg rating
Time saved: ≥10 min/inquiry

Validation Metric: 10 shadow tests across pathways 1-4 before going live.

Logging: All test results in working/ops/carmee-agent-shadow-testing.md

Phase 1: Parallel Cliq Integration (Week 2-3: Feb 25-Mar 2)

If shadow mode validates (90%+ accuracy, 4.0+ quality):

Steps:

Set up Zoho Cliq bot integration (OAuth, webhook receiver)
Deploy in parallel with custom GPT:
- Carmee sees both outputs (A/B comparison)
- Carmee chooses which to use (not forced)
- Track preference, edit rate, time saved
Blind testing (if possible): Carmee doesn't know which is agent vs. GPT
Measure:
- Draft quality (Carmee's subjective rating)
- Time saved (per inquiry)
- Edit rate (how much does Carmee modify before sending?)
- Preference (does Carmee prefer agent output?)

Location: Still Main session (not separate VPS yet)

Success Criteria:

Carmee prefers agent output >70% of time
Time saved: 10+ min/inquiry
Edit rate: <30% of draft modified
Quality rating: 4.0+ avg

Phase 2: Dedicated Agent (Week 4+: Mar 5+)

If Cliq integration succeeds:

Provision dedicated VPS:

Provider: Hetzner (€3.79/mo) vs. Vultr ($6/mo)
- Recommendation: Hetzner (37% savings, EU-based)
Size: Small (1 vCPU, 1GB RAM)
Region: EU (Frankfurt) or US (depends on CRM latency)

Agent configuration:

Name: minnie-carmee
Tailscale hostname: minnie-carmee (100.72.11.x)
Workspace: /opt/agents/carmee/
Credentials: Zoho CRM (READ only), Zoho Cliq bot
Budget cap: $150/month
- 80% ($120) → Alert
- 100% ($150) → Pause (hard stop)

Capabilities:

Zoho CRM API (deals, contacts, notes)
Zoho Cliq bot (receive inquiries, post drafts)
Google Drive (meeting notes, optional)
Gmail (email context, optional)

Learning loop:

Track Carmee's edits before sending
Adapt style over time (less generic, more Carmee-like)
Log patterns (common modifications, preferred phrasing)

Expansion:

Start with pathways 1-4 (grant, nonprofit, city, camps)
Expand to pathways 5-7 (operator, pilot, professional) after validation

Phase 3: Hardening & Scale (Month 2+: Mar 15+)

After Carmee agent proven (30-day pilot successful):

Implement governance hardening:

Governance watchdog (systemd timer):
- governance-sync.service - Self-pull governance from Git
- governance-sync.timer - Run every 60-90 sec
- Atomic symlink switch (no downtime)
- Auto-restart pathway-api after governance update
Golden Image with baked watchdog:
- systemctl enable governance-sync.timer during image build
- Impossible to spawn agent without watchdog
- All future agents inherit autonomous governance sync
Test autonomous governance sync:
- git push governance change
- Verify fleet converges within 90 sec without manual intervention
- No SSH fan-out, no orchestration script
Fix segment routing (gate-bounded specialization):
- Authority verification BEFORE pathway specialization
- No Pilot language leaks to unauthorized inquiries
Customer-context repo tier (if needed for multi-customer agents):
- Per-customer private repos
- Boot order: substrate → ops-context (global) → customer-context (override)
- Prevents cross-customer governance leakage

Deploy additional agents:

Dev Agent (GitHub monitoring, Malachi support)
Ops Agent (system monitoring, alerts)
Sales Agent (CRM automation, pipeline tracking)

All from hardened foundation (validated substrate, proven patterns).

Infrastructure Work Required

Immediate (for Shadow Mode - This Week)

Zoho CRM OAuth (done Feb 17)
Test CRM API access (validated, pulling live deals)
Document sample inquiry testing protocol
Create logging format for draft comparisons
Pull 5-10 sample inquiries from Carmee's history

Near-term (for Cliq Integration - Week 2-3)

Zoho Cliq bot setup:
- OAuth credentials (Cliq app registration)
- Webhook receiver (POST /inquiry → draft generation)
- Bot message formatting (Cliq markdown)
Draft formatting (Cliq vs. email markdown differences)
Error handling:
- CRM API failures (timeout, rate limit, invalid token)
- Cliq webhook failures (retry logic, dead letter queue)
Testing harness (send test inquiry → verify draft → measure latency)

Mid-term (for Dedicated Agent - Week 4)

Hetzner VPS provisioning:
- Provision script (tools/provision-vps.py --provider hetzner)
- Cloud-init bootstrap (Docker, Tailscale, agent workspace)
- Cost tracking (Hetzner billing API)
Tailscale auto-join:
- Pre-authorized authkey (ephemeral or reusable?)
- Hostname registration (minnie-carmee)
Budget enforcement:
- tools/budget-enforcer.py (check before each API call)
- Budget tracking file (budget.json)
- Alert mechanism (Telegram to Infrastructure group)
Fleet cost dashboard:
- Aggregate logs from all agents
- Generate markdown dashboard
- Accessible via Tailscale (http://100.72.11.53:9876/fleet-cost-dashboard.md)

Long-term (for Hardened Substrate - Month 2+)

Governance watchdog:
- infra/boot/governance-sync.service
- infra/boot/governance-sync.timer
- systemctl enable during image build
Golden Image snapshot:
- Docker image with baked watchdog
- Boot-time governance validation
- Push to registry (Docker Hub or private)
Customer-context repo tier:
- Per-customer repos (if multi-customer agents needed)
- Boot-order injection
- Credential isolation

Decision Points for Infrastructure & Tech Group

1. Shadow Mode Deployment Location

Options:

A. Main session (simplest, no new infra)
B. Test VPS (isolate Carmee work from Main)

Recommendation: A (Main session) - faster, no provisioning overhead for shadow testing

2. Governance Watchdog Timing

Options:

A. Before Carmee dedicated agent (safer, scales cleanly)
B. After Carmee validates (faster, informed by real usage)

Recommendation: B (after validation) per Quan's directive - deploy first, harden later

3. VPS Provider for Carmee Agent

Options:

A. Hetzner (€3.79/mo, EU-based, 37% cheaper)
B. Vultr ($6/mo, current provider, known quantity)

Recommendation: A (Hetzner) - significant cost savings, Docker/Tailscale work on both

4. Budget Cap for Carmee Agent

Proposed: $150/month

Lower than Dev Agent ($200) due to narrower scope
Based on estimated API usage:
- ~300 inquiries/month (Carmee's current volume)
- ~1500 tokens/inquiry (CRM pull + draft generation)
- Claude Sonnet 4.5: ~$0.30/inquiry
- Total: ~$90/month API + $50 buffer
80% alert at $120
100% pause at $150

Question: Is $150 appropriate or should it be higher/lower?

5. Fleet Dashboard Priority

Options:

A. Build before first dedicated agent (better visibility from start)
B. Build after Carmee validates (faster to deployment)

Recommendation: B (after validation) - Main session has basic tracking, dashboard can wait

6. Learning Loop Implementation

How should Carmee agent learn from edits?

Option A: Manual review (weekly, Quan reviews edit logs)
Option B: Automated pattern detection (LLM analyzes edits, suggests style adjustments)
Option C: Hybrid (automated pattern detection + weekly human review)

Recommendation: Start with A (manual), graduate to C (hybrid) after patterns emerge

Success Metrics (30-Day Pilot)

Carmee agent must demonstrate:

Metric	Target	Measurement
Pathway classification accuracy	≥90%	Quan's assessment during shadow mode
Draft quality rating	≥4.0 avg	Carmee's subjective rating (1-5 scale)
Time saved per inquiry	≥10 min	Before: 15-20 min, After: 5-10 min
Cost	<$150/month	API calls + VPS cost tracked
Security	Zero incidents	CRM data exposure, unauthorized API calls
Learning loop	Adapting over time	Edit rate decreases week-over-week
Uptime	>99%	Agent responds within 10 sec of inquiry

If all pass:

Proceed to Dev Agent, Ops Agent (validated architecture)
Implement governance hardening
Scale to multi-agent fleet

If any fail:

Document learnings
Iterate fixes
Test again (fail fast, learn faster)

Risk Mitigation

Risk: Agent output worse than custom GPT

Mitigation: Shadow mode catches this before Carmee sees it
Fallback: Keep custom GPT, iterate on agent, test again
Indicator: Draft quality rating <4.0 or Carmee prefers GPT >70% of time

Risk: CRM API rate limits or access issues

Mitigation: Cache deal data, handle API errors gracefully
Fallback: Manual inquiry forwarding (no CRM pull) for testing
Indicator: CRM API failures >5% of requests

Risk: Carmee resists new tool

Mitigation: Prove quality first in shadow mode, then parallel A/B
Adoption: Carmee chooses which output to use (not forced)
Indicator: Carmee rarely uses agent output even when quality is good

Risk: Klansys feels sidelined

Mitigation: Frame as "building on her foundation," minimal involvement until validated
Communication: Update her once agent is proven, acknowledge her custom GPT work
Indicator: Klansys expresses concerns or disengagement

Risk: Budget overrun

Mitigation: Hard cap at 100% ($150/month), pause agent automatically
Monitoring: Weekly cost review, alert at 80% ($120)
Indicator: Cost trajectory exceeds $150/month projection

Risk: Security incident (CRM data exposure)

Mitigation: READ-only CRM access, audit trail, credential isolation
Monitoring: Log all CRM API calls, weekly security review
Indicator: Unauthorized data access, credential leakage

Open Questions

Shadow mode deployment: Main session (A) or test VPS (B)?
- Leaning: A (Main session) - simplest for shadow testing
Governance watchdog timing: Before (A) or after (B) Carmee validation?
- Leaning: B (after) per Quan's directive
VPS provider: Hetzner (A) or Vultr (B)?
- Leaning: A (Hetzner) - 37% cost savings
Budget cap: $150/month appropriate?
- Leaning: Yes, with 80% alert / 100% pause
Fleet dashboard: Before (A) or after (B) dedicated agent?
- Leaning: B (after) - not blocking for shadow mode
Learning loop: Manual (A), automated (B), or hybrid (C)?
- Leaning: A (manual) initially, graduate to C (hybrid)
Customer-context repo tier: Needed for Carmee or defer?
- Leaning: Defer - Carmee serves one customer (ZTAG), not needed yet

Next Steps

This Week (Feb 17-22)

Infrastructure & Tech group discussion (this thread)
- Decide: Shadow mode location
- Decide: Governance watchdog timing
- Decide: VPS provider
- Decide: Budget cap
- Decide: Fleet dashboard priority
Pull sample inquiries (Quan)
- 5-10 real examples Carmee handled
- Variety of pathways (1-4)
- Include actual CRM context (deal IDs, contact names)
Generate shadow drafts (Minnie)
- Pull CRM data for each inquiry
- Classify pathway
- Generate draft + guidance notes
- Log results in working/ops/carmee-agent-shadow-testing.md
Review quality (Quan)
- Assess pathway accuracy
- Rate draft quality (1-5)
- Estimate time saved
- Decide: proceed to Cliq integration or iterate

Week 2-3 (Feb 25-Mar 2) - If Shadow Mode Validates

Set up Zoho Cliq bot integration
Deploy parallel testing (agent + custom GPT)
Carmee A/B comparison
Measure preference, edit rate, time saved

Week 4+ (Mar 5+) - If Cliq Integration Succeeds

Provision Hetzner VPS
Deploy Carmee agent
30-day pilot begins
Weekly cost/quality review

Month 2+ (Mar 15+) - After Pilot Validates

Implement governance hardening
Snapshot Golden Image
Deploy additional agents (Dev, Ops)
Scale fleet

Document Metadata

Created: February 17, 2026, 8:50 PM UTC
Owner: Infrastructure & Tech group + Quan
Status: Awaiting discussion and decisions
Next Review: After shadow mode completes (est. Feb 22)
Related Docs:
- working/ops/carmee-agent-shadow-testing.md (test logging)
- working/infrastructure/deployment-architecture.md (agent replication)
- analysis/project-minnie-explainer.md (external context)
- analysis/pathways-governance-integration-feb16.md (pathway classification)

Quan: Ready to resume conversation in Infrastructure & Tech group to finalize deployment plan.

Carmee Agent Deployment - First Round Learning Exercise

Context

Current Status ✅

Carmee Agent Mission

Problem

Solution

Architectural Context

Three Scaling Traps Identified (Earlier Feedback)

When to Fix These?

Deployment Plan

Phase 0: Shadow Mode Validation (This Week: Feb 17-22)

Phase 1: Parallel Cliq Integration (Week 2-3: Feb 25-Mar 2)

Phase 2: Dedicated Agent (Week 4+: Mar 5+)

Phase 3: Hardening & Scale (Month 2+: Mar 15+)

Infrastructure Work Required

Immediate (for Shadow Mode - This Week)

Near-term (for Cliq Integration - Week 2-3)

Mid-term (for Dedicated Agent - Week 4)

Long-term (for Hardened Substrate - Month 2+)

Decision Points for Infrastructure & Tech Group

1. Shadow Mode Deployment Location

2. Governance Watchdog Timing

3. VPS Provider for Carmee Agent

4. Budget Cap for Carmee Agent

5. Fleet Dashboard Priority

6. Learning Loop Implementation

Success Metrics (30-Day Pilot)

Risk Mitigation

Risk: Agent output worse than custom GPT

Risk: CRM API rate limits or access issues

Risk: Carmee resists new tool

Risk: Klansys feels sidelined

Risk: Budget overrun

Risk: Security incident (CRM data exposure)

Open Questions

Next Steps

This Week (Feb 17-22)

Week 2-3 (Feb 25-Mar 2) - If Shadow Mode Validates

Week 4+ (Mar 5+) - If Cliq Integration Succeeds

Month 2+ (Mar 15+) - After Pilot Validates

Document Metadata

Share Document Externally