Carmee Agent Deployment - First Round Learning Exercise
Status: Ready for Infrastructure & Tech discussion
Owner: Quan + Minnie + Infrastructure & Tech group
Date: February 17, 2026
Context
Goal: Deploy Carmee assistant agent as first operational deployment, use as learning exercise to validate architecture and gather feedback for hardening.
Quan's Directive:
"Let's use the Carmee deployment as the first round and gather feedback to see how we can harden the deployment later."
Strategic Rationale:
- Perfect architecture without deployment = theoretical value
- Imperfect deployment with learning loop = real feedback → better hardening
- Carmee's workflow is bounded (sales drafts, CRM context) - lower risk than multi-domain agents
- Unknown unknowns only emerge from production usage
Current Status ✅
All Operational Tools Working:
- ✅ Gmail OAuth (3 accounts, auto-refresh hourly)
- ✅ Google Calendar (multi-calendar integration)
- ✅ Google Drive (meeting notes search)
- ✅ Zoho CRM (just authorized Feb 17, deals/contacts/notes READ access, validated with live data)
- ✅ Weather API (wttr.in)
- ✅ Package tracking (UPS API)
- ✅ Auto-commit (hourly workspace backup)
- ✅ Tailscale mesh (VPS accessible at 100.72.11.53)
All auto-refresh mechanisms host-level (survive container rebuilds).
Carmee Agent Mission
Problem
- Carmee manually drafts sales replies in Zoho Cliq
- Uses custom GPT (Klansys-built) with manual CRM context copy/paste
- Rigid programmatic routing (brittle on edge cases)
- Manual workflow: Cliq → ChatGPT → Zoho CRM → back to Cliq
- Time per inquiry: 15-20 minutes (with context switching)
Solution
Agentic assistant that:
- Pulls live Zoho CRM data (deals, contacts, notes)
- Classifies inquiry pathway (1-7) with confidence scores
- Generates context-enriched draft replies
- Provides guidance notes (decision cycle, emphasis points, stakeholder type)
- Suggests next steps (follow-up timing, escalation triggers)
Target: 10 min/inquiry (5-10 min saved vs. current workflow)
Architectural Context
Three Scaling Traps Identified (Earlier Feedback)
1. ❌ Segment Routing Before Authority Gate
- Problem: If inquiry mentions "grant" OR from "*.k12.ca.us" → route to grant-specialized K-12 agent
- Issue: Site-level inquiry (no purchasing authority) gets routed to agent that uses Pilot language/rollout framing before authority verification
- Fix needed: Gate → Allowed Pathways → Segment Specialization (authority bounds specialization)
2. ❌ Manual SSH Fleet Updates
- Problem:
git push → SSH fan-out → restart agents manually
- Issue: At N=30, you become "restart daemon" orchestrating deployments
- Fix needed: Autonomous governance watchdog (systemd timer, self-pull governance, atomic symlink switch, auto-restart)
3. ❌ Documented But Not Enforced
- Problem: "Watchdog pattern documented" but not baked into Golden Image
- Issue: One forgotten timer = silent governance fork at scale
- Fix needed: Bake
governance-sync.timer into substrate (impossible to spawn without it)
When to Fix These?
Two approaches:
Option A: Fix Before Deployment (Safer)
- Implement governance watchdog now
- Bake into Golden Image
- Deploy Carmee from hardened foundation
- Pros: No technical debt, scales cleanly
- Cons: Delays deployment, theoretical fixes (not validated against real usage)
Option B: Deploy First, Harden Later (Faster, More Learning)
- Deploy Carmee with current tooling
- Observe friction in production
- Fix scaling traps based on actual pain points
- Pros: Real feedback informs better fixes, faster to value
- Cons: Some rework later, but bounded (Carmee is single agent)
Quan's Choice: Option B (deploy first, gather feedback)
Deployment Plan
Phase 0: Shadow Mode Validation (This Week: Feb 17-22)
Goal: Prove agentic engine beats custom GPT before Carmee integration
Location: Main session (not separate agent yet)
Steps:
- Pull 5-10 sample inquiries Carmee handled
- Generate shadow drafts:
- Pull CRM context (deal stage, contact history, notes)
- Pull Drive context (meeting notes, proposals)
- Pull Gmail context (email thread history)
- Classify pathway (1-7)
- Generate draft reply
- Provide guidance notes
- Suggest next steps
- Compare vs. custom GPT output
- Quan reviews quality:
- Pathway accuracy (correct classification?)
- Draft quality (1-5 rating)
- Context relevance (pulled right data?)
- Time saved estimate (vs. current workflow)
Success Criteria:
Validation Metric: 10 shadow tests across pathways 1-4 before going live.
Logging: All test results in working/ops/carmee-agent-shadow-testing.md
Phase 1: Parallel Cliq Integration (Week 2-3: Feb 25-Mar 2)
If shadow mode validates (90%+ accuracy, 4.0+ quality):
Steps:
- Set up Zoho Cliq bot integration (OAuth, webhook receiver)
- Deploy in parallel with custom GPT:
- Carmee sees both outputs (A/B comparison)
- Carmee chooses which to use (not forced)
- Track preference, edit rate, time saved
- Blind testing (if possible): Carmee doesn't know which is agent vs. GPT
- Measure:
- Draft quality (Carmee's subjective rating)
- Time saved (per inquiry)
- Edit rate (how much does Carmee modify before sending?)
- Preference (does Carmee prefer agent output?)
Location: Still Main session (not separate VPS yet)
Success Criteria:
Phase 2: Dedicated Agent (Week 4+: Mar 5+)
If Cliq integration succeeds:
Provision dedicated VPS:
- Provider: Hetzner (€3.79/mo) vs. Vultr ($6/mo)
- Recommendation: Hetzner (37% savings, EU-based)
- Size: Small (1 vCPU, 1GB RAM)
- Region: EU (Frankfurt) or US (depends on CRM latency)
Agent configuration:
- Name:
minnie-carmee
- Tailscale hostname:
minnie-carmee (100.72.11.x)
- Workspace:
/opt/agents/carmee/
- Credentials: Zoho CRM (READ only), Zoho Cliq bot
- Budget cap: $150/month
- 80% ($120) → Alert
- 100% ($150) → Pause (hard stop)
Capabilities:
- Zoho CRM API (deals, contacts, notes)
- Zoho Cliq bot (receive inquiries, post drafts)
- Google Drive (meeting notes, optional)
- Gmail (email context, optional)
Learning loop:
- Track Carmee's edits before sending
- Adapt style over time (less generic, more Carmee-like)
- Log patterns (common modifications, preferred phrasing)
Expansion:
- Start with pathways 1-4 (grant, nonprofit, city, camps)
- Expand to pathways 5-7 (operator, pilot, professional) after validation
Phase 3: Hardening & Scale (Month 2+: Mar 15+)
After Carmee agent proven (30-day pilot successful):
Implement governance hardening:
Governance watchdog (systemd timer):
governance-sync.service - Self-pull governance from Git
governance-sync.timer - Run every 60-90 sec
- Atomic symlink switch (no downtime)
- Auto-restart pathway-api after governance update
Golden Image with baked watchdog:
systemctl enable governance-sync.timer during image build
- Impossible to spawn agent without watchdog
- All future agents inherit autonomous governance sync
Test autonomous governance sync:
git push governance change
- Verify fleet converges within 90 sec without manual intervention
- No SSH fan-out, no orchestration script
Fix segment routing (gate-bounded specialization):
- Authority verification BEFORE pathway specialization
- No Pilot language leaks to unauthorized inquiries
Customer-context repo tier (if needed for multi-customer agents):
- Per-customer private repos
- Boot order: substrate → ops-context (global) → customer-context (override)
- Prevents cross-customer governance leakage
Deploy additional agents:
- Dev Agent (GitHub monitoring, Malachi support)
- Ops Agent (system monitoring, alerts)
- Sales Agent (CRM automation, pipeline tracking)
All from hardened foundation (validated substrate, proven patterns).
Infrastructure Work Required
Immediate (for Shadow Mode - This Week)
Near-term (for Cliq Integration - Week 2-3)
Mid-term (for Dedicated Agent - Week 4)
Long-term (for Hardened Substrate - Month 2+)
Decision Points for Infrastructure & Tech Group
1. Shadow Mode Deployment Location
Options:
- A. Main session (simplest, no new infra)
- B. Test VPS (isolate Carmee work from Main)
Recommendation: A (Main session) - faster, no provisioning overhead for shadow testing
2. Governance Watchdog Timing
Options:
- A. Before Carmee dedicated agent (safer, scales cleanly)
- B. After Carmee validates (faster, informed by real usage)
Recommendation: B (after validation) per Quan's directive - deploy first, harden later
3. VPS Provider for Carmee Agent
Options:
- A. Hetzner (€3.79/mo, EU-based, 37% cheaper)
- B. Vultr ($6/mo, current provider, known quantity)
Recommendation: A (Hetzner) - significant cost savings, Docker/Tailscale work on both
4. Budget Cap for Carmee Agent
Proposed: $150/month
- Lower than Dev Agent ($200) due to narrower scope
- Based on estimated API usage:
- ~300 inquiries/month (Carmee's current volume)
- ~1500 tokens/inquiry (CRM pull + draft generation)
- Claude Sonnet 4.5: ~$0.30/inquiry
- Total: ~$90/month API + $50 buffer
- 80% alert at $120
- 100% pause at $150
Question: Is $150 appropriate or should it be higher/lower?
5. Fleet Dashboard Priority
Options:
- A. Build before first dedicated agent (better visibility from start)
- B. Build after Carmee validates (faster to deployment)
Recommendation: B (after validation) - Main session has basic tracking, dashboard can wait
6. Learning Loop Implementation
How should Carmee agent learn from edits?
- Option A: Manual review (weekly, Quan reviews edit logs)
- Option B: Automated pattern detection (LLM analyzes edits, suggests style adjustments)
- Option C: Hybrid (automated pattern detection + weekly human review)
Recommendation: Start with A (manual), graduate to C (hybrid) after patterns emerge
Success Metrics (30-Day Pilot)
Carmee agent must demonstrate:
| Metric |
Target |
Measurement |
| Pathway classification accuracy |
≥90% |
Quan's assessment during shadow mode |
| Draft quality rating |
≥4.0 avg |
Carmee's subjective rating (1-5 scale) |
| Time saved per inquiry |
≥10 min |
Before: 15-20 min, After: 5-10 min |
| Cost |
<$150/month |
API calls + VPS cost tracked |
| Security |
Zero incidents |
CRM data exposure, unauthorized API calls |
| Learning loop |
Adapting over time |
Edit rate decreases week-over-week |
| Uptime |
>99% |
Agent responds within 10 sec of inquiry |
If all pass:
- Proceed to Dev Agent, Ops Agent (validated architecture)
- Implement governance hardening
- Scale to multi-agent fleet
If any fail:
- Document learnings
- Iterate fixes
- Test again (fail fast, learn faster)
Risk Mitigation
Risk: Agent output worse than custom GPT
- Mitigation: Shadow mode catches this before Carmee sees it
- Fallback: Keep custom GPT, iterate on agent, test again
- Indicator: Draft quality rating <4.0 or Carmee prefers GPT >70% of time
Risk: CRM API rate limits or access issues
- Mitigation: Cache deal data, handle API errors gracefully
- Fallback: Manual inquiry forwarding (no CRM pull) for testing
- Indicator: CRM API failures >5% of requests
Risk: Carmee resists new tool
- Mitigation: Prove quality first in shadow mode, then parallel A/B
- Adoption: Carmee chooses which output to use (not forced)
- Indicator: Carmee rarely uses agent output even when quality is good
Risk: Klansys feels sidelined
- Mitigation: Frame as "building on her foundation," minimal involvement until validated
- Communication: Update her once agent is proven, acknowledge her custom GPT work
- Indicator: Klansys expresses concerns or disengagement
Risk: Budget overrun
- Mitigation: Hard cap at 100% ($150/month), pause agent automatically
- Monitoring: Weekly cost review, alert at 80% ($120)
- Indicator: Cost trajectory exceeds $150/month projection
Risk: Security incident (CRM data exposure)
- Mitigation: READ-only CRM access, audit trail, credential isolation
- Monitoring: Log all CRM API calls, weekly security review
- Indicator: Unauthorized data access, credential leakage
Open Questions
Shadow mode deployment: Main session (A) or test VPS (B)?
- Leaning: A (Main session) - simplest for shadow testing
Governance watchdog timing: Before (A) or after (B) Carmee validation?
- Leaning: B (after) per Quan's directive
VPS provider: Hetzner (A) or Vultr (B)?
- Leaning: A (Hetzner) - 37% cost savings
Budget cap: $150/month appropriate?
- Leaning: Yes, with 80% alert / 100% pause
Fleet dashboard: Before (A) or after (B) dedicated agent?
- Leaning: B (after) - not blocking for shadow mode
Learning loop: Manual (A), automated (B), or hybrid (C)?
- Leaning: A (manual) initially, graduate to C (hybrid)
Customer-context repo tier: Needed for Carmee or defer?
- Leaning: Defer - Carmee serves one customer (ZTAG), not needed yet
Next Steps
This Week (Feb 17-22)
Infrastructure & Tech group discussion (this thread)
- Decide: Shadow mode location
- Decide: Governance watchdog timing
- Decide: VPS provider
- Decide: Budget cap
- Decide: Fleet dashboard priority
Pull sample inquiries (Quan)
- 5-10 real examples Carmee handled
- Variety of pathways (1-4)
- Include actual CRM context (deal IDs, contact names)
Generate shadow drafts (Minnie)
- Pull CRM data for each inquiry
- Classify pathway
- Generate draft + guidance notes
- Log results in
working/ops/carmee-agent-shadow-testing.md
Review quality (Quan)
- Assess pathway accuracy
- Rate draft quality (1-5)
- Estimate time saved
- Decide: proceed to Cliq integration or iterate
Week 2-3 (Feb 25-Mar 2) - If Shadow Mode Validates
- Set up Zoho Cliq bot integration
- Deploy parallel testing (agent + custom GPT)
- Carmee A/B comparison
- Measure preference, edit rate, time saved
Week 4+ (Mar 5+) - If Cliq Integration Succeeds
- Provision Hetzner VPS
- Deploy Carmee agent
- 30-day pilot begins
- Weekly cost/quality review
Month 2+ (Mar 15+) - After Pilot Validates
- Implement governance hardening
- Snapshot Golden Image
- Deploy additional agents (Dev, Ops)
- Scale fleet
Document Metadata
- Created: February 17, 2026, 8:50 PM UTC
- Owner: Infrastructure & Tech group + Quan
- Status: Awaiting discussion and decisions
- Next Review: After shadow mode completes (est. Feb 22)
- Related Docs:
working/ops/carmee-agent-shadow-testing.md (test logging)
working/infrastructure/deployment-architecture.md (agent replication)
analysis/project-minnie-explainer.md (external context)
analysis/pathways-governance-integration-feb16.md (pathway classification)
Quan: Ready to resume conversation in Infrastructure & Tech group to finalize deployment plan.