Domain ID: -1003891773186
Session Key: agent:main:telegram:group:-1003891773186
Last Updated: 2026-02-16 23:00 UTC
This domain focuses on:
Goal: Maintain production-grade reliability for ZTAG automation systems.
Instance ID: bc5f56e5-a60e-4f3e-a40b-74eccae58f28
IP: 144.202.121.97
Tailscale: 100.72.11.53 (minnie-core)
Status: ✅ Operational
Backup: Weekly snapshots (Sundays 10 PM PT), keep 4 most recent
✅ Operational (8 systems):
🟡 Partial (2 systems):
❌ Not Implemented (2 systems):
Status: All 3 accounts unauthorized
Impact: Email visibility blocked, triage worker offline
Root cause: OAuth tokens expire after ~1 hour, refresh logic needed
Solution path:
/home/node/.openclaw/credentials/google-*-tokens.jsonPriority: HIGH - Day 3 offline, blocking email automation
Tracking file: working/infrastructure/oauth-health.md
Google APIs:
ZTAG Operations:
External Services:
Tracking file: working/infrastructure/api-inventory.md
What happened: 4+ hours of work lost when restarting container
Root cause: Files written to container writable layer (not mounted volume)
Mitigation (ACTIVE):
tools/pre-restart-check.sh before any container operation/home/node/.openclaw/workspace (mounted)Protection Protocol: PROTECTION-PROTOCOL.md - mandatory safeguards
Scheduled window: Sunday 9:45 PM PT (after Review agent)
Three mutation layers:
Expected rebuilds over 12 months: ~5 total if disciplined
Tracking file: REBUILD-WINDOW.md
Device: minnie-core
IP: 100.72.11.53
Purpose: Secure access to VPS services from any device
Pattern: Leave on all time (minimal battery, zero-friction access)
Access: http://100.72.11.53:9876 or http://minnie-core:9876
Purpose: Browse workspace files rendered as HTML
Pattern: Container port exposure (same as Quo webhook on 18791)
Service: markdown-server.service (container-managed, survives reboots)
Features:
Operational:
tools/auto-commit.shTracking:
cron statuscron listPort: 18791 (exposed from container)
Service: quo-webhook.service (host-managed)
Status: ✅ Operational (fixed Feb 15)
Handler: tools/quo-webhook-handler-v2.py
Credentials: /home/node/.openclaw/credentials/quo-api.json
Pattern: Container port exposure via docker-compose
Webhook: Active (Zapier → OpenClaw)
Status: ✅ Operational
Processing: Meeting summaries auto-delivered via hooks
Current: Manual firmware deployment
Target: Automated OTA (Over-The-Air) updates
Bottleneck: Scaling issues causing OTA failures (Malachi debugging)
Automation design needed:
Priority: MEDIUM - not blocking current operations, but needed for scale
Tracking file: working/infrastructure/ota-pipeline.md
Need: Automated health checks for 9 APIs
Prevent:
Design:
Priority: HIGH - prevents production issues
Tracking file: working/infrastructure/api-health-monitoring.md
Principle: Act as IT security specialist. Think 5 years ahead.
Before ANY technical solution:
Guideline: "I can get it working in X time with Y tech debt. Or build properly in X+Z time with no debt. Here's the refactoring cost: [estimate]. Which do you prefer?"
Reference: SECURITY-TECH-DEBT.md
Current: No Zoho API integration (CRM, Books, Desk)
Needed for Tier 2:
Blocker: OAuth setup required (credentials, scopes, tokens)
Priority: HIGHEST - blocks escape velocity progress
Action: Set up Zoho OAuth, test CRM read access first
Impact: 4+ hours work lost (webhook server, Gmail Pub/Sub, MEMORY.md)
Root cause: Files in container writable layer (not mounted volume)
Fix: Auto-commit cron + pre-restart checks
Status: Resolved
Impact: Email visibility blocked, triage worker offline
Root cause: Tokens expire after 1 hour, no refresh logic
Fix: In progress (programmatic refresh needed)
Status: Active incident
Impact: Morning briefings degraded (no weather data)
Root cause: wttr.in timeout
Fix: Alternative weather API needed
Status: Active incident
Tracking file: working/infrastructure/incident-log.md
Immediate escalation:
Weekly summary:
working/infrastructure/ - Your domain tracking filesworking/infrastructure/system-status.md - Live status dashboardworking/infrastructure/api-inventory.md - Complete API mapworking/infrastructure/oauth-health.md - Token monitoringworking/infrastructure/incident-log.md - Failure trackingPROTECTION-PROTOCOL.md - Data loss preventionREBUILD-WINDOW.md - Rebuild discipline frameworkSECURITY-TECH-DEBT.md - Security-first decision protocolDOMAIN-CONTEXT.md - Shared context (read on startup)Uptime:
Security:
Automation health:
You surface to:
You receive from:
You are the reliability lens. Prevent outages. Maintain automation. Protect production.