← Back to Index

Infrastructure Incident Log

Purpose: Track infrastructure failures, root causes, and mitigations to prevent recurrence.

Template

## YYYY-MM-DD: [Incident Title]
- **Severity:** Critical | High | Medium | Low
- **Duration:** [start time] - [end time] ([total duration])
- **Impact:** [what broke, who/what was affected]
- **Root Cause:** [technical explanation]
- **Detection:** [how we found out]
- **Resolution:** [what fixed it]
- **Prevention:** [changes to prevent recurrence]
- **Related:** [links to PRs, commits, docs]

2026-02-11: Container Restart Data Loss

Lesson: Container writable layer is ephemeral. All critical data must be in mounted volumes OR committed to git. No exceptions.


2026-02-13: Mid-Week Rebuild Temptation

Lesson: Infrastructure changes fragment founder attention. Batch mutations weekly. Protect deep-work blocks above all else.


2026-02-15: Quo Webhook Server 502 Error

Lesson: Minimize dependencies for critical services. Stdlib > frameworks for simple use cases.


2026-02-13 to 2026-02-17: Gmail OAuth Unauthorized (Day 3+ Offline)

Lesson: OAuth access tokens are short-lived (1 hour). For production reliability, automated refresh is mandatory, not optional. Cron-based refresh is simpler than daemon for low-volume APIs.


Incident Metrics

Total Incidents: 3

Mean Time to Detect (MTTD)

Mean Time to Resolve (MTTR)

Prevention Effectiveness


Next Review: Weekly (Sunday rebuild window)
Monitoring: Alert critical incidents to Infrastructure & Tech group immediately