malachi-agentic-workflow-integration.md)Objective: Auto-generate FreeRTOS task scaffolding; Malachi approves structure
Timeline: Week 1-2
Owner: [Pick one senior dev + one tool specialist]
You are generating FreeRTOS task scaffolding for ZTAG Code 5.
Task Name: [e.g., "ButtonHandler"]
Priority: [e.g., "5 (medium)"]
Dependencies: [e.g., "Reads GPIO pin 12, posts event to queue"]
Timeout Behavior: [e.g., "Watchdog: 1 second timeout"]
Generate:
1. Task function (FreeRTOS conventions)
2. Unit test skeleton (test framework: [Unity/Catch2])
3. Architecture comment explaining assumptions
4. TODO list for implementation
Constraints:
- Must match ZTAG Code 5 style guide
- Comments explain WHY, not WHAT
- No functionality—only scaffold
Present generated code to Malachi:
Junior dev fills in logic:
Code review:
Objective: AI flags untested code paths; Malachi decides which tests matter
Timeline: Week 2-3
Owner: [QA lead + tool specialist]
Run code coverage on existing Code 5 module:
# Using gcov or similar
make test-coverage
coverage report > coverage.txt
Create prompt for Claude Code:
We have a C/FreeRTOS codebase (ZTAG Code 5) with partial test coverage.
Analyze this coverage report and identify:
1. Code paths with <50% coverage (priority high)
2. Edge cases that aren't tested (e.g., timeouts, errors)
3. Risky untested scenarios (e.g., concurrent access, boundary conditions)
4. Suggested test cases to close gaps
For each gap, explain why it matters and what test would cover it.
[Paste coverage report]
Generate report:
Malachi reviews & decides:
Junior dev implements approved tests
Objective: AI flags where code and docs diverge; Malachi decides if docs or code is wrong
Timeline: Week 3-4
Owner: [Documentation lead + tool specialist]
Gather comparison data:
develop branch (last 2 weeks)Create prompt for Claude Code:
I'm checking if code and documentation stay in sync.
Recent changes (from PRs):
[List recent changes: "PR #123: Changed MQTT reconnect timeout from 5s to 10s"]
FDD/Documentation:
[Paste relevant sections of design doc]
Identify:
1. Changes that SHOULD have updated docs but didn't
2. Docs that SHOULD be updated but weren't
3. Assumptions in code that aren't documented
4. Discrepancies that indicate potential bugs
For each, flag as:
- "Code needs to be changed" (docs are correct)
- "Docs need to be updated" (code is correct)
- "Needs investigation" (unclear which is right)
Generate report with recommendations
Malachi reviews & decides:
Assign doc updates to team
Meeting with Malachi:
Agenda:
1. Present Phase 1 results
- Time saved (boilerplate, test analysis, doc sync)
- Quality metrics (regressions: 0?)
- Team feedback (was this helpful?)
2. Malachi's assessment
- "Does this add value?"
- "Any concerns?"
- "Ready to expand to Phase 2?"
3. Decision
- GO: Proceed to Phase 2 (expand tools to PR review, refactoring)
- HOLD: Refine Phase 1 tasks, stay longer
- NO-GO: Tools aren't working; pivot strategy
Success = Malachi says: "This freed up time on routine work. Let's expand it."
What it does: Before Malachi manually reviews a PR, automated checks run:
How to set up:
Create GitHub Actions workflow (runs on every PR):
name: Automated Code Review
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run agentic review
run: |
# Call Claude Code via API with PR diff + context
# Generate report as GitHub comment
Claude Code prompt:
Review this PR for Code 5.
Code context:
- Main branch: [paste git diff]
- Related tests: [paste test changes]
- Module changes affect: [list impacted modules]
Check for:
1. Style violations (ZTAG conventions)
2. Uncovered code paths (new logic without tests?)
3. Edge cases (what if timeout? concurrent access?)
4. Similar past bugs (patterns we've seen before)
5. Assumptions that should be documented
Format as:
- ✅ Good: [specific praise]
- ⚠️ Check: [something to verify]
- ❌ Fix: [must address before merge]
Report generated as PR comment (automated, no human needed)
Malachi's review flow:
Junior dev addresses feedback
Key: Malachi still reviews; AI just did the legwork
What it does: AI suggests improvements to a module; Malachi decides if worth doing
Example: "MQTT module has coupling with game state. Could decouple via event queue."
How to set up:
Select a refactoring candidate (pick a module Malachi mentioned improving)
Create Cline task:
Analyze this Code 5 module for refactoring opportunities.
[Paste module code]
Current architecture:
- [How it's used]
- [Known issues/limitations]
Suggest:
1. Three potential improvements (in order of impact)
2. For each: benefit, risk, estimated effort
3. Pro/con of each approach
Don't implement—just analyze.
Generate analysis document
Malachi reviews & decides:
If approved: Team implements with review
What it does: When a customer reports a bug, AI pre-analyzes logs
Example: "Device hangs after 15 min" → AI checks for timeout patterns, memory leaks, watchdog resets
How to set up:
Gather field data:
Create Claude Code prompt:
A customer reported: "[bug description]"
Device logs:
[Paste relevant logs]
Timeline:
[When did it happen? How to reproduce?]
Possible causes (in order of likelihood):
1. [Root cause 1]: [why likely]
2. [Root cause 2]: [why likely]
Tests to run to narrow down:
- [Test 1]: [what would this tell us?]
- [Test 2]
If it's [cause X], the fix is [suggestion].
Generate diagnostic report
Malachi reviews & prioritizes:
Faster bug resolution (pre-analysis saves investigation time)
What: When a new game is added, auto-generate unit test template
Benefit: Ensures consistency, junior dev focuses on assertions not boilerplate
Setup: Similar to Task 1A, but for game-specific test patterns
What: When ESP-IDF version is bumped, AI flags deprecated APIs
Benefit: Proactive (catch incompatibilities before they break builds)
Setup: GitHub Actions + Claude Code analyzing codebase against new API docs
"I've been looking at our bottlenecks. You're blocking on:
- Code review (every PR waits for your availability)
- Documentation sync (manually tracking changes)
- Architecture decisions (juniors need your input)
That's not sustainable at scale. I want to explore tools that handle tedious work so you focus on decisions only you can make.
Would you be willing to try a structured 6-week pilot?"
Him: "Will this actually help or create more work?"
You: "Fair question. Let's start with 3 small tasks—boilerplate generation, test gap analysis, doc sync checking. We measure: time saved, regressions, and whether you think it adds value. If not, we stop. Deal?"
Him: "What if the AI generates broken code?"
You: "Same risk as a junior. You review before it ships. AI just handles the legwork faster. Phase 1 succeeds only if zero regressions. If we break something, we know it's not ready."
Him: "I don't trust AI with our code."
You: "I don't trust AI either. That's why you're in control. It suggests, you decide. You gate it. You turn it off if it doesn't work."
Him: "OK, what do you need from me?"
You: "For Phase 1:
- Review 3 boilerplate scaffolds I generate (15 min each)
- Approve list of test gaps we should cover (30 min)
- Flag doc-code discrepancies we catch (30 min)
Total: ~3 hours over 4 weeks. In return, you save ~10 hours on routine work. You focus on the hard decisions."
Him: "What tools are we using?"
You: "Claude Code (explains the why, good for architecture) + Cline (task-based workflows) + GitHub Actions (automated checks). You can customize the rules."
Every Monday, 10 min sync with Malachi:
Week 1: "How did the boilerplate generation feel? Anything you'd change?"
Week 2: "Did the test gap analysis flag real gaps? False positives?"
Week 3: "Is doc-code sync checking finding real discrepancies?"
Week 4: "Before we wrap Phase 1, what's your gut feeling? Worth expanding?"
Document feedback → Feed into Phase 2 setup
| Activity | Before AI | After AI | Savings |
|---|---|---|---|
| Create FreeRTOS scaffold | 60 min | 30 min (gen) + 15 min (review) | 15 min |
| Analyze test coverage | 60 min | 10 min (gen) + 30 min (Malachi decide) | 20 min |
| Check doc-code sync | 45 min | 15 min (gen) + 20 min (Malachi decide) | 10 min |
| Total per month | ~900 min | ~600 min | ~300 min (5.5 hrs) |
If Phase 1 fails (regressions, poor quality, Malachi says "no"), here's the pivot:
When Phase 1 wins happen, share them:
"We auto-generated 5 FreeRTOS scaffolds using Claude Code.
Saved 2 hours on boilerplate. Zero quality issues.
Malachi approved the process. Rolling into Phase 2."
This builds team confidence. When Malachi sees tools working, he'll advocate for them.
Week 1: Task 1A setup (boilerplate generation)
Week 2: Task 1B setup (test coverage), 1A feedback to Malachi
Week 3: Task 1C setup (doc sync), 1B feedback to Malachi
Week 4: Phase 1 review with Malachi, decide on Phase 2
Weeks 5-8: Phase 2 (PR review assistant, refactoring proposals, field issues)
Weeks 9-12: Phase 3 (test scaffolding, dependency checker)
Good luck! 🚀