← Back to Index

Code5 AI Integration Strategy: Learning from Jan 2025, Applying Feb 2026 Capabilities

Date: 2026-02-16
Context: Analysis of what went wrong in Jan 2025 vs what's possible now
Goal: Maximize AI leverage without repeating past mistakes or destroying developer agency

Executive Summary

The Jan 2025 AI experiment failed not because "AI is shit at coding" but because of workflow mistakes:

No embedded constraints in context
No iteration loop (generate once, pray)
"Opaque binary" philosophy sacrificed understanding
No testing in the generation loop

What's different now:

Agentic iteration with self-correction
CLAUDE.md for project constraints
Plan mode prevents "generate and pray"
Multi-agent coordination with quality gates
Tool integration (build, test, fix loop)

The path forward: Invert the loop. Human defines constraints → AI plans → Human approves → AI implements with iteration → Human validates. Developer understanding preserved. AI handles tedium.

Part 1: Why Jan 2025 Failed (Root Cause Analysis)

The Approach

From Jan 14, 2025 meeting, Quan proposed:

"Treat the documentation as your code, not the code. The code should be basically treated as opaque as a binary. You shouldn't have to look at it anymore."

The Failures

Problem	What Happened	Root Cause
Dynamic Memory	AI used shared_ptr despite spec saying no dynamic allocation	No embedded constraints in prompt context
Spec Gaps	Implementation revealed missing details	AI couldn't ask clarifying questions
Merge Hell	Ferenc's manual fixes conflicted with AI code	No iteration workflow - generate once
No Understanding	Team couldn't debug the code	"Opaque binary" philosophy
6 Month Cleanup	Devs spent half a year fixing	All of the above

The Missing Pieces

No CLAUDE.md equivalent - AI had no persistent project constraints
No Plan Mode - AI generated immediately without clarification
No Iteration Loop - Generate → hope it works → manual fix
No Testing in Loop - BDD tests were afterthought
No Tool Integration - Couldn't run ESP-IDF build and iterate

Part 2: What's Different Now (Feb 2026 Capabilities)

Claude Code Capabilities

Source: Claude Code Overview, How Claude Code Works

Capability	Jan 2025	Feb 2026
Agentic Iteration	No - generate once	Yes - iterate until task done
Self-Correction	No	Yes - reads errors, fixes
Multi-Agent	No	Yes - subagents in parallel
Tool Use	No	Yes - terminal, build, test
Project Context	Lost every session	CLAUDE.md persists constraints
Plan Mode	No	Yes - Research → Clarify → Plan → Build

GPT-5.3-Codex Capabilities

Source: OpenAI Codex 5.3, Codex Review 2026

Key advancement: First model that was instrumental in creating itself - debugged its own training, managed deployment, diagnosed test results.

"GPT-5.3-Codex can take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT-5.3-Codex while it's working, without losing context."

Cursor Agent Mode

Source: Cursor AI Guide 2026, Best AI Coding Agents 2026

"Agent mode is a fully autonomous coding agent that plans, writes, tests, and debugs without hand-holding."

Key: Multiple agents working in parallel - one refactoring, one fixing tests, one doing UI polish.

Embedded-Specific Tools

Source: Embedder, ESP32 Development 2026

"Memory is now allocated once, during initialization — not continuously at runtime."

This is EXACTLY what Code5 needs, and what AI got wrong in Jan 2025.

Part 3: The CLAUDE.md Solution

Source: Writing a Good CLAUDE.md, Claude.md Research

What Was Missing in Jan 2025

The AI had no persistent context about Code5's constraints. Every generation started fresh without knowing:

No dynamic memory
FreeRTOS patterns
ESP32 limitations
Determinism requirements

Proposed CLAUDE.md for Code5

# Code5 ZTagger Firmware - Project Context

## CRITICAL CONSTRAINTS (READ FIRST)

**Memory Management:**
- **NO dynamic memory allocation at runtime** (no shared_ptr, no new/malloc after init)
- ALL memory allocated once during initialization
- Static allocation patterns ONLY
- Heap fragmentation is FATAL for long-running embedded

**Real-Time Requirements:**
- Deterministic timing required
- No blocking calls in critical paths
- All long-running work in FreeRTOS tasks
- ISR handlers must be minimal

**Framework:**
- ESP-IDF ONLY (no Arduino abstractions)
- FreeRTOS task model
- ESP-NOW for wireless (reliability > speed)

## Architecture Overview

Game Manager (core)
├── Task Manager (FreeRTOS)
├── Peripheral Tasks
│ ├── Display (LVGL)
│ ├── Haptics (motor control)
│ ├── Sound (synthesizer)
│ └── Light Bar (RGB)
├── Game Logic (state machines)
└── Communication (ESP-NOW, OTA, WiFi)


## Before Generating Code

1. **ALWAYS** ask clarifying questions if spec is ambiguous
2. **ALWAYS** confirm memory allocation strategy before implementing
3. **ALWAYS** include unit test with implementation
4. **NEVER** use dynamic allocation without explicit approval

## Testing Requirements

- Unit tests run in simulator before hardware
- BDD tests validate behavior
- Test must pass before PR

## Code Style

- Snake_case for functions and variables
- UPPER_CASE for constants and macros
- Prefix with module name (e.g., game_manager_, display_)

Why This Prevents Jan 2025 Mistakes

Jan 2025 Failure	CLAUDE.md Prevention
Dynamic memory	"NO dynamic memory" in CRITICAL section
Spec gaps	"ALWAYS ask clarifying questions"
No tests	"ALWAYS include unit test"
Wrong framework	"ESP-IDF ONLY"
No understanding	Architecture overview provides context

Part 4: Impedance Matching to Team

Malachi Burke

Profile:

Burned by Jan 2025 AI mess
Values understanding, determinism, code integrity
Quote: "AI can slop in some things" - validates everything
Already designed CI/CD and acceptance testing use cases

AI Integration Strategy:

Use AI For	Why It Works for Malachi
CI/CD automation	His own idea, pure tooling
Test generation	Tests validate tests, low risk
Code review assist	Second opinion, he has final say
Documentation	Low risk, high leverage
Boilerplate	Boring stuff, well-defined

Workflow That Preserves His Agency:

Malachi defines requirements
    ↓
AI generates PLAN (not code)
    ↓
Malachi reviews/modifies plan
    ↓
AI implements with iteration
    ↓
Malachi reviews final code
    ↓
Malachi decides merge

Key Principle: He stays architect. AI is his tool, not replacement.

Entry Point: His own CI/CD idea. Let him own it.

Ryan Summers

Profile:

Already succeeding with AI (Claude + ChatGPT)
Quote: "Claude's really good at finding little bugs and stuff, and doing testing"
Working on LVGL display system

AI Integration Strategy:

Use AI For	Why It Works for Ryan
Full agent mode	He already validates output
LVGL iteration	Render → test → fix loop
Cross-referencing	His existing multi-tool approach
Bug hunting	Claude's strength he identified

Workflow:

Ryan defines feature/fix
    ↓
Agent implements + tests
    ↓
Agent runs tests, iterates
    ↓
Ryan reviews final result
    ↓
Malachi approves for merge

Expansion Opportunity: Let agents do full iteration loops on display rendering where output is visually verifiable.

UTF Labs Juniors (Shan, Basim, Faisal)

Profile:

Execute well with explicit requirements
Respond to clear directives
No visible AI tool usage yet
More receptive than Malachi

AI Integration Strategy:

Use AI For	Why It Works for Juniors
Test writing	Learn by seeing tests written
Documentation	Low risk practice
Boilerplate	10x velocity on boring stuff
Explanation	AI explains code to them
Pair programming	Agent as "senior dev pair"

Workflow That Builds Understanding:

Malachi gives explicit requirement
    ↓
Junior prompts AI with requirement
    ↓
AI explains plan, asks questions
    ↓
Junior validates understanding
    ↓
AI implements + explains
    ↓
Junior learns from explanation
    ↓
Junior validates with Malachi

Key Principle: Agents accelerate THEIR learning, not replace them.

Entry Point: Have one junior use Cursor for test writing on a bounded task.

Part 5: What TO Use AI For (High ROI, Low Risk)

Tier 1: Safe and High Value

Task	Risk Level	Benefit	Owner
CI/CD Pipeline	Very Low	Automation, testable	Malachi
Test Generation	Low	More coverage	Everyone
Documentation	Very Low	Time savings	Juniors
Boilerplate Code	Low	Velocity	Everyone
Code Formatting	None	Consistency	Automated

Tier 2: Medium Risk, High Value (With Review)

Task	Risk Level	Mitigation	Owner
Bug Triage	Medium	Human validates fix	Malachi/Ryan
Feature Implementation	Medium	Plan mode + review	Ryan/Juniors
Refactoring	Medium	Test coverage first	Ryan
Code Review Assist	Medium	Human has final say	Malachi

Tier 3: Use Carefully

Task	Risk Level	When OK	Owner
Game Logic	High	Only with full test coverage	Team
State Machines	High	After Malachi validates design	Malachi
ESP-NOW Protocol	High	Only for boilerplate	Malachi

Part 6: What NOT to Use AI For (Repeat of Jan 2025)

Hard No

Task	Why Not	Who Owns
Architecture Decisions	Requires deep system knowledge	Malachi
Memory Strategy	ESP32-specific, determinism	Malachi
Unsupervised Generation	The "opaque binary" mistake	NO ONE
Critical Path Without Review	Too risky	Malachi

The "Opaque Binary" Test

Before any AI integration, ask:

"Will the developer understand this code well enough to debug it at 2am when production is down?"

If no → Don't use AI for it, or restructure approach.

Part 7: Workflow Transformation

Jan 2025 Workflow (Failed)

Spec (Markdown)
    ↓
AI Generates Code
    ↓
Hope It Works
    ↓
Manual Fix When Broken
    ↓
6 Months Cleanup

Feb 2026 Workflow (Proposed)

CLAUDE.md (Constraints)
    ↓
Human Defines Task
    ↓
AI Outputs PLAN
    ↓
Human Reviews Plan
    ↓
AI Implements + Tests
    ↓
AI Runs Tests, Iterates
    ↓
Human Reviews Final
    ↓
Human Decides Merge

Key Differences

Aspect	Jan 2025	Feb 2026
Constraints	None	CLAUDE.md
Planning	None	Plan mode first
Iteration	Manual	Autonomous
Testing	Afterthought	In the loop
Understanding	Sacrificed	Preserved
Human Role	Fix AI mess	Validate AI work

Part 8: Implementation Timeline

Week 1-2: Foundation

Create Code5 CLAUDE.md with embedded constraints
Set up plan mode as default workflow
Ryan documents his existing AI wins
Malachi scopes CI/CD automation project

Week 3-4: First Wins

Deploy first CI/CD automation (Malachi owns)
One junior uses Cursor for test writing
Ryan uses full agent mode on bounded LVGL task
Review and document results

Month 2: Expansion

Juniors using AI for documentation + boilerplate
Malachi's acceptance testing prototype
Test generation becomes standard practice
Measure velocity changes

Month 3+: Mature Integration

AI-assisted code review as standard
Multiple agents for parallelizable tasks
Team comfortable with AI as tool
Malachi selectively recommends AI use cases

Part 9: Success Metrics

Quality Metrics (Must Not Degrade)

Build success rate
Test coverage
Production bugs
Code review rejection rate

Velocity Metrics (Should Improve)

Time to implement features
Test writing time
Documentation coverage
CI/CD setup time

Understanding Metrics (Must Preserve)

Can developer explain code?
Can developer debug without AI?
Is code maintainable by humans?

Part 10: The Embedded Reality (Hardware as Moat)

Why Code5 is Different from Pure Software

The ESP32/embedded context creates constraints that actually PROTECT against AI overreach:

Pure Software	Embedded (Code5)
Deploy instantly	Flash to hardware
Test in CI	Test on physical device
Memory is cheap	Memory is precious
Latency tolerant	Real-time critical
Can hot-patch	Brick risk is real
Users can refresh	Devices in field

The Physicality Moat

1. Hardware Validation Cannot Be Faked

AI can generate code that compiles. It cannot:

Feel if the haptic feedback is right
See if the display looks correct on actual hardware
Verify ESP-NOW works with 24 taggers in a gym
Test battery behavior under load

This is why Malachi's acceptance testing vision is perfect:

"You would use a live feed camera to hear and watch the output on a ztagger device"

Physical validation = human judgment required. AI assists, humans validate.

2. Determinism Requirements

From meetings, Malachi emphasized:

"Mission-critical determinism requirements"

AI-generated code in Jan 2025 used shared_ptr → non-deterministic allocation → embedded disaster.

Rule: Any AI-generated code for Code5 must pass determinism review. This is a HUMAN judgment.

3. Field Deployment Reality

Pure software: Bad code → user refreshes → fixed
Embedded: Bad OTA → bricked devices → customer disaster

From meetings (Feb 2026):

"OTA fails with 24+ devices"

This kind of bug requires physical testing with actual devices. AI can help debug logs, but humans must validate on hardware.

4. The 24-Tagger Problem

ESP-NOW at scale is not something AI has training data for. From Malachi's interview:

"Most ESP-NOW demos are only good for demos... if you've got 20 taggers running around..."

Domain expertise > AI capability for edge cases like this.

Where Embedded Constraints HELP AI Integration

The constraints actually make AI safer to use:

Constraint	Why It Helps
Must compile for ESP32	Immediate feedback loop
Must run on device	Physical validation gate
Memory limits	Forces AI to write efficient code
Real-time requirements	Can measure timing, catch violations
Test harness exists	13-15 devices can validate

Embedded-Specific AI Use Cases

High Value, Hardware-Safe:

Use Case	Why Safe for Embedded
Unit tests	Run in simulator first
Build scripts	Tooling, not runtime
Documentation	No hardware risk
Log analysis	Read-only, diagnostic
CI/CD	Automation of existing flows

Requires Physical Validation:

Use Case	Validation Method
Display code	Must see on actual screen
Haptic patterns	Must feel on device
Game logic	Must play the game
Multi-device	Must test with 20+ taggers
OTA	Must flash and verify

The Embedded AI Workflow

AI generates code
    ↓
Compiles for ESP32? (automated gate)
    ↓
Unit tests pass in simulator? (automated gate)
    ↓
Manual flash to device
    ↓
HUMAN validates on hardware
    ↓
Multi-device test
    ↓
HUMAN approves for production

Key: The hardware gates FORCE human involvement. This is a feature, not a bug.

Why This Is Your Moat

Pure software companies face a risk: AI could theoretically replace their entire stack.

Code5/ZTAG has physical reality:

24 taggers in a gymnasium
Kids running around with devices
Battery life under play conditions
Haptic feedback that feels right
Display readable in various lighting

AI cannot replace the human judgment of "does this feel like a good game?"

This means:

AI can accelerate the tedious parts
Humans remain essential for validation
Developer agency is structurally protected
Malachi's expertise is irreplaceable

Implication for Strategy

Don't fight the embedded constraints - use them:

Let AI handle code generation for testable modules
Let hardware validation be the quality gate
Let physical testing reveal issues AI missed
Let domain experts (Malachi) own hardware decisions

The physicality of Code5 means AI integration is inherently bounded. You can't over-adopt AI because hardware validation will catch mistakes.

Part 11: The Key Insight

The Jan 2025 failure taught us that AI without constraints and iteration creates debt, not value.

The Feb 2026 opportunity is that AI WITH constraints and iteration can create leverage without destroying understanding.

The difference:

Jan 2025: AI replaces developer thinking → disaster
Feb 2026: AI amplifies developer thinking → leverage

The Formula

AI Value = (Capability × Constraints × Iteration) / Supervision Required

Jan 2025: Low capability × No constraints × No iteration / Zero supervision = Negative value
Feb 2026: High capability × CLAUDE.md × Agentic loops / Human validation = High value

Appendix: Sources

AI Capabilities

CLAUDE.md Best Practices

Embedded AI

OpenClaw Interview Insights

OpenClaw/Lex Fridman Interview - Peter Steinberger's agentic engineering philosophy

Code5 AI Integration Strategy: Learning from Jan 2025, Applying Feb 2026 Capabilities

Executive Summary

Part 1: Why Jan 2025 Failed (Root Cause Analysis)

The Approach

The Failures

The Missing Pieces

Part 2: What's Different Now (Feb 2026 Capabilities)

Claude Code Capabilities

GPT-5.3-Codex Capabilities

Cursor Agent Mode

Embedded-Specific Tools

Part 3: The CLAUDE.md Solution

What Was Missing in Jan 2025

Proposed CLAUDE.md for Code5

Why This Prevents Jan 2025 Mistakes

Part 4: Impedance Matching to Team

Malachi Burke

Ryan Summers

UTF Labs Juniors (Shan, Basim, Faisal)

Part 5: What TO Use AI For (High ROI, Low Risk)

Tier 1: Safe and High Value

Tier 2: Medium Risk, High Value (With Review)

Tier 3: Use Carefully

Part 6: What NOT to Use AI For (Repeat of Jan 2025)

Hard No

The "Opaque Binary" Test

Part 7: Workflow Transformation

Jan 2025 Workflow (Failed)

Feb 2026 Workflow (Proposed)

Key Differences

Part 8: Implementation Timeline

Week 1-2: Foundation

Week 3-4: First Wins

Month 2: Expansion

Month 3+: Mature Integration

Part 9: Success Metrics

Quality Metrics (Must Not Degrade)

Velocity Metrics (Should Improve)

Understanding Metrics (Must Preserve)

Part 10: The Embedded Reality (Hardware as Moat)

Why Code5 is Different from Pure Software

The Physicality Moat

Where Embedded Constraints HELP AI Integration

Embedded-Specific AI Use Cases

The Embedded AI Workflow

Why This Is Your Moat

Implication for Strategy

Part 11: The Key Insight

The Formula

Appendix: Sources

AI Capabilities

CLAUDE.md Best Practices

Embedded AI

OpenClaw Interview Insights

Share Document Externally