← Back to Index

Code5 AI Integration Strategy: Learning from Jan 2025, Applying Feb 2026 Capabilities

Date: 2026-02-16
Context: Analysis of what went wrong in Jan 2025 vs what's possible now
Goal: Maximize AI leverage without repeating past mistakes or destroying developer agency


Executive Summary

The Jan 2025 AI experiment failed not because "AI is shit at coding" but because of workflow mistakes:

What's different now:

The path forward: Invert the loop. Human defines constraints → AI plans → Human approves → AI implements with iteration → Human validates. Developer understanding preserved. AI handles tedium.


Part 1: Why Jan 2025 Failed (Root Cause Analysis)

The Approach

From Jan 14, 2025 meeting, Quan proposed:

"Treat the documentation as your code, not the code. The code should be basically treated as opaque as a binary. You shouldn't have to look at it anymore."

The Failures

Problem What Happened Root Cause
Dynamic Memory AI used shared_ptr despite spec saying no dynamic allocation No embedded constraints in prompt context
Spec Gaps Implementation revealed missing details AI couldn't ask clarifying questions
Merge Hell Ferenc's manual fixes conflicted with AI code No iteration workflow - generate once
No Understanding Team couldn't debug the code "Opaque binary" philosophy
6 Month Cleanup Devs spent half a year fixing All of the above

The Missing Pieces

  1. No CLAUDE.md equivalent - AI had no persistent project constraints
  2. No Plan Mode - AI generated immediately without clarification
  3. No Iteration Loop - Generate → hope it works → manual fix
  4. No Testing in Loop - BDD tests were afterthought
  5. No Tool Integration - Couldn't run ESP-IDF build and iterate

Part 2: What's Different Now (Feb 2026 Capabilities)

Claude Code Capabilities

Source: Claude Code Overview, How Claude Code Works

Capability Jan 2025 Feb 2026
Agentic Iteration No - generate once Yes - iterate until task done
Self-Correction No Yes - reads errors, fixes
Multi-Agent No Yes - subagents in parallel
Tool Use No Yes - terminal, build, test
Project Context Lost every session CLAUDE.md persists constraints
Plan Mode No Yes - Research → Clarify → Plan → Build

GPT-5.3-Codex Capabilities

Source: OpenAI Codex 5.3, Codex Review 2026

Key advancement: First model that was instrumental in creating itself - debugged its own training, managed deployment, diagnosed test results.

"GPT-5.3-Codex can take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT-5.3-Codex while it's working, without losing context."

Cursor Agent Mode

Source: Cursor AI Guide 2026, Best AI Coding Agents 2026

"Agent mode is a fully autonomous coding agent that plans, writes, tests, and debugs without hand-holding."

Key: Multiple agents working in parallel - one refactoring, one fixing tests, one doing UI polish.

Embedded-Specific Tools

Source: Embedder, ESP32 Development 2026

"Memory is now allocated once, during initialization — not continuously at runtime."

This is EXACTLY what Code5 needs, and what AI got wrong in Jan 2025.


Part 3: The CLAUDE.md Solution

Source: Writing a Good CLAUDE.md, Claude.md Research

What Was Missing in Jan 2025

The AI had no persistent context about Code5's constraints. Every generation started fresh without knowing:

Proposed CLAUDE.md for Code5

# Code5 ZTagger Firmware - Project Context

## CRITICAL CONSTRAINTS (READ FIRST)

**Memory Management:**
- **NO dynamic memory allocation at runtime** (no shared_ptr, no new/malloc after init)
- ALL memory allocated once during initialization
- Static allocation patterns ONLY
- Heap fragmentation is FATAL for long-running embedded

**Real-Time Requirements:**
- Deterministic timing required
- No blocking calls in critical paths
- All long-running work in FreeRTOS tasks
- ISR handlers must be minimal

**Framework:**
- ESP-IDF ONLY (no Arduino abstractions)
- FreeRTOS task model
- ESP-NOW for wireless (reliability > speed)

## Architecture Overview

Game Manager (core)
├── Task Manager (FreeRTOS)
├── Peripheral Tasks
│ ├── Display (LVGL)
│ ├── Haptics (motor control)
│ ├── Sound (synthesizer)
│ └── Light Bar (RGB)
├── Game Logic (state machines)
└── Communication (ESP-NOW, OTA, WiFi)


## Before Generating Code

1. **ALWAYS** ask clarifying questions if spec is ambiguous
2. **ALWAYS** confirm memory allocation strategy before implementing
3. **ALWAYS** include unit test with implementation
4. **NEVER** use dynamic allocation without explicit approval

## Testing Requirements

- Unit tests run in simulator before hardware
- BDD tests validate behavior
- Test must pass before PR

## Code Style

- Snake_case for functions and variables
- UPPER_CASE for constants and macros
- Prefix with module name (e.g., game_manager_, display_)

Why This Prevents Jan 2025 Mistakes

Jan 2025 Failure CLAUDE.md Prevention
Dynamic memory "NO dynamic memory" in CRITICAL section
Spec gaps "ALWAYS ask clarifying questions"
No tests "ALWAYS include unit test"
Wrong framework "ESP-IDF ONLY"
No understanding Architecture overview provides context

Part 4: Impedance Matching to Team

Malachi Burke

Profile:

AI Integration Strategy:

Use AI For Why It Works for Malachi
CI/CD automation His own idea, pure tooling
Test generation Tests validate tests, low risk
Code review assist Second opinion, he has final say
Documentation Low risk, high leverage
Boilerplate Boring stuff, well-defined

Workflow That Preserves His Agency:

Malachi defines requirements
    ↓
AI generates PLAN (not code)
    ↓
Malachi reviews/modifies plan
    ↓
AI implements with iteration
    ↓
Malachi reviews final code
    ↓
Malachi decides merge

Key Principle: He stays architect. AI is his tool, not replacement.

Entry Point: His own CI/CD idea. Let him own it.


Ryan Summers

Profile:

AI Integration Strategy:

Use AI For Why It Works for Ryan
Full agent mode He already validates output
LVGL iteration Render → test → fix loop
Cross-referencing His existing multi-tool approach
Bug hunting Claude's strength he identified

Workflow:

Ryan defines feature/fix
    ↓
Agent implements + tests
    ↓
Agent runs tests, iterates
    ↓
Ryan reviews final result
    ↓
Malachi approves for merge

Expansion Opportunity: Let agents do full iteration loops on display rendering where output is visually verifiable.


UTF Labs Juniors (Shan, Basim, Faisal)

Profile:

AI Integration Strategy:

Use AI For Why It Works for Juniors
Test writing Learn by seeing tests written
Documentation Low risk practice
Boilerplate 10x velocity on boring stuff
Explanation AI explains code to them
Pair programming Agent as "senior dev pair"

Workflow That Builds Understanding:

Malachi gives explicit requirement
    ↓
Junior prompts AI with requirement
    ↓
AI explains plan, asks questions
    ↓
Junior validates understanding
    ↓
AI implements + explains
    ↓
Junior learns from explanation
    ↓
Junior validates with Malachi

Key Principle: Agents accelerate THEIR learning, not replace them.

Entry Point: Have one junior use Cursor for test writing on a bounded task.


Part 5: What TO Use AI For (High ROI, Low Risk)

Tier 1: Safe and High Value

Task Risk Level Benefit Owner
CI/CD Pipeline Very Low Automation, testable Malachi
Test Generation Low More coverage Everyone
Documentation Very Low Time savings Juniors
Boilerplate Code Low Velocity Everyone
Code Formatting None Consistency Automated

Tier 2: Medium Risk, High Value (With Review)

Task Risk Level Mitigation Owner
Bug Triage Medium Human validates fix Malachi/Ryan
Feature Implementation Medium Plan mode + review Ryan/Juniors
Refactoring Medium Test coverage first Ryan
Code Review Assist Medium Human has final say Malachi

Tier 3: Use Carefully

Task Risk Level When OK Owner
Game Logic High Only with full test coverage Team
State Machines High After Malachi validates design Malachi
ESP-NOW Protocol High Only for boilerplate Malachi

Part 6: What NOT to Use AI For (Repeat of Jan 2025)

Hard No

Task Why Not Who Owns
Architecture Decisions Requires deep system knowledge Malachi
Memory Strategy ESP32-specific, determinism Malachi
Unsupervised Generation The "opaque binary" mistake NO ONE
Critical Path Without Review Too risky Malachi

The "Opaque Binary" Test

Before any AI integration, ask:

"Will the developer understand this code well enough to debug it at 2am when production is down?"

If no → Don't use AI for it, or restructure approach.


Part 7: Workflow Transformation

Jan 2025 Workflow (Failed)

Spec (Markdown)
    ↓
AI Generates Code
    ↓
Hope It Works
    ↓
Manual Fix When Broken
    ↓
6 Months Cleanup

Feb 2026 Workflow (Proposed)

CLAUDE.md (Constraints)
    ↓
Human Defines Task
    ↓
AI Outputs PLAN
    ↓
Human Reviews Plan
    ↓
AI Implements + Tests
    ↓
AI Runs Tests, Iterates
    ↓
Human Reviews Final
    ↓
Human Decides Merge

Key Differences

Aspect Jan 2025 Feb 2026
Constraints None CLAUDE.md
Planning None Plan mode first
Iteration Manual Autonomous
Testing Afterthought In the loop
Understanding Sacrificed Preserved
Human Role Fix AI mess Validate AI work

Part 8: Implementation Timeline

Week 1-2: Foundation

Week 3-4: First Wins

Month 2: Expansion

Month 3+: Mature Integration


Part 9: Success Metrics

Quality Metrics (Must Not Degrade)

Velocity Metrics (Should Improve)

Understanding Metrics (Must Preserve)


Part 10: The Embedded Reality (Hardware as Moat)

Why Code5 is Different from Pure Software

The ESP32/embedded context creates constraints that actually PROTECT against AI overreach:

Pure Software Embedded (Code5)
Deploy instantly Flash to hardware
Test in CI Test on physical device
Memory is cheap Memory is precious
Latency tolerant Real-time critical
Can hot-patch Brick risk is real
Users can refresh Devices in field

The Physicality Moat

1. Hardware Validation Cannot Be Faked

AI can generate code that compiles. It cannot:

This is why Malachi's acceptance testing vision is perfect:

"You would use a live feed camera to hear and watch the output on a ztagger device"

Physical validation = human judgment required. AI assists, humans validate.

2. Determinism Requirements

From meetings, Malachi emphasized:

"Mission-critical determinism requirements"

AI-generated code in Jan 2025 used shared_ptr → non-deterministic allocation → embedded disaster.

Rule: Any AI-generated code for Code5 must pass determinism review. This is a HUMAN judgment.

3. Field Deployment Reality

Pure software: Bad code → user refreshes → fixed
Embedded: Bad OTA → bricked devices → customer disaster

From meetings (Feb 2026):

"OTA fails with 24+ devices"

This kind of bug requires physical testing with actual devices. AI can help debug logs, but humans must validate on hardware.

4. The 24-Tagger Problem

ESP-NOW at scale is not something AI has training data for. From Malachi's interview:

"Most ESP-NOW demos are only good for demos... if you've got 20 taggers running around..."

Domain expertise > AI capability for edge cases like this.

Where Embedded Constraints HELP AI Integration

The constraints actually make AI safer to use:

Constraint Why It Helps
Must compile for ESP32 Immediate feedback loop
Must run on device Physical validation gate
Memory limits Forces AI to write efficient code
Real-time requirements Can measure timing, catch violations
Test harness exists 13-15 devices can validate

Embedded-Specific AI Use Cases

High Value, Hardware-Safe:

Use Case Why Safe for Embedded
Unit tests Run in simulator first
Build scripts Tooling, not runtime
Documentation No hardware risk
Log analysis Read-only, diagnostic
CI/CD Automation of existing flows

Requires Physical Validation:

Use Case Validation Method
Display code Must see on actual screen
Haptic patterns Must feel on device
Game logic Must play the game
Multi-device Must test with 20+ taggers
OTA Must flash and verify

The Embedded AI Workflow

AI generates code
    ↓
Compiles for ESP32? (automated gate)
    ↓
Unit tests pass in simulator? (automated gate)
    ↓
Manual flash to device
    ↓
HUMAN validates on hardware
    ↓
Multi-device test
    ↓
HUMAN approves for production

Key: The hardware gates FORCE human involvement. This is a feature, not a bug.

Why This Is Your Moat

Pure software companies face a risk: AI could theoretically replace their entire stack.

Code5/ZTAG has physical reality:

AI cannot replace the human judgment of "does this feel like a good game?"

This means:

  1. AI can accelerate the tedious parts
  2. Humans remain essential for validation
  3. Developer agency is structurally protected
  4. Malachi's expertise is irreplaceable

Implication for Strategy

Don't fight the embedded constraints - use them:

The physicality of Code5 means AI integration is inherently bounded. You can't over-adopt AI because hardware validation will catch mistakes.


Part 11: The Key Insight

The Jan 2025 failure taught us that AI without constraints and iteration creates debt, not value.

The Feb 2026 opportunity is that AI WITH constraints and iteration can create leverage without destroying understanding.

The difference:

The Formula

AI Value = (Capability × Constraints × Iteration) / Supervision Required

Jan 2025: Low capability × No constraints × No iteration / Zero supervision = Negative value
Feb 2026: High capability × CLAUDE.md × Agentic loops / Human validation = High value

Appendix: Sources

AI Capabilities

CLAUDE.md Best Practices

Embedded AI

OpenClaw Interview Insights