Files
autocoder/CLAUDE.md
Auto 357083dbae feat: decouple regression testing agents from coding agents
Major refactoring of the parallel orchestrator to run regression testing
agents independently from coding agents. This improves system reliability
and provides better control over testing behavior.

Key changes:

Database & MCP Layer:
- Add testing_in_progress and last_tested_at columns to Feature model
- Add feature_claim_for_testing() for atomic test claim with retry
- Add feature_release_testing() to release claims after testing
- Refactor claim functions to iterative loops (no recursion)
- Add OperationalError retry handling for transient DB errors
- Reduce MAX_CLAIM_RETRIES from 10 to 5

Orchestrator:
- Decouple testing agent lifecycle from coding agents
- Add _maintain_testing_agents() for continuous testing maintenance
- Fix TOCTOU race in _spawn_testing_agent() - hold lock during spawn
- Add _cleanup_stale_testing_locks() with 30-min timeout
- Fix log ordering - start_session() before stale flag cleanup
- Add stale testing_in_progress cleanup on startup

Dead Code Removal:
- Remove count_testing_in_concurrency from entire stack (12+ files)
- Remove ineffective with_for_update() from features router

API & UI:
- Pass testing_agent_ratio via CLI to orchestrator
- Update testing prompt template to use new claim/release tools
- Rename UI label to "Regression Agents" with clearer description
- Add process_utils.py for cross-platform process tree management

Testing agents now:
- Run continuously as long as passing features exist
- Can re-test features multiple times to catch regressions
- Are controlled by fixed count (0-3) via testing_agent_ratio setting
- Have atomic claiming to prevent concurrent testing of same feature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:22:48 +02:00

10 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern:

  1. Initializer Agent - First session reads an app spec and creates features in a SQLite database
  2. Coding Agent - Subsequent sessions implement features one by one, marking them as passing

Commands

# Windows - launches CLI menu
start.bat

# macOS/Linux
./start.sh

# Launch Web UI (serves pre-built React app)
start_ui.bat      # Windows
./start_ui.sh     # macOS/Linux

Python Backend (Manual)

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the main CLI launcher
python start.py

# Run agent directly for a project (use absolute path or registered name)
python autonomous_agent_demo.py --project-dir C:/Projects/my-app
python autonomous_agent_demo.py --project-dir my-app  # if registered

# YOLO mode: rapid prototyping without browser testing
python autonomous_agent_demo.py --project-dir my-app --yolo

# Parallel mode: run multiple agents concurrently (1-5 agents)
python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3

YOLO Mode (Rapid Prototyping)

YOLO mode skips all testing for faster feature iteration:

# CLI
python autonomous_agent_demo.py --project-dir my-app --yolo

# UI: Toggle the lightning bolt button before starting the agent

What's different in YOLO mode:

  • No regression testing (skips feature_get_for_regression)
  • No Playwright MCP server (browser automation disabled)
  • Features marked passing after lint/type-check succeeds
  • Faster iteration for prototyping

What's the same:

  • Lint and type-check still run to verify code compiles
  • Feature MCP server for tracking progress
  • All other development tools available

When to use: Early prototyping when you want to quickly scaffold features without verification overhead. Switch back to standard mode for production-quality development.

React UI (in ui/ directory)

cd ui
npm install
npm run dev      # Development server (hot reload)
npm run build    # Production build (required for start_ui.bat)
npm run lint     # Run ESLint

Note: The start_ui.bat script serves the pre-built UI from ui/dist/. After making UI changes, run npm run build in the ui/ directory.

Architecture

Core Python Modules

  • start.py - CLI launcher with project creation/selection menu
  • autonomous_agent_demo.py - Entry point for running the agent
  • agent.py - Agent session loop using Claude Agent SDK
  • client.py - ClaudeSDKClient configuration with security hooks and MCP servers
  • security.py - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
  • prompts.py - Prompt template loading with project-specific fallback
  • progress.py - Progress tracking, database queries, webhook notifications
  • registry.py - Project registry for mapping names to paths (cross-platform)
  • parallel_orchestrator.py - Concurrent agent execution with dependency-aware scheduling
  • api/dependency_resolver.py - Cycle detection (Kahn's algorithm + DFS) and dependency validation

Project Registry

Projects can be stored in any directory. The registry maps project names to paths using SQLite:

  • All platforms: ~/.autocoder/registry.db

The registry uses:

  • SQLite database with SQLAlchemy ORM
  • POSIX path format (forward slashes) for cross-platform compatibility
  • SQLite's built-in transaction handling for concurrency safety

Server API (server/)

The FastAPI server provides REST endpoints for the UI:

  • server/routers/projects.py - Project CRUD with registry integration
  • server/routers/features.py - Feature management
  • server/routers/agent.py - Agent control (start/stop/pause/resume)
  • server/routers/filesystem.py - Filesystem browser API with security controls
  • server/routers/spec_creation.py - WebSocket for interactive spec creation

Feature Management

Features are stored in SQLite (features.db) via SQLAlchemy. The agent interacts with features through an MCP server:

  • mcp_server/feature_mcp.py - MCP server exposing feature management tools
  • api/database.py - SQLAlchemy models (Feature table with priority, category, name, description, steps, passes, dependencies)

MCP tools available to the agent:

  • feature_get_stats - Progress statistics
  • feature_get_next - Get highest-priority pending feature (respects dependencies)
  • feature_claim_next - Atomically claim next available feature (for parallel mode)
  • feature_get_for_regression - Random passing features for regression testing
  • feature_mark_passing - Mark feature complete
  • feature_skip - Move feature to end of queue
  • feature_create_bulk - Initialize all features (used by initializer)
  • feature_add_dependency - Add dependency between features (with cycle detection)
  • feature_remove_dependency - Remove a dependency

React UI (ui/)

  • Tech stack: React 18, TypeScript, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout)
  • src/App.tsx - Main app with project selection, kanban board, agent controls
  • src/hooks/useWebSocket.ts - Real-time updates via WebSocket (progress, agent status, logs, agent updates)
  • src/hooks/useProjects.ts - React Query hooks for API calls
  • src/lib/api.ts - REST API client
  • src/lib/types.ts - TypeScript type definitions

Key components:

  • AgentMissionControl.tsx - Dashboard showing active agents with mascots (Spark, Fizz, Octo, Hoot, Buzz)
  • DependencyGraph.tsx - Interactive node graph visualization with dagre layout
  • CelebrationOverlay.tsx - Confetti animation on feature completion
  • FolderBrowser.tsx - Server-side filesystem browser for project folder selection

Keyboard shortcuts (press ? for help):

  • D - Toggle debug panel
  • G - Toggle Kanban/Graph view
  • N - Add new feature
  • A - Toggle AI assistant
  • , - Open settings

Project Structure for Generated Apps

Projects can be stored in any directory (registered in ~/.autocoder/registry.db). Each project contains:

  • prompts/app_spec.txt - Application specification (XML format)
  • prompts/initializer_prompt.md - First session prompt
  • prompts/coding_prompt.md - Continuation session prompt
  • features.db - SQLite database with feature test cases
  • .agent.lock - Lock file to prevent multiple agent instances

Security Model

Defense-in-depth approach configured in client.py:

  1. OS-level sandbox for bash commands
  2. Filesystem restricted to project directory only
  3. Bash commands validated against ALLOWED_COMMANDS in security.py

Claude Code Integration

  • .claude/commands/create-spec.md - /create-spec slash command for interactive spec creation
  • .claude/skills/frontend-design/SKILL.md - Skill for distinctive UI design
  • .claude/templates/ - Prompt templates copied to new projects

Key Patterns

Prompt Loading Fallback Chain

  1. Project-specific: {project_dir}/prompts/{name}.md
  2. Base template: .claude/templates/{name}.template.md

Agent Session Flow

  1. Check if features.db has features (determines initializer vs coding agent)
  2. Create ClaudeSDKClient with security settings
  3. Send prompt and stream response
  4. Auto-continue with 3-second delay between sessions

Real-time UI Updates

The UI receives updates via WebSocket (/ws/projects/{project_name}):

  • progress - Test pass counts (passing, in_progress, total)
  • agent_status - Running/paused/stopped/crashed
  • log - Agent output lines with optional featureId/agentIndex for attribution
  • feature_update - Feature status changes
  • agent_update - Multi-agent state updates (thinking/working/testing/success/error) with mascot names

Parallel Mode

When running with --parallel, the orchestrator:

  1. Spawns multiple Claude agents as subprocesses (up to --max-concurrency)
  2. Each agent claims features atomically via feature_claim_next
  3. Features blocked by unmet dependencies are skipped
  4. Browser contexts are isolated per agent using --isolated flag
  5. AgentTracker parses output and emits agent_update messages for UI

Process Limits (Parallel Mode)

The orchestrator enforces strict bounds on concurrent processes:

  • MAX_PARALLEL_AGENTS = 5 - Maximum concurrent coding agents
  • MAX_TOTAL_AGENTS = 10 - Hard limit on total agents (coding + testing)
  • Testing agents are capped at max_concurrency (same as coding agents)

Expected process count during normal operation:

  • 1 orchestrator process
  • Up to 5 coding agents
  • Up to 5 testing agents
  • Total: never exceeds 11 Python processes

Stress Test Verification:

# Windows - verify process bounds
# 1. Note baseline count
tasklist | findstr python | find /c /v ""

# 2. Start parallel agent (max concurrency)
python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5

# 3. During run - should NEVER exceed baseline + 11
tasklist | findstr python | find /c /v ""

# 4. After stop via UI - should return to baseline
tasklist | findstr python | find /c /v ""
# macOS/Linux - verify process bounds
# 1. Note baseline count
pgrep -c python

# 2. Start parallel agent
python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5

# 3. During run - should NEVER exceed baseline + 11
pgrep -c python

# 4. After stop - should return to baseline
pgrep -c python

Log Verification:

# Check spawn vs completion balance
grep "Started testing agent" orchestrator_debug.log | wc -l
grep "Testing agent.*completed\|failed" orchestrator_debug.log | wc -l

# Watch for cap enforcement messages
grep "at max testing agents\|At max total agents" orchestrator_debug.log

Design System

The UI uses a neobrutalism design with Tailwind CSS v4:

  • CSS variables defined in ui/src/styles/globals.css via @theme directive
  • Custom animations: animate-slide-in, animate-pulse-neo, animate-shimmer
  • Color tokens: --color-neo-pending (yellow), --color-neo-progress (cyan), --color-neo-done (green)