mirror of https://github.com/leonvanzyl/autocoder.git synced 2026-01-30 06:12:06 +00:00

Files

Auto 357083dbae feat: decouple regression testing agents from coding agents

Major refactoring of the parallel orchestrator to run regression testing
agents independently from coding agents. This improves system reliability
and provides better control over testing behavior.

Key changes:

Database & MCP Layer:
- Add testing_in_progress and last_tested_at columns to Feature model
- Add feature_claim_for_testing() for atomic test claim with retry
- Add feature_release_testing() to release claims after testing
- Refactor claim functions to iterative loops (no recursion)
- Add OperationalError retry handling for transient DB errors
- Reduce MAX_CLAIM_RETRIES from 10 to 5

Orchestrator:
- Decouple testing agent lifecycle from coding agents
- Add _maintain_testing_agents() for continuous testing maintenance
- Fix TOCTOU race in _spawn_testing_agent() - hold lock during spawn
- Add _cleanup_stale_testing_locks() with 30-min timeout
- Fix log ordering - start_session() before stale flag cleanup
- Add stale testing_in_progress cleanup on startup

Dead Code Removal:
- Remove count_testing_in_concurrency from entire stack (12+ files)
- Remove ineffective with_for_update() from features router

API & UI:
- Pass testing_agent_ratio via CLI to orchestrator
- Update testing prompt template to use new claim/release tools
- Rename UI label to "Regression Agents" with clearer description
- Add process_utils.py for cross-platform process tree management

Testing agents now:
- Run continuously as long as passing features exist
- Can re-test features multiple times to catch regressions
- Are controlled by fixed count (0-3) via testing_agent_ratio setting
- Have atomic claiming to prevent concurrent testing of same feature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-22 15:22:48 +02:00

10 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern:

Initializer Agent - First session reads an app spec and creates features in a SQLite database
Coding Agent - Subsequent sessions implement features one by one, marking them as passing

Commands

Quick Start (Recommended)

# Windows - launches CLI menu
start.bat

# macOS/Linux
./start.sh

# Launch Web UI (serves pre-built React app)
start_ui.bat      # Windows
./start_ui.sh     # macOS/Linux

Python Backend (Manual)

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the main CLI launcher
python start.py

# Run agent directly for a project (use absolute path or registered name)
python autonomous_agent_demo.py --project-dir C:/Projects/my-app
python autonomous_agent_demo.py --project-dir my-app  # if registered

# YOLO mode: rapid prototyping without browser testing
python autonomous_agent_demo.py --project-dir my-app --yolo

# Parallel mode: run multiple agents concurrently (1-5 agents)
python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3

YOLO Mode (Rapid Prototyping)

YOLO mode skips all testing for faster feature iteration:

# CLI
python autonomous_agent_demo.py --project-dir my-app --yolo

# UI: Toggle the lightning bolt button before starting the agent

What's different in YOLO mode:

No regression testing (skips feature_get_for_regression)
No Playwright MCP server (browser automation disabled)
Features marked passing after lint/type-check succeeds
Faster iteration for prototyping

What's the same:

Lint and type-check still run to verify code compiles
Feature MCP server for tracking progress
All other development tools available

When to use: Early prototyping when you want to quickly scaffold features without verification overhead. Switch back to standard mode for production-quality development.

React UI (in ui/ directory)

cd ui
npm install
npm run dev      # Development server (hot reload)
npm run build    # Production build (required for start_ui.bat)
npm run lint     # Run ESLint

Note: The start_ui.bat script serves the pre-built UI from ui/dist/. After making UI changes, run npm run build in the ui/ directory.

Architecture

Core Python Modules

start.py - CLI launcher with project creation/selection menu
autonomous_agent_demo.py - Entry point for running the agent
agent.py - Agent session loop using Claude Agent SDK
client.py - ClaudeSDKClient configuration with security hooks and MCP servers
security.py - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
prompts.py - Prompt template loading with project-specific fallback
progress.py - Progress tracking, database queries, webhook notifications
registry.py - Project registry for mapping names to paths (cross-platform)
parallel_orchestrator.py - Concurrent agent execution with dependency-aware scheduling
api/dependency_resolver.py - Cycle detection (Kahn's algorithm + DFS) and dependency validation

Project Registry

Projects can be stored in any directory. The registry maps project names to paths using SQLite:

All platforms: ~/.autocoder/registry.db

The registry uses:

SQLite database with SQLAlchemy ORM
POSIX path format (forward slashes) for cross-platform compatibility
SQLite's built-in transaction handling for concurrency safety

Server API (server/)

The FastAPI server provides REST endpoints for the UI:

server/routers/projects.py - Project CRUD with registry integration
server/routers/features.py - Feature management
server/routers/agent.py - Agent control (start/stop/pause/resume)
server/routers/filesystem.py - Filesystem browser API with security controls
server/routers/spec_creation.py - WebSocket for interactive spec creation

Feature Management

Features are stored in SQLite (features.db) via SQLAlchemy. The agent interacts with features through an MCP server:

mcp_server/feature_mcp.py - MCP server exposing feature management tools
api/database.py - SQLAlchemy models (Feature table with priority, category, name, description, steps, passes, dependencies)

MCP tools available to the agent:

feature_get_stats - Progress statistics
feature_get_next - Get highest-priority pending feature (respects dependencies)
feature_claim_next - Atomically claim next available feature (for parallel mode)
feature_get_for_regression - Random passing features for regression testing
feature_mark_passing - Mark feature complete
feature_skip - Move feature to end of queue
feature_create_bulk - Initialize all features (used by initializer)
feature_add_dependency - Add dependency between features (with cycle detection)
feature_remove_dependency - Remove a dependency

React UI (ui/)

Tech stack: React 18, TypeScript, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout)
src/App.tsx - Main app with project selection, kanban board, agent controls
src/hooks/useWebSocket.ts - Real-time updates via WebSocket (progress, agent status, logs, agent updates)
src/hooks/useProjects.ts - React Query hooks for API calls
src/lib/api.ts - REST API client
src/lib/types.ts - TypeScript type definitions

Key components:

AgentMissionControl.tsx - Dashboard showing active agents with mascots (Spark, Fizz, Octo, Hoot, Buzz)
DependencyGraph.tsx - Interactive node graph visualization with dagre layout
CelebrationOverlay.tsx - Confetti animation on feature completion
FolderBrowser.tsx - Server-side filesystem browser for project folder selection

Keyboard shortcuts (press ? for help):

D - Toggle debug panel
G - Toggle Kanban/Graph view
N - Add new feature
A - Toggle AI assistant
, - Open settings

Project Structure for Generated Apps

Projects can be stored in any directory (registered in ~/.autocoder/registry.db). Each project contains:

prompts/app_spec.txt - Application specification (XML format)
prompts/initializer_prompt.md - First session prompt
prompts/coding_prompt.md - Continuation session prompt
features.db - SQLite database with feature test cases
.agent.lock - Lock file to prevent multiple agent instances

Security Model

Defense-in-depth approach configured in client.py:

OS-level sandbox for bash commands
Filesystem restricted to project directory only
Bash commands validated against ALLOWED_COMMANDS in security.py

Claude Code Integration

.claude/commands/create-spec.md - /create-spec slash command for interactive spec creation
.claude/skills/frontend-design/SKILL.md - Skill for distinctive UI design
.claude/templates/ - Prompt templates copied to new projects

Key Patterns

Prompt Loading Fallback Chain

Project-specific: {project_dir}/prompts/{name}.md
Base template: .claude/templates/{name}.template.md

Agent Session Flow

Check if features.db has features (determines initializer vs coding agent)
Create ClaudeSDKClient with security settings
Send prompt and stream response
Auto-continue with 3-second delay between sessions

Real-time UI Updates

The UI receives updates via WebSocket (/ws/projects/{project_name}):

progress - Test pass counts (passing, in_progress, total)
agent_status - Running/paused/stopped/crashed
log - Agent output lines with optional featureId/agentIndex for attribution
feature_update - Feature status changes
agent_update - Multi-agent state updates (thinking/working/testing/success/error) with mascot names

Parallel Mode

When running with --parallel, the orchestrator:

Spawns multiple Claude agents as subprocesses (up to --max-concurrency)
Each agent claims features atomically via feature_claim_next
Features blocked by unmet dependencies are skipped
Browser contexts are isolated per agent using --isolated flag
AgentTracker parses output and emits agent_update messages for UI

Process Limits (Parallel Mode)

The orchestrator enforces strict bounds on concurrent processes:

MAX_PARALLEL_AGENTS = 5 - Maximum concurrent coding agents
MAX_TOTAL_AGENTS = 10 - Hard limit on total agents (coding + testing)
Testing agents are capped at max_concurrency (same as coding agents)

Expected process count during normal operation:

1 orchestrator process
Up to 5 coding agents
Up to 5 testing agents
Total: never exceeds 11 Python processes

Stress Test Verification:

# Windows - verify process bounds
# 1. Note baseline count
tasklist | findstr python | find /c /v ""

# 2. Start parallel agent (max concurrency)
python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5

# 3. During run - should NEVER exceed baseline + 11
tasklist | findstr python | find /c /v ""

# 4. After stop via UI - should return to baseline
tasklist | findstr python | find /c /v ""

# macOS/Linux - verify process bounds
# 1. Note baseline count
pgrep -c python

# 2. Start parallel agent
python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5

# 3. During run - should NEVER exceed baseline + 11
pgrep -c python

# 4. After stop - should return to baseline
pgrep -c python

Log Verification:

# Check spawn vs completion balance
grep "Started testing agent" orchestrator_debug.log | wc -l
grep "Testing agent.*completed\|failed" orchestrator_debug.log | wc -l

# Watch for cap enforcement messages
grep "at max testing agents\|At max total agents" orchestrator_debug.log

Design System

The UI uses a neobrutalism design with Tailwind CSS v4:

CSS variables defined in ui/src/styles/globals.css via @theme directive
Custom animations: animate-slide-in, animate-pulse-neo, animate-shimmer
Color tokens: --color-neo-pending (yellow), --color-neo-progress (cyan), --color-neo-done (green)

10 KiB Raw Blame History