Files
autocoder/CLAUDE.md
2026-02-01 17:34:07 +02:00

20 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Prerequisites

  • Python 3.11+
  • Node.js 20+ (for UI development)
  • Claude Code CLI

Project Overview

This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern:

  1. Initializer Agent - First session reads an app spec and creates features in a SQLite database
  2. Coding Agent - Subsequent sessions implement features one by one, marking them as passing

Commands

# Windows - launches CLI menu
start.bat

# macOS/Linux
./start.sh

# Launch Web UI (serves pre-built React app)
start_ui.bat      # Windows
./start_ui.sh     # macOS/Linux

Python Backend (Manual)

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the main CLI launcher
python start.py

# Run agent directly for a project (use absolute path or registered name)
python autonomous_agent_demo.py --project-dir C:/Projects/my-app
python autonomous_agent_demo.py --project-dir my-app  # if registered

# YOLO mode: rapid prototyping without browser testing
python autonomous_agent_demo.py --project-dir my-app --yolo

# Parallel mode: run multiple agents concurrently (1-5 agents)
python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3

# Batch mode: implement multiple features per agent session (1-3)
python autonomous_agent_demo.py --project-dir my-app --batch-size 3

# Batch specific features by ID
python autonomous_agent_demo.py --project-dir my-app --batch-features 1,2,3

YOLO Mode (Rapid Prototyping)

YOLO mode skips all testing for faster feature iteration:

# CLI
python autonomous_agent_demo.py --project-dir my-app --yolo

# UI: Toggle the lightning bolt button before starting the agent

What's different in YOLO mode:

  • No regression testing
  • No Playwright MCP server (browser automation disabled)
  • Features marked passing after lint/type-check succeeds
  • Faster iteration for prototyping

What's the same:

  • Lint and type-check still run to verify code compiles
  • Feature MCP server for tracking progress
  • All other development tools available

When to use: Early prototyping when you want to quickly scaffold features without verification overhead. Switch back to standard mode for production-quality development.

React UI (in ui/ directory)

cd ui
npm install
npm run dev      # Development server (hot reload)
npm run build    # Production build (required for start_ui.bat)
npm run lint     # Run ESLint

Note: The start_ui.bat script serves the pre-built UI from ui/dist/. After making UI changes, run npm run build in the ui/ directory.

Testing

Python

ruff check .                          # Lint
mypy .                                # Type check
python test_security.py               # Security unit tests (12 tests)
python test_security_integration.py   # Integration tests (9 tests)
python -m pytest test_client.py       # Client tests (20 tests)
python -m pytest test_dependency_resolver.py  # Dependency resolver tests (12 tests)
python -m pytest test_rate_limit_utils.py     # Rate limit tests (22 tests)

React UI

cd ui
npm run lint          # ESLint
npm run build         # Type check + build (Vite 7)
npm run test:e2e      # Playwright end-to-end tests
npm run test:e2e:ui   # Playwright tests with UI

CI/CD

GitHub Actions (.github/workflows/ci.yml) runs on push/PR to master:

  • Python job: ruff lint + security tests
  • UI job: ESLint + TypeScript build

Code Quality

Configuration in pyproject.toml:

  • ruff: Line length 120, Python 3.11 target
  • mypy: Strict return type checking, ignores missing imports

Architecture

Core Python Modules

  • start.py - CLI launcher with project creation/selection menu
  • autonomous_agent_demo.py - Entry point for running the agent (supports --yolo, --parallel, --batch-size, --batch-features)
  • autocoder_paths.py - Central path resolution with dual-path backward compatibility and migration
  • agent.py - Agent session loop using Claude Agent SDK
  • client.py - ClaudeSDKClient configuration with security hooks, MCP servers, and Vertex AI support
  • security.py - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
  • prompts.py - Prompt template loading with project-specific fallback and batch feature prompts
  • progress.py - Progress tracking, database queries, webhook notifications
  • registry.py - Project registry for mapping names to paths (cross-platform), global settings model
  • parallel_orchestrator.py - Concurrent agent execution with dependency-aware scheduling
  • auth.py - Authentication error detection for Claude CLI
  • env_constants.py - Shared environment variable constants (API_ENV_VARS) used by client.py and chat sessions
  • rate_limit_utils.py - Rate limit detection, retry parsing, exponential backoff with jitter
  • api/database.py - SQLAlchemy models (Feature, Schedule, ScheduleOverride)
  • api/dependency_resolver.py - Cycle detection (Kahn's algorithm + DFS) and dependency validation
  • api/migration.py - JSON-to-SQLite migration utility

Project Registry

Projects can be stored in any directory. The registry maps project names to paths using SQLite:

  • All platforms: ~/.autocoder/registry.db

The registry uses:

  • SQLite database with SQLAlchemy ORM
  • POSIX path format (forward slashes) for cross-platform compatibility
  • SQLite's built-in transaction handling for concurrency safety

Server API (server/)

The FastAPI server provides REST and WebSocket endpoints for the UI:

Routers (server/routers/):

  • projects.py - Project CRUD with registry integration
  • features.py - Feature management
  • agent.py - Agent control (start/stop/pause/resume)
  • filesystem.py - Filesystem browser API with security controls
  • spec_creation.py - WebSocket for interactive spec creation
  • expand_project.py - Interactive project expansion via natural language
  • assistant_chat.py - Read-only project assistant chat (WebSocket/REST)
  • terminal.py - Interactive terminal I/O with PTY support (WebSocket bidirectional)
  • devserver.py - Dev server control (start/stop) and config
  • schedules.py - CRUD for time-based agent scheduling
  • settings.py - Global settings management (model selection, YOLO, batch size, headless browser)

Services (server/services/):

  • process_manager.py - Agent process lifecycle management
  • project_config.py - Project type detection and dev command management
  • terminal_manager.py - Terminal session management with PTY (pywinpty on Windows)
  • scheduler_service.py - APScheduler-based automated agent scheduling
  • dev_server_manager.py - Dev server lifecycle management
  • assistant_chat_session.py / assistant_database.py - Assistant chat sessions with SQLite persistence
  • spec_chat_session.py - Spec creation chat sessions
  • expand_chat_session.py - Expand project chat sessions
  • chat_constants.py - Shared constants for chat services

Utilities (server/utils/):

  • process_utils.py - Process management utilities
  • project_helpers.py - Project path resolution helpers
  • validation.py - Project name validation

Feature Management

Features are stored in SQLite (features.db) via SQLAlchemy. The agent interacts with features through an MCP server:

  • mcp_server/feature_mcp.py - MCP server exposing feature management tools
  • api/database.py - SQLAlchemy models (Feature table with priority, category, name, description, steps, passes, dependencies)

MCP tools available to the agent:

  • feature_get_stats - Progress statistics
  • feature_get_by_id - Get a single feature by ID
  • feature_get_summary - Get summary of all features
  • feature_get_ready - Get features ready to work on (dependencies met)
  • feature_get_blocked - Get features blocked by unmet dependencies
  • feature_get_graph - Get full dependency graph
  • feature_claim_and_get - Atomically claim next available feature (for parallel mode)
  • feature_mark_in_progress - Mark feature as in progress
  • feature_mark_passing - Mark feature complete
  • feature_mark_failing - Mark feature as failing
  • feature_skip - Move feature to end of queue
  • feature_clear_in_progress - Clear in-progress status
  • feature_create_bulk - Initialize all features (used by initializer)
  • feature_create - Create a single feature
  • feature_add_dependency - Add dependency between features (with cycle detection)
  • feature_remove_dependency - Remove a dependency
  • feature_set_dependencies - Set all dependencies for a feature at once

React UI (ui/)

  • Tech stack: React 19, TypeScript, Vite 7, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout), xterm.js (terminal)
  • src/App.tsx - Main app with project selection, kanban board, agent controls
  • src/hooks/useWebSocket.ts - Real-time updates via WebSocket (progress, agent status, logs, agent updates)
  • src/hooks/useProjects.ts - React Query hooks for API calls
  • src/lib/api.ts - REST API client
  • src/lib/types.ts - TypeScript type definitions

Key components:

  • AgentMissionControl.tsx - Dashboard showing active agents with mascots (Spark, Fizz, Octo, Hoot, Buzz)
  • DependencyGraph.tsx - Interactive node graph visualization with dagre layout
  • CelebrationOverlay.tsx - Confetti animation on feature completion
  • FolderBrowser.tsx - Server-side filesystem browser for project folder selection
  • Terminal.tsx / TerminalTabs.tsx - xterm.js-based multi-tab terminal
  • AssistantPanel.tsx / AssistantChat.tsx - AI assistant for project Q&A
  • ExpandProjectModal.tsx / ExpandProjectChat.tsx - Add features via natural language
  • DevServerControl.tsx - Dev server start/stop control
  • ScheduleModal.tsx - Schedule management UI
  • SettingsModal.tsx - Global settings panel

Keyboard shortcuts (press ? for help):

  • D - Toggle debug panel
  • G - Toggle Kanban/Graph view
  • N - Add new feature
  • A - Toggle AI assistant
  • , - Open settings

Project Structure for Generated Apps

Projects can be stored in any directory (registered in ~/.autocoder/registry.db). Each project contains:

  • .autocoder/prompts/app_spec.txt - Application specification (XML format)
  • .autocoder/prompts/initializer_prompt.md - First session prompt
  • .autocoder/prompts/coding_prompt.md - Continuation session prompt
  • .autocoder/features.db - SQLite database with feature test cases
  • .autocoder/.agent.lock - Lock file to prevent multiple agent instances
  • .autocoder/allowed_commands.yaml - Project-specific bash command allowlist (optional)
  • .autocoder/.gitignore - Ignores runtime files
  • CLAUDE.md - Stays at project root (SDK convention)
  • app_spec.txt - Root copy for agent template compatibility

Legacy projects with files at root level (e.g., features.db, prompts/) are auto-migrated to .autocoder/ on next agent start. Dual-path resolution ensures old and new layouts work transparently.

Security Model

Defense-in-depth approach configured in client.py:

  1. OS-level sandbox for bash commands
  2. Filesystem restricted to project directory only
  3. Bash commands validated using hierarchical allowlist system

Extra Read Paths (Cross-Project File Access)

The agent can optionally read files from directories outside the project folder via the EXTRA_READ_PATHS environment variable. This enables referencing documentation, shared libraries, or other projects.

Configuration:

# Single path
EXTRA_READ_PATHS=/Users/me/docs

# Multiple paths (comma-separated)
EXTRA_READ_PATHS=/Users/me/docs,/opt/shared-libs,/Volumes/Data/reference

Security Controls:

All paths are validated before being granted read access:

  • Must be absolute paths (not relative)
  • Must exist and be directories
  • Paths are canonicalized via Path.resolve() to prevent .. traversal attacks
  • Sensitive directories are blocked (see blocklist below)
  • Only Read, Glob, and Grep operations are allowed (no Write/Edit)

Blocked Sensitive Directories:

The following directories (relative to home) are always blocked:

  • .ssh, .aws, .azure, .kube - Cloud/SSH credentials
  • .gnupg, .gpg, .password-store - Encryption keys
  • .docker, .config/gcloud - Container/cloud configs
  • .npmrc, .pypirc, .netrc - Package manager credentials

Per-Project Allowed Commands

The agent's bash command access is controlled through a hierarchical configuration system:

Command Hierarchy (highest to lowest priority):

  1. Hardcoded Blocklist (security.py) - NEVER allowed (dd, sudo, shutdown, etc.)
  2. Org Blocklist (~/.autocoder/config.yaml) - Cannot be overridden by projects
  3. Org Allowlist (~/.autocoder/config.yaml) - Available to all projects
  4. Global Allowlist (security.py) - Default commands (npm, git, curl, etc.)
  5. Project Allowlist (.autocoder/allowed_commands.yaml) - Project-specific commands

Project Configuration:

Each project can define custom allowed commands in .autocoder/allowed_commands.yaml:

version: 1
commands:
  # Exact command names
  - name: swift
    description: Swift compiler

  # Prefix wildcards (matches swiftc, swiftlint, swiftformat)
  - name: swift*
    description: All Swift development tools

  # Local project scripts
  - name: ./scripts/build.sh
    description: Project build script

Organization Configuration:

System administrators can set org-wide policies in ~/.autocoder/config.yaml:

version: 1

# Commands available to ALL projects
allowed_commands:
  - name: jq
    description: JSON processor

# Commands blocked across ALL projects (cannot be overridden)
blocked_commands:
  - aws        # Prevent accidental cloud operations
  - kubectl    # Block production deployments

Pattern Matching:

  • Exact: swift matches only swift
  • Wildcard: swift* matches swift, swiftc, swiftlint, etc.
  • Scripts: ./scripts/build.sh matches the script by name from any directory

Limits:

  • Maximum 100 commands per project config
  • Blocklisted commands (sudo, dd, shutdown, etc.) can NEVER be allowed
  • Org-level blocked commands cannot be overridden by project configs

Files:

  • security.py - Command validation logic and hardcoded blocklist
  • test_security.py - Unit tests for security system
  • test_security_integration.py - Integration tests with real hooks
  • examples/project_allowed_commands.yaml - Project config example (all commented by default)
  • examples/org_config.yaml - Org config example (all commented by default)
  • examples/README.md - Comprehensive guide with use cases, testing, and troubleshooting

Vertex AI Configuration (Optional)

Run coding agents via Google Cloud Vertex AI:

  1. Install and authenticate gcloud CLI: gcloud auth application-default login
  2. Configure .env:
    CLAUDE_CODE_USE_VERTEX=1
    CLOUD_ML_REGION=us-east5
    ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
    ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-5@20251101
    ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5@20250929
    ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-haiku@20241022
    

Note: Use @ instead of - in model names for Vertex AI.

Ollama Local Models (Optional)

Run coding agents using local models via Ollama v0.14.0+:

  1. Install Ollama: https://ollama.com
  2. Start Ollama: ollama serve
  3. Pull a coding model: ollama pull qwen3-coder
  4. Configure .env:
    ANTHROPIC_BASE_URL=http://localhost:11434
    ANTHROPIC_AUTH_TOKEN=ollama
    API_TIMEOUT_MS=3000000
    ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3-coder
    ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3-coder
    ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3-coder
    
  5. Run autocoder normally - it will use your local Ollama models

Recommended coding models:

  • qwen3-coder - Good balance of speed and capability
  • deepseek-coder-v2 - Strong coding performance
  • codellama - Meta's code-focused model

Model tier mapping:

  • Use the same model for all tiers, or map different models per capability level
  • Larger models (70B+) work best for Opus tier
  • Smaller models (7B-20B) work well for Haiku tier

Known limitations:

  • Smaller context windows than Claude (model-dependent)
  • Extended context beta disabled (not supported by Ollama)
  • Performance depends on local hardware (GPU recommended)

Claude Code Integration

Slash commands (.claude/commands/):

  • /create-spec - Interactive spec creation for new projects
  • /expand-project - Expand existing project with new features
  • /gsd-to-autocoder-spec - Convert GSD codebase mapping to app_spec.txt
  • /check-code - Run lint and type-check for code quality
  • /checkpoint - Create comprehensive checkpoint commit
  • /review-pr - Review pull requests

Custom agents (.claude/agents/):

  • coder.md - Elite software architect agent for code implementation (Opus)
  • code-review.md - Code review agent for quality/security/performance analysis (Opus)
  • deep-dive.md - Technical investigator for deep analysis and debugging (Opus)

Skills (.claude/skills/):

  • frontend-design - Distinctive, production-grade UI design
  • gsd-to-autocoder-spec - Convert GSD codebase mapping to Autocoder app_spec format

Other:

  • .claude/templates/ - Prompt templates copied to new projects
  • examples/ - Configuration examples and documentation for security settings

Key Patterns

Prompt Loading Fallback Chain

  1. Project-specific: {project_dir}/.autocoder/prompts/{name}.md (or legacy {project_dir}/prompts/{name}.md)
  2. Base template: .claude/templates/{name}.template.md

Agent Session Flow

  1. Check if .autocoder/features.db has features (determines initializer vs coding agent)
  2. Create ClaudeSDKClient with security settings
  3. Send prompt and stream response
  4. Auto-continue with 3-second delay between sessions

Real-time UI Updates

The UI receives updates via WebSocket (/ws/projects/{project_name}):

  • progress - Test pass counts (passing, in_progress, total)
  • agent_status - Running/paused/stopped/crashed
  • log - Agent output lines with optional featureId/agentIndex for attribution
  • feature_update - Feature status changes
  • agent_update - Multi-agent state updates (thinking/working/testing/success/error) with mascot names

Parallel Mode

When running with --parallel, the orchestrator:

  1. Spawns multiple Claude agents as subprocesses (up to --max-concurrency)
  2. Each agent claims features atomically via feature_claim_and_get
  3. Features blocked by unmet dependencies are skipped
  4. Browser contexts are isolated per agent using --isolated flag
  5. AgentTracker parses output and emits agent_update messages for UI

Process Limits (Parallel Mode)

The orchestrator enforces strict bounds on concurrent processes:

  • MAX_PARALLEL_AGENTS = 5 - Maximum concurrent coding agents
  • MAX_TOTAL_AGENTS = 10 - Hard limit on total agents (coding + testing)
  • Testing agents are capped at max_concurrency (same as coding agents)
  • Total process count never exceeds 11 Python processes (1 orchestrator + 5 coding + 5 testing)

Multi-Feature Batching

Agents can implement multiple features per session using --batch-size (1-3, default: 3):

  • --batch-size N - Max features per coding agent batch
  • --testing-batch-size N - Features per testing batch (1-5, default: 3)
  • --batch-features 1,2,3 - Specific feature IDs for batch implementation
  • --testing-batch-features 1,2,3 - Specific feature IDs for batch regression testing
  • prompts.py provides get_batch_feature_prompt() for multi-feature prompt generation
  • Configurable in UI via settings panel

Design System

The UI uses a neobrutalism design with Tailwind CSS v4:

  • CSS variables defined in ui/src/styles/globals.css via @theme directive
  • Custom animations: animate-slide-in, animate-pulse-neo, animate-shimmer
  • Color tokens: --color-neo-pending (yellow), --color-neo-progress (cyan), --color-neo-done (green)