# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Prerequisites - Python 3.11+ - Node.js 20+ (for UI development) - Claude Code CLI ## Project Overview This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern: 1. **Initializer Agent** - First session reads an app spec and creates features in a SQLite database 2. **Coding Agent** - Subsequent sessions implement features one by one, marking them as passing ## Commands ### Quick Start (Recommended) ```bash # Windows - launches CLI menu start.bat # macOS/Linux ./start.sh # Launch Web UI (serves pre-built React app) start_ui.bat # Windows ./start_ui.sh # macOS/Linux ``` ### Python Backend (Manual) ```bash # Create and activate virtual environment python -m venv venv venv\Scripts\activate # Windows source venv/bin/activate # macOS/Linux # Install dependencies pip install -r requirements.txt # Run the main CLI launcher python start.py # Run agent directly for a project (use absolute path or registered name) python autonomous_agent_demo.py --project-dir C:/Projects/my-app python autonomous_agent_demo.py --project-dir my-app # if registered # YOLO mode: rapid prototyping without browser testing python autonomous_agent_demo.py --project-dir my-app --yolo # Parallel mode: run multiple agents concurrently (1-5 agents) python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3 # Batch mode: implement multiple features per agent session (1-3) python autonomous_agent_demo.py --project-dir my-app --batch-size 3 # Batch specific features by ID python autonomous_agent_demo.py --project-dir my-app --batch-features 1,2,3 ``` ### YOLO Mode (Rapid Prototyping) YOLO mode skips all testing for faster feature iteration: ```bash # CLI python autonomous_agent_demo.py --project-dir my-app --yolo # UI: Toggle the lightning bolt button before starting the agent ``` **What's different in YOLO mode:** - No regression testing - No Playwright MCP server (browser automation disabled) - Features marked passing after lint/type-check succeeds - Faster iteration for prototyping **What's the same:** - Lint and type-check still run to verify code compiles - Feature MCP server for tracking progress - All other development tools available **When to use:** Early prototyping when you want to quickly scaffold features without verification overhead. Switch back to standard mode for production-quality development. ### React UI (in ui/ directory) ```bash cd ui npm install npm run dev # Development server (hot reload) npm run build # Production build (required for start_ui.bat) npm run lint # Run ESLint ``` **Note:** The `start_ui.bat` script serves the pre-built UI from `ui/dist/`. After making UI changes, run `npm run build` in the `ui/` directory. ## Testing ### Python ```bash ruff check . # Lint mypy . # Type check python test_security.py # Security unit tests (12 tests) python test_security_integration.py # Integration tests (9 tests) python -m pytest test_client.py # Client tests (20 tests) python -m pytest test_dependency_resolver.py # Dependency resolver tests (12 tests) python -m pytest test_rate_limit_utils.py # Rate limit tests (22 tests) ``` ### React UI ```bash cd ui npm run lint # ESLint npm run build # Type check + build (Vite 7) npm run test:e2e # Playwright end-to-end tests npm run test:e2e:ui # Playwright tests with UI ``` ### CI/CD GitHub Actions (`.github/workflows/ci.yml`) runs on push/PR to master: - **Python job**: ruff lint + security tests - **UI job**: ESLint + TypeScript build ### Code Quality Configuration in `pyproject.toml`: - ruff: Line length 120, Python 3.11 target - mypy: Strict return type checking, ignores missing imports ## Architecture ### Core Python Modules - `start.py` - CLI launcher with project creation/selection menu - `autonomous_agent_demo.py` - Entry point for running the agent (supports `--yolo`, `--parallel`, `--batch-size`, `--batch-features`) - `autocoder_paths.py` - Central path resolution with dual-path backward compatibility and migration - `agent.py` - Agent session loop using Claude Agent SDK - `client.py` - ClaudeSDKClient configuration with security hooks, MCP servers, and Vertex AI support - `security.py` - Bash command allowlist validation (ALLOWED_COMMANDS whitelist) - `prompts.py` - Prompt template loading with project-specific fallback and batch feature prompts - `progress.py` - Progress tracking, database queries, webhook notifications - `registry.py` - Project registry for mapping names to paths (cross-platform), global settings model - `parallel_orchestrator.py` - Concurrent agent execution with dependency-aware scheduling - `auth.py` - Authentication error detection for Claude CLI - `env_constants.py` - Shared environment variable constants (API_ENV_VARS) used by client.py and chat sessions - `rate_limit_utils.py` - Rate limit detection, retry parsing, exponential backoff with jitter - `api/database.py` - SQLAlchemy models (Feature, Schedule, ScheduleOverride) - `api/dependency_resolver.py` - Cycle detection (Kahn's algorithm + DFS) and dependency validation - `api/migration.py` - JSON-to-SQLite migration utility ### Project Registry Projects can be stored in any directory. The registry maps project names to paths using SQLite: - **All platforms**: `~/.autocoder/registry.db` The registry uses: - SQLite database with SQLAlchemy ORM - POSIX path format (forward slashes) for cross-platform compatibility - SQLite's built-in transaction handling for concurrency safety ### Server API (server/) The FastAPI server provides REST and WebSocket endpoints for the UI: **Routers** (`server/routers/`): - `projects.py` - Project CRUD with registry integration - `features.py` - Feature management - `agent.py` - Agent control (start/stop/pause/resume) - `filesystem.py` - Filesystem browser API with security controls - `spec_creation.py` - WebSocket for interactive spec creation - `expand_project.py` - Interactive project expansion via natural language - `assistant_chat.py` - Read-only project assistant chat (WebSocket/REST) - `terminal.py` - Interactive terminal I/O with PTY support (WebSocket bidirectional) - `devserver.py` - Dev server control (start/stop) and config - `schedules.py` - CRUD for time-based agent scheduling - `settings.py` - Global settings management (model selection, YOLO, batch size, headless browser) **Services** (`server/services/`): - `process_manager.py` - Agent process lifecycle management - `project_config.py` - Project type detection and dev command management - `terminal_manager.py` - Terminal session management with PTY (`pywinpty` on Windows) - `scheduler_service.py` - APScheduler-based automated agent scheduling - `dev_server_manager.py` - Dev server lifecycle management - `assistant_chat_session.py` / `assistant_database.py` - Assistant chat sessions with SQLite persistence - `spec_chat_session.py` - Spec creation chat sessions - `expand_chat_session.py` - Expand project chat sessions - `chat_constants.py` - Shared constants for chat services **Utilities** (`server/utils/`): - `process_utils.py` - Process management utilities - `project_helpers.py` - Project path resolution helpers - `validation.py` - Project name validation ### Feature Management Features are stored in SQLite (`features.db`) via SQLAlchemy. The agent interacts with features through an MCP server: - `mcp_server/feature_mcp.py` - MCP server exposing feature management tools - `api/database.py` - SQLAlchemy models (Feature table with priority, category, name, description, steps, passes, dependencies) MCP tools available to the agent: - `feature_get_stats` - Progress statistics - `feature_get_by_id` - Get a single feature by ID - `feature_get_summary` - Get summary of all features - `feature_get_ready` - Get features ready to work on (dependencies met) - `feature_get_blocked` - Get features blocked by unmet dependencies - `feature_get_graph` - Get full dependency graph - `feature_claim_and_get` - Atomically claim next available feature (for parallel mode) - `feature_mark_in_progress` - Mark feature as in progress - `feature_mark_passing` - Mark feature complete - `feature_mark_failing` - Mark feature as failing - `feature_skip` - Move feature to end of queue - `feature_clear_in_progress` - Clear in-progress status - `feature_create_bulk` - Initialize all features (used by initializer) - `feature_create` - Create a single feature - `feature_add_dependency` - Add dependency between features (with cycle detection) - `feature_remove_dependency` - Remove a dependency - `feature_set_dependencies` - Set all dependencies for a feature at once ### React UI (ui/) - Tech stack: React 19, TypeScript, Vite 7, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout), xterm.js (terminal) - `src/App.tsx` - Main app with project selection, kanban board, agent controls - `src/hooks/useWebSocket.ts` - Real-time updates via WebSocket (progress, agent status, logs, agent updates) - `src/hooks/useProjects.ts` - React Query hooks for API calls - `src/lib/api.ts` - REST API client - `src/lib/types.ts` - TypeScript type definitions Key components: - `AgentMissionControl.tsx` - Dashboard showing active agents with mascots (Spark, Fizz, Octo, Hoot, Buzz) - `DependencyGraph.tsx` - Interactive node graph visualization with dagre layout - `CelebrationOverlay.tsx` - Confetti animation on feature completion - `FolderBrowser.tsx` - Server-side filesystem browser for project folder selection - `Terminal.tsx` / `TerminalTabs.tsx` - xterm.js-based multi-tab terminal - `AssistantPanel.tsx` / `AssistantChat.tsx` - AI assistant for project Q&A - `ExpandProjectModal.tsx` / `ExpandProjectChat.tsx` - Add features via natural language - `DevServerControl.tsx` - Dev server start/stop control - `ScheduleModal.tsx` - Schedule management UI - `SettingsModal.tsx` - Global settings panel Keyboard shortcuts (press `?` for help): - `D` - Toggle debug panel - `G` - Toggle Kanban/Graph view - `N` - Add new feature - `A` - Toggle AI assistant - `,` - Open settings ### Project Structure for Generated Apps Projects can be stored in any directory (registered in `~/.autocoder/registry.db`). Each project contains: - `.autocoder/prompts/app_spec.txt` - Application specification (XML format) - `.autocoder/prompts/initializer_prompt.md` - First session prompt - `.autocoder/prompts/coding_prompt.md` - Continuation session prompt - `.autocoder/features.db` - SQLite database with feature test cases - `.autocoder/.agent.lock` - Lock file to prevent multiple agent instances - `.autocoder/allowed_commands.yaml` - Project-specific bash command allowlist (optional) - `.autocoder/.gitignore` - Ignores runtime files - `CLAUDE.md` - Stays at project root (SDK convention) - `app_spec.txt` - Root copy for agent template compatibility Legacy projects with files at root level (e.g., `features.db`, `prompts/`) are auto-migrated to `.autocoder/` on next agent start. Dual-path resolution ensures old and new layouts work transparently. ### Security Model Defense-in-depth approach configured in `client.py`: 1. OS-level sandbox for bash commands 2. Filesystem restricted to project directory only 3. Bash commands validated using hierarchical allowlist system #### Extra Read Paths (Cross-Project File Access) The agent can optionally read files from directories outside the project folder via the `EXTRA_READ_PATHS` environment variable. This enables referencing documentation, shared libraries, or other projects. **Configuration:** ```bash # Single path EXTRA_READ_PATHS=/Users/me/docs # Multiple paths (comma-separated) EXTRA_READ_PATHS=/Users/me/docs,/opt/shared-libs,/Volumes/Data/reference ``` **Security Controls:** All paths are validated before being granted read access: - Must be absolute paths (not relative) - Must exist and be directories - Paths are canonicalized via `Path.resolve()` to prevent `..` traversal attacks - Sensitive directories are blocked (see blocklist below) - Only Read, Glob, and Grep operations are allowed (no Write/Edit) **Blocked Sensitive Directories:** The following directories (relative to home) are always blocked: - `.ssh`, `.aws`, `.azure`, `.kube` - Cloud/SSH credentials - `.gnupg`, `.gpg`, `.password-store` - Encryption keys - `.docker`, `.config/gcloud` - Container/cloud configs - `.npmrc`, `.pypirc`, `.netrc` - Package manager credentials #### Per-Project Allowed Commands The agent's bash command access is controlled through a hierarchical configuration system: **Command Hierarchy (highest to lowest priority):** 1. **Hardcoded Blocklist** (`security.py`) - NEVER allowed (dd, sudo, shutdown, etc.) 2. **Org Blocklist** (`~/.autocoder/config.yaml`) - Cannot be overridden by projects 3. **Org Allowlist** (`~/.autocoder/config.yaml`) - Available to all projects 4. **Global Allowlist** (`security.py`) - Default commands (npm, git, curl, etc.) 5. **Project Allowlist** (`.autocoder/allowed_commands.yaml`) - Project-specific commands **Project Configuration:** Each project can define custom allowed commands in `.autocoder/allowed_commands.yaml`: ```yaml version: 1 commands: # Exact command names - name: swift description: Swift compiler # Prefix wildcards (matches swiftc, swiftlint, swiftformat) - name: swift* description: All Swift development tools # Local project scripts - name: ./scripts/build.sh description: Project build script ``` **Organization Configuration:** System administrators can set org-wide policies in `~/.autocoder/config.yaml`: ```yaml version: 1 # Commands available to ALL projects allowed_commands: - name: jq description: JSON processor # Commands blocked across ALL projects (cannot be overridden) blocked_commands: - aws # Prevent accidental cloud operations - kubectl # Block production deployments ``` **Pattern Matching:** - Exact: `swift` matches only `swift` - Wildcard: `swift*` matches `swift`, `swiftc`, `swiftlint`, etc. - Scripts: `./scripts/build.sh` matches the script by name from any directory **Limits:** - Maximum 100 commands per project config - Blocklisted commands (sudo, dd, shutdown, etc.) can NEVER be allowed - Org-level blocked commands cannot be overridden by project configs **Files:** - `security.py` - Command validation logic and hardcoded blocklist - `test_security.py` - Unit tests for security system - `test_security_integration.py` - Integration tests with real hooks - `examples/project_allowed_commands.yaml` - Project config example (all commented by default) - `examples/org_config.yaml` - Org config example (all commented by default) - `examples/README.md` - Comprehensive guide with use cases, testing, and troubleshooting ### Vertex AI Configuration (Optional) Run coding agents via Google Cloud Vertex AI: 1. Install and authenticate gcloud CLI: `gcloud auth application-default login` 2. Configure `.env`: ``` CLAUDE_CODE_USE_VERTEX=1 CLOUD_ML_REGION=us-east5 ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-5@20251101 ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5@20250929 ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-haiku@20241022 ``` **Note:** Use `@` instead of `-` in model names for Vertex AI. ### Ollama Local Models (Optional) Run coding agents using local models via Ollama v0.14.0+: 1. Install Ollama: https://ollama.com 2. Start Ollama: `ollama serve` 3. Pull a coding model: `ollama pull qwen3-coder` 4. Configure `.env`: ``` ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama API_TIMEOUT_MS=3000000 ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3-coder ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3-coder ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3-coder ``` 5. Run autocoder normally - it will use your local Ollama models **Recommended coding models:** - `qwen3-coder` - Good balance of speed and capability - `deepseek-coder-v2` - Strong coding performance - `codellama` - Meta's code-focused model **Model tier mapping:** - Use the same model for all tiers, or map different models per capability level - Larger models (70B+) work best for Opus tier - Smaller models (7B-20B) work well for Haiku tier **Known limitations:** - Smaller context windows than Claude (model-dependent) - Extended context beta disabled (not supported by Ollama) - Performance depends on local hardware (GPU recommended) ## Claude Code Integration **Slash commands** (`.claude/commands/`): - `/create-spec` - Interactive spec creation for new projects - `/expand-project` - Expand existing project with new features - `/gsd-to-autocoder-spec` - Convert GSD codebase mapping to app_spec.txt - `/check-code` - Run lint and type-check for code quality - `/checkpoint` - Create comprehensive checkpoint commit - `/review-pr` - Review pull requests **Custom agents** (`.claude/agents/`): - `coder.md` - Elite software architect agent for code implementation (Opus) - `code-review.md` - Code review agent for quality/security/performance analysis (Opus) - `deep-dive.md` - Technical investigator for deep analysis and debugging (Opus) **Skills** (`.claude/skills/`): - `frontend-design` - Distinctive, production-grade UI design - `gsd-to-autocoder-spec` - Convert GSD codebase mapping to Autocoder app_spec format **Other:** - `.claude/templates/` - Prompt templates copied to new projects - `examples/` - Configuration examples and documentation for security settings ## Key Patterns ### Prompt Loading Fallback Chain 1. Project-specific: `{project_dir}/.autocoder/prompts/{name}.md` (or legacy `{project_dir}/prompts/{name}.md`) 2. Base template: `.claude/templates/{name}.template.md` ### Agent Session Flow 1. Check if `.autocoder/features.db` has features (determines initializer vs coding agent) 2. Create ClaudeSDKClient with security settings 3. Send prompt and stream response 4. Auto-continue with 3-second delay between sessions ### Real-time UI Updates The UI receives updates via WebSocket (`/ws/projects/{project_name}`): - `progress` - Test pass counts (passing, in_progress, total) - `agent_status` - Running/paused/stopped/crashed - `log` - Agent output lines with optional featureId/agentIndex for attribution - `feature_update` - Feature status changes - `agent_update` - Multi-agent state updates (thinking/working/testing/success/error) with mascot names ### Parallel Mode When running with `--parallel`, the orchestrator: 1. Spawns multiple Claude agents as subprocesses (up to `--max-concurrency`) 2. Each agent claims features atomically via `feature_claim_and_get` 3. Features blocked by unmet dependencies are skipped 4. Browser contexts are isolated per agent using `--isolated` flag 5. AgentTracker parses output and emits `agent_update` messages for UI ### Process Limits (Parallel Mode) The orchestrator enforces strict bounds on concurrent processes: - `MAX_PARALLEL_AGENTS = 5` - Maximum concurrent coding agents - `MAX_TOTAL_AGENTS = 10` - Hard limit on total agents (coding + testing) - Testing agents are capped at `max_concurrency` (same as coding agents) - Total process count never exceeds 11 Python processes (1 orchestrator + 5 coding + 5 testing) ### Multi-Feature Batching Agents can implement multiple features per session using `--batch-size` (1-3, default: 3): - `--batch-size N` - Max features per coding agent batch - `--testing-batch-size N` - Features per testing batch (1-5, default: 3) - `--batch-features 1,2,3` - Specific feature IDs for batch implementation - `--testing-batch-features 1,2,3` - Specific feature IDs for batch regression testing - `prompts.py` provides `get_batch_feature_prompt()` for multi-feature prompt generation - Configurable in UI via settings panel ### Design System The UI uses a **neobrutalism** design with Tailwind CSS v4: - CSS variables defined in `ui/src/styles/globals.css` via `@theme` directive - Custom animations: `animate-slide-in`, `animate-pulse-neo`, `animate-shimmer` - Color tokens: `--color-neo-pending` (yellow), `--color-neo-progress` (cyan), `--color-neo-done` (green)