Files
automaker/reference
Cody Seibert 2822cdfc32 Update app_spec.txt and coding_prompt.md for improved clarity and consistency
- Updated references to `app_spec.txt` and `feature_list.json` in app_spec.txt to include the correct path.
- Enhanced coding_prompt.md by incorporating testing utilities for better test management and readability.
- Added new utility functions in tests/utils.ts to streamline test interactions.

This commit aims to improve documentation accuracy and maintainability of testing practices.
2025-12-09 00:45:34 -05:00
..
2025-12-07 16:43:26 -05:00
2025-12-07 16:43:26 -05:00
2025-12-07 16:43:26 -05:00
2025-12-07 16:43:26 -05:00
2025-12-07 16:43:26 -05:00
2025-12-07 16:43:26 -05:00
2025-12-07 16:43:26 -05:00

Autonomous Coding Agent Demo

A minimal harness demonstrating long-running autonomous coding with the Claude Agent SDK. This demo implements a two-agent pattern (initializer + coding agent) that can build complete applications over multiple sessions.

Prerequisites

Required: Install the latest versions of both Claude Code and the Claude Agent SDK:

# Install Claude Code CLI (latest version required)
npm install -g @anthropic-ai/claude-code

# Install Python dependencies
pip install -r requirements.txt

Verify your installations:

claude --version  # Should be latest version
pip show claude-code-sdk  # Check SDK is installed

API Key: Set your Anthropic API key:

export ANTHROPIC_API_KEY='your-api-key-here'

Quick Start

python autonomous_agent_demo.py --project-dir ./my_project

For testing with limited iterations:

python autonomous_agent_demo.py --project-dir ./my_project --max-iterations 3

Important Timing Expectations

Warning: This demo takes a long time to run!

  • First session (initialization): The agent generates a feature_list.json with 200 test cases. This takes several minutes and may appear to hang - this is normal. The agent is writing out all the features.

  • Subsequent sessions: Each coding iteration can take 5-15 minutes depending on complexity.

  • Full app: Building all 200 features typically requires many hours of total runtime across multiple sessions.

Tip: The 200 features parameter in the prompts is designed for comprehensive coverage. If you want faster demos, you can modify prompts/initializer_prompt.md to reduce the feature count (e.g., 20-50 features for a quicker demo).

How It Works

Two-Agent Pattern

  1. Initializer Agent (Session 1): Reads app_spec.txt, creates feature_list.json with 200 test cases, sets up project structure, and initializes git.

  2. Coding Agent (Sessions 2+): Picks up where the previous session left off, implements features one by one, and marks them as passing in feature_list.json.

Session Management

  • Each session runs with a fresh context window
  • Progress is persisted via feature_list.json and git commits
  • The agent auto-continues between sessions (3 second delay)
  • Press Ctrl+C to pause; run the same command to resume

Security Model

This demo uses a defense-in-depth security approach (see security.py and client.py):

  1. OS-level Sandbox: Bash commands run in an isolated environment
  2. Filesystem Restrictions: File operations restricted to the project directory only
  3. Bash Allowlist: Only specific commands are permitted:
    • File inspection: ls, cat, head, tail, wc, grep
    • Node.js: npm, node
    • Version control: git
    • Process management: ps, lsof, sleep, pkill (dev processes only)

Commands not in the allowlist are blocked by the security hook.

Project Structure

autonomous-coding/
├── autonomous_agent_demo.py  # Main entry point
├── agent.py                  # Agent session logic
├── client.py                 # Claude SDK client configuration
├── security.py               # Bash command allowlist and validation
├── progress.py               # Progress tracking utilities
├── prompts.py                # Prompt loading utilities
├── prompts/
│   ├── app_spec.txt          # Application specification
│   ├── initializer_prompt.md # First session prompt
│   └── coding_prompt.md      # Continuation session prompt
└── requirements.txt          # Python dependencies

Generated Project Structure

After running, your project directory will contain:

my_project/
├── .automaker/feature_list.json         # Test cases (source of truth)
├── app_spec.txt              # Copied specification
├── init.sh                   # Environment setup script
├── claude-progress.txt       # Session progress notes
├── .claude_settings.json     # Security settings
└── [application files]       # Generated application code

Running the Generated Application

After the agent completes (or pauses), you can run the generated application:

cd generations/my_project

# Run the setup script created by the agent
./init.sh

# Or manually (typical for Node.js apps):
npm install
npm run dev

The application will typically be available at http://localhost:3007 or similar (check the agent's output or init.sh for the exact URL).

Command Line Options

Option Description Default
--project-dir Directory for the project ./autonomous_demo_project
--max-iterations Max agent iterations Unlimited
--model Claude model to use claude-sonnet-4-5-20250929

Customization

Changing the Application

Edit prompts/app_spec.txt to specify a different application to build.

Adjusting Feature Count

Edit prompts/initializer_prompt.md and change the "200 features" requirement to a smaller number for faster demos.

Modifying Allowed Commands

Edit security.py to add or remove commands from ALLOWED_COMMANDS.

Troubleshooting

"Appears to hang on first run" This is normal. The initializer agent is generating 200 detailed test cases, which takes significant time. Watch for [Tool: ...] output to confirm the agent is working.

"Command blocked by security hook" The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to ALLOWED_COMMANDS in security.py.

"API key not set" Ensure ANTHROPIC_API_KEY is exported in your shell environment.

License

Internal Anthropic use.