mirror of
https://github.com/AutoMaker-Org/automaker.git
synced 2026-01-30 06:12:03 +00:00
164 lines
5.7 KiB
Markdown
164 lines
5.7 KiB
Markdown
# Autonomous Coding Agent Demo
|
|
|
|
A minimal harness demonstrating long-running autonomous coding with the Claude Agent SDK. This demo implements a two-agent pattern (initializer + coding agent) that can build complete applications over multiple sessions.
|
|
|
|
## Prerequisites
|
|
|
|
**Required:** Install the latest versions of both Claude Code and the Claude Agent SDK:
|
|
|
|
```bash
|
|
# Install Claude Code CLI (latest version required)
|
|
npm install -g @anthropic-ai/claude-code
|
|
|
|
# Install Python dependencies
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
Verify your installations:
|
|
```bash
|
|
claude --version # Should be latest version
|
|
pip show claude-code-sdk # Check SDK is installed
|
|
```
|
|
|
|
**API Key:** Set your Anthropic API key:
|
|
```bash
|
|
export ANTHROPIC_API_KEY='your-api-key-here'
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
python autonomous_agent_demo.py --project-dir ./my_project
|
|
```
|
|
|
|
For testing with limited iterations:
|
|
```bash
|
|
python autonomous_agent_demo.py --project-dir ./my_project --max-iterations 3
|
|
```
|
|
|
|
## Important Timing Expectations
|
|
|
|
> **Warning: This demo takes a long time to run!**
|
|
|
|
- **First session (initialization):** The agent generates a `feature_list.json` with 200 test cases. This takes several minutes and may appear to hang - this is normal. The agent is writing out all the features.
|
|
|
|
- **Subsequent sessions:** Each coding iteration can take **5-15 minutes** depending on complexity.
|
|
|
|
- **Full app:** Building all 200 features typically requires **many hours** of total runtime across multiple sessions.
|
|
|
|
**Tip:** The 200 features parameter in the prompts is designed for comprehensive coverage. If you want faster demos, you can modify `prompts/initializer_prompt.md` to reduce the feature count (e.g., 20-50 features for a quicker demo).
|
|
|
|
## How It Works
|
|
|
|
### Two-Agent Pattern
|
|
|
|
1. **Initializer Agent (Session 1):** Reads `app_spec.txt`, creates `feature_list.json` with 200 test cases, sets up project structure, and initializes git.
|
|
|
|
2. **Coding Agent (Sessions 2+):** Picks up where the previous session left off, implements features one by one, and marks them as passing in `feature_list.json`.
|
|
|
|
### Session Management
|
|
|
|
- Each session runs with a fresh context window
|
|
- Progress is persisted via `feature_list.json` and git commits
|
|
- The agent auto-continues between sessions (3 second delay)
|
|
- Press `Ctrl+C` to pause; run the same command to resume
|
|
|
|
## Security Model
|
|
|
|
This demo uses a defense-in-depth security approach (see `security.py` and `client.py`):
|
|
|
|
1. **OS-level Sandbox:** Bash commands run in an isolated environment
|
|
2. **Filesystem Restrictions:** File operations restricted to the project directory only
|
|
3. **Bash Allowlist:** Only specific commands are permitted:
|
|
- File inspection: `ls`, `cat`, `head`, `tail`, `wc`, `grep`
|
|
- Node.js: `npm`, `node`
|
|
- Version control: `git`
|
|
- Process management: `ps`, `lsof`, `sleep`, `pkill` (dev processes only)
|
|
|
|
Commands not in the allowlist are blocked by the security hook.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
autonomous-coding/
|
|
├── autonomous_agent_demo.py # Main entry point
|
|
├── agent.py # Agent session logic
|
|
├── client.py # Claude SDK client configuration
|
|
├── security.py # Bash command allowlist and validation
|
|
├── progress.py # Progress tracking utilities
|
|
├── prompts.py # Prompt loading utilities
|
|
├── prompts/
|
|
│ ├── app_spec.txt # Application specification
|
|
│ ├── initializer_prompt.md # First session prompt
|
|
│ └── coding_prompt.md # Continuation session prompt
|
|
└── requirements.txt # Python dependencies
|
|
```
|
|
|
|
## Generated Project Structure
|
|
|
|
After running, your project directory will contain:
|
|
|
|
```
|
|
my_project/
|
|
├── feature_list.json # Test cases (source of truth)
|
|
├── app_spec.txt # Copied specification
|
|
├── init.sh # Environment setup script
|
|
├── claude-progress.txt # Session progress notes
|
|
├── .claude_settings.json # Security settings
|
|
└── [application files] # Generated application code
|
|
```
|
|
|
|
## Running the Generated Application
|
|
|
|
After the agent completes (or pauses), you can run the generated application:
|
|
|
|
```bash
|
|
cd generations/my_project
|
|
|
|
# Run the setup script created by the agent
|
|
./init.sh
|
|
|
|
# Or manually (typical for Node.js apps):
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
The application will typically be available at `http://localhost:3000` or similar (check the agent's output or `init.sh` for the exact URL).
|
|
|
|
## Command Line Options
|
|
|
|
| Option | Description | Default |
|
|
|--------|-------------|---------|
|
|
| `--project-dir` | Directory for the project | `./autonomous_demo_project` |
|
|
| `--max-iterations` | Max agent iterations | Unlimited |
|
|
| `--model` | Claude model to use | `claude-sonnet-4-5-20250929` |
|
|
|
|
## Customization
|
|
|
|
### Changing the Application
|
|
|
|
Edit `prompts/app_spec.txt` to specify a different application to build.
|
|
|
|
### Adjusting Feature Count
|
|
|
|
Edit `prompts/initializer_prompt.md` and change the "200 features" requirement to a smaller number for faster demos.
|
|
|
|
### Modifying Allowed Commands
|
|
|
|
Edit `security.py` to add or remove commands from `ALLOWED_COMMANDS`.
|
|
|
|
## Troubleshooting
|
|
|
|
**"Appears to hang on first run"**
|
|
This is normal. The initializer agent is generating 200 detailed test cases, which takes significant time. Watch for `[Tool: ...]` output to confirm the agent is working.
|
|
|
|
**"Command blocked by security hook"**
|
|
The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to `ALLOWED_COMMANDS` in `security.py`.
|
|
|
|
**"API key not set"**
|
|
Ensure `ANTHROPIC_API_KEY` is exported in your shell environment.
|
|
|
|
## License
|
|
|
|
Internal Anthropic use.
|