mirror of
https://github.com/leonvanzyl/autocoder.git
synced 2026-02-02 07:23:35 +00:00
feat: add dedicated testing agents and enhanced parallel orchestration
Introduce a new testing agent architecture that runs regression tests independently from coding agents, improving quality assurance in parallel mode. Key changes: Testing Agent System: - Add testing_prompt.template.md for dedicated testing agent role - Add feature_mark_failing MCP tool for regression detection - Add --agent-type flag to select initializer/coding/testing mode - Remove regression testing from coding prompt (now handled by testing agents) Parallel Orchestrator Enhancements: - Add testing agent spawning with configurable ratio (--testing-agent-ratio) - Add comprehensive debug logging system (DebugLog class) - Improve database session management to prevent stale reads - Add engine.dispose() calls to refresh connections after subprocess commits - Fix f-string linting issues (remove unnecessary f-prefixes) UI Improvements: - Add testing agent mascot (Chip) to AgentAvatar - Enhance AgentCard to display testing agent status - Add testing agent ratio slider in SettingsModal - Update WebSocket handling for testing agent updates - Improve ActivityFeed to show testing agent activity API & Server Updates: - Add testing_agent_ratio to settings schema and endpoints - Update process manager to support testing agent type - Enhance WebSocket messages for agent_update events Template Changes: - Delete coding_prompt_yolo.template.md (consolidated into main prompt) - Update initializer_prompt.template.md with improved structure - Streamline coding_prompt.template.md workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -26,10 +26,22 @@ which is the single source of truth for what needs to be built.
|
||||
|
||||
**Creating Features:**
|
||||
|
||||
Use the feature_create_bulk tool to add all features at once:
|
||||
Use the feature_create_bulk tool to add all features at once. Note: You MUST include `depends_on_indices`
|
||||
to specify dependencies. Features with no dependencies can run first and enable parallel execution.
|
||||
|
||||
```
|
||||
Use the feature_create_bulk tool with features=[
|
||||
{
|
||||
"category": "functional",
|
||||
"name": "App loads without errors",
|
||||
"description": "Application starts and renders homepage",
|
||||
"steps": [
|
||||
"Step 1: Navigate to homepage",
|
||||
"Step 2: Verify no console errors",
|
||||
"Step 3: Verify main content renders"
|
||||
]
|
||||
// No depends_on_indices = FOUNDATION feature (runs first)
|
||||
},
|
||||
{
|
||||
"category": "functional",
|
||||
"name": "User can create an account",
|
||||
@@ -38,7 +50,8 @@ Use the feature_create_bulk tool with features=[
|
||||
"Step 1: Navigate to registration page",
|
||||
"Step 2: Fill in required fields",
|
||||
"Step 3: Submit form and verify account created"
|
||||
]
|
||||
],
|
||||
"depends_on_indices": [0] // Depends on app loading
|
||||
},
|
||||
{
|
||||
"category": "functional",
|
||||
@@ -49,7 +62,7 @@ Use the feature_create_bulk tool with features=[
|
||||
"Step 2: Enter credentials",
|
||||
"Step 3: Verify successful login and redirect"
|
||||
],
|
||||
"depends_on_indices": [0]
|
||||
"depends_on_indices": [0, 1] // Depends on app loading AND registration
|
||||
},
|
||||
{
|
||||
"category": "functional",
|
||||
@@ -60,7 +73,18 @@ Use the feature_create_bulk tool with features=[
|
||||
"Step 2: Navigate to dashboard",
|
||||
"Step 3: Verify personalized content displays"
|
||||
],
|
||||
"depends_on_indices": [1]
|
||||
"depends_on_indices": [2] // Depends on login only
|
||||
},
|
||||
{
|
||||
"category": "functional",
|
||||
"name": "User can update profile",
|
||||
"description": "User can modify their profile information",
|
||||
"steps": [
|
||||
"Step 1: Log in as user",
|
||||
"Step 2: Navigate to profile settings",
|
||||
"Step 3: Update and save profile"
|
||||
],
|
||||
"depends_on_indices": [2] // ALSO depends on login (WIDE GRAPH - can run parallel with dashboard!)
|
||||
}
|
||||
]
|
||||
```
|
||||
@@ -69,7 +93,15 @@ Use the feature_create_bulk tool with features=[
|
||||
- IDs and priorities are assigned automatically based on order
|
||||
- All features start with `passes: false` by default
|
||||
- You can create features in batches if there are many (e.g., 50 at a time)
|
||||
- Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below)
|
||||
- **CRITICAL:** Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below)
|
||||
|
||||
**DEPENDENCY REQUIREMENT:**
|
||||
You MUST specify dependencies using `depends_on_indices` for features that logically depend on others.
|
||||
- Features 0-9 should have NO dependencies (foundation/setup features)
|
||||
- Features 10+ MUST have at least some dependencies where logical
|
||||
- Create WIDE dependency graphs, not linear chains:
|
||||
- BAD: A -> B -> C -> D -> E (linear chain, only 1 feature can run at a time)
|
||||
- GOOD: A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, multiple features can run in parallel)
|
||||
|
||||
**Requirements for features:**
|
||||
|
||||
@@ -88,10 +120,19 @@ Use the feature_create_bulk tool with features=[
|
||||
|
||||
---
|
||||
|
||||
## FEATURE DEPENDENCIES
|
||||
## FEATURE DEPENDENCIES (MANDATORY)
|
||||
|
||||
**THIS SECTION IS MANDATORY. You MUST specify dependencies for features.**
|
||||
|
||||
Dependencies enable **parallel execution** of independent features. When you specify dependencies correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development.
|
||||
|
||||
**WARNING:** If you do not specify dependencies, ALL features will be ready immediately, which:
|
||||
1. Overwhelms the parallel agents trying to work on unrelated features
|
||||
2. Results in features being implemented in random order
|
||||
3. Causes logical issues (e.g., "Edit user" attempted before "Create user")
|
||||
|
||||
You MUST analyze each feature and specify its dependencies using `depends_on_indices`.
|
||||
|
||||
### Why Dependencies Matter
|
||||
|
||||
1. **Parallel Execution**: Features without dependencies can run in parallel
|
||||
@@ -137,35 +178,64 @@ Since feature IDs aren't assigned until after creation, use **array indices** (0
|
||||
|
||||
1. **Start with foundation features** (index 0-10): Core setup, basic navigation, authentication
|
||||
2. **Group related features together**: Keep CRUD operations adjacent
|
||||
3. **Chain complex flows**: Registration → Login → Dashboard → Settings
|
||||
3. **Chain complex flows**: Registration -> Login -> Dashboard -> Settings
|
||||
4. **Keep dependencies shallow**: Prefer 1-2 dependencies over deep chains
|
||||
5. **Skip dependencies for independent features**: Visual tests often have no dependencies
|
||||
|
||||
### Example: Todo App Feature Chain
|
||||
### Minimum Dependency Coverage
|
||||
|
||||
**REQUIREMENT:** At least 60% of your features (after index 10) should have at least one dependency.
|
||||
|
||||
Target structure for a 150-feature project:
|
||||
- Features 0-9: Foundation (0 dependencies) - App loads, basic setup
|
||||
- Features 10-149: At least 84 should have dependencies (60% of 140)
|
||||
|
||||
This ensures:
|
||||
- A good mix of parallelizable features (foundation)
|
||||
- Logical ordering for dependent features
|
||||
|
||||
### Example: Todo App Feature Chain (Wide Graph Pattern)
|
||||
|
||||
This example shows the CORRECT wide graph pattern where multiple features share the same dependency,
|
||||
enabling parallel execution:
|
||||
|
||||
```json
|
||||
[
|
||||
// Foundation (no dependencies)
|
||||
// FOUNDATION TIER (indices 0-2, no dependencies)
|
||||
// These run first and enable everything else
|
||||
{ "name": "App loads without errors", "category": "functional" },
|
||||
{ "name": "Navigation bar displays", "category": "style" },
|
||||
{ "name": "Homepage renders correctly", "category": "functional" },
|
||||
|
||||
// Auth chain
|
||||
// AUTH TIER (indices 3-5, depend on foundation)
|
||||
// These can all run in parallel once foundation passes
|
||||
{ "name": "User can register", "depends_on_indices": [0] },
|
||||
{ "name": "User can login", "depends_on_indices": [2] },
|
||||
{ "name": "User can logout", "depends_on_indices": [3] },
|
||||
{ "name": "User can login", "depends_on_indices": [0, 3] },
|
||||
{ "name": "User can logout", "depends_on_indices": [4] },
|
||||
|
||||
// Todo CRUD (depends on auth)
|
||||
{ "name": "User can create todo", "depends_on_indices": [3] },
|
||||
{ "name": "User can view todos", "depends_on_indices": [5] },
|
||||
{ "name": "User can edit todo", "depends_on_indices": [5] },
|
||||
{ "name": "User can delete todo", "depends_on_indices": [5] },
|
||||
// CORE CRUD TIER (indices 6-9, depend on auth)
|
||||
// WIDE GRAPH: All 4 of these depend on login (index 4)
|
||||
// This means all 4 can start as soon as login passes!
|
||||
{ "name": "User can create todo", "depends_on_indices": [4] },
|
||||
{ "name": "User can view todos", "depends_on_indices": [4] },
|
||||
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
|
||||
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
|
||||
|
||||
// Advanced features (multiple dependencies)
|
||||
{ "name": "User can filter todos", "depends_on_indices": [6] },
|
||||
{ "name": "User can search todos", "depends_on_indices": [6] }
|
||||
// ADVANCED TIER (indices 10-11, depend on CRUD)
|
||||
// Note: filter and search both depend on view (7), not on each other
|
||||
{ "name": "User can filter todos", "depends_on_indices": [7] },
|
||||
{ "name": "User can search todos", "depends_on_indices": [7] }
|
||||
]
|
||||
```
|
||||
|
||||
**Parallelism analysis of this example:**
|
||||
- Foundation tier: 3 features can run in parallel
|
||||
- Auth tier: 3 features wait for foundation, then can run (mostly parallel)
|
||||
- CRUD tier: 4 features can start once login passes (all 4 in parallel!)
|
||||
- Advanced tier: 2 features can run once view passes (both in parallel)
|
||||
|
||||
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY TEST CATEGORIES
|
||||
@@ -585,32 +655,16 @@ Set up the basic project structure based on what's specified in `app_spec.txt`.
|
||||
This typically includes directories for frontend, backend, and any other
|
||||
components mentioned in the spec.
|
||||
|
||||
### OPTIONAL: Start Implementation
|
||||
|
||||
If you have time remaining in this session, you may begin implementing
|
||||
the highest-priority features. Get the next feature with:
|
||||
|
||||
```
|
||||
Use the feature_get_next tool
|
||||
```
|
||||
|
||||
Remember:
|
||||
- Work on ONE feature at a time
|
||||
- Test thoroughly before marking as passing
|
||||
- Commit your progress before session ends
|
||||
|
||||
### ENDING THIS SESSION
|
||||
|
||||
Before your context fills up:
|
||||
Once you have completed the four tasks above:
|
||||
|
||||
1. Commit all work with descriptive messages
|
||||
2. Create `claude-progress.txt` with a summary of what you accomplished
|
||||
3. Verify features were created using the feature_get_stats tool
|
||||
4. Leave the environment in a clean, working state
|
||||
1. Commit all work with a descriptive message
|
||||
2. Verify features were created using the feature_get_stats tool
|
||||
3. Leave the environment in a clean, working state
|
||||
4. Exit cleanly
|
||||
|
||||
The next agent will continue from here with a fresh context window.
|
||||
|
||||
---
|
||||
|
||||
**Remember:** You have unlimited time across many sessions. Focus on
|
||||
quality over speed. Production-ready is the goal.
|
||||
**IMPORTANT:** Do NOT attempt to implement any features. Your job is setup only.
|
||||
Feature implementation will be handled by parallel coding agents that spawn after
|
||||
you complete initialization. Starting implementation here would create a bottleneck
|
||||
and defeat the purpose of the parallel architecture.
|
||||
|
||||
Reference in New Issue
Block a user