feat: add dedicated testing agents and enhanced parallel orchestration

Introduce a new testing agent architecture that runs regression tests independently from coding agents, improving quality assurance in parallel mode. Key changes: Testing Agent System: - Add testing_prompt.template.md for dedicated testing agent role - Add feature_mark_failing MCP tool for regression detection - Add --agent-type flag to select initializer/coding/testing mode - Remove regression testing from coding prompt (now handled by testing agents) Parallel Orchestrator Enhancements: - Add testing agent spawning with configurable ratio (--testing-agent-ratio) - Add comprehensive debug logging system (DebugLog class) - Improve database session management to prevent stale reads - Add engine.dispose() calls to refresh connections after subprocess commits - Fix f-string linting issues (remove unnecessary f-prefixes) UI Improvements: - Add testing agent mascot (Chip) to AgentAvatar - Enhance AgentCard to display testing agent status - Add testing agent ratio slider in SettingsModal - Update WebSocket handling for testing agent updates - Improve ActivityFeed to show testing agent activity API & Server Updates: - Add testing_agent_ratio to settings schema and endpoints - Update process manager to support testing agent type - Enhance WebSocket messages for agent_update events Template Changes: - Delete coding_prompt_yolo.template.md (consolidated into main prompt) - Update initializer_prompt.template.md with improved structure - Streamline coding_prompt.template.md workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-19 11:53:09 +00:00 · 2026-01-18 13:49:50 +02:00
parent 5f786078fa
commit 13128361b0
27 changed files with 1885 additions and 536 deletions
--- a/.claude/templates/initializer_prompt.template.md
+++ b/.claude/templates/initializer_prompt.template.md
@@ -26,10 +26,22 @@ which is the single source of truth for what needs to be built.

 **Creating Features:**

-Use the feature_create_bulk tool to add all features at once:
+Use the feature_create_bulk tool to add all features at once. Note: You MUST include `depends_on_indices`
+to specify dependencies. Features with no dependencies can run first and enable parallel execution.

 ```
 Use the feature_create_bulk tool with features=[
+  {
+    "category": "functional",
+    "name": "App loads without errors",
+    "description": "Application starts and renders homepage",
+    "steps": [
+      "Step 1: Navigate to homepage",
+      "Step 2: Verify no console errors",
+      "Step 3: Verify main content renders"
+    ]
+    // No depends_on_indices = FOUNDATION feature (runs first)
+  },
  {
    "category": "functional",
    "name": "User can create an account",
@@ -38,7 +50,8 @@ Use the feature_create_bulk tool with features=[
      "Step 1: Navigate to registration page",
      "Step 2: Fill in required fields",
      "Step 3: Submit form and verify account created"
-    ]
+    ],
+    "depends_on_indices": [0]  // Depends on app loading
  },
  {
    "category": "functional",
@@ -49,7 +62,7 @@ Use the feature_create_bulk tool with features=[
      "Step 2: Enter credentials",
      "Step 3: Verify successful login and redirect"
    ],
-    "depends_on_indices": [0]
+    "depends_on_indices": [0, 1]  // Depends on app loading AND registration
  },
  {
    "category": "functional",
@@ -60,7 +73,18 @@ Use the feature_create_bulk tool with features=[
      "Step 2: Navigate to dashboard",
      "Step 3: Verify personalized content displays"
    ],
-    "depends_on_indices": [1]
+    "depends_on_indices": [2]  // Depends on login only
+  },
+  {
+    "category": "functional",
+    "name": "User can update profile",
+    "description": "User can modify their profile information",
+    "steps": [
+      "Step 1: Log in as user",
+      "Step 2: Navigate to profile settings",
+      "Step 3: Update and save profile"
+    ],
+    "depends_on_indices": [2]  // ALSO depends on login (WIDE GRAPH - can run parallel with dashboard!)
  }
 ]
 ```
@@ -69,7 +93,15 @@ Use the feature_create_bulk tool with features=[
 - IDs and priorities are assigned automatically based on order
 - All features start with `passes: false` by default
 - You can create features in batches if there are many (e.g., 50 at a time)
- Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below)
+- **CRITICAL:** Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below)
+
+**DEPENDENCY REQUIREMENT:**
+You MUST specify dependencies using `depends_on_indices` for features that logically depend on others.
+- Features 0-9 should have NO dependencies (foundation/setup features)
+- Features 10+ MUST have at least some dependencies where logical
+- Create WIDE dependency graphs, not linear chains:
+  - BAD:  A -> B -> C -> D -> E (linear chain, only 1 feature can run at a time)
+  - GOOD: A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, multiple features can run in parallel)

 **Requirements for features:**

@@ -88,10 +120,19 @@ Use the feature_create_bulk tool with features=[

 ---

-## FEATURE DEPENDENCIES
+## FEATURE DEPENDENCIES (MANDATORY)
+
+**THIS SECTION IS MANDATORY. You MUST specify dependencies for features.**

 Dependencies enable **parallel execution** of independent features. When you specify dependencies correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development.

+**WARNING:** If you do not specify dependencies, ALL features will be ready immediately, which:
+1. Overwhelms the parallel agents trying to work on unrelated features
+2. Results in features being implemented in random order
+3. Causes logical issues (e.g., "Edit user" attempted before "Create user")
+
+You MUST analyze each feature and specify its dependencies using `depends_on_indices`.
+
 ### Why Dependencies Matter

 1. **Parallel Execution**: Features without dependencies can run in parallel
@@ -137,35 +178,64 @@ Since feature IDs aren't assigned until after creation, use **array indices** (0

 1. **Start with foundation features** (index 0-10): Core setup, basic navigation, authentication
 2. **Group related features together**: Keep CRUD operations adjacent
-3. **Chain complex flows**: Registration → Login → Dashboard → Settings
+3. **Chain complex flows**: Registration -> Login -> Dashboard -> Settings
 4. **Keep dependencies shallow**: Prefer 1-2 dependencies over deep chains
 5. **Skip dependencies for independent features**: Visual tests often have no dependencies

-### Example: Todo App Feature Chain
+### Minimum Dependency Coverage
+
+**REQUIREMENT:** At least 60% of your features (after index 10) should have at least one dependency.
+
+Target structure for a 150-feature project:
+- Features 0-9: Foundation (0 dependencies) - App loads, basic setup
+- Features 10-149: At least 84 should have dependencies (60% of 140)
+
+This ensures:
+- A good mix of parallelizable features (foundation)
+- Logical ordering for dependent features
+
+### Example: Todo App Feature Chain (Wide Graph Pattern)
+
+This example shows the CORRECT wide graph pattern where multiple features share the same dependency,
+enabling parallel execution:

 ```json
 [
-  // Foundation (no dependencies)
+  // FOUNDATION TIER (indices 0-2, no dependencies)
+  // These run first and enable everything else
  { "name": "App loads without errors", "category": "functional" },
  { "name": "Navigation bar displays", "category": "style" },
+  { "name": "Homepage renders correctly", "category": "functional" },

-  // Auth chain
+  // AUTH TIER (indices 3-5, depend on foundation)
+  // These can all run in parallel once foundation passes
  { "name": "User can register", "depends_on_indices": [0] },
-  { "name": "User can login", "depends_on_indices": [2] },
-  { "name": "User can logout", "depends_on_indices": [3] },
+  { "name": "User can login", "depends_on_indices": [0, 3] },
+  { "name": "User can logout", "depends_on_indices": [4] },

-  // Todo CRUD (depends on auth)
-  { "name": "User can create todo", "depends_on_indices": [3] },
-  { "name": "User can view todos", "depends_on_indices": [5] },
-  { "name": "User can edit todo", "depends_on_indices": [5] },
-  { "name": "User can delete todo", "depends_on_indices": [5] },
+  // CORE CRUD TIER (indices 6-9, depend on auth)
+  // WIDE GRAPH: All 4 of these depend on login (index 4)
+  // This means all 4 can start as soon as login passes!
+  { "name": "User can create todo", "depends_on_indices": [4] },
+  { "name": "User can view todos", "depends_on_indices": [4] },
+  { "name": "User can edit todo", "depends_on_indices": [4, 6] },
+  { "name": "User can delete todo", "depends_on_indices": [4, 6] },

-  // Advanced features (multiple dependencies)
-  { "name": "User can filter todos", "depends_on_indices": [6] },
-  { "name": "User can search todos", "depends_on_indices": [6] }
+  // ADVANCED TIER (indices 10-11, depend on CRUD)
+  // Note: filter and search both depend on view (7), not on each other
+  { "name": "User can filter todos", "depends_on_indices": [7] },
+  { "name": "User can search todos", "depends_on_indices": [7] }
 ]
 ```

+**Parallelism analysis of this example:**
+- Foundation tier: 3 features can run in parallel
+- Auth tier: 3 features wait for foundation, then can run (mostly parallel)
+- CRUD tier: 4 features can start once login passes (all 4 in parallel!)
+- Advanced tier: 2 features can run once view passes (both in parallel)
+
+**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
+
 ---

 ## MANDATORY TEST CATEGORIES
@@ -585,32 +655,16 @@ Set up the basic project structure based on what's specified in `app_spec.txt`.
 This typically includes directories for frontend, backend, and any other
 components mentioned in the spec.

-### OPTIONAL: Start Implementation
-
-If you have time remaining in this session, you may begin implementing
-the highest-priority features. Get the next feature with:
-
-```
-Use the feature_get_next tool
-```
-
-Remember:
- Work on ONE feature at a time
- Test thoroughly before marking as passing
- Commit your progress before session ends
-
 ### ENDING THIS SESSION

-Before your context fills up:
+Once you have completed the four tasks above:

-1. Commit all work with descriptive messages
-2. Create `claude-progress.txt` with a summary of what you accomplished
-3. Verify features were created using the feature_get_stats tool
-4. Leave the environment in a clean, working state
+1. Commit all work with a descriptive message
+2. Verify features were created using the feature_get_stats tool
+3. Leave the environment in a clean, working state
+4. Exit cleanly

-The next agent will continue from here with a fresh context window.
-
---
-
-**Remember:** You have unlimited time across many sessions. Focus on
-quality over speed. Production-ready is the goal.
+**IMPORTANT:** Do NOT attempt to implement any features. Your job is setup only.
+Feature implementation will be handled by parallel coding agents that spawn after
+you complete initialization. Starting implementation here would create a bottleneck
+and defeat the purpose of the parallel architecture.