feat: decouple regression testing agents from coding agents

Major refactoring of the parallel orchestrator to run regression testing agents independently from coding agents. This improves system reliability and provides better control over testing behavior. Key changes: Database & MCP Layer: - Add testing_in_progress and last_tested_at columns to Feature model - Add feature_claim_for_testing() for atomic test claim with retry - Add feature_release_testing() to release claims after testing - Refactor claim functions to iterative loops (no recursion) - Add OperationalError retry handling for transient DB errors - Reduce MAX_CLAIM_RETRIES from 10 to 5 Orchestrator: - Decouple testing agent lifecycle from coding agents - Add _maintain_testing_agents() for continuous testing maintenance - Fix TOCTOU race in _spawn_testing_agent() - hold lock during spawn - Add _cleanup_stale_testing_locks() with 30-min timeout - Fix log ordering - start_session() before stale flag cleanup - Add stale testing_in_progress cleanup on startup Dead Code Removal: - Remove count_testing_in_concurrency from entire stack (12+ files) - Remove ineffective with_for_update() from features router API & UI: - Pass testing_agent_ratio via CLI to orchestrator - Update testing prompt template to use new claim/release tools - Rename UI label to "Regression Agents" with clearer description - Add process_utils.py for cross-platform process tree management Testing agents now: - Run continuously as long as passing features exist - Can re-test features multiple times to catch regressions - Are controlled by fixed count (0-3) via testing_agent_ratio setting - Have atomic claiming to prevent concurrent testing of same feature Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-17 19:03:09 +00:00 · 2026-01-22 15:22:48 +02:00
parent 29c6b252a9
commit 357083dbae
20 changed files with 841 additions and 382 deletions
--- a/.claude/templates/testing_prompt.template.md
+++ b/.claude/templates/testing_prompt.template.md
@@ -40,15 +40,19 @@ chmod +x init.sh

 Otherwise, start servers manually.

-### STEP 3: GET A FEATURE TO TEST
+### STEP 3: CLAIM A FEATURE TO TEST

-Request ONE passing feature for regression testing:
+Atomically claim ONE passing feature for regression testing:

 ```
-Use the feature_get_for_regression tool with limit=1
+Use the feature_claim_for_testing tool
 ```

-This returns a random feature that is currently marked as passing. Your job is to verify it still works.
+This atomically claims a random passing feature that:
+- Is not being worked on by coding agents
+- Is not already being tested by another testing agent
+
+**CRITICAL:** You MUST call `feature_release_testing` when done, regardless of pass/fail.

 ### STEP 4: VERIFY THE FEATURE

@@ -83,9 +87,12 @@ Use browser automation tools:

 #### If the feature PASSES:

-The feature still works correctly. Simply confirm this and end your session:
+The feature still works correctly. Release the claim and end your session:

 ```
+# Release the testing claim (tested_ok=true)
+Use the feature_release_testing tool with feature_id={id} and tested_ok=true
+
 # Log the successful verification
 echo "[Testing] Feature #{id} verified - still passing" >> claude-progress.txt
 ```
@@ -120,7 +127,13 @@ A regression has been introduced. You MUST fix it:
   Use the feature_mark_passing tool with feature_id={id}
   ```

-6. **Commit the fix:**
+6. **Release the testing claim:**
+   ```
+   Use the feature_release_testing tool with feature_id={id} and tested_ok=false
+   ```
+   Note: tested_ok=false because we found a regression (even though we fixed it).
+
+7. **Commit the fix:**
   ```bash
   git add .
   git commit -m "Fix regression in [feature name]
@@ -144,7 +157,9 @@ echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progr

 ### Feature Management
 - `feature_get_stats` - Get progress overview (passing/in_progress/total counts)
- `feature_get_for_regression` - Get a random passing feature to test
+- `feature_claim_for_testing` - **USE THIS** - Atomically claim a feature for testing
+- `feature_release_testing` - **REQUIRED** - Release claim after testing (pass tested_ok=true/false)
+- `feature_get_for_regression` - (Legacy) Get random passing features without claiming
 - `feature_mark_failing` - Mark a feature as failing (when you find a regression)
 - `feature_mark_passing` - Mark a feature as passing (after fixing a regression)

@@ -176,12 +191,18 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
 - Visual appearance correct
 - API calls succeed

+**CRITICAL - Always release your claim:**
+- Call `feature_release_testing` when done, whether pass or fail
+- Pass `tested_ok=true` if the feature passed
+- Pass `tested_ok=false` if you found a regression
+
 **If you find a regression:**
 1. Mark the feature as failing immediately
 2. Fix the issue
 3. Verify the fix with browser automation
 4. Mark as passing only after thorough verification
-5. Commit the fix
+5. Release the testing claim with `tested_ok=false`
+6. Commit the fix

 **You have one iteration.** Focus on testing ONE feature thoroughly.