feat: decouple regression testing agents from coding agents

Major refactoring of the parallel orchestrator to run regression testing
agents independently from coding agents. This improves system reliability
and provides better control over testing behavior.

Key changes:

Database & MCP Layer:
- Add testing_in_progress and last_tested_at columns to Feature model
- Add feature_claim_for_testing() for atomic test claim with retry
- Add feature_release_testing() to release claims after testing
- Refactor claim functions to iterative loops (no recursion)
- Add OperationalError retry handling for transient DB errors
- Reduce MAX_CLAIM_RETRIES from 10 to 5

Orchestrator:
- Decouple testing agent lifecycle from coding agents
- Add _maintain_testing_agents() for continuous testing maintenance
- Fix TOCTOU race in _spawn_testing_agent() - hold lock during spawn
- Add _cleanup_stale_testing_locks() with 30-min timeout
- Fix log ordering - start_session() before stale flag cleanup
- Add stale testing_in_progress cleanup on startup

Dead Code Removal:
- Remove count_testing_in_concurrency from entire stack (12+ files)
- Remove ineffective with_for_update() from features router

API & UI:
- Pass testing_agent_ratio via CLI to orchestrator
- Update testing prompt template to use new claim/release tools
- Rename UI label to "Regression Agents" with clearer description
- Add process_utils.py for cross-platform process tree management

Testing agents now:
- Run continuously as long as passing features exist
- Can re-test features multiple times to catch regressions
- Are controlled by fixed count (0-3) via testing_agent_ratio setting
- Have atomic claiming to prevent concurrent testing of same feature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Auto
2026-01-22 15:22:48 +02:00
parent 29c6b252a9
commit 357083dbae
20 changed files with 841 additions and 382 deletions

View File

@@ -215,6 +215,62 @@ When running with `--parallel`, the orchestrator:
4. Browser contexts are isolated per agent using `--isolated` flag
5. AgentTracker parses output and emits `agent_update` messages for UI
### Process Limits (Parallel Mode)
The orchestrator enforces strict bounds on concurrent processes:
- `MAX_PARALLEL_AGENTS = 5` - Maximum concurrent coding agents
- `MAX_TOTAL_AGENTS = 10` - Hard limit on total agents (coding + testing)
- Testing agents are capped at `max_concurrency` (same as coding agents)
**Expected process count during normal operation:**
- 1 orchestrator process
- Up to 5 coding agents
- Up to 5 testing agents
- Total: never exceeds 11 Python processes
**Stress Test Verification:**
```bash
# Windows - verify process bounds
# 1. Note baseline count
tasklist | findstr python | find /c /v ""
# 2. Start parallel agent (max concurrency)
python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5
# 3. During run - should NEVER exceed baseline + 11
tasklist | findstr python | find /c /v ""
# 4. After stop via UI - should return to baseline
tasklist | findstr python | find /c /v ""
```
```bash
# macOS/Linux - verify process bounds
# 1. Note baseline count
pgrep -c python
# 2. Start parallel agent
python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5
# 3. During run - should NEVER exceed baseline + 11
pgrep -c python
# 4. After stop - should return to baseline
pgrep -c python
```
**Log Verification:**
```bash
# Check spawn vs completion balance
grep "Started testing agent" orchestrator_debug.log | wc -l
grep "Testing agent.*completed\|failed" orchestrator_debug.log | wc -l
# Watch for cap enforcement messages
grep "at max testing agents\|At max total agents" orchestrator_debug.log
```
### Design System
The UI uses a **neobrutalism** design with Tailwind CSS v4: