feat: add dedicated testing agents and enhanced parallel orchestration

Introduce a new testing agent architecture that runs regression tests
independently from coding agents, improving quality assurance in
parallel mode.

Key changes:

Testing Agent System:
- Add testing_prompt.template.md for dedicated testing agent role
- Add feature_mark_failing MCP tool for regression detection
- Add --agent-type flag to select initializer/coding/testing mode
- Remove regression testing from coding prompt (now handled by testing agents)

Parallel Orchestrator Enhancements:
- Add testing agent spawning with configurable ratio (--testing-agent-ratio)
- Add comprehensive debug logging system (DebugLog class)
- Improve database session management to prevent stale reads
- Add engine.dispose() calls to refresh connections after subprocess commits
- Fix f-string linting issues (remove unnecessary f-prefixes)

UI Improvements:
- Add testing agent mascot (Chip) to AgentAvatar
- Enhance AgentCard to display testing agent status
- Add testing agent ratio slider in SettingsModal
- Update WebSocket handling for testing agent updates
- Improve ActivityFeed to show testing agent activity

API & Server Updates:
- Add testing_agent_ratio to settings schema and endpoints
- Update process manager to support testing agent type
- Enhance WebSocket messages for agent_update events

Template Changes:
- Delete coding_prompt_yolo.template.md (consolidated into main prompt)
- Update initializer_prompt.template.md with improved structure
- Streamline coding_prompt.template.md workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Auto
2026-01-18 13:49:50 +02:00
parent 5f786078fa
commit 13128361b0
27 changed files with 1885 additions and 536 deletions

View File

@@ -11,6 +11,7 @@ Tools:
- feature_get_next: Get next feature to implement
- feature_get_for_regression: Get random passing features for testing
- feature_mark_passing: Mark a feature as passing
- feature_mark_failing: Mark a feature as failing (regression detected)
- feature_skip: Skip a feature (move to end of queue)
- feature_mark_in_progress: Mark a feature as in-progress
- feature_clear_in_progress: Clear in-progress status
@@ -358,7 +359,8 @@ def feature_get_for_regression(
) -> str:
"""Get random passing features for regression testing.
Returns a random selection of features that are currently passing.
Returns a random selection of features that are currently passing
and NOT currently in progress (to avoid conflicts with coding agents).
Use this to verify that previously implemented features still work
after making changes.
@@ -373,6 +375,7 @@ def feature_get_for_regression(
features = (
session.query(Feature)
.filter(Feature.passes == True)
.filter(Feature.in_progress == False) # Avoid conflicts with coding agents
.order_by(func.random())
.limit(limit)
.all()
@@ -418,6 +421,48 @@ def feature_mark_passing(
session.close()
@mcp.tool()
def feature_mark_failing(
feature_id: Annotated[int, Field(description="The ID of the feature to mark as failing", ge=1)]
) -> str:
"""Mark a feature as failing after finding a regression.
Updates the feature's passes field to false and clears the in_progress flag.
Use this when a testing agent discovers that a previously-passing feature
no longer works correctly (regression detected).
After marking as failing, you should:
1. Investigate the root cause
2. Fix the regression
3. Verify the fix
4. Call feature_mark_passing once fixed
Args:
feature_id: The ID of the feature to mark as failing
Returns:
JSON with the updated feature details, or error if not found.
"""
session = get_session()
try:
feature = session.query(Feature).filter(Feature.id == feature_id).first()
if feature is None:
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
feature.passes = False
feature.in_progress = False
session.commit()
session.refresh(feature)
return json.dumps({
"message": f"Feature #{feature_id} marked as failing - regression detected",
"feature": feature.to_dict()
}, indent=2)
finally:
session.close()
@mcp.tool()
def feature_skip(
feature_id: Annotated[int, Field(description="The ID of the feature to skip", ge=1)]