mirror of
https://github.com/leonvanzyl/autocoder.git
synced 2026-01-30 06:12:06 +00:00
feat: add dedicated testing agents and enhanced parallel orchestration
Introduce a new testing agent architecture that runs regression tests independently from coding agents, improving quality assurance in parallel mode. Key changes: Testing Agent System: - Add testing_prompt.template.md for dedicated testing agent role - Add feature_mark_failing MCP tool for regression detection - Add --agent-type flag to select initializer/coding/testing mode - Remove regression testing from coding prompt (now handled by testing agents) Parallel Orchestrator Enhancements: - Add testing agent spawning with configurable ratio (--testing-agent-ratio) - Add comprehensive debug logging system (DebugLog class) - Improve database session management to prevent stale reads - Add engine.dispose() calls to refresh connections after subprocess commits - Fix f-string linting issues (remove unnecessary f-prefixes) UI Improvements: - Add testing agent mascot (Chip) to AgentAvatar - Enhance AgentCard to display testing agent status - Add testing agent ratio slider in SettingsModal - Update WebSocket handling for testing agent updates - Improve ActivityFeed to show testing agent activity API & Server Updates: - Add testing_agent_ratio to settings schema and endpoints - Update process manager to support testing agent type - Enhance WebSocket messages for agent_update events Template Changes: - Delete coding_prompt_yolo.template.md (consolidated into main prompt) - Update initializer_prompt.template.md with improved structure - Streamline coding_prompt.template.md workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -48,38 +48,7 @@ chmod +x init.sh
|
||||
|
||||
Otherwise, start servers manually and document the process.
|
||||
|
||||
### STEP 3: VERIFICATION TEST (CRITICAL!)
|
||||
|
||||
**MANDATORY BEFORE NEW WORK:**
|
||||
|
||||
The previous session may have introduced bugs. Before implementing anything
|
||||
new, you MUST run verification tests.
|
||||
|
||||
Run 1-2 of the features marked as passing that are most core to the app's functionality to verify they still work.
|
||||
|
||||
To get passing features for regression testing:
|
||||
|
||||
```
|
||||
Use the feature_get_for_regression tool (returns up to 3 random passing features)
|
||||
```
|
||||
|
||||
For example, if this were a chat app, you should perform a test that logs into the app, sends a message, and gets a response.
|
||||
|
||||
**If you find ANY issues (functional or visual):**
|
||||
|
||||
- Mark that feature as "passes": false immediately
|
||||
- Add issues to a list
|
||||
- Fix all issues BEFORE moving to new features
|
||||
- This includes UI bugs like:
|
||||
- White-on-white text or poor contrast
|
||||
- Random characters displayed
|
||||
- Incorrect timestamps
|
||||
- Layout issues or overflow
|
||||
- Buttons too close together
|
||||
- Missing hover states
|
||||
- Console errors
|
||||
|
||||
### STEP 4: CHOOSE ONE FEATURE TO IMPLEMENT
|
||||
### STEP 3: CHOOSE ONE FEATURE TO IMPLEMENT
|
||||
|
||||
#### TEST-DRIVEN DEVELOPMENT MINDSET (CRITICAL)
|
||||
|
||||
@@ -140,16 +109,16 @@ Use the feature_skip tool with feature_id={id}
|
||||
|
||||
Document the SPECIFIC external blocker in `claude-progress.txt`. "Functionality not built" is NEVER a valid reason.
|
||||
|
||||
### STEP 5: IMPLEMENT THE FEATURE
|
||||
### STEP 4: IMPLEMENT THE FEATURE
|
||||
|
||||
Implement the chosen feature thoroughly:
|
||||
|
||||
1. Write the code (frontend and/or backend as needed)
|
||||
2. Test manually using browser automation (see Step 6)
|
||||
2. Test manually using browser automation (see Step 5)
|
||||
3. Fix any issues discovered
|
||||
4. Verify the feature works end-to-end
|
||||
|
||||
### STEP 6: VERIFY WITH BROWSER AUTOMATION
|
||||
### STEP 5: VERIFY WITH BROWSER AUTOMATION
|
||||
|
||||
**CRITICAL:** You MUST verify features through the actual UI.
|
||||
|
||||
@@ -174,7 +143,7 @@ Use browser automation tools:
|
||||
- Skip visual verification
|
||||
- Mark tests passing without thorough verification
|
||||
|
||||
### STEP 6.5: MANDATORY VERIFICATION CHECKLIST (BEFORE MARKING ANY TEST PASSING)
|
||||
### STEP 5.5: MANDATORY VERIFICATION CHECKLIST (BEFORE MARKING ANY TEST PASSING)
|
||||
|
||||
**You MUST complete ALL of these checks before marking any feature as "passes": true**
|
||||
|
||||
@@ -209,7 +178,7 @@ Use browser automation tools:
|
||||
- [ ] Loading states appeared during API calls
|
||||
- [ ] Error states handle failures gracefully
|
||||
|
||||
### STEP 6.6: MOCK DATA DETECTION SWEEP
|
||||
### STEP 5.6: MOCK DATA DETECTION SWEEP
|
||||
|
||||
**Run this sweep AFTER EVERY FEATURE before marking it as passing:**
|
||||
|
||||
@@ -252,7 +221,7 @@ For API endpoints used by this feature:
|
||||
- Verify response contains actual database data
|
||||
- Empty database = empty response (not pre-populated mock data)
|
||||
|
||||
### STEP 7: UPDATE FEATURE STATUS (CAREFULLY!)
|
||||
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
|
||||
|
||||
**YOU CAN ONLY MODIFY ONE FIELD: "passes"**
|
||||
|
||||
@@ -273,7 +242,7 @@ Use the feature_mark_passing tool with feature_id=42
|
||||
|
||||
**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**
|
||||
|
||||
### STEP 8: COMMIT YOUR PROGRESS
|
||||
### STEP 7: COMMIT YOUR PROGRESS
|
||||
|
||||
Make a descriptive git commit:
|
||||
|
||||
@@ -288,7 +257,7 @@ git commit -m "Implement [feature name] - verified end-to-end
|
||||
"
|
||||
```
|
||||
|
||||
### STEP 9: UPDATE PROGRESS NOTES
|
||||
### STEP 8: UPDATE PROGRESS NOTES
|
||||
|
||||
Update `claude-progress.txt` with:
|
||||
|
||||
@@ -298,7 +267,7 @@ Update `claude-progress.txt` with:
|
||||
- What should be worked on next
|
||||
- Current completion status (e.g., "45/200 tests passing")
|
||||
|
||||
### STEP 10: END SESSION CLEANLY
|
||||
### STEP 9: END SESSION CLEANLY
|
||||
|
||||
Before context fills up:
|
||||
|
||||
@@ -374,12 +343,12 @@ feature_get_next
|
||||
# 3. Mark a feature as in-progress (call immediately after feature_get_next)
|
||||
feature_mark_in_progress with feature_id={id}
|
||||
|
||||
# 4. Get up to 3 random passing features for regression testing
|
||||
feature_get_for_regression
|
||||
|
||||
# 5. Mark a feature as passing (after verification)
|
||||
# 4. Mark a feature as passing (after verification)
|
||||
feature_mark_passing with feature_id={id}
|
||||
|
||||
# 5. Mark a feature as failing (if you discover it's broken)
|
||||
feature_mark_failing with feature_id={id}
|
||||
|
||||
# 6. Skip a feature (moves to end of queue) - ONLY when blocked by dependency
|
||||
feature_skip with feature_id={id}
|
||||
|
||||
@@ -436,7 +405,7 @@ This allows you to fully test email-dependent flows without needing external ema
|
||||
- **All navigation works - no 404s or broken links**
|
||||
|
||||
**You have unlimited time.** Take as long as needed to get it right. The most important thing is that you
|
||||
leave the code base in a clean state before terminating the session (Step 10).
|
||||
leave the code base in a clean state before terminating the session (Step 9).
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user