mirror of
https://github.com/leonvanzyl/autocoder.git
synced 2026-02-02 15:23:37 +00:00
refactor: optimize token usage, deduplicate code, fix bugs across agents
Token reduction (~40% per session, ~2.3M fewer tokens per 200-feature project): - Agent-type-specific tool lists: coding 9, testing 5, init 5 (was 19 for all) - Right-sized max_turns: coding 300, testing 100 (was 1000 for all) - Trimmed coding prompt template (~150 lines removed) - Streamlined testing prompt with batch support - YOLO mode now strips browser testing instructions from prompt - Added Grep, WebFetch, WebSearch to expand project session Performance improvements: - Rate limit retries start at ~15s with jitter (was fixed 60s) - Post-spawn delay reduced to 0.5s (was 2s) - Orchestrator consolidated to 1 DB query per loop (was 5-7) - Testing agents batch 3 features per session (was 1) - Smart context compaction preserves critical state, discards noise Bug fixes: - Removed ghost feature_release_testing MCP tool (wasted tokens every test session) - Forward all 9 Vertex AI env vars to chat sessions (was missing 3) - Fix DetachedInstanceError risk in test batch ORM access - Prevent duplicate testing of same features in parallel mode Code deduplication: - _get_project_path(): 9 copies -> 1 shared utility (project_helpers.py) - validate_project_name(): 9 copies -> 2 variants in 1 file (validation.py) - ROOT_DIR: 10 copies -> 1 definition (chat_constants.py) - API_ENV_VARS: 4 copies -> 1 source of truth (env_constants.py) Security hardening: - Unified sensitive directory blocklist (14 dirs, was two divergent lists) - Cached get_blocked_paths() for O(1) directory listing checks - Terminal security warning when ALLOW_REMOTE=1 exposes WebSocket - 20 new security tests for EXTRA_READ_PATHS blocking - Extracted _validate_command_list() and _validate_pkill_processes() helpers Type safety: - 87 mypy errors -> 0 across 58 source files - Installed types-PyYAML for proper yaml stub types - Fixed SQLAlchemy Column[T] coercions across all routers Dead code removed: - 13 files deleted (~2,679 lines): unused UI components, debug logs, outdated docs - 7 unused npm packages removed (Radix UI components with 0 imports) - AgentAvatar.tsx reduced from 615 -> 119 lines (SVGs extracted to mascotData.tsx) New CLI options: - --testing-batch-size (1-5) for parallel mode test batching - --testing-feature-ids for direct multi-feature testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,58 +1,29 @@
|
||||
## YOUR ROLE - TESTING AGENT
|
||||
|
||||
You are a **testing agent** responsible for **regression testing** previously-passing features.
|
||||
You are a **testing agent** responsible for **regression testing** previously-passing features. If you find a regression, you must fix it.
|
||||
|
||||
Your job is to ensure that features marked as "passing" still work correctly. If you find a regression (a feature that no longer works), you must fix it.
|
||||
## ASSIGNED FEATURES FOR REGRESSION TESTING
|
||||
|
||||
### STEP 1: GET YOUR BEARINGS (MANDATORY)
|
||||
You are assigned to test the following features: {{TESTING_FEATURE_IDS}}
|
||||
|
||||
Start by orienting yourself:
|
||||
### Workflow for EACH feature:
|
||||
1. Call `feature_get_by_id` with the feature ID
|
||||
2. Read the feature's verification steps
|
||||
3. Test the feature in the browser
|
||||
4. Call `feature_mark_passing` or `feature_mark_failing`
|
||||
5. Move to the next feature
|
||||
|
||||
```bash
|
||||
# 1. See your working directory
|
||||
pwd
|
||||
---
|
||||
|
||||
# 2. List files to understand project structure
|
||||
ls -la
|
||||
### STEP 1: GET YOUR ASSIGNED FEATURE(S)
|
||||
|
||||
# 3. Read progress notes from previous sessions (last 200 lines)
|
||||
tail -200 claude-progress.txt
|
||||
|
||||
# 4. Check recent git history
|
||||
git log --oneline -10
|
||||
```
|
||||
|
||||
Then use MCP tools to check feature status:
|
||||
Your features have been pre-assigned by the orchestrator. For each feature ID listed above, use `feature_get_by_id` to get the details:
|
||||
|
||||
```
|
||||
# 5. Get progress statistics
|
||||
Use the feature_get_stats tool
|
||||
Use the feature_get_by_id tool with feature_id=<ID>
|
||||
```
|
||||
|
||||
### STEP 2: START SERVERS (IF NOT RUNNING)
|
||||
|
||||
If `init.sh` exists, run it:
|
||||
|
||||
```bash
|
||||
chmod +x init.sh
|
||||
./init.sh
|
||||
```
|
||||
|
||||
Otherwise, start servers manually.
|
||||
|
||||
### STEP 3: GET YOUR ASSIGNED FEATURE
|
||||
|
||||
Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` to get the details:
|
||||
|
||||
```
|
||||
Use the feature_get_by_id tool with feature_id={your_assigned_id}
|
||||
```
|
||||
|
||||
The orchestrator has already claimed this feature for testing (set `testing_in_progress=true`).
|
||||
|
||||
**CRITICAL:** You MUST call `feature_release_testing` when done, regardless of pass/fail.
|
||||
|
||||
### STEP 4: VERIFY THE FEATURE
|
||||
### STEP 2: VERIFY THE FEATURE
|
||||
|
||||
**CRITICAL:** You MUST verify the feature through the actual UI using browser automation.
|
||||
|
||||
@@ -81,21 +52,11 @@ Use browser automation tools:
|
||||
- browser_console_messages - Get browser console output (check for errors)
|
||||
- browser_network_requests - Monitor API calls
|
||||
|
||||
### STEP 5: HANDLE RESULTS
|
||||
### STEP 3: HANDLE RESULTS
|
||||
|
||||
#### If the feature PASSES:
|
||||
|
||||
The feature still works correctly. Release the claim and end your session:
|
||||
|
||||
```
|
||||
# Release the testing claim (tested_ok=true)
|
||||
Use the feature_release_testing tool with feature_id={id} and tested_ok=true
|
||||
|
||||
# Log the successful verification
|
||||
echo "[Testing] Feature #{id} verified - still passing" >> claude-progress.txt
|
||||
```
|
||||
|
||||
**DO NOT** call feature_mark_passing again - it's already passing.
|
||||
The feature still works correctly. **DO NOT** call feature_mark_passing again -- it's already passing. End your session.
|
||||
|
||||
#### If the feature FAILS (regression found):
|
||||
|
||||
@@ -125,13 +86,7 @@ A regression has been introduced. You MUST fix it:
|
||||
Use the feature_mark_passing tool with feature_id={id}
|
||||
```
|
||||
|
||||
6. **Release the testing claim:**
|
||||
```
|
||||
Use the feature_release_testing tool with feature_id={id} and tested_ok=false
|
||||
```
|
||||
Note: tested_ok=false because we found a regression (even though we fixed it).
|
||||
|
||||
7. **Commit the fix:**
|
||||
6. **Commit the fix:**
|
||||
```bash
|
||||
git add .
|
||||
git commit -m "Fix regression in [feature name]
|
||||
@@ -141,14 +96,6 @@ A regression has been introduced. You MUST fix it:
|
||||
- Verified with browser automation"
|
||||
```
|
||||
|
||||
### STEP 6: UPDATE PROGRESS AND END
|
||||
|
||||
Update `claude-progress.txt`:
|
||||
|
||||
```bash
|
||||
echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progress.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AVAILABLE MCP TOOLS
|
||||
@@ -156,12 +103,11 @@ echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progr
|
||||
### Feature Management
|
||||
- `feature_get_stats` - Get progress overview (passing/in_progress/total counts)
|
||||
- `feature_get_by_id` - Get your assigned feature details
|
||||
- `feature_release_testing` - **REQUIRED** - Release claim after testing (pass tested_ok=true/false)
|
||||
- `feature_mark_failing` - Mark a feature as failing (when you find a regression)
|
||||
- `feature_mark_passing` - Mark a feature as passing (after fixing a regression)
|
||||
|
||||
### Browser Automation (Playwright)
|
||||
All interaction tools have **built-in auto-wait** - no manual timeouts needed.
|
||||
All interaction tools have **built-in auto-wait** -- no manual timeouts needed.
|
||||
|
||||
- `browser_navigate` - Navigate to URL
|
||||
- `browser_take_screenshot` - Capture screenshot
|
||||
@@ -178,9 +124,7 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
|
||||
|
||||
## IMPORTANT REMINDERS
|
||||
|
||||
**Your Goal:** Verify that passing features still work, and fix any regressions found.
|
||||
|
||||
**This Session's Goal:** Test ONE feature thoroughly.
|
||||
**Your Goal:** Test each assigned feature thoroughly. Verify it still works, and fix any regression found. Process ALL features in your list before ending your session.
|
||||
|
||||
**Quality Bar:**
|
||||
- Zero console errors
|
||||
@@ -188,21 +132,15 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
|
||||
- Visual appearance correct
|
||||
- API calls succeed
|
||||
|
||||
**CRITICAL - Always release your claim:**
|
||||
- Call `feature_release_testing` when done, whether pass or fail
|
||||
- Pass `tested_ok=true` if the feature passed
|
||||
- Pass `tested_ok=false` if you found a regression
|
||||
|
||||
**If you find a regression:**
|
||||
1. Mark the feature as failing immediately
|
||||
2. Fix the issue
|
||||
3. Verify the fix with browser automation
|
||||
4. Mark as passing only after thorough verification
|
||||
5. Release the testing claim with `tested_ok=false`
|
||||
6. Commit the fix
|
||||
5. Commit the fix
|
||||
|
||||
**You have one iteration.** Focus on testing ONE feature thoroughly.
|
||||
**You have one iteration.** Test all assigned features before ending.
|
||||
|
||||
---
|
||||
|
||||
Begin by running Step 1 (Get Your Bearings).
|
||||
Begin by running Step 1 for the first feature in your assigned list.
|
||||
|
||||
Reference in New Issue
Block a user