mirror of
https://github.com/leonvanzyl/autocoder.git
synced 2026-02-02 07:23:35 +00:00
refactor: optimize token usage, deduplicate code, fix bugs across agents
Token reduction (~40% per session, ~2.3M fewer tokens per 200-feature project): - Agent-type-specific tool lists: coding 9, testing 5, init 5 (was 19 for all) - Right-sized max_turns: coding 300, testing 100 (was 1000 for all) - Trimmed coding prompt template (~150 lines removed) - Streamlined testing prompt with batch support - YOLO mode now strips browser testing instructions from prompt - Added Grep, WebFetch, WebSearch to expand project session Performance improvements: - Rate limit retries start at ~15s with jitter (was fixed 60s) - Post-spawn delay reduced to 0.5s (was 2s) - Orchestrator consolidated to 1 DB query per loop (was 5-7) - Testing agents batch 3 features per session (was 1) - Smart context compaction preserves critical state, discards noise Bug fixes: - Removed ghost feature_release_testing MCP tool (wasted tokens every test session) - Forward all 9 Vertex AI env vars to chat sessions (was missing 3) - Fix DetachedInstanceError risk in test batch ORM access - Prevent duplicate testing of same features in parallel mode Code deduplication: - _get_project_path(): 9 copies -> 1 shared utility (project_helpers.py) - validate_project_name(): 9 copies -> 2 variants in 1 file (validation.py) - ROOT_DIR: 10 copies -> 1 definition (chat_constants.py) - API_ENV_VARS: 4 copies -> 1 source of truth (env_constants.py) Security hardening: - Unified sensitive directory blocklist (14 dirs, was two divergent lists) - Cached get_blocked_paths() for O(1) directory listing checks - Terminal security warning when ALLOW_REMOTE=1 exposes WebSocket - 20 new security tests for EXTRA_READ_PATHS blocking - Extracted _validate_command_list() and _validate_pkill_processes() helpers Type safety: - 87 mypy errors -> 0 across 58 source files - Installed types-PyYAML for proper yaml stub types - Fixed SQLAlchemy Column[T] coercions across all routers Dead code removed: - 13 files deleted (~2,679 lines): unused UI components, debug logs, outdated docs - 7 unused npm packages removed (Radix UI components with 0 imports) - AgentAvatar.tsx reduced from 615 -> 119 lines (SVGs extracted to mascotData.tsx) New CLI options: - --testing-batch-size (1-5) for parallel mode test batching - --testing-feature-ids for direct multi-feature testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -49,51 +49,21 @@ Otherwise, start servers manually and document the process.
|
||||
|
||||
#### TEST-DRIVEN DEVELOPMENT MINDSET (CRITICAL)
|
||||
|
||||
Features are **test cases** that drive development. This is test-driven development:
|
||||
Features are **test cases** that drive development. If functionality doesn't exist, **BUILD IT** -- you are responsible for implementing ALL required functionality. Missing pages, endpoints, database tables, or components are NOT blockers; they are your job to create.
|
||||
|
||||
- **If you can't test a feature because functionality doesn't exist → BUILD IT**
|
||||
- You are responsible for implementing ALL required functionality
|
||||
- Never assume another process will build it later
|
||||
- "Missing functionality" is NOT a blocker - it's your job to create it
|
||||
|
||||
**Example:** Feature says "User can filter flashcards by difficulty level"
|
||||
- WRONG: "Flashcard page doesn't exist yet" → skip feature
|
||||
- RIGHT: "Flashcard page doesn't exist yet" → build flashcard page → implement filter → test feature
|
||||
|
||||
**Note:** Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` with your assigned feature ID to get the details.
|
||||
|
||||
Once you've retrieved the feature, **mark it as in-progress** (if not already):
|
||||
**Note:** Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` with your assigned feature ID to get the details. Then mark it as in-progress:
|
||||
|
||||
```
|
||||
# Mark feature as in-progress
|
||||
Use the feature_mark_in_progress tool with feature_id={your_assigned_id}
|
||||
```
|
||||
|
||||
If you get "already in-progress" error, that's OK - continue with implementation.
|
||||
|
||||
Focus on completing one feature perfectly and completing its testing steps in this session before moving on to other features.
|
||||
It's ok if you only complete one feature in this session, as there will be more sessions later that continue to make progress.
|
||||
Focus on completing one feature perfectly in this session. It's ok if you only complete one feature, as more sessions will follow.
|
||||
|
||||
#### When to Skip a Feature (EXTREMELY RARE)
|
||||
|
||||
**Skipping should almost NEVER happen.** Only skip for truly external blockers you cannot control:
|
||||
|
||||
- **External API not configured**: Third-party service credentials missing (e.g., Stripe keys, OAuth secrets)
|
||||
- **External service unavailable**: Dependency on service that's down or inaccessible
|
||||
- **Environment limitation**: Hardware or system requirement you cannot fulfill
|
||||
|
||||
**NEVER skip because:**
|
||||
|
||||
| Situation | Wrong Action | Correct Action |
|
||||
|-----------|--------------|----------------|
|
||||
| "Page doesn't exist" | Skip | Create the page |
|
||||
| "API endpoint missing" | Skip | Implement the endpoint |
|
||||
| "Database table not ready" | Skip | Create the migration |
|
||||
| "Component not built" | Skip | Build the component |
|
||||
| "No data to test with" | Skip | Create test data or build data entry flow |
|
||||
| "Feature X needs to be done first" | Skip | Build feature X as part of this feature |
|
||||
|
||||
If a feature requires building other functionality first, **build that functionality**. You are the coding agent - your job is to make the feature work, not to defer it.
|
||||
Only skip for truly external blockers: missing third-party credentials (Stripe keys, OAuth secrets), unavailable external services, or unfulfillable environment requirements. **NEVER** skip because a page, endpoint, component, or data doesn't exist yet -- build it. If a feature requires other functionality first, build that functionality as part of this feature.
|
||||
|
||||
If you must skip (truly external blocker only):
|
||||
|
||||
@@ -139,130 +109,22 @@ Use browser automation tools:
|
||||
|
||||
### STEP 5.5: MANDATORY VERIFICATION CHECKLIST (BEFORE MARKING ANY TEST PASSING)
|
||||
|
||||
**You MUST complete ALL of these checks before marking any feature as "passes": true**
|
||||
**Complete ALL applicable checks before marking any feature as passing:**
|
||||
|
||||
#### Security Verification (for protected features)
|
||||
|
||||
- [ ] Feature respects user role permissions
|
||||
- [ ] Unauthenticated access is blocked (redirects to login)
|
||||
- [ ] API endpoint checks authorization (returns 401/403 appropriately)
|
||||
- [ ] Cannot access other users' data by manipulating URLs
|
||||
|
||||
#### Real Data Verification (CRITICAL - NO MOCK DATA)
|
||||
|
||||
- [ ] Created unique test data via UI (e.g., "TEST_12345_VERIFY_ME")
|
||||
- [ ] Verified the EXACT data I created appears in UI
|
||||
- [ ] Refreshed page - data persists (proves database storage)
|
||||
- [ ] Deleted the test data - verified it's gone everywhere
|
||||
- [ ] NO unexplained data appeared (would indicate mock data)
|
||||
- [ ] Dashboard/counts reflect real numbers after my changes
|
||||
- [ ] **Ran extended mock data grep (STEP 5.6) - no hits in src/ (excluding tests)**
|
||||
- [ ] **Verified no globalThis, devStore, or dev-store patterns**
|
||||
- [ ] **Server restart test passed (STEP 5.7) - data persists across restart**
|
||||
|
||||
#### Navigation Verification
|
||||
|
||||
- [ ] All buttons on this page link to existing routes
|
||||
- [ ] No 404 errors when clicking any interactive element
|
||||
- [ ] Back button returns to correct previous page
|
||||
- [ ] Related links (edit, view, delete) have correct IDs in URLs
|
||||
|
||||
#### Integration Verification
|
||||
|
||||
- [ ] Console shows ZERO JavaScript errors
|
||||
- [ ] Network tab shows successful API calls (no 500s)
|
||||
- [ ] Data returned from API matches what UI displays
|
||||
- [ ] Loading states appeared during API calls
|
||||
- [ ] Error states handle failures gracefully
|
||||
- **Security:** Feature respects role permissions; unauthenticated access blocked; API checks auth (401/403); no cross-user data leaks via URL manipulation
|
||||
- **Real Data:** Create unique test data via UI, verify it appears, refresh to confirm persistence, delete and verify removal. No unexplained data (indicates mocks). Dashboard counts reflect real numbers
|
||||
- **Mock Data Grep:** Run STEP 5.6 grep checks - no hits in src/ (excluding tests). No globalThis, devStore, or dev-store patterns
|
||||
- **Server Restart:** For data features, run STEP 5.7 - data persists across server restart
|
||||
- **Navigation:** All buttons link to existing routes, no 404s, back button works, edit/view/delete links have correct IDs
|
||||
- **Integration:** Zero JS console errors, no 500s in network tab, API data matches UI, loading/error states work
|
||||
|
||||
### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
|
||||
|
||||
**Run ALL these grep checks. Any hits in src/ (excluding test files) require investigation:**
|
||||
|
||||
```bash
|
||||
# Common exclusions for test files
|
||||
EXCLUDE="--exclude=*.test.* --exclude=*.spec.* --exclude=*__test__* --exclude=*__mocks__*"
|
||||
|
||||
# 1. In-memory storage patterns (CRITICAL - catches dev-store)
|
||||
grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 2. Mock data variables
|
||||
grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 3. TODO/incomplete markers
|
||||
grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 4. Development-only conditionals
|
||||
grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/
|
||||
|
||||
# 5. In-memory collections as data stores
|
||||
grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" $EXCLUDE src/ 2>/dev/null
|
||||
```
|
||||
|
||||
**Rule:** If ANY grep returns results in production code → investigate → FIX before marking passing.
|
||||
|
||||
**Runtime verification:**
|
||||
1. Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
|
||||
2. Check database directly - all displayed data must come from real DB queries
|
||||
3. If unexplained data appears, it's mock data - fix before marking passing.
|
||||
Before marking a feature passing, grep for mock/placeholder data patterns in src/ (excluding test files): `globalThis`, `devStore`, `dev-store`, `mockDb`, `mockData`, `fakeData`, `sampleData`, `dummyData`, `testData`, `TODO.*real`, `TODO.*database`, `STUB`, `MOCK`, `isDevelopment`, `isDev`. Any hits in production code must be investigated and fixed. Also create unique test data (e.g., "TEST_12345"), verify it appears in UI, then delete and confirm removal - unexplained data indicates mock implementations.
|
||||
|
||||
### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
|
||||
|
||||
**When required:** Any feature involving CRUD operations or data persistence.
|
||||
|
||||
**This test is NON-NEGOTIABLE. It catches in-memory storage implementations that pass all other tests.**
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Create unique test data via UI or API (e.g., item named "RESTART_TEST_12345")
|
||||
2. Verify data appears in UI and API response
|
||||
|
||||
3. **STOP the server completely:**
|
||||
```bash
|
||||
# Kill by port (safer - only kills the dev server, not VS Code/Claude Code/etc.)
|
||||
# Unix/macOS:
|
||||
lsof -ti :${PORT:-3000} | xargs kill -TERM 2>/dev/null || true
|
||||
sleep 3
|
||||
lsof -ti :${PORT:-3000} | xargs kill -9 2>/dev/null || true
|
||||
sleep 2
|
||||
|
||||
# Windows alternative (use if lsof not available):
|
||||
# netstat -ano | findstr :${PORT:-3000} | findstr LISTENING
|
||||
# taskkill /F /PID <pid_from_above> 2>nul
|
||||
|
||||
# Verify server is stopped
|
||||
if lsof -ti :${PORT:-3000} > /dev/null 2>&1; then
|
||||
echo "ERROR: Server still running on port ${PORT:-3000}!"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
4. **RESTART the server:**
|
||||
```bash
|
||||
./init.sh &
|
||||
sleep 15 # Allow server to fully start
|
||||
# Verify server is responding
|
||||
if ! curl -f http://localhost:${PORT:-3000}/api/health && ! curl -f http://localhost:${PORT:-3000}; then
|
||||
echo "ERROR: Server failed to start after restart"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
5. **Query for test data - it MUST still exist**
|
||||
- Via UI: Navigate to data location, verify data appears
|
||||
- Via API: `curl http://localhost:${PORT:-3000}/api/items` - verify data in response
|
||||
|
||||
6. **If data is GONE:** Implementation uses in-memory storage → CRITICAL FAIL
|
||||
- Run all grep commands from STEP 5.6 to identify the mock pattern
|
||||
- You MUST fix the in-memory storage implementation before proceeding
|
||||
- Replace in-memory storage with real database queries
|
||||
|
||||
7. **Clean up test data** after successful verification
|
||||
|
||||
**Why this test exists:** In-memory stores like `globalThis.devStore` pass all other tests because data persists during a single server run. Only a full server restart reveals this bug. Skipping this step WILL allow dev-store implementations to slip through.
|
||||
|
||||
**YOLO Mode Note:** Even in YOLO mode, this verification is MANDATORY for data features. Use curl instead of browser automation.
|
||||
For any feature involving CRUD or data persistence: create unique test data (e.g., "RESTART_TEST_12345"), verify it exists, then fully stop and restart the dev server. After restart, verify the test data still exists. If data is gone, the implementation uses in-memory storage -- run STEP 5.6 greps, find the mock pattern, and replace with real database queries. Clean up test data after verification. This test catches in-memory stores like `globalThis.devStore` that pass all other tests but lose data on restart.
|
||||
|
||||
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
|
||||
|
||||
|
||||
@@ -1,58 +1,29 @@
|
||||
## YOUR ROLE - TESTING AGENT
|
||||
|
||||
You are a **testing agent** responsible for **regression testing** previously-passing features.
|
||||
You are a **testing agent** responsible for **regression testing** previously-passing features. If you find a regression, you must fix it.
|
||||
|
||||
Your job is to ensure that features marked as "passing" still work correctly. If you find a regression (a feature that no longer works), you must fix it.
|
||||
## ASSIGNED FEATURES FOR REGRESSION TESTING
|
||||
|
||||
### STEP 1: GET YOUR BEARINGS (MANDATORY)
|
||||
You are assigned to test the following features: {{TESTING_FEATURE_IDS}}
|
||||
|
||||
Start by orienting yourself:
|
||||
### Workflow for EACH feature:
|
||||
1. Call `feature_get_by_id` with the feature ID
|
||||
2. Read the feature's verification steps
|
||||
3. Test the feature in the browser
|
||||
4. Call `feature_mark_passing` or `feature_mark_failing`
|
||||
5. Move to the next feature
|
||||
|
||||
```bash
|
||||
# 1. See your working directory
|
||||
pwd
|
||||
---
|
||||
|
||||
# 2. List files to understand project structure
|
||||
ls -la
|
||||
### STEP 1: GET YOUR ASSIGNED FEATURE(S)
|
||||
|
||||
# 3. Read progress notes from previous sessions (last 200 lines)
|
||||
tail -200 claude-progress.txt
|
||||
|
||||
# 4. Check recent git history
|
||||
git log --oneline -10
|
||||
```
|
||||
|
||||
Then use MCP tools to check feature status:
|
||||
Your features have been pre-assigned by the orchestrator. For each feature ID listed above, use `feature_get_by_id` to get the details:
|
||||
|
||||
```
|
||||
# 5. Get progress statistics
|
||||
Use the feature_get_stats tool
|
||||
Use the feature_get_by_id tool with feature_id=<ID>
|
||||
```
|
||||
|
||||
### STEP 2: START SERVERS (IF NOT RUNNING)
|
||||
|
||||
If `init.sh` exists, run it:
|
||||
|
||||
```bash
|
||||
chmod +x init.sh
|
||||
./init.sh
|
||||
```
|
||||
|
||||
Otherwise, start servers manually.
|
||||
|
||||
### STEP 3: GET YOUR ASSIGNED FEATURE
|
||||
|
||||
Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` to get the details:
|
||||
|
||||
```
|
||||
Use the feature_get_by_id tool with feature_id={your_assigned_id}
|
||||
```
|
||||
|
||||
The orchestrator has already claimed this feature for testing (set `testing_in_progress=true`).
|
||||
|
||||
**CRITICAL:** You MUST call `feature_release_testing` when done, regardless of pass/fail.
|
||||
|
||||
### STEP 4: VERIFY THE FEATURE
|
||||
### STEP 2: VERIFY THE FEATURE
|
||||
|
||||
**CRITICAL:** You MUST verify the feature through the actual UI using browser automation.
|
||||
|
||||
@@ -81,21 +52,11 @@ Use browser automation tools:
|
||||
- browser_console_messages - Get browser console output (check for errors)
|
||||
- browser_network_requests - Monitor API calls
|
||||
|
||||
### STEP 5: HANDLE RESULTS
|
||||
### STEP 3: HANDLE RESULTS
|
||||
|
||||
#### If the feature PASSES:
|
||||
|
||||
The feature still works correctly. Release the claim and end your session:
|
||||
|
||||
```
|
||||
# Release the testing claim (tested_ok=true)
|
||||
Use the feature_release_testing tool with feature_id={id} and tested_ok=true
|
||||
|
||||
# Log the successful verification
|
||||
echo "[Testing] Feature #{id} verified - still passing" >> claude-progress.txt
|
||||
```
|
||||
|
||||
**DO NOT** call feature_mark_passing again - it's already passing.
|
||||
The feature still works correctly. **DO NOT** call feature_mark_passing again -- it's already passing. End your session.
|
||||
|
||||
#### If the feature FAILS (regression found):
|
||||
|
||||
@@ -125,13 +86,7 @@ A regression has been introduced. You MUST fix it:
|
||||
Use the feature_mark_passing tool with feature_id={id}
|
||||
```
|
||||
|
||||
6. **Release the testing claim:**
|
||||
```
|
||||
Use the feature_release_testing tool with feature_id={id} and tested_ok=false
|
||||
```
|
||||
Note: tested_ok=false because we found a regression (even though we fixed it).
|
||||
|
||||
7. **Commit the fix:**
|
||||
6. **Commit the fix:**
|
||||
```bash
|
||||
git add .
|
||||
git commit -m "Fix regression in [feature name]
|
||||
@@ -141,14 +96,6 @@ A regression has been introduced. You MUST fix it:
|
||||
- Verified with browser automation"
|
||||
```
|
||||
|
||||
### STEP 6: UPDATE PROGRESS AND END
|
||||
|
||||
Update `claude-progress.txt`:
|
||||
|
||||
```bash
|
||||
echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progress.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AVAILABLE MCP TOOLS
|
||||
@@ -156,12 +103,11 @@ echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progr
|
||||
### Feature Management
|
||||
- `feature_get_stats` - Get progress overview (passing/in_progress/total counts)
|
||||
- `feature_get_by_id` - Get your assigned feature details
|
||||
- `feature_release_testing` - **REQUIRED** - Release claim after testing (pass tested_ok=true/false)
|
||||
- `feature_mark_failing` - Mark a feature as failing (when you find a regression)
|
||||
- `feature_mark_passing` - Mark a feature as passing (after fixing a regression)
|
||||
|
||||
### Browser Automation (Playwright)
|
||||
All interaction tools have **built-in auto-wait** - no manual timeouts needed.
|
||||
All interaction tools have **built-in auto-wait** -- no manual timeouts needed.
|
||||
|
||||
- `browser_navigate` - Navigate to URL
|
||||
- `browser_take_screenshot` - Capture screenshot
|
||||
@@ -178,9 +124,7 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
|
||||
|
||||
## IMPORTANT REMINDERS
|
||||
|
||||
**Your Goal:** Verify that passing features still work, and fix any regressions found.
|
||||
|
||||
**This Session's Goal:** Test ONE feature thoroughly.
|
||||
**Your Goal:** Test each assigned feature thoroughly. Verify it still works, and fix any regression found. Process ALL features in your list before ending your session.
|
||||
|
||||
**Quality Bar:**
|
||||
- Zero console errors
|
||||
@@ -188,21 +132,15 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
|
||||
- Visual appearance correct
|
||||
- API calls succeed
|
||||
|
||||
**CRITICAL - Always release your claim:**
|
||||
- Call `feature_release_testing` when done, whether pass or fail
|
||||
- Pass `tested_ok=true` if the feature passed
|
||||
- Pass `tested_ok=false` if you found a regression
|
||||
|
||||
**If you find a regression:**
|
||||
1. Mark the feature as failing immediately
|
||||
2. Fix the issue
|
||||
3. Verify the fix with browser automation
|
||||
4. Mark as passing only after thorough verification
|
||||
5. Release the testing claim with `tested_ok=false`
|
||||
6. Commit the fix
|
||||
5. Commit the fix
|
||||
|
||||
**You have one iteration.** Focus on testing ONE feature thoroughly.
|
||||
**You have one iteration.** Test all assigned features before ending.
|
||||
|
||||
---
|
||||
|
||||
Begin by running Step 1 (Get Your Bearings).
|
||||
Begin by running Step 1 for the first feature in your assigned list.
|
||||
|
||||
Reference in New Issue
Block a user