improve performance

This commit is contained in:
Auto
2026-01-23 14:37:43 +02:00
parent 1be42cc734
commit 874359fcf6
9 changed files with 396 additions and 672 deletions

View File

@@ -172,48 +172,12 @@ Use browser automation tools:
- [ ] Loading states appeared during API calls
- [ ] Error states handle failures gracefully
### STEP 5.6: MOCK DATA DETECTION SWEEP
### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
**Run this sweep AFTER EVERY FEATURE before marking it as passing:**
#### 1. Code Pattern Search
Search the codebase for forbidden patterns:
```bash
# Search for mock data patterns
grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx"
grep -r "// TODO\|// FIXME\|// STUB\|// MOCK" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx"
grep -r "hardcoded\|placeholder" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx"
```
**If ANY matches found related to your feature - FIX THEM before proceeding.**
#### 2. Runtime Verification
For ANY data displayed in UI:
1. Create NEW data with UNIQUE content (e.g., "TEST_12345_DELETE_ME")
2. Verify that EXACT content appears in the UI
3. Delete the record
4. Verify it's GONE from the UI
5. **If you see data that wasn't created during testing - IT'S MOCK DATA. Fix it.**
#### 3. Database Verification
Check that:
- Database tables contain only data you created during tests
- Counts/statistics match actual database record counts
- No seed data is masquerading as user data
#### 4. API Response Verification
For API endpoints used by this feature:
- Call the endpoint directly
- Verify response contains actual database data
- Empty database = empty response (not pre-populated mock data)
1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
3. **Check database:** All displayed data must come from real DB queries
4. If unexplained data appears, it's mock data - fix before marking passing.
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
@@ -273,51 +237,11 @@ Before context fills up:
---
## TESTING REQUIREMENTS
## BROWSER AUTOMATION
**ALL testing must use browser automation tools.**
Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in.
Available tools:
**Navigation & Screenshots:**
- browser_navigate - Navigate to a URL
- browser_navigate_back - Go back to previous page
- browser_take_screenshot - Capture screenshot (use for visual verification)
- browser_snapshot - Get accessibility tree snapshot (structured page data)
**Element Interaction:**
- browser_click - Click elements (has built-in auto-wait)
- browser_type - Type text into editable elements
- browser_fill_form - Fill multiple form fields at once
- browser_select_option - Select dropdown options
- browser_hover - Hover over elements
- browser_drag - Drag and drop between elements
- browser_press_key - Press keyboard keys
**Debugging & Monitoring:**
- browser_console_messages - Get browser console output (check for errors)
- browser_network_requests - Monitor API calls and responses
- browser_evaluate - Execute JavaScript (USE SPARINGLY - debugging only, NOT for bypassing UI)
**Browser Management:**
- browser_close - Close the browser
- browser_resize - Resize browser window (use to test mobile: 375x667, tablet: 768x1024, desktop: 1280x720)
- browser_tabs - Manage browser tabs
- browser_wait_for - Wait for text/element/time
- browser_handle_dialog - Handle alert/confirm dialogs
- browser_file_upload - Upload files
**Key Benefits:**
- All interaction tools have **built-in auto-wait** - no manual timeouts needed
- Use `browser_console_messages` to detect JavaScript errors
- Use `browser_network_requests` to verify API calls succeed
Test like a human user with mouse and keyboard. Don't take shortcuts by using JavaScript evaluation.
Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation.
---
@@ -381,26 +305,7 @@ This allows you to fully test email-dependent flows without needing external ema
---
## IMPORTANT REMINDERS
**Your Goal:** Production-quality application with all tests passing
**This Session's Goal:** Complete at least one feature perfectly
**Priority:** Fix broken tests before implementing new features
**Quality Bar:**
- Zero console errors
- Polished UI matching the design specified in app_spec.txt
- All features work end-to-end through the UI
- Fast, responsive, professional
- **NO MOCK DATA - all data from real database**
- **Security enforced - unauthorized access blocked**
- **All navigation works - no 404s or broken links**
**You have unlimited time.** Take as long as needed to get it right. The most important thing is that you
leave the code base in a clean state before terminating the session (Step 9).
**Remember:** One feature per session. Zero console errors. All data from real database. Leave codebase clean before ending session.
---

View File

@@ -26,82 +26,11 @@ which is the single source of truth for what needs to be built.
**Creating Features:**
Use the feature_create_bulk tool to add all features at once. Note: You MUST include `depends_on_indices`
to specify dependencies. Features with no dependencies can run first and enable parallel execution.
```
Use the feature_create_bulk tool with features=[
{
"category": "functional",
"name": "App loads without errors",
"description": "Application starts and renders homepage",
"steps": [
"Step 1: Navigate to homepage",
"Step 2: Verify no console errors",
"Step 3: Verify main content renders"
]
// No depends_on_indices = FOUNDATION feature (runs first)
},
{
"category": "functional",
"name": "User can create an account",
"description": "Basic user registration functionality",
"steps": [
"Step 1: Navigate to registration page",
"Step 2: Fill in required fields",
"Step 3: Submit form and verify account created"
],
"depends_on_indices": [0] // Depends on app loading
},
{
"category": "functional",
"name": "User can log in",
"description": "Authentication with existing credentials",
"steps": [
"Step 1: Navigate to login page",
"Step 2: Enter credentials",
"Step 3: Verify successful login and redirect"
],
"depends_on_indices": [0, 1] // Depends on app loading AND registration
},
{
"category": "functional",
"name": "User can view dashboard",
"description": "Protected dashboard requires authentication",
"steps": [
"Step 1: Log in as user",
"Step 2: Navigate to dashboard",
"Step 3: Verify personalized content displays"
],
"depends_on_indices": [2] // Depends on login only
},
{
"category": "functional",
"name": "User can update profile",
"description": "User can modify their profile information",
"steps": [
"Step 1: Log in as user",
"Step 2: Navigate to profile settings",
"Step 3: Update and save profile"
],
"depends_on_indices": [2] // ALSO depends on login (WIDE GRAPH - can run parallel with dashboard!)
}
]
```
Use the feature_create_bulk tool to add all features at once. You can create features in batches if there are many (e.g., 50 at a time).
**Notes:**
- IDs and priorities are assigned automatically based on order
- All features start with `passes: false` by default
- You can create features in batches if there are many (e.g., 50 at a time)
- **CRITICAL:** Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below)
**DEPENDENCY REQUIREMENT:**
You MUST specify dependencies using `depends_on_indices` for features that logically depend on others.
- Features 0-9 should have NO dependencies (foundation/setup features)
- Features 10+ MUST have at least some dependencies where logical
- Create WIDE dependency graphs, not linear chains:
- BAD: A -> B -> C -> D -> E (linear chain, only 1 feature can run at a time)
- GOOD: A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, multiple features can run in parallel)
**Requirements for features:**
@@ -114,7 +43,6 @@ You MUST specify dependencies using `depends_on_indices` for features that logic
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
- At least 25 tests MUST have 10+ steps each (more for complex apps)
- Order features by priority: fundamental features first (the API assigns priority based on order)
- All features start with `passes: false` automatically
- Cover every feature in the spec exhaustively
- **MUST include tests from ALL 20 mandatory categories below**
@@ -122,125 +50,68 @@ You MUST specify dependencies using `depends_on_indices` for features that logic
## FEATURE DEPENDENCIES (MANDATORY)
**THIS SECTION IS MANDATORY. You MUST specify dependencies for features.**
Dependencies enable **parallel execution** of independent features. When specified correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development.
Dependencies enable **parallel execution** of independent features. When you specify dependencies correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development.
**Why this matters:** Without dependencies, features execute in random order, causing logical issues (e.g., "Edit user" before "Create user") and preventing efficient parallelization.
**WARNING:** If you do not specify dependencies, ALL features will be ready immediately, which:
1. Overwhelms the parallel agents trying to work on unrelated features
2. Results in features being implemented in random order
3. Causes logical issues (e.g., "Edit user" attempted before "Create user")
### Dependency Rules
You MUST analyze each feature and specify its dependencies using `depends_on_indices`.
1. **Use `depends_on_indices`** (0-based array indices) to reference dependencies
2. **Can only depend on EARLIER features** (index must be less than current position)
3. **No circular dependencies** allowed
4. **Maximum 20 dependencies** per feature
5. **Foundation features (index 0-9)** should have NO dependencies
6. **60% of features after index 10** should have at least one dependency
### Why Dependencies Matter
### Dependency Types
1. **Parallel Execution**: Features without dependencies can run in parallel
2. **Logical Ordering**: Ensures features are built in the right order
3. **Blocking Prevention**: An agent won't start a feature until its dependencies pass
| Type | Example |
|------|---------|
| Data | "Edit item" depends on "Create item" |
| Auth | "View dashboard" depends on "User can log in" |
| Navigation | "Modal close works" depends on "Modal opens" |
| UI | "Filter results" depends on "Display results list" |
### How to Determine Dependencies
### Wide Graph Pattern (REQUIRED)
Ask yourself: "What MUST be working before this feature can be tested?"
Create WIDE dependency graphs, not linear chains:
- **BAD:** A -> B -> C -> D -> E (linear chain, only 1 feature runs at a time)
- **GOOD:** A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, parallel execution)
| Dependency Type | Example |
|-----------------|---------|
| **Data dependencies** | "Edit item" depends on "Create item" |
| **Auth dependencies** | "View dashboard" depends on "User can log in" |
| **Navigation dependencies** | "Modal close works" depends on "Modal opens" |
| **UI dependencies** | "Filter results" depends on "Display results list" |
| **API dependencies** | "Fetch user data" depends on "API authentication" |
### Using `depends_on_indices`
Since feature IDs aren't assigned until after creation, use **array indices** (0-based) to reference dependencies:
```json
{
"features": [
{ "name": "Create account", ... }, // Index 0
{ "name": "Login", "depends_on_indices": [0] }, // Index 1, depends on 0
{ "name": "View profile", "depends_on_indices": [1] }, // Index 2, depends on 1
{ "name": "Edit profile", "depends_on_indices": [2] } // Index 3, depends on 2
]
}
```
### Rules for Dependencies
1. **Can only depend on EARLIER features**: Index must be less than current feature's position
2. **No circular dependencies**: A cannot depend on B if B depends on A
3. **Maximum 20 dependencies** per feature
4. **Foundation features have NO dependencies**: First features in each category typically have none
5. **Don't over-depend**: Only add dependencies that are truly required for testing
### Best Practices
1. **Start with foundation features** (index 0-10): Core setup, basic navigation, authentication
2. **Group related features together**: Keep CRUD operations adjacent
3. **Chain complex flows**: Registration -> Login -> Dashboard -> Settings
4. **Keep dependencies shallow**: Prefer 1-2 dependencies over deep chains
5. **Skip dependencies for independent features**: Visual tests often have no dependencies
### Minimum Dependency Coverage
**REQUIREMENT:** At least 60% of your features (after index 10) should have at least one dependency.
Target structure for a 150-feature project:
- Features 0-9: Foundation (0 dependencies) - App loads, basic setup
- Features 10-149: At least 84 should have dependencies (60% of 140)
This ensures:
- A good mix of parallelizable features (foundation)
- Logical ordering for dependent features
### Example: Todo App Feature Chain (Wide Graph Pattern)
This example shows the CORRECT wide graph pattern where multiple features share the same dependency,
enabling parallel execution:
### Complete Example
```json
[
// FOUNDATION TIER (indices 0-2, no dependencies)
// These run first and enable everything else
// FOUNDATION TIER (indices 0-2, no dependencies) - run first
{ "name": "App loads without errors", "category": "functional" },
{ "name": "Navigation bar displays", "category": "style" },
{ "name": "Homepage renders correctly", "category": "functional" },
// AUTH TIER (indices 3-5, depend on foundation)
// These can all run in parallel once foundation passes
// AUTH TIER (indices 3-5, depend on foundation) - run in parallel
{ "name": "User can register", "depends_on_indices": [0] },
{ "name": "User can login", "depends_on_indices": [0, 3] },
{ "name": "User can logout", "depends_on_indices": [4] },
// CORE CRUD TIER (indices 6-9, depend on auth)
// WIDE GRAPH: All 4 of these depend on login (index 4)
// This means all 4 can start as soon as login passes!
// CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
// All 4 start as soon as login passes!
{ "name": "User can create todo", "depends_on_indices": [4] },
{ "name": "User can view todos", "depends_on_indices": [4] },
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
// ADVANCED TIER (indices 10-11, depend on CRUD)
// Note: filter and search both depend on view (7), not on each other
// ADVANCED TIER (indices 10-11) - both depend on view, not each other
{ "name": "User can filter todos", "depends_on_indices": [7] },
{ "name": "User can search todos", "depends_on_indices": [7] }
]
```
**Parallelism analysis of this example:**
- Foundation tier: 3 features can run in parallel
- Auth tier: 3 features wait for foundation, then can run (mostly parallel)
- CRUD tier: 4 features can start once login passes (all 4 in parallel!)
- Advanced tier: 2 features can run once view passes (both in parallel)
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
---
## MANDATORY TEST CATEGORIES
The feature_list.json **MUST** include tests from ALL of these categories. The minimum counts scale by complexity tier.
The feature_list.json **MUST** include tests from ALL 20 categories. Minimum counts scale by complexity tier.
### Category Distribution by Complexity Tier
@@ -270,331 +141,47 @@ The feature_list.json **MUST** include tests from ALL of these categories. The m
---
### A. Security & Access Control Tests
### Category Descriptions
Test that unauthorized access is blocked and permissions are enforced.
**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.
**Required tests (examples):**
**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
- Unauthenticated user cannot access protected routes (redirect to login)
- Regular user cannot access admin-only pages (403 or redirect)
- API endpoints return 401 for unauthenticated requests
- API endpoints return 403 for unauthorized role access
- Session expires after configured inactivity period
- Logout clears all session data and tokens
- Invalid/expired tokens are rejected
- Each role can ONLY see their permitted menu items
- Direct URL access to unauthorized pages is blocked
- Sensitive operations require confirmation or re-authentication
- Cannot access another user's data by manipulating IDs in URL
- Password reset flow works securely
- Failed login attempts are handled (no information leakage)
**C. Real Data Verification** - Test data persistence across refreshes and sessions, CRUD operations with unique test data, related record updates, and empty states.
### B. Navigation Integrity Tests
**D. Workflow Completeness** - Test end-to-end CRUD for every entity, state transitions, multi-step wizards, bulk operations, and form submission feedback.
Test that every button, link, and menu item goes to the correct place.
**E. Error Handling** - Test network failures, invalid input, API errors, 404/500 responses, loading states, timeouts, and user-friendly error messages.
**Required tests (examples):**
**F. UI-Backend Integration** - Test request/response format matching, database-driven dropdowns, cascading updates, filters/sorts with real data, and API error display.
- Every button in sidebar navigates to correct page
- Every menu item links to existing route
- All CRUD action buttons (Edit, Delete, View) go to correct URLs with correct IDs
- Back button works correctly after each navigation
- Deep linking works (direct URL access to any page with auth)
- Breadcrumbs reflect actual navigation path
- 404 page shown for non-existent routes (not crash)
- After login, user redirected to intended destination (or dashboard)
- After logout, user redirected to login page
- Pagination links work and preserve current filters
- Tab navigation within pages works correctly
- Modal close buttons return to previous state
- Cancel buttons on forms return to previous page
**G. State & Persistence** - Test refresh mid-form, session recovery, multi-tab behavior, back-button after submit, and unsaved changes warnings.
### C. Real Data Verification Tests
**H. URL & Direct Access** - Test URL manipulation security, direct route access by role, malformed parameters, deep links to deleted entities, and shareable filter URLs.
Test that data is real (not mocked) and persists correctly.
**I. Double-Action & Idempotency** - Test double-click submit, rapid delete clicks, back-and-resubmit, button disabled during processing, and concurrent submissions.
**Required tests (examples):**
**J. Data Cleanup & Cascade** - Test parent deletion effects on children, removal from search/lists/dropdowns, statistics updates, and soft vs hard delete behavior.
- Create a record via UI with unique content → verify it appears in list
- Create a record → refresh page → record still exists
- Create a record → log out → log in → record still exists
- Edit a record → verify changes persist after refresh
- Delete a record → verify it's gone from list AND database
- Delete a record → verify it's gone from related dropdowns
- Filter/search → results match actual data created in test
- Dashboard statistics reflect real record counts (create 3 items, count shows 3)
- Reports show real aggregated data
- Export functionality exports actual data you created
- Related records update when parent changes
- Timestamps are real and accurate (created_at, updated_at)
- Data created by User A is not visible to User B (unless shared)
- Empty state shows correctly when no data exists
**K. Default & Reset** - Test form defaults, sensible date picker defaults, dropdown placeholders, reset button behavior, and filter/pagination reset on context change.
### D. Workflow Completeness Tests
**L. Search & Filter Edge Cases** - Test empty search, whitespace-only, special characters, quotes, long strings, zero-result combinations, and filter persistence.
Test that every workflow can be completed end-to-end through the UI.
**M. Form Validation** - Test required fields, email/password/numeric/date formats, min/max constraints, uniqueness, specific error messages, and server-side validation.
**Required tests (examples):**
**N. Feedback & Notification** - Test success/error feedback for all actions, loading spinners, disabled buttons during submit, progress indicators, and toast behavior.
- Every entity has working Create operation via UI form
- Every entity has working Read/View operation (detail page loads)
- Every entity has working Update operation (edit form saves)
- Every entity has working Delete operation (with confirmation dialog)
- Every status/state has a UI mechanism to transition to next state
- Multi-step processes (wizards) can be completed end-to-end
- Bulk operations (select all, delete selected) work
- Cancel/Undo operations work where applicable
- Required fields prevent submission when empty
- Form validation shows errors before submission
- Successful submission shows success feedback
- Backend workflow (e.g., user→customer conversion) has UI trigger
**O. Responsive & Layout** - Test layouts at desktop (1920px), tablet (768px), and mobile (375px), no horizontal scroll, touch targets, modal fit, and text overflow.
### E. Error Handling Tests
**P. Accessibility** - Test tab navigation, focus rings, screen reader compatibility, ARIA labels, color contrast, labels on form fields, and error announcements.
Test graceful handling of errors and edge cases.
**Q. Temporal & Timezone** - Test timezone-aware display, accurate timestamps, date picker constraints, overdue detection, and date sorting across boundaries.
**Required tests (examples):**
**R. Concurrency & Race Conditions** - Test concurrent edits, viewing deleted records, pagination during updates, rapid navigation, and late API response handling.
- Network failure shows user-friendly error message, not crash
- Invalid form input shows field-level errors
- API errors display meaningful messages to user
- 404 responses handled gracefully (show not found page)
- 500 responses don't expose stack traces or technical details
- Empty search results show "no results found" message
- Loading states shown during all async operations
- Timeout doesn't hang the UI indefinitely
- Submitting form with server error keeps user data in form
- File upload errors (too large, wrong type) show clear message
- Duplicate entry errors (e.g., email already exists) are clear
**S. Export/Import** - Test full/filtered export, import with valid/duplicate/malformed files, and round-trip data integrity.
### F. UI-Backend Integration Tests
Test that frontend and backend communicate correctly.
**Required tests (examples):**
- Frontend request format matches what backend expects
- Backend response format matches what frontend parses
- All dropdown options come from real database data (not hardcoded)
- Related entity selectors (e.g., "choose category") populated from DB
- Changes in one area reflect in related areas after refresh
- Deleting parent handles children correctly (cascade or block)
- Filters work with actual data attributes from database
- Sort functionality sorts real data correctly
- Pagination returns correct page of real data
- API error responses are parsed and displayed correctly
- Loading spinners appear during API calls
- Optimistic updates (if used) rollback on failure
### G. State & Persistence Tests
Test that state is maintained correctly across sessions and tabs.
**Required tests (examples):**
- Refresh page mid-form - appropriate behavior (data kept or cleared)
- Close browser, reopen - session state handled correctly
- Same user in two browser tabs - changes sync or handled gracefully
- Browser back after form submit - no duplicate submission
- Bookmark a page, return later - works (with auth check)
- LocalStorage/cookies cleared - graceful re-authentication
- Unsaved changes warning when navigating away from dirty form
### H. URL & Direct Access Tests
Test direct URL access and URL manipulation security.
**Required tests (examples):**
- Change entity ID in URL - cannot access others' data
- Access /admin directly as regular user - blocked
- Malformed URL parameters - handled gracefully (no crash)
- Very long URL - handled correctly
- URL with SQL injection attempt - rejected/sanitized
- Deep link to deleted entity - shows "not found", not crash
- Query parameters for filters are reflected in UI
- Sharing a URL with filters preserves those filters
### I. Double-Action & Idempotency Tests
Test that rapid or duplicate actions don't cause issues.
**Required tests (examples):**
- Double-click submit button - only one record created
- Rapid multiple clicks on delete - only one deletion occurs
- Submit form, hit back, submit again - appropriate behavior
- Multiple simultaneous API calls - server handles correctly
- Refresh during save operation - data not corrupted
- Click same navigation link twice quickly - no issues
- Submit button disabled during processing
### J. Data Cleanup & Cascade Tests
Test that deleting data cleans up properly everywhere.
**Required tests (examples):**
- Delete parent entity - children removed from all views
- Delete item - removed from search results immediately
- Delete item - statistics/counts updated immediately
- Delete item - related dropdowns updated
- Delete item - cached views refreshed
- Soft delete (if applicable) - item hidden but recoverable
- Hard delete - item completely removed from database
### K. Default & Reset Tests
Test that defaults and reset functionality work correctly.
**Required tests (examples):**
- New form shows correct default values
- Date pickers default to sensible dates (today, not 1970)
- Dropdowns default to correct option (or placeholder)
- Reset button clears to defaults, not just empty
- Clear filters button resets all filters to default
- Pagination resets to page 1 when filters change
- Sorting resets when changing views
### L. Search & Filter Edge Cases
Test search and filter functionality thoroughly.
**Required tests (examples):**
- Empty search shows all results (or appropriate message)
- Search with only spaces - handled correctly
- Search with special characters (!@#$%^&\*) - no errors
- Search with quotes - handled correctly
- Search with very long string - handled correctly
- Filter combinations that return zero results - shows message
- Filter + search + sort together - all work correctly
- Filter persists after viewing detail and returning to list
- Clear individual filter - works correctly
- Search is case-insensitive (or clearly case-sensitive)
### M. Form Validation Tests
Test all form validation rules exhaustively.
**Required tests (examples):**
- Required field empty - shows error, blocks submit
- Email field with invalid email formats - shows error
- Password field - enforces complexity requirements
- Numeric field with letters - rejected
- Date field with invalid date - rejected
- Min/max length enforced on text fields
- Min/max values enforced on numeric fields
- Duplicate unique values rejected (e.g., duplicate email)
- Error messages are specific (not just "invalid")
- Errors clear when user fixes the issue
- Server-side validation matches client-side
- Whitespace-only input rejected for required fields
### N. Feedback & Notification Tests
Test that users get appropriate feedback for all actions.
**Required tests (examples):**
- Every successful save/create shows success feedback
- Every failed action shows error feedback
- Loading spinner during every async operation
- Disabled state on buttons during form submission
- Progress indicator for long operations (file upload)
- Toast/notification disappears after appropriate time
- Multiple notifications don't overlap incorrectly
- Success messages are specific (not just "Success")
### O. Responsive & Layout Tests
Test that the UI works on different screen sizes.
**Required tests (examples):**
- Desktop layout correct at 1920px width
- Tablet layout correct at 768px width
- Mobile layout correct at 375px width
- No horizontal scroll on any standard viewport
- Touch targets large enough on mobile (44px min)
- Modals fit within viewport on mobile
- Long text truncates or wraps correctly (no overflow)
- Tables scroll horizontally if needed on mobile
- Navigation collapses appropriately on mobile
### P. Accessibility Tests
Test basic accessibility compliance.
**Required tests (examples):**
- Tab navigation works through all interactive elements
- Focus ring visible on all focused elements
- Screen reader can navigate main content areas
- ARIA labels on icon-only buttons
- Color contrast meets WCAG AA (4.5:1 for text)
- No information conveyed by color alone
- Form fields have associated labels
- Error messages announced to screen readers
- Skip link to main content (if applicable)
- Images have alt text
### Q. Temporal & Timezone Tests
Test date/time handling.
**Required tests (examples):**
- Dates display in user's local timezone
- Created/updated timestamps accurate and formatted correctly
- Date picker allows only valid date ranges
- Overdue items identified correctly (timezone-aware)
- "Today", "This Week" filters work correctly for user's timezone
- Recurring items generate at correct times (if applicable)
- Date sorting works correctly across months/years
### R. Concurrency & Race Condition Tests
Test multi-user and race condition scenarios.
**Required tests (examples):**
- Two users edit same record - last save wins or conflict shown
- Record deleted while another user viewing - graceful handling
- List updates while user on page 2 - pagination still works
- Rapid navigation between pages - no stale data displayed
- API response arrives after user navigated away - no crash
- Concurrent form submissions from same user handled
### S. Export/Import Tests (if applicable)
Test data export and import functionality.
**Required tests (examples):**
- Export all data - file contains all records
- Export filtered data - only filtered records included
- Import valid file - all records created correctly
- Import duplicate data - handled correctly (skip/update/error)
- Import malformed file - error message, no partial import
- Export then import - data integrity preserved exactly
### T. Performance Tests
Test basic performance requirements.
**Required tests (examples):**
- Page loads in <3s with 100 records
- Page loads in <5s with 1000 records
- Search responds in <1s
- Infinite scroll doesn't degrade with many items
- Large file upload shows progress
- Memory doesn't leak on long sessions
- No console errors during normal operation
**T. Performance** - Test page load with 100/1000 records, search response time, infinite scroll stability, upload progress, and memory/console errors.
---

View File

@@ -21,6 +21,7 @@ from sqlalchemy import (
Column,
DateTime,
ForeignKey,
Index,
Integer,
String,
Text,
@@ -39,6 +40,12 @@ class Feature(Base):
__tablename__ = "features"
# Composite index for common status query pattern (passes, in_progress)
# Used by feature_get_stats, get_ready_features, and other status queries
__table_args__ = (
Index('ix_feature_status', 'passes', 'in_progress'),
)
id = Column(Integer, primary_key=True, index=True)
priority = Column(Integer, nullable=False, default=999, index=True)
category = Column(String(100), nullable=False)

View File

@@ -6,6 +6,7 @@ Provides dependency resolution using Kahn's algorithm for topological sorting.
Includes cycle detection, validation, and helper functions for dependency management.
"""
import heapq
from typing import TypedDict
# Security: Prevent DoS via excessive dependencies
@@ -55,19 +56,27 @@ def resolve_dependencies(features: list[dict]) -> DependencyResult:
if not dep.get("passes"):
blocked.setdefault(feature["id"], []).append(dep_id)
# Kahn's algorithm with priority-aware selection
queue = [f for f in features if in_degree[f["id"]] == 0]
queue.sort(key=lambda f: (f.get("priority", 999), f["id"]))
# Kahn's algorithm with priority-aware selection using a heap
# Heap entries are tuples: (priority, id, feature_dict) for stable ordering
heap = [
(f.get("priority", 999), f["id"], f)
for f in features
if in_degree[f["id"]] == 0
]
heapq.heapify(heap)
ordered: list[dict] = []
while queue:
current = queue.pop(0)
while heap:
_, _, current = heapq.heappop(heap)
ordered.append(current)
for dependent_id in adjacency[current["id"]]:
in_degree[dependent_id] -= 1
if in_degree[dependent_id] == 0:
queue.append(feature_map[dependent_id])
queue.sort(key=lambda f: (f.get("priority", 999), f["id"]))
dep_feature = feature_map[dependent_id]
heapq.heappush(
heap,
(dep_feature.get("priority", 999), dependent_id, dep_feature)
)
# Detect cycles (features not in ordered = part of cycle)
cycles: list[list[int]] = []
@@ -84,12 +93,19 @@ def resolve_dependencies(features: list[dict]) -> DependencyResult:
}
def are_dependencies_satisfied(feature: dict, all_features: list[dict]) -> bool:
def are_dependencies_satisfied(
feature: dict,
all_features: list[dict],
passing_ids: set[int] | None = None,
) -> bool:
"""Check if all dependencies have passes=True.
Args:
feature: Feature dict to check
all_features: List of all feature dicts
passing_ids: Optional pre-computed set of passing feature IDs.
If None, will be computed from all_features. Pass this when
calling in a loop to avoid O(n^2) complexity.
Returns:
True if all dependencies are satisfied (or no dependencies)
@@ -97,22 +113,31 @@ def are_dependencies_satisfied(feature: dict, all_features: list[dict]) -> bool:
deps = feature.get("dependencies") or []
if not deps:
return True
passing_ids = {f["id"] for f in all_features if f.get("passes")}
if passing_ids is None:
passing_ids = {f["id"] for f in all_features if f.get("passes")}
return all(dep_id in passing_ids for dep_id in deps)
def get_blocking_dependencies(feature: dict, all_features: list[dict]) -> list[int]:
def get_blocking_dependencies(
feature: dict,
all_features: list[dict],
passing_ids: set[int] | None = None,
) -> list[int]:
"""Get list of incomplete dependency IDs.
Args:
feature: Feature dict to check
all_features: List of all feature dicts
passing_ids: Optional pre-computed set of passing feature IDs.
If None, will be computed from all_features. Pass this when
calling in a loop to avoid O(n^2) complexity.
Returns:
List of feature IDs that are blocking this feature
"""
deps = feature.get("dependencies") or []
passing_ids = {f["id"] for f in all_features if f.get("passes")}
if passing_ids is None:
passing_ids = {f["id"] for f in all_features if f.get("passes")}
return [dep_id for dep_id in deps if dep_id not in passing_ids]

View File

@@ -12,7 +12,7 @@ import sys
from pathlib import Path
from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
from claude_agent_sdk.types import HookMatcher
from claude_agent_sdk.types import HookContext, HookInput, HookMatcher, SyncHookJSONOutput
from dotenv import load_dotenv
from security import bash_security_hook
@@ -55,7 +55,9 @@ FEATURE_MCP_TOOLS = [
# Core feature operations
"mcp__features__feature_get_stats",
"mcp__features__feature_get_by_id", # Get assigned feature details
"mcp__features__feature_get_summary", # Lightweight: id, name, status, deps only
"mcp__features__feature_mark_in_progress",
"mcp__features__feature_claim_and_get", # Atomic claim + get details
"mcp__features__feature_mark_passing",
"mcp__features__feature_mark_failing", # Mark regression detected
"mcp__features__feature_skip",
@@ -268,6 +270,45 @@ def create_client(
context["project_dir"] = str(project_dir.resolve())
return await bash_security_hook(input_data, tool_use_id, context)
# PreCompact hook for logging and customizing context compaction
# Compaction is handled automatically by Claude Code CLI when context approaches limits.
# This hook allows us to log when compaction occurs and optionally provide custom instructions.
async def pre_compact_hook(
input_data: HookInput,
tool_use_id: str | None,
context: HookContext,
) -> SyncHookJSONOutput:
"""
Hook called before context compaction occurs.
Compaction triggers:
- "auto": Automatic compaction when context approaches token limits
- "manual": User-initiated compaction via /compact command
The hook can customize compaction via hookSpecificOutput:
- customInstructions: String with focus areas for summarization
"""
trigger = input_data.get("trigger", "auto")
custom_instructions = input_data.get("custom_instructions")
if trigger == "auto":
print("[Context] Auto-compaction triggered (context approaching limit)")
else:
print("[Context] Manual compaction requested")
if custom_instructions:
print(f"[Context] Custom instructions: {custom_instructions}")
# Return empty dict to allow compaction to proceed with default behavior
# To customize, return:
# {
# "hookSpecificOutput": {
# "hookEventName": "PreCompact",
# "customInstructions": "Focus on preserving file paths and test results"
# }
# }
return SyncHookJSONOutput()
return ClaudeSDKClient(
options=ClaudeAgentOptions(
model=model,
@@ -281,10 +322,35 @@ def create_client(
"PreToolUse": [
HookMatcher(matcher="Bash", hooks=[bash_hook_with_context]),
],
# PreCompact hook for context management during long sessions.
# Compaction is automatic when context approaches token limits.
# This hook logs compaction events and can customize summarization.
"PreCompact": [
HookMatcher(hooks=[pre_compact_hook]),
],
},
max_turns=1000,
cwd=str(project_dir.resolve()),
settings=str(settings_file.resolve()), # Use absolute path
env=sdk_env, # Pass API configuration overrides to CLI subprocess
# Enable extended context beta for better handling of long sessions.
# This provides up to 1M tokens of context with automatic compaction.
# See: https://docs.anthropic.com/en/api/beta-headers
betas=["context-1m-2025-08-07"],
# Note on context management:
# The Claude Agent SDK handles context management automatically through the
# underlying Claude Code CLI. When context approaches limits, the CLI
# automatically compacts/summarizes previous messages.
#
# The SDK does NOT expose explicit compaction_control or context_management
# parameters. Instead, context is managed via:
# 1. betas=["context-1m-2025-08-07"] - Extended context window
# 2. PreCompact hook - Intercept and customize compaction behavior
# 3. max_turns - Limit conversation turns (set to 1000 for long sessions)
#
# Future SDK versions may add explicit compaction controls. When available,
# consider adding:
# - compaction_control={"enabled": True, "context_token_threshold": 80000}
# - context_management={"edits": [...]} for tool use clearing
)
)

View File

@@ -8,10 +8,12 @@ Provides tools to manage features in the autonomous coding system.
Tools:
- feature_get_stats: Get progress statistics
- feature_get_by_id: Get a specific feature by ID
- feature_get_summary: Get minimal feature info (id, name, status, deps)
- feature_mark_passing: Mark a feature as passing
- feature_mark_failing: Mark a feature as failing (regression detected)
- feature_skip: Skip a feature (move to end of queue)
- feature_mark_in_progress: Mark a feature as in-progress
- feature_claim_and_get: Atomically claim and get feature details
- feature_clear_in_progress: Clear in-progress status
- feature_release_testing: Release testing lock on a feature
- feature_create_bulk: Create multiple features at once
@@ -19,7 +21,7 @@ Tools:
- feature_add_dependency: Add a dependency between features
- feature_remove_dependency: Remove a dependency
- feature_get_ready: Get features ready to implement
- feature_get_blocked: Get features blocked by dependencies
- feature_get_blocked: Get features blocked by dependencies (with limit)
- feature_get_graph: Get the dependency graph
Note: Feature selection (which feature to work on) is handled by the
@@ -142,11 +144,20 @@ def feature_get_stats() -> str:
Returns:
JSON with: passing (int), in_progress (int), total (int), percentage (float)
"""
from sqlalchemy import case, func
session = get_session()
try:
total = session.query(Feature).count()
passing = session.query(Feature).filter(Feature.passes == True).count()
in_progress = session.query(Feature).filter(Feature.in_progress == True).count()
# Single aggregate query instead of 3 separate COUNT queries
result = session.query(
func.count(Feature.id).label('total'),
func.sum(case((Feature.passes == True, 1), else_=0)).label('passing'),
func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress')
).first()
total = result.total or 0
passing = int(result.passing or 0)
in_progress = int(result.in_progress or 0)
percentage = round((passing / total) * 100, 1) if total > 0 else 0.0
return json.dumps({
@@ -154,7 +165,7 @@ def feature_get_stats() -> str:
"in_progress": in_progress,
"total": total,
"percentage": percentage
}, indent=2)
})
finally:
session.close()
@@ -181,7 +192,38 @@ def feature_get_by_id(
if feature is None:
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
return json.dumps(feature.to_dict(), indent=2)
return json.dumps(feature.to_dict())
finally:
session.close()
@mcp.tool()
def feature_get_summary(
feature_id: Annotated[int, Field(description="The ID of the feature", ge=1)]
) -> str:
"""Get minimal feature info: id, name, status, and dependencies only.
Use this instead of feature_get_by_id when you only need status info,
not the full description and steps. This reduces response size significantly.
Args:
feature_id: The ID of the feature to retrieve
Returns:
JSON with: id, name, passes, in_progress, dependencies
"""
session = get_session()
try:
feature = session.query(Feature).filter(Feature.id == feature_id).first()
if feature is None:
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
return json.dumps({
"id": feature.id,
"name": feature.name,
"passes": feature.passes,
"in_progress": feature.in_progress,
"dependencies": feature.dependencies or []
})
finally:
session.close()
@@ -229,7 +271,7 @@ def feature_release_testing(
return json.dumps({
"message": f"Feature #{feature_id} testing {status}",
"feature": feature.to_dict()
}, indent=2)
})
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to release testing claim: {str(e)}"})
@@ -250,7 +292,7 @@ def feature_mark_passing(
feature_id: The ID of the feature to mark as passing
Returns:
JSON with the updated feature details, or error if not found.
JSON with success confirmation: {success, feature_id, name}
"""
session = get_session()
try:
@@ -262,9 +304,8 @@ def feature_mark_passing(
feature.passes = True
feature.in_progress = False
session.commit()
session.refresh(feature)
return json.dumps(feature.to_dict(), indent=2)
return json.dumps({"success": True, "feature_id": feature_id, "name": feature.name})
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to mark feature passing: {str(e)}"})
@@ -309,7 +350,7 @@ def feature_mark_failing(
return json.dumps({
"message": f"Feature #{feature_id} marked as failing - regression detected",
"feature": feature.to_dict()
}, indent=2)
})
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to mark feature failing: {str(e)}"})
@@ -368,7 +409,7 @@ def feature_skip(
"old_priority": old_priority,
"new_priority": new_priority,
"message": f"Feature '{feature.name}' moved to end of queue"
}, indent=2)
})
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to skip feature: {str(e)}"})
@@ -408,7 +449,7 @@ def feature_mark_in_progress(
session.commit()
session.refresh(feature)
return json.dumps(feature.to_dict(), indent=2)
return json.dumps(feature.to_dict())
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to mark feature in-progress: {str(e)}"})
@@ -416,6 +457,48 @@ def feature_mark_in_progress(
session.close()
@mcp.tool()
def feature_claim_and_get(
feature_id: Annotated[int, Field(description="The ID of the feature to claim", ge=1)]
) -> str:
"""Atomically claim a feature (mark in-progress) and return its full details.
Combines feature_mark_in_progress + feature_get_by_id into a single operation.
If already in-progress, still returns the feature details (idempotent).
Args:
feature_id: The ID of the feature to claim and retrieve
Returns:
JSON with feature details including claimed status, or error if not found.
"""
session = get_session()
try:
feature = session.query(Feature).filter(Feature.id == feature_id).first()
if feature is None:
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
if feature.passes:
return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
# Idempotent: if already in-progress, just return details
already_claimed = feature.in_progress
if not already_claimed:
feature.in_progress = True
session.commit()
session.refresh(feature)
result = feature.to_dict()
result["already_claimed"] = already_claimed
return json.dumps(result)
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to claim feature: {str(e)}"})
finally:
session.close()
@mcp.tool()
def feature_clear_in_progress(
feature_id: Annotated[int, Field(description="The ID of the feature to clear in-progress status", ge=1)]
@@ -442,7 +525,7 @@ def feature_clear_in_progress(
session.commit()
session.refresh(feature)
return json.dumps(feature.to_dict(), indent=2)
return json.dumps(feature.to_dict())
except Exception as e:
session.rollback()
return json.dumps({"error": f"Failed to clear in-progress status: {str(e)}"})
@@ -549,7 +632,7 @@ def feature_create_bulk(
return json.dumps({
"created": len(created_features),
"with_dependencies": deps_count
}, indent=2)
})
except Exception as e:
session.rollback()
return json.dumps({"error": str(e)})
@@ -604,7 +687,7 @@ def feature_create(
"success": True,
"message": f"Created feature: {name}",
"feature": db_feature.to_dict()
}, indent=2)
})
except Exception as e:
session.rollback()
return json.dumps({"error": str(e)})
@@ -754,20 +837,25 @@ def feature_get_ready(
"features": ready[:limit],
"count": len(ready[:limit]),
"total_ready": len(ready)
}, indent=2)
})
finally:
session.close()
@mcp.tool()
def feature_get_blocked() -> str:
"""Get all features that are blocked by unmet dependencies.
def feature_get_blocked(
limit: Annotated[int, Field(default=20, ge=1, le=100, description="Max features to return")] = 20
) -> str:
"""Get features that are blocked by unmet dependencies.
Returns features that have dependencies which are not yet passing.
Each feature includes a 'blocked_by' field listing the blocking feature IDs.
Args:
limit: Maximum number of features to return (1-100, default 20)
Returns:
JSON with: features (list with blocked_by field), count (int)
JSON with: features (list with blocked_by field), count (int), total_blocked (int)
"""
session = get_session()
try:
@@ -787,9 +875,10 @@ def feature_get_blocked() -> str:
})
return json.dumps({
"features": blocked,
"count": len(blocked)
}, indent=2)
"features": blocked[:limit],
"count": len(blocked[:limit]),
"total_blocked": len(blocked)
})
finally:
session.close()
@@ -840,7 +929,7 @@ def feature_get_graph() -> str:
return json.dumps({
"nodes": nodes,
"edges": edges
}, indent=2)
})
finally:
session.close()

View File

@@ -186,6 +186,12 @@ class ParallelOrchestrator:
# Session tracking for logging/debugging
self.session_start_time: datetime = None
# Event signaled when any agent completes, allowing the main loop to wake
# immediately instead of waiting for the full POLL_INTERVAL timeout.
# This reduces latency when spawning the next feature after completion.
self._agent_completed_event: asyncio.Event = None # Created in run_loop
self._event_loop: asyncio.AbstractEventLoop = None # Stored for thread-safe signaling
# Database session for this orchestrator
self._engine, self._session_maker = create_database(project_dir)
@@ -311,6 +317,9 @@ class ParallelOrchestrator:
all_features = session.query(Feature).all()
all_dicts = [f.to_dict() for f in all_features]
# Pre-compute passing_ids once to avoid O(n^2) in the loop
passing_ids = {f.id for f in all_features if f.passes}
ready = []
skipped_reasons = {"passes": 0, "in_progress": 0, "running": 0, "failed": 0, "deps": 0}
for f in all_features:
@@ -329,8 +338,8 @@ class ParallelOrchestrator:
if self._failure_counts.get(f.id, 0) >= MAX_FEATURE_RETRIES:
skipped_reasons["failed"] += 1
continue
# Check dependencies
if are_dependencies_satisfied(f.to_dict(), all_dicts):
# Check dependencies (pass pre-computed passing_ids)
if are_dependencies_satisfied(f.to_dict(), all_dicts, passing_ids):
ready.append(f.to_dict())
else:
skipped_reasons["deps"] += 1
@@ -794,6 +803,52 @@ class ParallelOrchestrator:
finally:
self._on_agent_complete(feature_id, proc.returncode, agent_type, proc)
def _signal_agent_completed(self):
"""Signal that an agent has completed, waking the main loop.
This method is safe to call from any thread. It schedules the event.set()
call to run on the event loop thread to avoid cross-thread issues with
asyncio.Event.
"""
if self._agent_completed_event is not None and self._event_loop is not None:
try:
# Use the stored event loop reference to schedule the set() call
# This is necessary because asyncio.Event is not thread-safe and
# asyncio.get_event_loop() fails in threads without an event loop
if self._event_loop.is_running():
self._event_loop.call_soon_threadsafe(self._agent_completed_event.set)
else:
# Fallback: set directly if loop isn't running (shouldn't happen during normal operation)
self._agent_completed_event.set()
except RuntimeError:
# Event loop closed, ignore (orchestrator may be shutting down)
pass
async def _wait_for_agent_completion(self, timeout: float = POLL_INTERVAL):
"""Wait for an agent to complete or until timeout expires.
This replaces fixed `asyncio.sleep(POLL_INTERVAL)` calls with event-based
waiting. When an agent completes, _signal_agent_completed() sets the event,
causing this method to return immediately. If no agent completes within
the timeout, we return anyway to check for ready features.
Args:
timeout: Maximum seconds to wait (default: POLL_INTERVAL)
"""
if self._agent_completed_event is None:
# Fallback if event not initialized (shouldn't happen in normal operation)
await asyncio.sleep(timeout)
return
try:
await asyncio.wait_for(self._agent_completed_event.wait(), timeout=timeout)
# Event was set - an agent completed. Clear it for the next wait cycle.
self._agent_completed_event.clear()
debug_log.log("EVENT", "Woke up immediately - agent completed")
except asyncio.TimeoutError:
# Timeout reached without agent completion - this is normal, just check anyway
pass
def _on_agent_complete(
self,
feature_id: int | None,
@@ -832,6 +887,8 @@ class ParallelOrchestrator:
pid=proc.pid,
feature_id=feature_id,
status=status)
# Signal main loop that an agent slot is available
self._signal_agent_completed()
return
# Coding agent completion
@@ -843,40 +900,20 @@ class ParallelOrchestrator:
self.running_coding_agents.pop(feature_id, None)
self.abort_events.pop(feature_id, None)
# BEFORE dispose: Query database state to see if it's stale
session_before = self.get_session()
try:
session_before.expire_all()
feature_before = session_before.query(Feature).filter(Feature.id == feature_id).first()
all_before = session_before.query(Feature).all()
passing_before = sum(1 for f in all_before if f.passes)
debug_log.log("DB", f"BEFORE engine.dispose() - Feature #{feature_id} state",
passes=feature_before.passes if feature_before else None,
in_progress=feature_before.in_progress if feature_before else None,
total_passing_in_db=passing_before)
finally:
session_before.close()
# CRITICAL: Refresh database connection to see subprocess commits
# Refresh session cache to see subprocess commits
# The coding agent runs as a subprocess and commits changes (e.g., passes=True).
# SQLAlchemy may have stale connections. Disposing the engine forces new connections
# that will see the subprocess's committed changes.
debug_log.log("DB", "Disposing database engine now...")
self._engine.dispose()
# AFTER dispose: Query again to compare
# Using session.expire_all() is lighter weight than engine.dispose() for SQLite WAL mode
# and is sufficient to invalidate cached data and force fresh reads.
# engine.dispose() is only called on orchestrator shutdown, not on every agent completion.
session = self.get_session()
try:
session.expire_all()
feature = session.query(Feature).filter(Feature.id == feature_id).first()
all_after = session.query(Feature).all()
passing_after = sum(1 for f in all_after if f.passes)
feature_passes = feature.passes if feature else None
feature_in_progress = feature.in_progress if feature else None
debug_log.log("DB", f"AFTER engine.dispose() - Feature #{feature_id} state",
debug_log.log("DB", f"Feature #{feature_id} state after session.expire_all()",
passes=feature_passes,
in_progress=feature_in_progress,
total_passing_in_db=passing_after,
passing_changed=(passing_after != passing_before) if 'passing_before' in dir() else "unknown")
in_progress=feature_in_progress)
if feature and feature.in_progress and not feature.passes:
feature.in_progress = False
session.commit()
@@ -900,6 +937,9 @@ class ParallelOrchestrator:
# CRITICAL: This print triggers the WebSocket to emit agent_update with state='error' or 'success'
print(f"Feature #{feature_id} {status}", flush=True)
# Signal main loop that an agent slot is available
self._signal_agent_completed()
# NOTE: Testing agents are now spawned in start_feature() when coding agents START,
# not here when they complete. This ensures 1:1 ratio and proper termination.
@@ -949,6 +989,12 @@ class ParallelOrchestrator:
"""Main orchestration loop."""
self.is_running = True
# Initialize the agent completion event for this run
# Must be created in the async context where it will be used
self._agent_completed_event = asyncio.Event()
# Store the event loop reference for thread-safe signaling from output reader threads
self._event_loop = asyncio.get_running_loop()
# Track session start for regression testing (UTC for consistency with last_tested_at)
self.session_start_time = datetime.now(timezone.utc)
@@ -1100,8 +1146,8 @@ class ParallelOrchestrator:
at_capacity=(current >= self.max_concurrency))
if current >= self.max_concurrency:
debug_log.log("CAPACITY", "At max capacity, sleeping...")
await asyncio.sleep(POLL_INTERVAL)
debug_log.log("CAPACITY", "At max capacity, waiting for agent completion...")
await self._wait_for_agent_completion()
continue
# Priority 1: Resume features from previous session
@@ -1119,7 +1165,7 @@ class ParallelOrchestrator:
if not ready:
# Wait for running features to complete
if current > 0:
await asyncio.sleep(POLL_INTERVAL)
await self._wait_for_agent_completion()
continue
else:
# No ready features and nothing running
@@ -1138,7 +1184,7 @@ class ParallelOrchestrator:
# Still have pending features but all are blocked by dependencies
print("No ready features available. All remaining features may be blocked by dependencies.", flush=True)
await asyncio.sleep(POLL_INTERVAL * 2)
await self._wait_for_agent_completion(timeout=POLL_INTERVAL * 2)
continue
# Start features up to capacity
@@ -1174,7 +1220,7 @@ class ParallelOrchestrator:
except Exception as e:
print(f"Orchestrator error: {e}", flush=True)
await asyncio.sleep(POLL_INTERVAL)
await self._wait_for_agent_completion()
# Wait for remaining agents to complete
print("Waiting for running agents to complete...", flush=True)
@@ -1184,7 +1230,8 @@ class ParallelOrchestrator:
testing_done = len(self.running_testing_agents) == 0
if coding_done and testing_done:
break
await asyncio.sleep(1)
# Use short timeout since we're just waiting for final agents to finish
await self._wait_for_agent_completion(timeout=1.0)
print("Orchestrator finished.", flush=True)

View File

@@ -72,15 +72,31 @@ def count_passing_tests(project_dir: Path) -> tuple[int, int, int]:
try:
conn = sqlite3.connect(db_file)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM features")
total = cursor.fetchone()[0]
cursor.execute("SELECT COUNT(*) FROM features WHERE passes = 1")
passing = cursor.fetchone()[0]
# Handle case where in_progress column doesn't exist yet
# Single aggregate query instead of 3 separate COUNT queries
# Handle case where in_progress column doesn't exist yet (legacy DBs)
try:
cursor.execute("SELECT COUNT(*) FROM features WHERE in_progress = 1")
in_progress = cursor.fetchone()[0]
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress
FROM features
""")
row = cursor.fetchone()
total = row[0] or 0
passing = row[1] or 0
in_progress = row[2] or 0
except sqlite3.OperationalError:
# Fallback for databases without in_progress column
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing
FROM features
""")
row = cursor.fetchone()
total = row[0] or 0
passing = row[1] or 0
in_progress = 0
conn.close()
return passing, in_progress, total

View File

@@ -109,11 +109,11 @@ The orchestrator has already claimed this feature for you.
def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None, yolo_mode: bool = False) -> str:
"""
Load the coding prompt with single-feature focus instructions prepended.
"""Prepend single-feature assignment header to base coding prompt.
When the orchestrator assigns a specific feature to a coding agent,
this prompt ensures the agent works ONLY on that feature.
Used in parallel mode to assign a specific feature to an agent.
The base prompt already contains the full workflow - this just
identifies which feature to work on.
Args:
feature_id: The specific feature ID to work on
@@ -122,38 +122,20 @@ def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None,
handled by separate testing agents, not YOLO prompts.
Returns:
The prompt with single-feature instructions prepended
The prompt with single-feature header prepended
"""
# Always use the standard coding prompt
# (Testing/regression is handled by separate testing agents)
base_prompt = get_coding_prompt(project_dir)
# Prepend single-feature instructions
single_feature_header = f"""## ASSIGNED FEATURE
# Minimal header - the base prompt already contains the full workflow
single_feature_header = f"""## ASSIGNED FEATURE: #{feature_id}
**You are assigned to work on Feature #{feature_id} ONLY.**
This session is part of a parallel execution where multiple agents work on different features simultaneously.
### Your workflow:
1. **Get feature details** using `feature_get_by_id` with ID {feature_id}
2. **Mark as in-progress** using `feature_mark_in_progress` with ID {feature_id}
- If you get "already in-progress" error, that's OK - continue with implementation
3. **Implement the feature** following the steps from the feature details
4. **Test your implementation** to verify it works correctly
5. **Mark as passing** using `feature_mark_passing` with ID {feature_id}
6. **Commit your changes** and end the session
### Important rules:
- **Do NOT** work on any other features - other agents are handling them
- If blocked, use `feature_skip` and document the blocker in claude-progress.txt
Work ONLY on this feature. Other agents are handling other features.
Use `feature_claim_and_get` with ID {feature_id} to claim it and get details.
If blocked, use `feature_skip` and document the blocker.
---
"""
return single_feature_header + base_prompt