diff --git a/.claude/templates/coding_prompt.template.md b/.claude/templates/coding_prompt.template.md index b4e464e..bce9a14 100644 --- a/.claude/templates/coding_prompt.template.md +++ b/.claude/templates/coding_prompt.template.md @@ -172,48 +172,12 @@ Use browser automation tools: - [ ] Loading states appeared during API calls - [ ] Error states handle failures gracefully -### STEP 5.6: MOCK DATA DETECTION SWEEP +### STEP 5.6: MOCK DATA DETECTION (Before marking passing) -**Run this sweep AFTER EVERY FEATURE before marking it as passing:** - -#### 1. Code Pattern Search - -Search the codebase for forbidden patterns: - -```bash -# Search for mock data patterns -grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx" -grep -r "// TODO\|// FIXME\|// STUB\|// MOCK" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx" -grep -r "hardcoded\|placeholder" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx" -``` - -**If ANY matches found related to your feature - FIX THEM before proceeding.** - -#### 2. Runtime Verification - -For ANY data displayed in UI: - -1. Create NEW data with UNIQUE content (e.g., "TEST_12345_DELETE_ME") -2. Verify that EXACT content appears in the UI -3. Delete the record -4. Verify it's GONE from the UI -5. **If you see data that wasn't created during testing - IT'S MOCK DATA. Fix it.** - -#### 3. Database Verification - -Check that: - -- Database tables contain only data you created during tests -- Counts/statistics match actual database record counts -- No seed data is masquerading as user data - -#### 4. API Response Verification - -For API endpoints used by this feature: - -- Call the endpoint directly -- Verify response contains actual database data -- Empty database = empty response (not pre-populated mock data) +1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"` +2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone +3. **Check database:** All displayed data must come from real DB queries +4. If unexplained data appears, it's mock data - fix before marking passing. ### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!) @@ -273,51 +237,11 @@ Before context fills up: --- -## TESTING REQUIREMENTS +## BROWSER AUTOMATION -**ALL testing must use browser automation tools.** +Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in. -Available tools: - -**Navigation & Screenshots:** - -- browser_navigate - Navigate to a URL -- browser_navigate_back - Go back to previous page -- browser_take_screenshot - Capture screenshot (use for visual verification) -- browser_snapshot - Get accessibility tree snapshot (structured page data) - -**Element Interaction:** - -- browser_click - Click elements (has built-in auto-wait) -- browser_type - Type text into editable elements -- browser_fill_form - Fill multiple form fields at once -- browser_select_option - Select dropdown options -- browser_hover - Hover over elements -- browser_drag - Drag and drop between elements -- browser_press_key - Press keyboard keys - -**Debugging & Monitoring:** - -- browser_console_messages - Get browser console output (check for errors) -- browser_network_requests - Monitor API calls and responses -- browser_evaluate - Execute JavaScript (USE SPARINGLY - debugging only, NOT for bypassing UI) - -**Browser Management:** - -- browser_close - Close the browser -- browser_resize - Resize browser window (use to test mobile: 375x667, tablet: 768x1024, desktop: 1280x720) -- browser_tabs - Manage browser tabs -- browser_wait_for - Wait for text/element/time -- browser_handle_dialog - Handle alert/confirm dialogs -- browser_file_upload - Upload files - -**Key Benefits:** - -- All interaction tools have **built-in auto-wait** - no manual timeouts needed -- Use `browser_console_messages` to detect JavaScript errors -- Use `browser_network_requests` to verify API calls succeed - -Test like a human user with mouse and keyboard. Don't take shortcuts by using JavaScript evaluation. +Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation. --- @@ -381,26 +305,7 @@ This allows you to fully test email-dependent flows without needing external ema --- -## IMPORTANT REMINDERS - -**Your Goal:** Production-quality application with all tests passing - -**This Session's Goal:** Complete at least one feature perfectly - -**Priority:** Fix broken tests before implementing new features - -**Quality Bar:** - -- Zero console errors -- Polished UI matching the design specified in app_spec.txt -- All features work end-to-end through the UI -- Fast, responsive, professional -- **NO MOCK DATA - all data from real database** -- **Security enforced - unauthorized access blocked** -- **All navigation works - no 404s or broken links** - -**You have unlimited time.** Take as long as needed to get it right. The most important thing is that you -leave the code base in a clean state before terminating the session (Step 9). +**Remember:** One feature per session. Zero console errors. All data from real database. Leave codebase clean before ending session. --- diff --git a/.claude/templates/initializer_prompt.template.md b/.claude/templates/initializer_prompt.template.md index f0baffb..c6ee081 100644 --- a/.claude/templates/initializer_prompt.template.md +++ b/.claude/templates/initializer_prompt.template.md @@ -26,82 +26,11 @@ which is the single source of truth for what needs to be built. **Creating Features:** -Use the feature_create_bulk tool to add all features at once. Note: You MUST include `depends_on_indices` -to specify dependencies. Features with no dependencies can run first and enable parallel execution. - -``` -Use the feature_create_bulk tool with features=[ - { - "category": "functional", - "name": "App loads without errors", - "description": "Application starts and renders homepage", - "steps": [ - "Step 1: Navigate to homepage", - "Step 2: Verify no console errors", - "Step 3: Verify main content renders" - ] - // No depends_on_indices = FOUNDATION feature (runs first) - }, - { - "category": "functional", - "name": "User can create an account", - "description": "Basic user registration functionality", - "steps": [ - "Step 1: Navigate to registration page", - "Step 2: Fill in required fields", - "Step 3: Submit form and verify account created" - ], - "depends_on_indices": [0] // Depends on app loading - }, - { - "category": "functional", - "name": "User can log in", - "description": "Authentication with existing credentials", - "steps": [ - "Step 1: Navigate to login page", - "Step 2: Enter credentials", - "Step 3: Verify successful login and redirect" - ], - "depends_on_indices": [0, 1] // Depends on app loading AND registration - }, - { - "category": "functional", - "name": "User can view dashboard", - "description": "Protected dashboard requires authentication", - "steps": [ - "Step 1: Log in as user", - "Step 2: Navigate to dashboard", - "Step 3: Verify personalized content displays" - ], - "depends_on_indices": [2] // Depends on login only - }, - { - "category": "functional", - "name": "User can update profile", - "description": "User can modify their profile information", - "steps": [ - "Step 1: Log in as user", - "Step 2: Navigate to profile settings", - "Step 3: Update and save profile" - ], - "depends_on_indices": [2] // ALSO depends on login (WIDE GRAPH - can run parallel with dashboard!) - } -] -``` +Use the feature_create_bulk tool to add all features at once. You can create features in batches if there are many (e.g., 50 at a time). **Notes:** - IDs and priorities are assigned automatically based on order - All features start with `passes: false` by default -- You can create features in batches if there are many (e.g., 50 at a time) -- **CRITICAL:** Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below) - -**DEPENDENCY REQUIREMENT:** -You MUST specify dependencies using `depends_on_indices` for features that logically depend on others. -- Features 0-9 should have NO dependencies (foundation/setup features) -- Features 10+ MUST have at least some dependencies where logical -- Create WIDE dependency graphs, not linear chains: - - BAD: A -> B -> C -> D -> E (linear chain, only 1 feature can run at a time) - - GOOD: A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, multiple features can run in parallel) **Requirements for features:** @@ -114,7 +43,6 @@ You MUST specify dependencies using `depends_on_indices` for features that logic - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps) - At least 25 tests MUST have 10+ steps each (more for complex apps) - Order features by priority: fundamental features first (the API assigns priority based on order) -- All features start with `passes: false` automatically - Cover every feature in the spec exhaustively - **MUST include tests from ALL 20 mandatory categories below** @@ -122,125 +50,68 @@ You MUST specify dependencies using `depends_on_indices` for features that logic ## FEATURE DEPENDENCIES (MANDATORY) -**THIS SECTION IS MANDATORY. You MUST specify dependencies for features.** +Dependencies enable **parallel execution** of independent features. When specified correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development. -Dependencies enable **parallel execution** of independent features. When you specify dependencies correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development. +**Why this matters:** Without dependencies, features execute in random order, causing logical issues (e.g., "Edit user" before "Create user") and preventing efficient parallelization. -**WARNING:** If you do not specify dependencies, ALL features will be ready immediately, which: -1. Overwhelms the parallel agents trying to work on unrelated features -2. Results in features being implemented in random order -3. Causes logical issues (e.g., "Edit user" attempted before "Create user") +### Dependency Rules -You MUST analyze each feature and specify its dependencies using `depends_on_indices`. +1. **Use `depends_on_indices`** (0-based array indices) to reference dependencies +2. **Can only depend on EARLIER features** (index must be less than current position) +3. **No circular dependencies** allowed +4. **Maximum 20 dependencies** per feature +5. **Foundation features (index 0-9)** should have NO dependencies +6. **60% of features after index 10** should have at least one dependency -### Why Dependencies Matter +### Dependency Types -1. **Parallel Execution**: Features without dependencies can run in parallel -2. **Logical Ordering**: Ensures features are built in the right order -3. **Blocking Prevention**: An agent won't start a feature until its dependencies pass +| Type | Example | +|------|---------| +| Data | "Edit item" depends on "Create item" | +| Auth | "View dashboard" depends on "User can log in" | +| Navigation | "Modal close works" depends on "Modal opens" | +| UI | "Filter results" depends on "Display results list" | -### How to Determine Dependencies +### Wide Graph Pattern (REQUIRED) -Ask yourself: "What MUST be working before this feature can be tested?" +Create WIDE dependency graphs, not linear chains: +- **BAD:** A -> B -> C -> D -> E (linear chain, only 1 feature runs at a time) +- **GOOD:** A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, parallel execution) -| Dependency Type | Example | -|-----------------|---------| -| **Data dependencies** | "Edit item" depends on "Create item" | -| **Auth dependencies** | "View dashboard" depends on "User can log in" | -| **Navigation dependencies** | "Modal close works" depends on "Modal opens" | -| **UI dependencies** | "Filter results" depends on "Display results list" | -| **API dependencies** | "Fetch user data" depends on "API authentication" | - -### Using `depends_on_indices` - -Since feature IDs aren't assigned until after creation, use **array indices** (0-based) to reference dependencies: - -```json -{ - "features": [ - { "name": "Create account", ... }, // Index 0 - { "name": "Login", "depends_on_indices": [0] }, // Index 1, depends on 0 - { "name": "View profile", "depends_on_indices": [1] }, // Index 2, depends on 1 - { "name": "Edit profile", "depends_on_indices": [2] } // Index 3, depends on 2 - ] -} -``` - -### Rules for Dependencies - -1. **Can only depend on EARLIER features**: Index must be less than current feature's position -2. **No circular dependencies**: A cannot depend on B if B depends on A -3. **Maximum 20 dependencies** per feature -4. **Foundation features have NO dependencies**: First features in each category typically have none -5. **Don't over-depend**: Only add dependencies that are truly required for testing - -### Best Practices - -1. **Start with foundation features** (index 0-10): Core setup, basic navigation, authentication -2. **Group related features together**: Keep CRUD operations adjacent -3. **Chain complex flows**: Registration -> Login -> Dashboard -> Settings -4. **Keep dependencies shallow**: Prefer 1-2 dependencies over deep chains -5. **Skip dependencies for independent features**: Visual tests often have no dependencies - -### Minimum Dependency Coverage - -**REQUIREMENT:** At least 60% of your features (after index 10) should have at least one dependency. - -Target structure for a 150-feature project: -- Features 0-9: Foundation (0 dependencies) - App loads, basic setup -- Features 10-149: At least 84 should have dependencies (60% of 140) - -This ensures: -- A good mix of parallelizable features (foundation) -- Logical ordering for dependent features - -### Example: Todo App Feature Chain (Wide Graph Pattern) - -This example shows the CORRECT wide graph pattern where multiple features share the same dependency, -enabling parallel execution: +### Complete Example ```json [ - // FOUNDATION TIER (indices 0-2, no dependencies) - // These run first and enable everything else + // FOUNDATION TIER (indices 0-2, no dependencies) - run first { "name": "App loads without errors", "category": "functional" }, { "name": "Navigation bar displays", "category": "style" }, { "name": "Homepage renders correctly", "category": "functional" }, - // AUTH TIER (indices 3-5, depend on foundation) - // These can all run in parallel once foundation passes + // AUTH TIER (indices 3-5, depend on foundation) - run in parallel { "name": "User can register", "depends_on_indices": [0] }, { "name": "User can login", "depends_on_indices": [0, 3] }, { "name": "User can logout", "depends_on_indices": [4] }, - // CORE CRUD TIER (indices 6-9, depend on auth) - // WIDE GRAPH: All 4 of these depend on login (index 4) - // This means all 4 can start as soon as login passes! + // CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login + // All 4 start as soon as login passes! { "name": "User can create todo", "depends_on_indices": [4] }, { "name": "User can view todos", "depends_on_indices": [4] }, { "name": "User can edit todo", "depends_on_indices": [4, 6] }, { "name": "User can delete todo", "depends_on_indices": [4, 6] }, - // ADVANCED TIER (indices 10-11, depend on CRUD) - // Note: filter and search both depend on view (7), not on each other + // ADVANCED TIER (indices 10-11) - both depend on view, not each other { "name": "User can filter todos", "depends_on_indices": [7] }, { "name": "User can search todos", "depends_on_indices": [7] } ] ``` -**Parallelism analysis of this example:** -- Foundation tier: 3 features can run in parallel -- Auth tier: 3 features wait for foundation, then can run (mostly parallel) -- CRUD tier: 4 features can start once login passes (all 4 in parallel!) -- Advanced tier: 2 features can run once view passes (both in parallel) - **Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles. --- ## MANDATORY TEST CATEGORIES -The feature_list.json **MUST** include tests from ALL of these categories. The minimum counts scale by complexity tier. +The feature_list.json **MUST** include tests from ALL 20 categories. Minimum counts scale by complexity tier. ### Category Distribution by Complexity Tier @@ -270,331 +141,47 @@ The feature_list.json **MUST** include tests from ALL of these categories. The m --- -### A. Security & Access Control Tests +### Category Descriptions -Test that unauthorized access is blocked and permissions are enforced. +**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users. -**Required tests (examples):** +**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects. -- Unauthenticated user cannot access protected routes (redirect to login) -- Regular user cannot access admin-only pages (403 or redirect) -- API endpoints return 401 for unauthenticated requests -- API endpoints return 403 for unauthorized role access -- Session expires after configured inactivity period -- Logout clears all session data and tokens -- Invalid/expired tokens are rejected -- Each role can ONLY see their permitted menu items -- Direct URL access to unauthorized pages is blocked -- Sensitive operations require confirmation or re-authentication -- Cannot access another user's data by manipulating IDs in URL -- Password reset flow works securely -- Failed login attempts are handled (no information leakage) +**C. Real Data Verification** - Test data persistence across refreshes and sessions, CRUD operations with unique test data, related record updates, and empty states. -### B. Navigation Integrity Tests +**D. Workflow Completeness** - Test end-to-end CRUD for every entity, state transitions, multi-step wizards, bulk operations, and form submission feedback. -Test that every button, link, and menu item goes to the correct place. +**E. Error Handling** - Test network failures, invalid input, API errors, 404/500 responses, loading states, timeouts, and user-friendly error messages. -**Required tests (examples):** +**F. UI-Backend Integration** - Test request/response format matching, database-driven dropdowns, cascading updates, filters/sorts with real data, and API error display. -- Every button in sidebar navigates to correct page -- Every menu item links to existing route -- All CRUD action buttons (Edit, Delete, View) go to correct URLs with correct IDs -- Back button works correctly after each navigation -- Deep linking works (direct URL access to any page with auth) -- Breadcrumbs reflect actual navigation path -- 404 page shown for non-existent routes (not crash) -- After login, user redirected to intended destination (or dashboard) -- After logout, user redirected to login page -- Pagination links work and preserve current filters -- Tab navigation within pages works correctly -- Modal close buttons return to previous state -- Cancel buttons on forms return to previous page +**G. State & Persistence** - Test refresh mid-form, session recovery, multi-tab behavior, back-button after submit, and unsaved changes warnings. -### C. Real Data Verification Tests +**H. URL & Direct Access** - Test URL manipulation security, direct route access by role, malformed parameters, deep links to deleted entities, and shareable filter URLs. -Test that data is real (not mocked) and persists correctly. +**I. Double-Action & Idempotency** - Test double-click submit, rapid delete clicks, back-and-resubmit, button disabled during processing, and concurrent submissions. -**Required tests (examples):** +**J. Data Cleanup & Cascade** - Test parent deletion effects on children, removal from search/lists/dropdowns, statistics updates, and soft vs hard delete behavior. -- Create a record via UI with unique content → verify it appears in list -- Create a record → refresh page → record still exists -- Create a record → log out → log in → record still exists -- Edit a record → verify changes persist after refresh -- Delete a record → verify it's gone from list AND database -- Delete a record → verify it's gone from related dropdowns -- Filter/search → results match actual data created in test -- Dashboard statistics reflect real record counts (create 3 items, count shows 3) -- Reports show real aggregated data -- Export functionality exports actual data you created -- Related records update when parent changes -- Timestamps are real and accurate (created_at, updated_at) -- Data created by User A is not visible to User B (unless shared) -- Empty state shows correctly when no data exists +**K. Default & Reset** - Test form defaults, sensible date picker defaults, dropdown placeholders, reset button behavior, and filter/pagination reset on context change. -### D. Workflow Completeness Tests +**L. Search & Filter Edge Cases** - Test empty search, whitespace-only, special characters, quotes, long strings, zero-result combinations, and filter persistence. -Test that every workflow can be completed end-to-end through the UI. +**M. Form Validation** - Test required fields, email/password/numeric/date formats, min/max constraints, uniqueness, specific error messages, and server-side validation. -**Required tests (examples):** +**N. Feedback & Notification** - Test success/error feedback for all actions, loading spinners, disabled buttons during submit, progress indicators, and toast behavior. -- Every entity has working Create operation via UI form -- Every entity has working Read/View operation (detail page loads) -- Every entity has working Update operation (edit form saves) -- Every entity has working Delete operation (with confirmation dialog) -- Every status/state has a UI mechanism to transition to next state -- Multi-step processes (wizards) can be completed end-to-end -- Bulk operations (select all, delete selected) work -- Cancel/Undo operations work where applicable -- Required fields prevent submission when empty -- Form validation shows errors before submission -- Successful submission shows success feedback -- Backend workflow (e.g., user→customer conversion) has UI trigger +**O. Responsive & Layout** - Test layouts at desktop (1920px), tablet (768px), and mobile (375px), no horizontal scroll, touch targets, modal fit, and text overflow. -### E. Error Handling Tests +**P. Accessibility** - Test tab navigation, focus rings, screen reader compatibility, ARIA labels, color contrast, labels on form fields, and error announcements. -Test graceful handling of errors and edge cases. +**Q. Temporal & Timezone** - Test timezone-aware display, accurate timestamps, date picker constraints, overdue detection, and date sorting across boundaries. -**Required tests (examples):** +**R. Concurrency & Race Conditions** - Test concurrent edits, viewing deleted records, pagination during updates, rapid navigation, and late API response handling. -- Network failure shows user-friendly error message, not crash -- Invalid form input shows field-level errors -- API errors display meaningful messages to user -- 404 responses handled gracefully (show not found page) -- 500 responses don't expose stack traces or technical details -- Empty search results show "no results found" message -- Loading states shown during all async operations -- Timeout doesn't hang the UI indefinitely -- Submitting form with server error keeps user data in form -- File upload errors (too large, wrong type) show clear message -- Duplicate entry errors (e.g., email already exists) are clear +**S. Export/Import** - Test full/filtered export, import with valid/duplicate/malformed files, and round-trip data integrity. -### F. UI-Backend Integration Tests - -Test that frontend and backend communicate correctly. - -**Required tests (examples):** - -- Frontend request format matches what backend expects -- Backend response format matches what frontend parses -- All dropdown options come from real database data (not hardcoded) -- Related entity selectors (e.g., "choose category") populated from DB -- Changes in one area reflect in related areas after refresh -- Deleting parent handles children correctly (cascade or block) -- Filters work with actual data attributes from database -- Sort functionality sorts real data correctly -- Pagination returns correct page of real data -- API error responses are parsed and displayed correctly -- Loading spinners appear during API calls -- Optimistic updates (if used) rollback on failure - -### G. State & Persistence Tests - -Test that state is maintained correctly across sessions and tabs. - -**Required tests (examples):** - -- Refresh page mid-form - appropriate behavior (data kept or cleared) -- Close browser, reopen - session state handled correctly -- Same user in two browser tabs - changes sync or handled gracefully -- Browser back after form submit - no duplicate submission -- Bookmark a page, return later - works (with auth check) -- LocalStorage/cookies cleared - graceful re-authentication -- Unsaved changes warning when navigating away from dirty form - -### H. URL & Direct Access Tests - -Test direct URL access and URL manipulation security. - -**Required tests (examples):** - -- Change entity ID in URL - cannot access others' data -- Access /admin directly as regular user - blocked -- Malformed URL parameters - handled gracefully (no crash) -- Very long URL - handled correctly -- URL with SQL injection attempt - rejected/sanitized -- Deep link to deleted entity - shows "not found", not crash -- Query parameters for filters are reflected in UI -- Sharing a URL with filters preserves those filters - -### I. Double-Action & Idempotency Tests - -Test that rapid or duplicate actions don't cause issues. - -**Required tests (examples):** - -- Double-click submit button - only one record created -- Rapid multiple clicks on delete - only one deletion occurs -- Submit form, hit back, submit again - appropriate behavior -- Multiple simultaneous API calls - server handles correctly -- Refresh during save operation - data not corrupted -- Click same navigation link twice quickly - no issues -- Submit button disabled during processing - -### J. Data Cleanup & Cascade Tests - -Test that deleting data cleans up properly everywhere. - -**Required tests (examples):** - -- Delete parent entity - children removed from all views -- Delete item - removed from search results immediately -- Delete item - statistics/counts updated immediately -- Delete item - related dropdowns updated -- Delete item - cached views refreshed -- Soft delete (if applicable) - item hidden but recoverable -- Hard delete - item completely removed from database - -### K. Default & Reset Tests - -Test that defaults and reset functionality work correctly. - -**Required tests (examples):** - -- New form shows correct default values -- Date pickers default to sensible dates (today, not 1970) -- Dropdowns default to correct option (or placeholder) -- Reset button clears to defaults, not just empty -- Clear filters button resets all filters to default -- Pagination resets to page 1 when filters change -- Sorting resets when changing views - -### L. Search & Filter Edge Cases - -Test search and filter functionality thoroughly. - -**Required tests (examples):** - -- Empty search shows all results (or appropriate message) -- Search with only spaces - handled correctly -- Search with special characters (!@#$%^&\*) - no errors -- Search with quotes - handled correctly -- Search with very long string - handled correctly -- Filter combinations that return zero results - shows message -- Filter + search + sort together - all work correctly -- Filter persists after viewing detail and returning to list -- Clear individual filter - works correctly -- Search is case-insensitive (or clearly case-sensitive) - -### M. Form Validation Tests - -Test all form validation rules exhaustively. - -**Required tests (examples):** - -- Required field empty - shows error, blocks submit -- Email field with invalid email formats - shows error -- Password field - enforces complexity requirements -- Numeric field with letters - rejected -- Date field with invalid date - rejected -- Min/max length enforced on text fields -- Min/max values enforced on numeric fields -- Duplicate unique values rejected (e.g., duplicate email) -- Error messages are specific (not just "invalid") -- Errors clear when user fixes the issue -- Server-side validation matches client-side -- Whitespace-only input rejected for required fields - -### N. Feedback & Notification Tests - -Test that users get appropriate feedback for all actions. - -**Required tests (examples):** - -- Every successful save/create shows success feedback -- Every failed action shows error feedback -- Loading spinner during every async operation -- Disabled state on buttons during form submission -- Progress indicator for long operations (file upload) -- Toast/notification disappears after appropriate time -- Multiple notifications don't overlap incorrectly -- Success messages are specific (not just "Success") - -### O. Responsive & Layout Tests - -Test that the UI works on different screen sizes. - -**Required tests (examples):** - -- Desktop layout correct at 1920px width -- Tablet layout correct at 768px width -- Mobile layout correct at 375px width -- No horizontal scroll on any standard viewport -- Touch targets large enough on mobile (44px min) -- Modals fit within viewport on mobile -- Long text truncates or wraps correctly (no overflow) -- Tables scroll horizontally if needed on mobile -- Navigation collapses appropriately on mobile - -### P. Accessibility Tests - -Test basic accessibility compliance. - -**Required tests (examples):** - -- Tab navigation works through all interactive elements -- Focus ring visible on all focused elements -- Screen reader can navigate main content areas -- ARIA labels on icon-only buttons -- Color contrast meets WCAG AA (4.5:1 for text) -- No information conveyed by color alone -- Form fields have associated labels -- Error messages announced to screen readers -- Skip link to main content (if applicable) -- Images have alt text - -### Q. Temporal & Timezone Tests - -Test date/time handling. - -**Required tests (examples):** - -- Dates display in user's local timezone -- Created/updated timestamps accurate and formatted correctly -- Date picker allows only valid date ranges -- Overdue items identified correctly (timezone-aware) -- "Today", "This Week" filters work correctly for user's timezone -- Recurring items generate at correct times (if applicable) -- Date sorting works correctly across months/years - -### R. Concurrency & Race Condition Tests - -Test multi-user and race condition scenarios. - -**Required tests (examples):** - -- Two users edit same record - last save wins or conflict shown -- Record deleted while another user viewing - graceful handling -- List updates while user on page 2 - pagination still works -- Rapid navigation between pages - no stale data displayed -- API response arrives after user navigated away - no crash -- Concurrent form submissions from same user handled - -### S. Export/Import Tests (if applicable) - -Test data export and import functionality. - -**Required tests (examples):** - -- Export all data - file contains all records -- Export filtered data - only filtered records included -- Import valid file - all records created correctly -- Import duplicate data - handled correctly (skip/update/error) -- Import malformed file - error message, no partial import -- Export then import - data integrity preserved exactly - -### T. Performance Tests - -Test basic performance requirements. - -**Required tests (examples):** - -- Page loads in <3s with 100 records -- Page loads in <5s with 1000 records -- Search responds in <1s -- Infinite scroll doesn't degrade with many items -- Large file upload shows progress -- Memory doesn't leak on long sessions -- No console errors during normal operation +**T. Performance** - Test page load with 100/1000 records, search response time, infinite scroll stability, upload progress, and memory/console errors. --- diff --git a/api/database.py b/api/database.py index fd82847..af2fd01 100644 --- a/api/database.py +++ b/api/database.py @@ -21,6 +21,7 @@ from sqlalchemy import ( Column, DateTime, ForeignKey, + Index, Integer, String, Text, @@ -39,6 +40,12 @@ class Feature(Base): __tablename__ = "features" + # Composite index for common status query pattern (passes, in_progress) + # Used by feature_get_stats, get_ready_features, and other status queries + __table_args__ = ( + Index('ix_feature_status', 'passes', 'in_progress'), + ) + id = Column(Integer, primary_key=True, index=True) priority = Column(Integer, nullable=False, default=999, index=True) category = Column(String(100), nullable=False) diff --git a/api/dependency_resolver.py b/api/dependency_resolver.py index 3e1980b..103cee7 100644 --- a/api/dependency_resolver.py +++ b/api/dependency_resolver.py @@ -6,6 +6,7 @@ Provides dependency resolution using Kahn's algorithm for topological sorting. Includes cycle detection, validation, and helper functions for dependency management. """ +import heapq from typing import TypedDict # Security: Prevent DoS via excessive dependencies @@ -55,19 +56,27 @@ def resolve_dependencies(features: list[dict]) -> DependencyResult: if not dep.get("passes"): blocked.setdefault(feature["id"], []).append(dep_id) - # Kahn's algorithm with priority-aware selection - queue = [f for f in features if in_degree[f["id"]] == 0] - queue.sort(key=lambda f: (f.get("priority", 999), f["id"])) + # Kahn's algorithm with priority-aware selection using a heap + # Heap entries are tuples: (priority, id, feature_dict) for stable ordering + heap = [ + (f.get("priority", 999), f["id"], f) + for f in features + if in_degree[f["id"]] == 0 + ] + heapq.heapify(heap) ordered: list[dict] = [] - while queue: - current = queue.pop(0) + while heap: + _, _, current = heapq.heappop(heap) ordered.append(current) for dependent_id in adjacency[current["id"]]: in_degree[dependent_id] -= 1 if in_degree[dependent_id] == 0: - queue.append(feature_map[dependent_id]) - queue.sort(key=lambda f: (f.get("priority", 999), f["id"])) + dep_feature = feature_map[dependent_id] + heapq.heappush( + heap, + (dep_feature.get("priority", 999), dependent_id, dep_feature) + ) # Detect cycles (features not in ordered = part of cycle) cycles: list[list[int]] = [] @@ -84,12 +93,19 @@ def resolve_dependencies(features: list[dict]) -> DependencyResult: } -def are_dependencies_satisfied(feature: dict, all_features: list[dict]) -> bool: +def are_dependencies_satisfied( + feature: dict, + all_features: list[dict], + passing_ids: set[int] | None = None, +) -> bool: """Check if all dependencies have passes=True. Args: feature: Feature dict to check all_features: List of all feature dicts + passing_ids: Optional pre-computed set of passing feature IDs. + If None, will be computed from all_features. Pass this when + calling in a loop to avoid O(n^2) complexity. Returns: True if all dependencies are satisfied (or no dependencies) @@ -97,22 +113,31 @@ def are_dependencies_satisfied(feature: dict, all_features: list[dict]) -> bool: deps = feature.get("dependencies") or [] if not deps: return True - passing_ids = {f["id"] for f in all_features if f.get("passes")} + if passing_ids is None: + passing_ids = {f["id"] for f in all_features if f.get("passes")} return all(dep_id in passing_ids for dep_id in deps) -def get_blocking_dependencies(feature: dict, all_features: list[dict]) -> list[int]: +def get_blocking_dependencies( + feature: dict, + all_features: list[dict], + passing_ids: set[int] | None = None, +) -> list[int]: """Get list of incomplete dependency IDs. Args: feature: Feature dict to check all_features: List of all feature dicts + passing_ids: Optional pre-computed set of passing feature IDs. + If None, will be computed from all_features. Pass this when + calling in a loop to avoid O(n^2) complexity. Returns: List of feature IDs that are blocking this feature """ deps = feature.get("dependencies") or [] - passing_ids = {f["id"] for f in all_features if f.get("passes")} + if passing_ids is None: + passing_ids = {f["id"] for f in all_features if f.get("passes")} return [dep_id for dep_id in deps if dep_id not in passing_ids] diff --git a/client.py b/client.py index eff81f8..e844aa4 100644 --- a/client.py +++ b/client.py @@ -12,7 +12,7 @@ import sys from pathlib import Path from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient -from claude_agent_sdk.types import HookMatcher +from claude_agent_sdk.types import HookContext, HookInput, HookMatcher, SyncHookJSONOutput from dotenv import load_dotenv from security import bash_security_hook @@ -55,7 +55,9 @@ FEATURE_MCP_TOOLS = [ # Core feature operations "mcp__features__feature_get_stats", "mcp__features__feature_get_by_id", # Get assigned feature details + "mcp__features__feature_get_summary", # Lightweight: id, name, status, deps only "mcp__features__feature_mark_in_progress", + "mcp__features__feature_claim_and_get", # Atomic claim + get details "mcp__features__feature_mark_passing", "mcp__features__feature_mark_failing", # Mark regression detected "mcp__features__feature_skip", @@ -268,6 +270,45 @@ def create_client( context["project_dir"] = str(project_dir.resolve()) return await bash_security_hook(input_data, tool_use_id, context) + # PreCompact hook for logging and customizing context compaction + # Compaction is handled automatically by Claude Code CLI when context approaches limits. + # This hook allows us to log when compaction occurs and optionally provide custom instructions. + async def pre_compact_hook( + input_data: HookInput, + tool_use_id: str | None, + context: HookContext, + ) -> SyncHookJSONOutput: + """ + Hook called before context compaction occurs. + + Compaction triggers: + - "auto": Automatic compaction when context approaches token limits + - "manual": User-initiated compaction via /compact command + + The hook can customize compaction via hookSpecificOutput: + - customInstructions: String with focus areas for summarization + """ + trigger = input_data.get("trigger", "auto") + custom_instructions = input_data.get("custom_instructions") + + if trigger == "auto": + print("[Context] Auto-compaction triggered (context approaching limit)") + else: + print("[Context] Manual compaction requested") + + if custom_instructions: + print(f"[Context] Custom instructions: {custom_instructions}") + + # Return empty dict to allow compaction to proceed with default behavior + # To customize, return: + # { + # "hookSpecificOutput": { + # "hookEventName": "PreCompact", + # "customInstructions": "Focus on preserving file paths and test results" + # } + # } + return SyncHookJSONOutput() + return ClaudeSDKClient( options=ClaudeAgentOptions( model=model, @@ -281,10 +322,35 @@ def create_client( "PreToolUse": [ HookMatcher(matcher="Bash", hooks=[bash_hook_with_context]), ], + # PreCompact hook for context management during long sessions. + # Compaction is automatic when context approaches token limits. + # This hook logs compaction events and can customize summarization. + "PreCompact": [ + HookMatcher(hooks=[pre_compact_hook]), + ], }, max_turns=1000, cwd=str(project_dir.resolve()), settings=str(settings_file.resolve()), # Use absolute path env=sdk_env, # Pass API configuration overrides to CLI subprocess + # Enable extended context beta for better handling of long sessions. + # This provides up to 1M tokens of context with automatic compaction. + # See: https://docs.anthropic.com/en/api/beta-headers + betas=["context-1m-2025-08-07"], + # Note on context management: + # The Claude Agent SDK handles context management automatically through the + # underlying Claude Code CLI. When context approaches limits, the CLI + # automatically compacts/summarizes previous messages. + # + # The SDK does NOT expose explicit compaction_control or context_management + # parameters. Instead, context is managed via: + # 1. betas=["context-1m-2025-08-07"] - Extended context window + # 2. PreCompact hook - Intercept and customize compaction behavior + # 3. max_turns - Limit conversation turns (set to 1000 for long sessions) + # + # Future SDK versions may add explicit compaction controls. When available, + # consider adding: + # - compaction_control={"enabled": True, "context_token_threshold": 80000} + # - context_management={"edits": [...]} for tool use clearing ) ) diff --git a/mcp_server/feature_mcp.py b/mcp_server/feature_mcp.py index 0a00e8c..3c25001 100755 --- a/mcp_server/feature_mcp.py +++ b/mcp_server/feature_mcp.py @@ -8,10 +8,12 @@ Provides tools to manage features in the autonomous coding system. Tools: - feature_get_stats: Get progress statistics - feature_get_by_id: Get a specific feature by ID +- feature_get_summary: Get minimal feature info (id, name, status, deps) - feature_mark_passing: Mark a feature as passing - feature_mark_failing: Mark a feature as failing (regression detected) - feature_skip: Skip a feature (move to end of queue) - feature_mark_in_progress: Mark a feature as in-progress +- feature_claim_and_get: Atomically claim and get feature details - feature_clear_in_progress: Clear in-progress status - feature_release_testing: Release testing lock on a feature - feature_create_bulk: Create multiple features at once @@ -19,7 +21,7 @@ Tools: - feature_add_dependency: Add a dependency between features - feature_remove_dependency: Remove a dependency - feature_get_ready: Get features ready to implement -- feature_get_blocked: Get features blocked by dependencies +- feature_get_blocked: Get features blocked by dependencies (with limit) - feature_get_graph: Get the dependency graph Note: Feature selection (which feature to work on) is handled by the @@ -142,11 +144,20 @@ def feature_get_stats() -> str: Returns: JSON with: passing (int), in_progress (int), total (int), percentage (float) """ + from sqlalchemy import case, func + session = get_session() try: - total = session.query(Feature).count() - passing = session.query(Feature).filter(Feature.passes == True).count() - in_progress = session.query(Feature).filter(Feature.in_progress == True).count() + # Single aggregate query instead of 3 separate COUNT queries + result = session.query( + func.count(Feature.id).label('total'), + func.sum(case((Feature.passes == True, 1), else_=0)).label('passing'), + func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress') + ).first() + + total = result.total or 0 + passing = int(result.passing or 0) + in_progress = int(result.in_progress or 0) percentage = round((passing / total) * 100, 1) if total > 0 else 0.0 return json.dumps({ @@ -154,7 +165,7 @@ def feature_get_stats() -> str: "in_progress": in_progress, "total": total, "percentage": percentage - }, indent=2) + }) finally: session.close() @@ -181,7 +192,38 @@ def feature_get_by_id( if feature is None: return json.dumps({"error": f"Feature with ID {feature_id} not found"}) - return json.dumps(feature.to_dict(), indent=2) + return json.dumps(feature.to_dict()) + finally: + session.close() + + +@mcp.tool() +def feature_get_summary( + feature_id: Annotated[int, Field(description="The ID of the feature", ge=1)] +) -> str: + """Get minimal feature info: id, name, status, and dependencies only. + + Use this instead of feature_get_by_id when you only need status info, + not the full description and steps. This reduces response size significantly. + + Args: + feature_id: The ID of the feature to retrieve + + Returns: + JSON with: id, name, passes, in_progress, dependencies + """ + session = get_session() + try: + feature = session.query(Feature).filter(Feature.id == feature_id).first() + if feature is None: + return json.dumps({"error": f"Feature with ID {feature_id} not found"}) + return json.dumps({ + "id": feature.id, + "name": feature.name, + "passes": feature.passes, + "in_progress": feature.in_progress, + "dependencies": feature.dependencies or [] + }) finally: session.close() @@ -229,7 +271,7 @@ def feature_release_testing( return json.dumps({ "message": f"Feature #{feature_id} testing {status}", "feature": feature.to_dict() - }, indent=2) + }) except Exception as e: session.rollback() return json.dumps({"error": f"Failed to release testing claim: {str(e)}"}) @@ -250,7 +292,7 @@ def feature_mark_passing( feature_id: The ID of the feature to mark as passing Returns: - JSON with the updated feature details, or error if not found. + JSON with success confirmation: {success, feature_id, name} """ session = get_session() try: @@ -262,9 +304,8 @@ def feature_mark_passing( feature.passes = True feature.in_progress = False session.commit() - session.refresh(feature) - return json.dumps(feature.to_dict(), indent=2) + return json.dumps({"success": True, "feature_id": feature_id, "name": feature.name}) except Exception as e: session.rollback() return json.dumps({"error": f"Failed to mark feature passing: {str(e)}"}) @@ -309,7 +350,7 @@ def feature_mark_failing( return json.dumps({ "message": f"Feature #{feature_id} marked as failing - regression detected", "feature": feature.to_dict() - }, indent=2) + }) except Exception as e: session.rollback() return json.dumps({"error": f"Failed to mark feature failing: {str(e)}"}) @@ -368,7 +409,7 @@ def feature_skip( "old_priority": old_priority, "new_priority": new_priority, "message": f"Feature '{feature.name}' moved to end of queue" - }, indent=2) + }) except Exception as e: session.rollback() return json.dumps({"error": f"Failed to skip feature: {str(e)}"}) @@ -408,7 +449,7 @@ def feature_mark_in_progress( session.commit() session.refresh(feature) - return json.dumps(feature.to_dict(), indent=2) + return json.dumps(feature.to_dict()) except Exception as e: session.rollback() return json.dumps({"error": f"Failed to mark feature in-progress: {str(e)}"}) @@ -416,6 +457,48 @@ def feature_mark_in_progress( session.close() +@mcp.tool() +def feature_claim_and_get( + feature_id: Annotated[int, Field(description="The ID of the feature to claim", ge=1)] +) -> str: + """Atomically claim a feature (mark in-progress) and return its full details. + + Combines feature_mark_in_progress + feature_get_by_id into a single operation. + If already in-progress, still returns the feature details (idempotent). + + Args: + feature_id: The ID of the feature to claim and retrieve + + Returns: + JSON with feature details including claimed status, or error if not found. + """ + session = get_session() + try: + feature = session.query(Feature).filter(Feature.id == feature_id).first() + + if feature is None: + return json.dumps({"error": f"Feature with ID {feature_id} not found"}) + + if feature.passes: + return json.dumps({"error": f"Feature with ID {feature_id} is already passing"}) + + # Idempotent: if already in-progress, just return details + already_claimed = feature.in_progress + if not already_claimed: + feature.in_progress = True + session.commit() + session.refresh(feature) + + result = feature.to_dict() + result["already_claimed"] = already_claimed + return json.dumps(result) + except Exception as e: + session.rollback() + return json.dumps({"error": f"Failed to claim feature: {str(e)}"}) + finally: + session.close() + + @mcp.tool() def feature_clear_in_progress( feature_id: Annotated[int, Field(description="The ID of the feature to clear in-progress status", ge=1)] @@ -442,7 +525,7 @@ def feature_clear_in_progress( session.commit() session.refresh(feature) - return json.dumps(feature.to_dict(), indent=2) + return json.dumps(feature.to_dict()) except Exception as e: session.rollback() return json.dumps({"error": f"Failed to clear in-progress status: {str(e)}"}) @@ -549,7 +632,7 @@ def feature_create_bulk( return json.dumps({ "created": len(created_features), "with_dependencies": deps_count - }, indent=2) + }) except Exception as e: session.rollback() return json.dumps({"error": str(e)}) @@ -604,7 +687,7 @@ def feature_create( "success": True, "message": f"Created feature: {name}", "feature": db_feature.to_dict() - }, indent=2) + }) except Exception as e: session.rollback() return json.dumps({"error": str(e)}) @@ -754,20 +837,25 @@ def feature_get_ready( "features": ready[:limit], "count": len(ready[:limit]), "total_ready": len(ready) - }, indent=2) + }) finally: session.close() @mcp.tool() -def feature_get_blocked() -> str: - """Get all features that are blocked by unmet dependencies. +def feature_get_blocked( + limit: Annotated[int, Field(default=20, ge=1, le=100, description="Max features to return")] = 20 +) -> str: + """Get features that are blocked by unmet dependencies. Returns features that have dependencies which are not yet passing. Each feature includes a 'blocked_by' field listing the blocking feature IDs. + Args: + limit: Maximum number of features to return (1-100, default 20) + Returns: - JSON with: features (list with blocked_by field), count (int) + JSON with: features (list with blocked_by field), count (int), total_blocked (int) """ session = get_session() try: @@ -787,9 +875,10 @@ def feature_get_blocked() -> str: }) return json.dumps({ - "features": blocked, - "count": len(blocked) - }, indent=2) + "features": blocked[:limit], + "count": len(blocked[:limit]), + "total_blocked": len(blocked) + }) finally: session.close() @@ -840,7 +929,7 @@ def feature_get_graph() -> str: return json.dumps({ "nodes": nodes, "edges": edges - }, indent=2) + }) finally: session.close() diff --git a/parallel_orchestrator.py b/parallel_orchestrator.py index 47e11a9..eb57bf9 100644 --- a/parallel_orchestrator.py +++ b/parallel_orchestrator.py @@ -186,6 +186,12 @@ class ParallelOrchestrator: # Session tracking for logging/debugging self.session_start_time: datetime = None + # Event signaled when any agent completes, allowing the main loop to wake + # immediately instead of waiting for the full POLL_INTERVAL timeout. + # This reduces latency when spawning the next feature after completion. + self._agent_completed_event: asyncio.Event = None # Created in run_loop + self._event_loop: asyncio.AbstractEventLoop = None # Stored for thread-safe signaling + # Database session for this orchestrator self._engine, self._session_maker = create_database(project_dir) @@ -311,6 +317,9 @@ class ParallelOrchestrator: all_features = session.query(Feature).all() all_dicts = [f.to_dict() for f in all_features] + # Pre-compute passing_ids once to avoid O(n^2) in the loop + passing_ids = {f.id for f in all_features if f.passes} + ready = [] skipped_reasons = {"passes": 0, "in_progress": 0, "running": 0, "failed": 0, "deps": 0} for f in all_features: @@ -329,8 +338,8 @@ class ParallelOrchestrator: if self._failure_counts.get(f.id, 0) >= MAX_FEATURE_RETRIES: skipped_reasons["failed"] += 1 continue - # Check dependencies - if are_dependencies_satisfied(f.to_dict(), all_dicts): + # Check dependencies (pass pre-computed passing_ids) + if are_dependencies_satisfied(f.to_dict(), all_dicts, passing_ids): ready.append(f.to_dict()) else: skipped_reasons["deps"] += 1 @@ -794,6 +803,52 @@ class ParallelOrchestrator: finally: self._on_agent_complete(feature_id, proc.returncode, agent_type, proc) + def _signal_agent_completed(self): + """Signal that an agent has completed, waking the main loop. + + This method is safe to call from any thread. It schedules the event.set() + call to run on the event loop thread to avoid cross-thread issues with + asyncio.Event. + """ + if self._agent_completed_event is not None and self._event_loop is not None: + try: + # Use the stored event loop reference to schedule the set() call + # This is necessary because asyncio.Event is not thread-safe and + # asyncio.get_event_loop() fails in threads without an event loop + if self._event_loop.is_running(): + self._event_loop.call_soon_threadsafe(self._agent_completed_event.set) + else: + # Fallback: set directly if loop isn't running (shouldn't happen during normal operation) + self._agent_completed_event.set() + except RuntimeError: + # Event loop closed, ignore (orchestrator may be shutting down) + pass + + async def _wait_for_agent_completion(self, timeout: float = POLL_INTERVAL): + """Wait for an agent to complete or until timeout expires. + + This replaces fixed `asyncio.sleep(POLL_INTERVAL)` calls with event-based + waiting. When an agent completes, _signal_agent_completed() sets the event, + causing this method to return immediately. If no agent completes within + the timeout, we return anyway to check for ready features. + + Args: + timeout: Maximum seconds to wait (default: POLL_INTERVAL) + """ + if self._agent_completed_event is None: + # Fallback if event not initialized (shouldn't happen in normal operation) + await asyncio.sleep(timeout) + return + + try: + await asyncio.wait_for(self._agent_completed_event.wait(), timeout=timeout) + # Event was set - an agent completed. Clear it for the next wait cycle. + self._agent_completed_event.clear() + debug_log.log("EVENT", "Woke up immediately - agent completed") + except asyncio.TimeoutError: + # Timeout reached without agent completion - this is normal, just check anyway + pass + def _on_agent_complete( self, feature_id: int | None, @@ -832,6 +887,8 @@ class ParallelOrchestrator: pid=proc.pid, feature_id=feature_id, status=status) + # Signal main loop that an agent slot is available + self._signal_agent_completed() return # Coding agent completion @@ -843,40 +900,20 @@ class ParallelOrchestrator: self.running_coding_agents.pop(feature_id, None) self.abort_events.pop(feature_id, None) - # BEFORE dispose: Query database state to see if it's stale - session_before = self.get_session() - try: - session_before.expire_all() - feature_before = session_before.query(Feature).filter(Feature.id == feature_id).first() - all_before = session_before.query(Feature).all() - passing_before = sum(1 for f in all_before if f.passes) - debug_log.log("DB", f"BEFORE engine.dispose() - Feature #{feature_id} state", - passes=feature_before.passes if feature_before else None, - in_progress=feature_before.in_progress if feature_before else None, - total_passing_in_db=passing_before) - finally: - session_before.close() - - # CRITICAL: Refresh database connection to see subprocess commits + # Refresh session cache to see subprocess commits # The coding agent runs as a subprocess and commits changes (e.g., passes=True). - # SQLAlchemy may have stale connections. Disposing the engine forces new connections - # that will see the subprocess's committed changes. - debug_log.log("DB", "Disposing database engine now...") - self._engine.dispose() - - # AFTER dispose: Query again to compare + # Using session.expire_all() is lighter weight than engine.dispose() for SQLite WAL mode + # and is sufficient to invalidate cached data and force fresh reads. + # engine.dispose() is only called on orchestrator shutdown, not on every agent completion. session = self.get_session() try: + session.expire_all() feature = session.query(Feature).filter(Feature.id == feature_id).first() - all_after = session.query(Feature).all() - passing_after = sum(1 for f in all_after if f.passes) feature_passes = feature.passes if feature else None feature_in_progress = feature.in_progress if feature else None - debug_log.log("DB", f"AFTER engine.dispose() - Feature #{feature_id} state", + debug_log.log("DB", f"Feature #{feature_id} state after session.expire_all()", passes=feature_passes, - in_progress=feature_in_progress, - total_passing_in_db=passing_after, - passing_changed=(passing_after != passing_before) if 'passing_before' in dir() else "unknown") + in_progress=feature_in_progress) if feature and feature.in_progress and not feature.passes: feature.in_progress = False session.commit() @@ -900,6 +937,9 @@ class ParallelOrchestrator: # CRITICAL: This print triggers the WebSocket to emit agent_update with state='error' or 'success' print(f"Feature #{feature_id} {status}", flush=True) + # Signal main loop that an agent slot is available + self._signal_agent_completed() + # NOTE: Testing agents are now spawned in start_feature() when coding agents START, # not here when they complete. This ensures 1:1 ratio and proper termination. @@ -949,6 +989,12 @@ class ParallelOrchestrator: """Main orchestration loop.""" self.is_running = True + # Initialize the agent completion event for this run + # Must be created in the async context where it will be used + self._agent_completed_event = asyncio.Event() + # Store the event loop reference for thread-safe signaling from output reader threads + self._event_loop = asyncio.get_running_loop() + # Track session start for regression testing (UTC for consistency with last_tested_at) self.session_start_time = datetime.now(timezone.utc) @@ -1100,8 +1146,8 @@ class ParallelOrchestrator: at_capacity=(current >= self.max_concurrency)) if current >= self.max_concurrency: - debug_log.log("CAPACITY", "At max capacity, sleeping...") - await asyncio.sleep(POLL_INTERVAL) + debug_log.log("CAPACITY", "At max capacity, waiting for agent completion...") + await self._wait_for_agent_completion() continue # Priority 1: Resume features from previous session @@ -1119,7 +1165,7 @@ class ParallelOrchestrator: if not ready: # Wait for running features to complete if current > 0: - await asyncio.sleep(POLL_INTERVAL) + await self._wait_for_agent_completion() continue else: # No ready features and nothing running @@ -1138,7 +1184,7 @@ class ParallelOrchestrator: # Still have pending features but all are blocked by dependencies print("No ready features available. All remaining features may be blocked by dependencies.", flush=True) - await asyncio.sleep(POLL_INTERVAL * 2) + await self._wait_for_agent_completion(timeout=POLL_INTERVAL * 2) continue # Start features up to capacity @@ -1174,7 +1220,7 @@ class ParallelOrchestrator: except Exception as e: print(f"Orchestrator error: {e}", flush=True) - await asyncio.sleep(POLL_INTERVAL) + await self._wait_for_agent_completion() # Wait for remaining agents to complete print("Waiting for running agents to complete...", flush=True) @@ -1184,7 +1230,8 @@ class ParallelOrchestrator: testing_done = len(self.running_testing_agents) == 0 if coding_done and testing_done: break - await asyncio.sleep(1) + # Use short timeout since we're just waiting for final agents to finish + await self._wait_for_agent_completion(timeout=1.0) print("Orchestrator finished.", flush=True) diff --git a/progress.py b/progress.py index a4dda26..0821c90 100644 --- a/progress.py +++ b/progress.py @@ -72,15 +72,31 @@ def count_passing_tests(project_dir: Path) -> tuple[int, int, int]: try: conn = sqlite3.connect(db_file) cursor = conn.cursor() - cursor.execute("SELECT COUNT(*) FROM features") - total = cursor.fetchone()[0] - cursor.execute("SELECT COUNT(*) FROM features WHERE passes = 1") - passing = cursor.fetchone()[0] - # Handle case where in_progress column doesn't exist yet + # Single aggregate query instead of 3 separate COUNT queries + # Handle case where in_progress column doesn't exist yet (legacy DBs) try: - cursor.execute("SELECT COUNT(*) FROM features WHERE in_progress = 1") - in_progress = cursor.fetchone()[0] + cursor.execute(""" + SELECT + COUNT(*) as total, + SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing, + SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress + FROM features + """) + row = cursor.fetchone() + total = row[0] or 0 + passing = row[1] or 0 + in_progress = row[2] or 0 except sqlite3.OperationalError: + # Fallback for databases without in_progress column + cursor.execute(""" + SELECT + COUNT(*) as total, + SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing + FROM features + """) + row = cursor.fetchone() + total = row[0] or 0 + passing = row[1] or 0 in_progress = 0 conn.close() return passing, in_progress, total diff --git a/prompts.py b/prompts.py index 0558255..6869256 100644 --- a/prompts.py +++ b/prompts.py @@ -109,11 +109,11 @@ The orchestrator has already claimed this feature for you. def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None, yolo_mode: bool = False) -> str: - """ - Load the coding prompt with single-feature focus instructions prepended. + """Prepend single-feature assignment header to base coding prompt. - When the orchestrator assigns a specific feature to a coding agent, - this prompt ensures the agent works ONLY on that feature. + Used in parallel mode to assign a specific feature to an agent. + The base prompt already contains the full workflow - this just + identifies which feature to work on. Args: feature_id: The specific feature ID to work on @@ -122,38 +122,20 @@ def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None, handled by separate testing agents, not YOLO prompts. Returns: - The prompt with single-feature instructions prepended + The prompt with single-feature header prepended """ - # Always use the standard coding prompt - # (Testing/regression is handled by separate testing agents) base_prompt = get_coding_prompt(project_dir) - # Prepend single-feature instructions - single_feature_header = f"""## ASSIGNED FEATURE + # Minimal header - the base prompt already contains the full workflow + single_feature_header = f"""## ASSIGNED FEATURE: #{feature_id} -**You are assigned to work on Feature #{feature_id} ONLY.** - -This session is part of a parallel execution where multiple agents work on different features simultaneously. - -### Your workflow: - -1. **Get feature details** using `feature_get_by_id` with ID {feature_id} -2. **Mark as in-progress** using `feature_mark_in_progress` with ID {feature_id} - - If you get "already in-progress" error, that's OK - continue with implementation -3. **Implement the feature** following the steps from the feature details -4. **Test your implementation** to verify it works correctly -5. **Mark as passing** using `feature_mark_passing` with ID {feature_id} -6. **Commit your changes** and end the session - -### Important rules: - -- **Do NOT** work on any other features - other agents are handling them -- If blocked, use `feature_skip` and document the blocker in claude-progress.txt +Work ONLY on this feature. Other agents are handling other features. +Use `feature_claim_and_get` with ID {feature_id} to claim it and get details. +If blocked, use `feature_skip` and document the blocker. --- """ - return single_feature_header + base_prompt