mirror of
https://github.com/leonvanzyl/autocoder.git
synced 2026-01-30 06:12:06 +00:00
improve performance
This commit is contained in:
@@ -172,48 +172,12 @@ Use browser automation tools:
|
|||||||
- [ ] Loading states appeared during API calls
|
- [ ] Loading states appeared during API calls
|
||||||
- [ ] Error states handle failures gracefully
|
- [ ] Error states handle failures gracefully
|
||||||
|
|
||||||
### STEP 5.6: MOCK DATA DETECTION SWEEP
|
### STEP 5.6: MOCK DATA DETECTION (Before marking passing)
|
||||||
|
|
||||||
**Run this sweep AFTER EVERY FEATURE before marking it as passing:**
|
1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
|
||||||
|
2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
|
||||||
#### 1. Code Pattern Search
|
3. **Check database:** All displayed data must come from real DB queries
|
||||||
|
4. If unexplained data appears, it's mock data - fix before marking passing.
|
||||||
Search the codebase for forbidden patterns:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Search for mock data patterns
|
|
||||||
grep -r "mockData\|fakeData\|sampleData\|dummyData\|testData" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx"
|
|
||||||
grep -r "// TODO\|// FIXME\|// STUB\|// MOCK" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx"
|
|
||||||
grep -r "hardcoded\|placeholder" --include="*.js" --include="*.ts" --include="*.jsx" --include="*.tsx"
|
|
||||||
```
|
|
||||||
|
|
||||||
**If ANY matches found related to your feature - FIX THEM before proceeding.**
|
|
||||||
|
|
||||||
#### 2. Runtime Verification
|
|
||||||
|
|
||||||
For ANY data displayed in UI:
|
|
||||||
|
|
||||||
1. Create NEW data with UNIQUE content (e.g., "TEST_12345_DELETE_ME")
|
|
||||||
2. Verify that EXACT content appears in the UI
|
|
||||||
3. Delete the record
|
|
||||||
4. Verify it's GONE from the UI
|
|
||||||
5. **If you see data that wasn't created during testing - IT'S MOCK DATA. Fix it.**
|
|
||||||
|
|
||||||
#### 3. Database Verification
|
|
||||||
|
|
||||||
Check that:
|
|
||||||
|
|
||||||
- Database tables contain only data you created during tests
|
|
||||||
- Counts/statistics match actual database record counts
|
|
||||||
- No seed data is masquerading as user data
|
|
||||||
|
|
||||||
#### 4. API Response Verification
|
|
||||||
|
|
||||||
For API endpoints used by this feature:
|
|
||||||
|
|
||||||
- Call the endpoint directly
|
|
||||||
- Verify response contains actual database data
|
|
||||||
- Empty database = empty response (not pre-populated mock data)
|
|
||||||
|
|
||||||
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
|
### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)
|
||||||
|
|
||||||
@@ -273,51 +237,11 @@ Before context fills up:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## TESTING REQUIREMENTS
|
## BROWSER AUTOMATION
|
||||||
|
|
||||||
**ALL testing must use browser automation tools.**
|
Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in.
|
||||||
|
|
||||||
Available tools:
|
Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation.
|
||||||
|
|
||||||
**Navigation & Screenshots:**
|
|
||||||
|
|
||||||
- browser_navigate - Navigate to a URL
|
|
||||||
- browser_navigate_back - Go back to previous page
|
|
||||||
- browser_take_screenshot - Capture screenshot (use for visual verification)
|
|
||||||
- browser_snapshot - Get accessibility tree snapshot (structured page data)
|
|
||||||
|
|
||||||
**Element Interaction:**
|
|
||||||
|
|
||||||
- browser_click - Click elements (has built-in auto-wait)
|
|
||||||
- browser_type - Type text into editable elements
|
|
||||||
- browser_fill_form - Fill multiple form fields at once
|
|
||||||
- browser_select_option - Select dropdown options
|
|
||||||
- browser_hover - Hover over elements
|
|
||||||
- browser_drag - Drag and drop between elements
|
|
||||||
- browser_press_key - Press keyboard keys
|
|
||||||
|
|
||||||
**Debugging & Monitoring:**
|
|
||||||
|
|
||||||
- browser_console_messages - Get browser console output (check for errors)
|
|
||||||
- browser_network_requests - Monitor API calls and responses
|
|
||||||
- browser_evaluate - Execute JavaScript (USE SPARINGLY - debugging only, NOT for bypassing UI)
|
|
||||||
|
|
||||||
**Browser Management:**
|
|
||||||
|
|
||||||
- browser_close - Close the browser
|
|
||||||
- browser_resize - Resize browser window (use to test mobile: 375x667, tablet: 768x1024, desktop: 1280x720)
|
|
||||||
- browser_tabs - Manage browser tabs
|
|
||||||
- browser_wait_for - Wait for text/element/time
|
|
||||||
- browser_handle_dialog - Handle alert/confirm dialogs
|
|
||||||
- browser_file_upload - Upload files
|
|
||||||
|
|
||||||
**Key Benefits:**
|
|
||||||
|
|
||||||
- All interaction tools have **built-in auto-wait** - no manual timeouts needed
|
|
||||||
- Use `browser_console_messages` to detect JavaScript errors
|
|
||||||
- Use `browser_network_requests` to verify API calls succeed
|
|
||||||
|
|
||||||
Test like a human user with mouse and keyboard. Don't take shortcuts by using JavaScript evaluation.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -381,26 +305,7 @@ This allows you to fully test email-dependent flows without needing external ema
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## IMPORTANT REMINDERS
|
**Remember:** One feature per session. Zero console errors. All data from real database. Leave codebase clean before ending session.
|
||||||
|
|
||||||
**Your Goal:** Production-quality application with all tests passing
|
|
||||||
|
|
||||||
**This Session's Goal:** Complete at least one feature perfectly
|
|
||||||
|
|
||||||
**Priority:** Fix broken tests before implementing new features
|
|
||||||
|
|
||||||
**Quality Bar:**
|
|
||||||
|
|
||||||
- Zero console errors
|
|
||||||
- Polished UI matching the design specified in app_spec.txt
|
|
||||||
- All features work end-to-end through the UI
|
|
||||||
- Fast, responsive, professional
|
|
||||||
- **NO MOCK DATA - all data from real database**
|
|
||||||
- **Security enforced - unauthorized access blocked**
|
|
||||||
- **All navigation works - no 404s or broken links**
|
|
||||||
|
|
||||||
**You have unlimited time.** Take as long as needed to get it right. The most important thing is that you
|
|
||||||
leave the code base in a clean state before terminating the session (Step 9).
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -26,82 +26,11 @@ which is the single source of truth for what needs to be built.
|
|||||||
|
|
||||||
**Creating Features:**
|
**Creating Features:**
|
||||||
|
|
||||||
Use the feature_create_bulk tool to add all features at once. Note: You MUST include `depends_on_indices`
|
Use the feature_create_bulk tool to add all features at once. You can create features in batches if there are many (e.g., 50 at a time).
|
||||||
to specify dependencies. Features with no dependencies can run first and enable parallel execution.
|
|
||||||
|
|
||||||
```
|
|
||||||
Use the feature_create_bulk tool with features=[
|
|
||||||
{
|
|
||||||
"category": "functional",
|
|
||||||
"name": "App loads without errors",
|
|
||||||
"description": "Application starts and renders homepage",
|
|
||||||
"steps": [
|
|
||||||
"Step 1: Navigate to homepage",
|
|
||||||
"Step 2: Verify no console errors",
|
|
||||||
"Step 3: Verify main content renders"
|
|
||||||
]
|
|
||||||
// No depends_on_indices = FOUNDATION feature (runs first)
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"category": "functional",
|
|
||||||
"name": "User can create an account",
|
|
||||||
"description": "Basic user registration functionality",
|
|
||||||
"steps": [
|
|
||||||
"Step 1: Navigate to registration page",
|
|
||||||
"Step 2: Fill in required fields",
|
|
||||||
"Step 3: Submit form and verify account created"
|
|
||||||
],
|
|
||||||
"depends_on_indices": [0] // Depends on app loading
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"category": "functional",
|
|
||||||
"name": "User can log in",
|
|
||||||
"description": "Authentication with existing credentials",
|
|
||||||
"steps": [
|
|
||||||
"Step 1: Navigate to login page",
|
|
||||||
"Step 2: Enter credentials",
|
|
||||||
"Step 3: Verify successful login and redirect"
|
|
||||||
],
|
|
||||||
"depends_on_indices": [0, 1] // Depends on app loading AND registration
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"category": "functional",
|
|
||||||
"name": "User can view dashboard",
|
|
||||||
"description": "Protected dashboard requires authentication",
|
|
||||||
"steps": [
|
|
||||||
"Step 1: Log in as user",
|
|
||||||
"Step 2: Navigate to dashboard",
|
|
||||||
"Step 3: Verify personalized content displays"
|
|
||||||
],
|
|
||||||
"depends_on_indices": [2] // Depends on login only
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"category": "functional",
|
|
||||||
"name": "User can update profile",
|
|
||||||
"description": "User can modify their profile information",
|
|
||||||
"steps": [
|
|
||||||
"Step 1: Log in as user",
|
|
||||||
"Step 2: Navigate to profile settings",
|
|
||||||
"Step 3: Update and save profile"
|
|
||||||
],
|
|
||||||
"depends_on_indices": [2] // ALSO depends on login (WIDE GRAPH - can run parallel with dashboard!)
|
|
||||||
}
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
**Notes:**
|
**Notes:**
|
||||||
- IDs and priorities are assigned automatically based on order
|
- IDs and priorities are assigned automatically based on order
|
||||||
- All features start with `passes: false` by default
|
- All features start with `passes: false` by default
|
||||||
- You can create features in batches if there are many (e.g., 50 at a time)
|
|
||||||
- **CRITICAL:** Use `depends_on_indices` to specify dependencies (see FEATURE DEPENDENCIES section below)
|
|
||||||
|
|
||||||
**DEPENDENCY REQUIREMENT:**
|
|
||||||
You MUST specify dependencies using `depends_on_indices` for features that logically depend on others.
|
|
||||||
- Features 0-9 should have NO dependencies (foundation/setup features)
|
|
||||||
- Features 10+ MUST have at least some dependencies where logical
|
|
||||||
- Create WIDE dependency graphs, not linear chains:
|
|
||||||
- BAD: A -> B -> C -> D -> E (linear chain, only 1 feature can run at a time)
|
|
||||||
- GOOD: A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, multiple features can run in parallel)
|
|
||||||
|
|
||||||
**Requirements for features:**
|
**Requirements for features:**
|
||||||
|
|
||||||
@@ -114,7 +43,6 @@ You MUST specify dependencies using `depends_on_indices` for features that logic
|
|||||||
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
|
- Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
|
||||||
- At least 25 tests MUST have 10+ steps each (more for complex apps)
|
- At least 25 tests MUST have 10+ steps each (more for complex apps)
|
||||||
- Order features by priority: fundamental features first (the API assigns priority based on order)
|
- Order features by priority: fundamental features first (the API assigns priority based on order)
|
||||||
- All features start with `passes: false` automatically
|
|
||||||
- Cover every feature in the spec exhaustively
|
- Cover every feature in the spec exhaustively
|
||||||
- **MUST include tests from ALL 20 mandatory categories below**
|
- **MUST include tests from ALL 20 mandatory categories below**
|
||||||
|
|
||||||
@@ -122,125 +50,68 @@ You MUST specify dependencies using `depends_on_indices` for features that logic
|
|||||||
|
|
||||||
## FEATURE DEPENDENCIES (MANDATORY)
|
## FEATURE DEPENDENCIES (MANDATORY)
|
||||||
|
|
||||||
**THIS SECTION IS MANDATORY. You MUST specify dependencies for features.**
|
Dependencies enable **parallel execution** of independent features. When specified correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development.
|
||||||
|
|
||||||
Dependencies enable **parallel execution** of independent features. When you specify dependencies correctly, multiple agents can work on unrelated features simultaneously, dramatically speeding up development.
|
**Why this matters:** Without dependencies, features execute in random order, causing logical issues (e.g., "Edit user" before "Create user") and preventing efficient parallelization.
|
||||||
|
|
||||||
**WARNING:** If you do not specify dependencies, ALL features will be ready immediately, which:
|
### Dependency Rules
|
||||||
1. Overwhelms the parallel agents trying to work on unrelated features
|
|
||||||
2. Results in features being implemented in random order
|
|
||||||
3. Causes logical issues (e.g., "Edit user" attempted before "Create user")
|
|
||||||
|
|
||||||
You MUST analyze each feature and specify its dependencies using `depends_on_indices`.
|
1. **Use `depends_on_indices`** (0-based array indices) to reference dependencies
|
||||||
|
2. **Can only depend on EARLIER features** (index must be less than current position)
|
||||||
|
3. **No circular dependencies** allowed
|
||||||
|
4. **Maximum 20 dependencies** per feature
|
||||||
|
5. **Foundation features (index 0-9)** should have NO dependencies
|
||||||
|
6. **60% of features after index 10** should have at least one dependency
|
||||||
|
|
||||||
### Why Dependencies Matter
|
### Dependency Types
|
||||||
|
|
||||||
1. **Parallel Execution**: Features without dependencies can run in parallel
|
| Type | Example |
|
||||||
2. **Logical Ordering**: Ensures features are built in the right order
|
|------|---------|
|
||||||
3. **Blocking Prevention**: An agent won't start a feature until its dependencies pass
|
| Data | "Edit item" depends on "Create item" |
|
||||||
|
| Auth | "View dashboard" depends on "User can log in" |
|
||||||
|
| Navigation | "Modal close works" depends on "Modal opens" |
|
||||||
|
| UI | "Filter results" depends on "Display results list" |
|
||||||
|
|
||||||
### How to Determine Dependencies
|
### Wide Graph Pattern (REQUIRED)
|
||||||
|
|
||||||
Ask yourself: "What MUST be working before this feature can be tested?"
|
Create WIDE dependency graphs, not linear chains:
|
||||||
|
- **BAD:** A -> B -> C -> D -> E (linear chain, only 1 feature runs at a time)
|
||||||
|
- **GOOD:** A -> B, A -> C, A -> D, B -> E, C -> E (wide graph, parallel execution)
|
||||||
|
|
||||||
| Dependency Type | Example |
|
### Complete Example
|
||||||
|-----------------|---------|
|
|
||||||
| **Data dependencies** | "Edit item" depends on "Create item" |
|
|
||||||
| **Auth dependencies** | "View dashboard" depends on "User can log in" |
|
|
||||||
| **Navigation dependencies** | "Modal close works" depends on "Modal opens" |
|
|
||||||
| **UI dependencies** | "Filter results" depends on "Display results list" |
|
|
||||||
| **API dependencies** | "Fetch user data" depends on "API authentication" |
|
|
||||||
|
|
||||||
### Using `depends_on_indices`
|
|
||||||
|
|
||||||
Since feature IDs aren't assigned until after creation, use **array indices** (0-based) to reference dependencies:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"features": [
|
|
||||||
{ "name": "Create account", ... }, // Index 0
|
|
||||||
{ "name": "Login", "depends_on_indices": [0] }, // Index 1, depends on 0
|
|
||||||
{ "name": "View profile", "depends_on_indices": [1] }, // Index 2, depends on 1
|
|
||||||
{ "name": "Edit profile", "depends_on_indices": [2] } // Index 3, depends on 2
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Rules for Dependencies
|
|
||||||
|
|
||||||
1. **Can only depend on EARLIER features**: Index must be less than current feature's position
|
|
||||||
2. **No circular dependencies**: A cannot depend on B if B depends on A
|
|
||||||
3. **Maximum 20 dependencies** per feature
|
|
||||||
4. **Foundation features have NO dependencies**: First features in each category typically have none
|
|
||||||
5. **Don't over-depend**: Only add dependencies that are truly required for testing
|
|
||||||
|
|
||||||
### Best Practices
|
|
||||||
|
|
||||||
1. **Start with foundation features** (index 0-10): Core setup, basic navigation, authentication
|
|
||||||
2. **Group related features together**: Keep CRUD operations adjacent
|
|
||||||
3. **Chain complex flows**: Registration -> Login -> Dashboard -> Settings
|
|
||||||
4. **Keep dependencies shallow**: Prefer 1-2 dependencies over deep chains
|
|
||||||
5. **Skip dependencies for independent features**: Visual tests often have no dependencies
|
|
||||||
|
|
||||||
### Minimum Dependency Coverage
|
|
||||||
|
|
||||||
**REQUIREMENT:** At least 60% of your features (after index 10) should have at least one dependency.
|
|
||||||
|
|
||||||
Target structure for a 150-feature project:
|
|
||||||
- Features 0-9: Foundation (0 dependencies) - App loads, basic setup
|
|
||||||
- Features 10-149: At least 84 should have dependencies (60% of 140)
|
|
||||||
|
|
||||||
This ensures:
|
|
||||||
- A good mix of parallelizable features (foundation)
|
|
||||||
- Logical ordering for dependent features
|
|
||||||
|
|
||||||
### Example: Todo App Feature Chain (Wide Graph Pattern)
|
|
||||||
|
|
||||||
This example shows the CORRECT wide graph pattern where multiple features share the same dependency,
|
|
||||||
enabling parallel execution:
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
[
|
[
|
||||||
// FOUNDATION TIER (indices 0-2, no dependencies)
|
// FOUNDATION TIER (indices 0-2, no dependencies) - run first
|
||||||
// These run first and enable everything else
|
|
||||||
{ "name": "App loads without errors", "category": "functional" },
|
{ "name": "App loads without errors", "category": "functional" },
|
||||||
{ "name": "Navigation bar displays", "category": "style" },
|
{ "name": "Navigation bar displays", "category": "style" },
|
||||||
{ "name": "Homepage renders correctly", "category": "functional" },
|
{ "name": "Homepage renders correctly", "category": "functional" },
|
||||||
|
|
||||||
// AUTH TIER (indices 3-5, depend on foundation)
|
// AUTH TIER (indices 3-5, depend on foundation) - run in parallel
|
||||||
// These can all run in parallel once foundation passes
|
|
||||||
{ "name": "User can register", "depends_on_indices": [0] },
|
{ "name": "User can register", "depends_on_indices": [0] },
|
||||||
{ "name": "User can login", "depends_on_indices": [0, 3] },
|
{ "name": "User can login", "depends_on_indices": [0, 3] },
|
||||||
{ "name": "User can logout", "depends_on_indices": [4] },
|
{ "name": "User can logout", "depends_on_indices": [4] },
|
||||||
|
|
||||||
// CORE CRUD TIER (indices 6-9, depend on auth)
|
// CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
|
||||||
// WIDE GRAPH: All 4 of these depend on login (index 4)
|
// All 4 start as soon as login passes!
|
||||||
// This means all 4 can start as soon as login passes!
|
|
||||||
{ "name": "User can create todo", "depends_on_indices": [4] },
|
{ "name": "User can create todo", "depends_on_indices": [4] },
|
||||||
{ "name": "User can view todos", "depends_on_indices": [4] },
|
{ "name": "User can view todos", "depends_on_indices": [4] },
|
||||||
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
|
{ "name": "User can edit todo", "depends_on_indices": [4, 6] },
|
||||||
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
|
{ "name": "User can delete todo", "depends_on_indices": [4, 6] },
|
||||||
|
|
||||||
// ADVANCED TIER (indices 10-11, depend on CRUD)
|
// ADVANCED TIER (indices 10-11) - both depend on view, not each other
|
||||||
// Note: filter and search both depend on view (7), not on each other
|
|
||||||
{ "name": "User can filter todos", "depends_on_indices": [7] },
|
{ "name": "User can filter todos", "depends_on_indices": [7] },
|
||||||
{ "name": "User can search todos", "depends_on_indices": [7] }
|
{ "name": "User can search todos", "depends_on_indices": [7] }
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
**Parallelism analysis of this example:**
|
|
||||||
- Foundation tier: 3 features can run in parallel
|
|
||||||
- Auth tier: 3 features wait for foundation, then can run (mostly parallel)
|
|
||||||
- CRUD tier: 4 features can start once login passes (all 4 in parallel!)
|
|
||||||
- Advanced tier: 2 features can run once view passes (both in parallel)
|
|
||||||
|
|
||||||
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
|
**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## MANDATORY TEST CATEGORIES
|
## MANDATORY TEST CATEGORIES
|
||||||
|
|
||||||
The feature_list.json **MUST** include tests from ALL of these categories. The minimum counts scale by complexity tier.
|
The feature_list.json **MUST** include tests from ALL 20 categories. Minimum counts scale by complexity tier.
|
||||||
|
|
||||||
### Category Distribution by Complexity Tier
|
### Category Distribution by Complexity Tier
|
||||||
|
|
||||||
@@ -270,331 +141,47 @@ The feature_list.json **MUST** include tests from ALL of these categories. The m
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### A. Security & Access Control Tests
|
### Category Descriptions
|
||||||
|
|
||||||
Test that unauthorized access is blocked and permissions are enforced.
|
**A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.
|
||||||
|
|
||||||
**Required tests (examples):**
|
**B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
|
||||||
|
|
||||||
- Unauthenticated user cannot access protected routes (redirect to login)
|
**C. Real Data Verification** - Test data persistence across refreshes and sessions, CRUD operations with unique test data, related record updates, and empty states.
|
||||||
- Regular user cannot access admin-only pages (403 or redirect)
|
|
||||||
- API endpoints return 401 for unauthenticated requests
|
|
||||||
- API endpoints return 403 for unauthorized role access
|
|
||||||
- Session expires after configured inactivity period
|
|
||||||
- Logout clears all session data and tokens
|
|
||||||
- Invalid/expired tokens are rejected
|
|
||||||
- Each role can ONLY see their permitted menu items
|
|
||||||
- Direct URL access to unauthorized pages is blocked
|
|
||||||
- Sensitive operations require confirmation or re-authentication
|
|
||||||
- Cannot access another user's data by manipulating IDs in URL
|
|
||||||
- Password reset flow works securely
|
|
||||||
- Failed login attempts are handled (no information leakage)
|
|
||||||
|
|
||||||
### B. Navigation Integrity Tests
|
**D. Workflow Completeness** - Test end-to-end CRUD for every entity, state transitions, multi-step wizards, bulk operations, and form submission feedback.
|
||||||
|
|
||||||
Test that every button, link, and menu item goes to the correct place.
|
**E. Error Handling** - Test network failures, invalid input, API errors, 404/500 responses, loading states, timeouts, and user-friendly error messages.
|
||||||
|
|
||||||
**Required tests (examples):**
|
**F. UI-Backend Integration** - Test request/response format matching, database-driven dropdowns, cascading updates, filters/sorts with real data, and API error display.
|
||||||
|
|
||||||
- Every button in sidebar navigates to correct page
|
**G. State & Persistence** - Test refresh mid-form, session recovery, multi-tab behavior, back-button after submit, and unsaved changes warnings.
|
||||||
- Every menu item links to existing route
|
|
||||||
- All CRUD action buttons (Edit, Delete, View) go to correct URLs with correct IDs
|
|
||||||
- Back button works correctly after each navigation
|
|
||||||
- Deep linking works (direct URL access to any page with auth)
|
|
||||||
- Breadcrumbs reflect actual navigation path
|
|
||||||
- 404 page shown for non-existent routes (not crash)
|
|
||||||
- After login, user redirected to intended destination (or dashboard)
|
|
||||||
- After logout, user redirected to login page
|
|
||||||
- Pagination links work and preserve current filters
|
|
||||||
- Tab navigation within pages works correctly
|
|
||||||
- Modal close buttons return to previous state
|
|
||||||
- Cancel buttons on forms return to previous page
|
|
||||||
|
|
||||||
### C. Real Data Verification Tests
|
**H. URL & Direct Access** - Test URL manipulation security, direct route access by role, malformed parameters, deep links to deleted entities, and shareable filter URLs.
|
||||||
|
|
||||||
Test that data is real (not mocked) and persists correctly.
|
**I. Double-Action & Idempotency** - Test double-click submit, rapid delete clicks, back-and-resubmit, button disabled during processing, and concurrent submissions.
|
||||||
|
|
||||||
**Required tests (examples):**
|
**J. Data Cleanup & Cascade** - Test parent deletion effects on children, removal from search/lists/dropdowns, statistics updates, and soft vs hard delete behavior.
|
||||||
|
|
||||||
- Create a record via UI with unique content → verify it appears in list
|
**K. Default & Reset** - Test form defaults, sensible date picker defaults, dropdown placeholders, reset button behavior, and filter/pagination reset on context change.
|
||||||
- Create a record → refresh page → record still exists
|
|
||||||
- Create a record → log out → log in → record still exists
|
|
||||||
- Edit a record → verify changes persist after refresh
|
|
||||||
- Delete a record → verify it's gone from list AND database
|
|
||||||
- Delete a record → verify it's gone from related dropdowns
|
|
||||||
- Filter/search → results match actual data created in test
|
|
||||||
- Dashboard statistics reflect real record counts (create 3 items, count shows 3)
|
|
||||||
- Reports show real aggregated data
|
|
||||||
- Export functionality exports actual data you created
|
|
||||||
- Related records update when parent changes
|
|
||||||
- Timestamps are real and accurate (created_at, updated_at)
|
|
||||||
- Data created by User A is not visible to User B (unless shared)
|
|
||||||
- Empty state shows correctly when no data exists
|
|
||||||
|
|
||||||
### D. Workflow Completeness Tests
|
**L. Search & Filter Edge Cases** - Test empty search, whitespace-only, special characters, quotes, long strings, zero-result combinations, and filter persistence.
|
||||||
|
|
||||||
Test that every workflow can be completed end-to-end through the UI.
|
**M. Form Validation** - Test required fields, email/password/numeric/date formats, min/max constraints, uniqueness, specific error messages, and server-side validation.
|
||||||
|
|
||||||
**Required tests (examples):**
|
**N. Feedback & Notification** - Test success/error feedback for all actions, loading spinners, disabled buttons during submit, progress indicators, and toast behavior.
|
||||||
|
|
||||||
- Every entity has working Create operation via UI form
|
**O. Responsive & Layout** - Test layouts at desktop (1920px), tablet (768px), and mobile (375px), no horizontal scroll, touch targets, modal fit, and text overflow.
|
||||||
- Every entity has working Read/View operation (detail page loads)
|
|
||||||
- Every entity has working Update operation (edit form saves)
|
|
||||||
- Every entity has working Delete operation (with confirmation dialog)
|
|
||||||
- Every status/state has a UI mechanism to transition to next state
|
|
||||||
- Multi-step processes (wizards) can be completed end-to-end
|
|
||||||
- Bulk operations (select all, delete selected) work
|
|
||||||
- Cancel/Undo operations work where applicable
|
|
||||||
- Required fields prevent submission when empty
|
|
||||||
- Form validation shows errors before submission
|
|
||||||
- Successful submission shows success feedback
|
|
||||||
- Backend workflow (e.g., user→customer conversion) has UI trigger
|
|
||||||
|
|
||||||
### E. Error Handling Tests
|
**P. Accessibility** - Test tab navigation, focus rings, screen reader compatibility, ARIA labels, color contrast, labels on form fields, and error announcements.
|
||||||
|
|
||||||
Test graceful handling of errors and edge cases.
|
**Q. Temporal & Timezone** - Test timezone-aware display, accurate timestamps, date picker constraints, overdue detection, and date sorting across boundaries.
|
||||||
|
|
||||||
**Required tests (examples):**
|
**R. Concurrency & Race Conditions** - Test concurrent edits, viewing deleted records, pagination during updates, rapid navigation, and late API response handling.
|
||||||
|
|
||||||
- Network failure shows user-friendly error message, not crash
|
**S. Export/Import** - Test full/filtered export, import with valid/duplicate/malformed files, and round-trip data integrity.
|
||||||
- Invalid form input shows field-level errors
|
|
||||||
- API errors display meaningful messages to user
|
|
||||||
- 404 responses handled gracefully (show not found page)
|
|
||||||
- 500 responses don't expose stack traces or technical details
|
|
||||||
- Empty search results show "no results found" message
|
|
||||||
- Loading states shown during all async operations
|
|
||||||
- Timeout doesn't hang the UI indefinitely
|
|
||||||
- Submitting form with server error keeps user data in form
|
|
||||||
- File upload errors (too large, wrong type) show clear message
|
|
||||||
- Duplicate entry errors (e.g., email already exists) are clear
|
|
||||||
|
|
||||||
### F. UI-Backend Integration Tests
|
**T. Performance** - Test page load with 100/1000 records, search response time, infinite scroll stability, upload progress, and memory/console errors.
|
||||||
|
|
||||||
Test that frontend and backend communicate correctly.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Frontend request format matches what backend expects
|
|
||||||
- Backend response format matches what frontend parses
|
|
||||||
- All dropdown options come from real database data (not hardcoded)
|
|
||||||
- Related entity selectors (e.g., "choose category") populated from DB
|
|
||||||
- Changes in one area reflect in related areas after refresh
|
|
||||||
- Deleting parent handles children correctly (cascade or block)
|
|
||||||
- Filters work with actual data attributes from database
|
|
||||||
- Sort functionality sorts real data correctly
|
|
||||||
- Pagination returns correct page of real data
|
|
||||||
- API error responses are parsed and displayed correctly
|
|
||||||
- Loading spinners appear during API calls
|
|
||||||
- Optimistic updates (if used) rollback on failure
|
|
||||||
|
|
||||||
### G. State & Persistence Tests
|
|
||||||
|
|
||||||
Test that state is maintained correctly across sessions and tabs.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Refresh page mid-form - appropriate behavior (data kept or cleared)
|
|
||||||
- Close browser, reopen - session state handled correctly
|
|
||||||
- Same user in two browser tabs - changes sync or handled gracefully
|
|
||||||
- Browser back after form submit - no duplicate submission
|
|
||||||
- Bookmark a page, return later - works (with auth check)
|
|
||||||
- LocalStorage/cookies cleared - graceful re-authentication
|
|
||||||
- Unsaved changes warning when navigating away from dirty form
|
|
||||||
|
|
||||||
### H. URL & Direct Access Tests
|
|
||||||
|
|
||||||
Test direct URL access and URL manipulation security.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Change entity ID in URL - cannot access others' data
|
|
||||||
- Access /admin directly as regular user - blocked
|
|
||||||
- Malformed URL parameters - handled gracefully (no crash)
|
|
||||||
- Very long URL - handled correctly
|
|
||||||
- URL with SQL injection attempt - rejected/sanitized
|
|
||||||
- Deep link to deleted entity - shows "not found", not crash
|
|
||||||
- Query parameters for filters are reflected in UI
|
|
||||||
- Sharing a URL with filters preserves those filters
|
|
||||||
|
|
||||||
### I. Double-Action & Idempotency Tests
|
|
||||||
|
|
||||||
Test that rapid or duplicate actions don't cause issues.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Double-click submit button - only one record created
|
|
||||||
- Rapid multiple clicks on delete - only one deletion occurs
|
|
||||||
- Submit form, hit back, submit again - appropriate behavior
|
|
||||||
- Multiple simultaneous API calls - server handles correctly
|
|
||||||
- Refresh during save operation - data not corrupted
|
|
||||||
- Click same navigation link twice quickly - no issues
|
|
||||||
- Submit button disabled during processing
|
|
||||||
|
|
||||||
### J. Data Cleanup & Cascade Tests
|
|
||||||
|
|
||||||
Test that deleting data cleans up properly everywhere.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Delete parent entity - children removed from all views
|
|
||||||
- Delete item - removed from search results immediately
|
|
||||||
- Delete item - statistics/counts updated immediately
|
|
||||||
- Delete item - related dropdowns updated
|
|
||||||
- Delete item - cached views refreshed
|
|
||||||
- Soft delete (if applicable) - item hidden but recoverable
|
|
||||||
- Hard delete - item completely removed from database
|
|
||||||
|
|
||||||
### K. Default & Reset Tests
|
|
||||||
|
|
||||||
Test that defaults and reset functionality work correctly.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- New form shows correct default values
|
|
||||||
- Date pickers default to sensible dates (today, not 1970)
|
|
||||||
- Dropdowns default to correct option (or placeholder)
|
|
||||||
- Reset button clears to defaults, not just empty
|
|
||||||
- Clear filters button resets all filters to default
|
|
||||||
- Pagination resets to page 1 when filters change
|
|
||||||
- Sorting resets when changing views
|
|
||||||
|
|
||||||
### L. Search & Filter Edge Cases
|
|
||||||
|
|
||||||
Test search and filter functionality thoroughly.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Empty search shows all results (or appropriate message)
|
|
||||||
- Search with only spaces - handled correctly
|
|
||||||
- Search with special characters (!@#$%^&\*) - no errors
|
|
||||||
- Search with quotes - handled correctly
|
|
||||||
- Search with very long string - handled correctly
|
|
||||||
- Filter combinations that return zero results - shows message
|
|
||||||
- Filter + search + sort together - all work correctly
|
|
||||||
- Filter persists after viewing detail and returning to list
|
|
||||||
- Clear individual filter - works correctly
|
|
||||||
- Search is case-insensitive (or clearly case-sensitive)
|
|
||||||
|
|
||||||
### M. Form Validation Tests
|
|
||||||
|
|
||||||
Test all form validation rules exhaustively.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Required field empty - shows error, blocks submit
|
|
||||||
- Email field with invalid email formats - shows error
|
|
||||||
- Password field - enforces complexity requirements
|
|
||||||
- Numeric field with letters - rejected
|
|
||||||
- Date field with invalid date - rejected
|
|
||||||
- Min/max length enforced on text fields
|
|
||||||
- Min/max values enforced on numeric fields
|
|
||||||
- Duplicate unique values rejected (e.g., duplicate email)
|
|
||||||
- Error messages are specific (not just "invalid")
|
|
||||||
- Errors clear when user fixes the issue
|
|
||||||
- Server-side validation matches client-side
|
|
||||||
- Whitespace-only input rejected for required fields
|
|
||||||
|
|
||||||
### N. Feedback & Notification Tests
|
|
||||||
|
|
||||||
Test that users get appropriate feedback for all actions.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Every successful save/create shows success feedback
|
|
||||||
- Every failed action shows error feedback
|
|
||||||
- Loading spinner during every async operation
|
|
||||||
- Disabled state on buttons during form submission
|
|
||||||
- Progress indicator for long operations (file upload)
|
|
||||||
- Toast/notification disappears after appropriate time
|
|
||||||
- Multiple notifications don't overlap incorrectly
|
|
||||||
- Success messages are specific (not just "Success")
|
|
||||||
|
|
||||||
### O. Responsive & Layout Tests
|
|
||||||
|
|
||||||
Test that the UI works on different screen sizes.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Desktop layout correct at 1920px width
|
|
||||||
- Tablet layout correct at 768px width
|
|
||||||
- Mobile layout correct at 375px width
|
|
||||||
- No horizontal scroll on any standard viewport
|
|
||||||
- Touch targets large enough on mobile (44px min)
|
|
||||||
- Modals fit within viewport on mobile
|
|
||||||
- Long text truncates or wraps correctly (no overflow)
|
|
||||||
- Tables scroll horizontally if needed on mobile
|
|
||||||
- Navigation collapses appropriately on mobile
|
|
||||||
|
|
||||||
### P. Accessibility Tests
|
|
||||||
|
|
||||||
Test basic accessibility compliance.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Tab navigation works through all interactive elements
|
|
||||||
- Focus ring visible on all focused elements
|
|
||||||
- Screen reader can navigate main content areas
|
|
||||||
- ARIA labels on icon-only buttons
|
|
||||||
- Color contrast meets WCAG AA (4.5:1 for text)
|
|
||||||
- No information conveyed by color alone
|
|
||||||
- Form fields have associated labels
|
|
||||||
- Error messages announced to screen readers
|
|
||||||
- Skip link to main content (if applicable)
|
|
||||||
- Images have alt text
|
|
||||||
|
|
||||||
### Q. Temporal & Timezone Tests
|
|
||||||
|
|
||||||
Test date/time handling.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Dates display in user's local timezone
|
|
||||||
- Created/updated timestamps accurate and formatted correctly
|
|
||||||
- Date picker allows only valid date ranges
|
|
||||||
- Overdue items identified correctly (timezone-aware)
|
|
||||||
- "Today", "This Week" filters work correctly for user's timezone
|
|
||||||
- Recurring items generate at correct times (if applicable)
|
|
||||||
- Date sorting works correctly across months/years
|
|
||||||
|
|
||||||
### R. Concurrency & Race Condition Tests
|
|
||||||
|
|
||||||
Test multi-user and race condition scenarios.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Two users edit same record - last save wins or conflict shown
|
|
||||||
- Record deleted while another user viewing - graceful handling
|
|
||||||
- List updates while user on page 2 - pagination still works
|
|
||||||
- Rapid navigation between pages - no stale data displayed
|
|
||||||
- API response arrives after user navigated away - no crash
|
|
||||||
- Concurrent form submissions from same user handled
|
|
||||||
|
|
||||||
### S. Export/Import Tests (if applicable)
|
|
||||||
|
|
||||||
Test data export and import functionality.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Export all data - file contains all records
|
|
||||||
- Export filtered data - only filtered records included
|
|
||||||
- Import valid file - all records created correctly
|
|
||||||
- Import duplicate data - handled correctly (skip/update/error)
|
|
||||||
- Import malformed file - error message, no partial import
|
|
||||||
- Export then import - data integrity preserved exactly
|
|
||||||
|
|
||||||
### T. Performance Tests
|
|
||||||
|
|
||||||
Test basic performance requirements.
|
|
||||||
|
|
||||||
**Required tests (examples):**
|
|
||||||
|
|
||||||
- Page loads in <3s with 100 records
|
|
||||||
- Page loads in <5s with 1000 records
|
|
||||||
- Search responds in <1s
|
|
||||||
- Infinite scroll doesn't degrade with many items
|
|
||||||
- Large file upload shows progress
|
|
||||||
- Memory doesn't leak on long sessions
|
|
||||||
- No console errors during normal operation
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -21,6 +21,7 @@ from sqlalchemy import (
|
|||||||
Column,
|
Column,
|
||||||
DateTime,
|
DateTime,
|
||||||
ForeignKey,
|
ForeignKey,
|
||||||
|
Index,
|
||||||
Integer,
|
Integer,
|
||||||
String,
|
String,
|
||||||
Text,
|
Text,
|
||||||
@@ -39,6 +40,12 @@ class Feature(Base):
|
|||||||
|
|
||||||
__tablename__ = "features"
|
__tablename__ = "features"
|
||||||
|
|
||||||
|
# Composite index for common status query pattern (passes, in_progress)
|
||||||
|
# Used by feature_get_stats, get_ready_features, and other status queries
|
||||||
|
__table_args__ = (
|
||||||
|
Index('ix_feature_status', 'passes', 'in_progress'),
|
||||||
|
)
|
||||||
|
|
||||||
id = Column(Integer, primary_key=True, index=True)
|
id = Column(Integer, primary_key=True, index=True)
|
||||||
priority = Column(Integer, nullable=False, default=999, index=True)
|
priority = Column(Integer, nullable=False, default=999, index=True)
|
||||||
category = Column(String(100), nullable=False)
|
category = Column(String(100), nullable=False)
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ Provides dependency resolution using Kahn's algorithm for topological sorting.
|
|||||||
Includes cycle detection, validation, and helper functions for dependency management.
|
Includes cycle detection, validation, and helper functions for dependency management.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
import heapq
|
||||||
from typing import TypedDict
|
from typing import TypedDict
|
||||||
|
|
||||||
# Security: Prevent DoS via excessive dependencies
|
# Security: Prevent DoS via excessive dependencies
|
||||||
@@ -55,19 +56,27 @@ def resolve_dependencies(features: list[dict]) -> DependencyResult:
|
|||||||
if not dep.get("passes"):
|
if not dep.get("passes"):
|
||||||
blocked.setdefault(feature["id"], []).append(dep_id)
|
blocked.setdefault(feature["id"], []).append(dep_id)
|
||||||
|
|
||||||
# Kahn's algorithm with priority-aware selection
|
# Kahn's algorithm with priority-aware selection using a heap
|
||||||
queue = [f for f in features if in_degree[f["id"]] == 0]
|
# Heap entries are tuples: (priority, id, feature_dict) for stable ordering
|
||||||
queue.sort(key=lambda f: (f.get("priority", 999), f["id"]))
|
heap = [
|
||||||
|
(f.get("priority", 999), f["id"], f)
|
||||||
|
for f in features
|
||||||
|
if in_degree[f["id"]] == 0
|
||||||
|
]
|
||||||
|
heapq.heapify(heap)
|
||||||
ordered: list[dict] = []
|
ordered: list[dict] = []
|
||||||
|
|
||||||
while queue:
|
while heap:
|
||||||
current = queue.pop(0)
|
_, _, current = heapq.heappop(heap)
|
||||||
ordered.append(current)
|
ordered.append(current)
|
||||||
for dependent_id in adjacency[current["id"]]:
|
for dependent_id in adjacency[current["id"]]:
|
||||||
in_degree[dependent_id] -= 1
|
in_degree[dependent_id] -= 1
|
||||||
if in_degree[dependent_id] == 0:
|
if in_degree[dependent_id] == 0:
|
||||||
queue.append(feature_map[dependent_id])
|
dep_feature = feature_map[dependent_id]
|
||||||
queue.sort(key=lambda f: (f.get("priority", 999), f["id"]))
|
heapq.heappush(
|
||||||
|
heap,
|
||||||
|
(dep_feature.get("priority", 999), dependent_id, dep_feature)
|
||||||
|
)
|
||||||
|
|
||||||
# Detect cycles (features not in ordered = part of cycle)
|
# Detect cycles (features not in ordered = part of cycle)
|
||||||
cycles: list[list[int]] = []
|
cycles: list[list[int]] = []
|
||||||
@@ -84,12 +93,19 @@ def resolve_dependencies(features: list[dict]) -> DependencyResult:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def are_dependencies_satisfied(feature: dict, all_features: list[dict]) -> bool:
|
def are_dependencies_satisfied(
|
||||||
|
feature: dict,
|
||||||
|
all_features: list[dict],
|
||||||
|
passing_ids: set[int] | None = None,
|
||||||
|
) -> bool:
|
||||||
"""Check if all dependencies have passes=True.
|
"""Check if all dependencies have passes=True.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
feature: Feature dict to check
|
feature: Feature dict to check
|
||||||
all_features: List of all feature dicts
|
all_features: List of all feature dicts
|
||||||
|
passing_ids: Optional pre-computed set of passing feature IDs.
|
||||||
|
If None, will be computed from all_features. Pass this when
|
||||||
|
calling in a loop to avoid O(n^2) complexity.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
True if all dependencies are satisfied (or no dependencies)
|
True if all dependencies are satisfied (or no dependencies)
|
||||||
@@ -97,22 +113,31 @@ def are_dependencies_satisfied(feature: dict, all_features: list[dict]) -> bool:
|
|||||||
deps = feature.get("dependencies") or []
|
deps = feature.get("dependencies") or []
|
||||||
if not deps:
|
if not deps:
|
||||||
return True
|
return True
|
||||||
passing_ids = {f["id"] for f in all_features if f.get("passes")}
|
if passing_ids is None:
|
||||||
|
passing_ids = {f["id"] for f in all_features if f.get("passes")}
|
||||||
return all(dep_id in passing_ids for dep_id in deps)
|
return all(dep_id in passing_ids for dep_id in deps)
|
||||||
|
|
||||||
|
|
||||||
def get_blocking_dependencies(feature: dict, all_features: list[dict]) -> list[int]:
|
def get_blocking_dependencies(
|
||||||
|
feature: dict,
|
||||||
|
all_features: list[dict],
|
||||||
|
passing_ids: set[int] | None = None,
|
||||||
|
) -> list[int]:
|
||||||
"""Get list of incomplete dependency IDs.
|
"""Get list of incomplete dependency IDs.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
feature: Feature dict to check
|
feature: Feature dict to check
|
||||||
all_features: List of all feature dicts
|
all_features: List of all feature dicts
|
||||||
|
passing_ids: Optional pre-computed set of passing feature IDs.
|
||||||
|
If None, will be computed from all_features. Pass this when
|
||||||
|
calling in a loop to avoid O(n^2) complexity.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
List of feature IDs that are blocking this feature
|
List of feature IDs that are blocking this feature
|
||||||
"""
|
"""
|
||||||
deps = feature.get("dependencies") or []
|
deps = feature.get("dependencies") or []
|
||||||
passing_ids = {f["id"] for f in all_features if f.get("passes")}
|
if passing_ids is None:
|
||||||
|
passing_ids = {f["id"] for f in all_features if f.get("passes")}
|
||||||
return [dep_id for dep_id in deps if dep_id not in passing_ids]
|
return [dep_id for dep_id in deps if dep_id not in passing_ids]
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
68
client.py
68
client.py
@@ -12,7 +12,7 @@ import sys
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
|
from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
|
||||||
from claude_agent_sdk.types import HookMatcher
|
from claude_agent_sdk.types import HookContext, HookInput, HookMatcher, SyncHookJSONOutput
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
from security import bash_security_hook
|
from security import bash_security_hook
|
||||||
@@ -55,7 +55,9 @@ FEATURE_MCP_TOOLS = [
|
|||||||
# Core feature operations
|
# Core feature operations
|
||||||
"mcp__features__feature_get_stats",
|
"mcp__features__feature_get_stats",
|
||||||
"mcp__features__feature_get_by_id", # Get assigned feature details
|
"mcp__features__feature_get_by_id", # Get assigned feature details
|
||||||
|
"mcp__features__feature_get_summary", # Lightweight: id, name, status, deps only
|
||||||
"mcp__features__feature_mark_in_progress",
|
"mcp__features__feature_mark_in_progress",
|
||||||
|
"mcp__features__feature_claim_and_get", # Atomic claim + get details
|
||||||
"mcp__features__feature_mark_passing",
|
"mcp__features__feature_mark_passing",
|
||||||
"mcp__features__feature_mark_failing", # Mark regression detected
|
"mcp__features__feature_mark_failing", # Mark regression detected
|
||||||
"mcp__features__feature_skip",
|
"mcp__features__feature_skip",
|
||||||
@@ -268,6 +270,45 @@ def create_client(
|
|||||||
context["project_dir"] = str(project_dir.resolve())
|
context["project_dir"] = str(project_dir.resolve())
|
||||||
return await bash_security_hook(input_data, tool_use_id, context)
|
return await bash_security_hook(input_data, tool_use_id, context)
|
||||||
|
|
||||||
|
# PreCompact hook for logging and customizing context compaction
|
||||||
|
# Compaction is handled automatically by Claude Code CLI when context approaches limits.
|
||||||
|
# This hook allows us to log when compaction occurs and optionally provide custom instructions.
|
||||||
|
async def pre_compact_hook(
|
||||||
|
input_data: HookInput,
|
||||||
|
tool_use_id: str | None,
|
||||||
|
context: HookContext,
|
||||||
|
) -> SyncHookJSONOutput:
|
||||||
|
"""
|
||||||
|
Hook called before context compaction occurs.
|
||||||
|
|
||||||
|
Compaction triggers:
|
||||||
|
- "auto": Automatic compaction when context approaches token limits
|
||||||
|
- "manual": User-initiated compaction via /compact command
|
||||||
|
|
||||||
|
The hook can customize compaction via hookSpecificOutput:
|
||||||
|
- customInstructions: String with focus areas for summarization
|
||||||
|
"""
|
||||||
|
trigger = input_data.get("trigger", "auto")
|
||||||
|
custom_instructions = input_data.get("custom_instructions")
|
||||||
|
|
||||||
|
if trigger == "auto":
|
||||||
|
print("[Context] Auto-compaction triggered (context approaching limit)")
|
||||||
|
else:
|
||||||
|
print("[Context] Manual compaction requested")
|
||||||
|
|
||||||
|
if custom_instructions:
|
||||||
|
print(f"[Context] Custom instructions: {custom_instructions}")
|
||||||
|
|
||||||
|
# Return empty dict to allow compaction to proceed with default behavior
|
||||||
|
# To customize, return:
|
||||||
|
# {
|
||||||
|
# "hookSpecificOutput": {
|
||||||
|
# "hookEventName": "PreCompact",
|
||||||
|
# "customInstructions": "Focus on preserving file paths and test results"
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
return SyncHookJSONOutput()
|
||||||
|
|
||||||
return ClaudeSDKClient(
|
return ClaudeSDKClient(
|
||||||
options=ClaudeAgentOptions(
|
options=ClaudeAgentOptions(
|
||||||
model=model,
|
model=model,
|
||||||
@@ -281,10 +322,35 @@ def create_client(
|
|||||||
"PreToolUse": [
|
"PreToolUse": [
|
||||||
HookMatcher(matcher="Bash", hooks=[bash_hook_with_context]),
|
HookMatcher(matcher="Bash", hooks=[bash_hook_with_context]),
|
||||||
],
|
],
|
||||||
|
# PreCompact hook for context management during long sessions.
|
||||||
|
# Compaction is automatic when context approaches token limits.
|
||||||
|
# This hook logs compaction events and can customize summarization.
|
||||||
|
"PreCompact": [
|
||||||
|
HookMatcher(hooks=[pre_compact_hook]),
|
||||||
|
],
|
||||||
},
|
},
|
||||||
max_turns=1000,
|
max_turns=1000,
|
||||||
cwd=str(project_dir.resolve()),
|
cwd=str(project_dir.resolve()),
|
||||||
settings=str(settings_file.resolve()), # Use absolute path
|
settings=str(settings_file.resolve()), # Use absolute path
|
||||||
env=sdk_env, # Pass API configuration overrides to CLI subprocess
|
env=sdk_env, # Pass API configuration overrides to CLI subprocess
|
||||||
|
# Enable extended context beta for better handling of long sessions.
|
||||||
|
# This provides up to 1M tokens of context with automatic compaction.
|
||||||
|
# See: https://docs.anthropic.com/en/api/beta-headers
|
||||||
|
betas=["context-1m-2025-08-07"],
|
||||||
|
# Note on context management:
|
||||||
|
# The Claude Agent SDK handles context management automatically through the
|
||||||
|
# underlying Claude Code CLI. When context approaches limits, the CLI
|
||||||
|
# automatically compacts/summarizes previous messages.
|
||||||
|
#
|
||||||
|
# The SDK does NOT expose explicit compaction_control or context_management
|
||||||
|
# parameters. Instead, context is managed via:
|
||||||
|
# 1. betas=["context-1m-2025-08-07"] - Extended context window
|
||||||
|
# 2. PreCompact hook - Intercept and customize compaction behavior
|
||||||
|
# 3. max_turns - Limit conversation turns (set to 1000 for long sessions)
|
||||||
|
#
|
||||||
|
# Future SDK versions may add explicit compaction controls. When available,
|
||||||
|
# consider adding:
|
||||||
|
# - compaction_control={"enabled": True, "context_token_threshold": 80000}
|
||||||
|
# - context_management={"edits": [...]} for tool use clearing
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -8,10 +8,12 @@ Provides tools to manage features in the autonomous coding system.
|
|||||||
Tools:
|
Tools:
|
||||||
- feature_get_stats: Get progress statistics
|
- feature_get_stats: Get progress statistics
|
||||||
- feature_get_by_id: Get a specific feature by ID
|
- feature_get_by_id: Get a specific feature by ID
|
||||||
|
- feature_get_summary: Get minimal feature info (id, name, status, deps)
|
||||||
- feature_mark_passing: Mark a feature as passing
|
- feature_mark_passing: Mark a feature as passing
|
||||||
- feature_mark_failing: Mark a feature as failing (regression detected)
|
- feature_mark_failing: Mark a feature as failing (regression detected)
|
||||||
- feature_skip: Skip a feature (move to end of queue)
|
- feature_skip: Skip a feature (move to end of queue)
|
||||||
- feature_mark_in_progress: Mark a feature as in-progress
|
- feature_mark_in_progress: Mark a feature as in-progress
|
||||||
|
- feature_claim_and_get: Atomically claim and get feature details
|
||||||
- feature_clear_in_progress: Clear in-progress status
|
- feature_clear_in_progress: Clear in-progress status
|
||||||
- feature_release_testing: Release testing lock on a feature
|
- feature_release_testing: Release testing lock on a feature
|
||||||
- feature_create_bulk: Create multiple features at once
|
- feature_create_bulk: Create multiple features at once
|
||||||
@@ -19,7 +21,7 @@ Tools:
|
|||||||
- feature_add_dependency: Add a dependency between features
|
- feature_add_dependency: Add a dependency between features
|
||||||
- feature_remove_dependency: Remove a dependency
|
- feature_remove_dependency: Remove a dependency
|
||||||
- feature_get_ready: Get features ready to implement
|
- feature_get_ready: Get features ready to implement
|
||||||
- feature_get_blocked: Get features blocked by dependencies
|
- feature_get_blocked: Get features blocked by dependencies (with limit)
|
||||||
- feature_get_graph: Get the dependency graph
|
- feature_get_graph: Get the dependency graph
|
||||||
|
|
||||||
Note: Feature selection (which feature to work on) is handled by the
|
Note: Feature selection (which feature to work on) is handled by the
|
||||||
@@ -142,11 +144,20 @@ def feature_get_stats() -> str:
|
|||||||
Returns:
|
Returns:
|
||||||
JSON with: passing (int), in_progress (int), total (int), percentage (float)
|
JSON with: passing (int), in_progress (int), total (int), percentage (float)
|
||||||
"""
|
"""
|
||||||
|
from sqlalchemy import case, func
|
||||||
|
|
||||||
session = get_session()
|
session = get_session()
|
||||||
try:
|
try:
|
||||||
total = session.query(Feature).count()
|
# Single aggregate query instead of 3 separate COUNT queries
|
||||||
passing = session.query(Feature).filter(Feature.passes == True).count()
|
result = session.query(
|
||||||
in_progress = session.query(Feature).filter(Feature.in_progress == True).count()
|
func.count(Feature.id).label('total'),
|
||||||
|
func.sum(case((Feature.passes == True, 1), else_=0)).label('passing'),
|
||||||
|
func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress')
|
||||||
|
).first()
|
||||||
|
|
||||||
|
total = result.total or 0
|
||||||
|
passing = int(result.passing or 0)
|
||||||
|
in_progress = int(result.in_progress or 0)
|
||||||
percentage = round((passing / total) * 100, 1) if total > 0 else 0.0
|
percentage = round((passing / total) * 100, 1) if total > 0 else 0.0
|
||||||
|
|
||||||
return json.dumps({
|
return json.dumps({
|
||||||
@@ -154,7 +165,7 @@ def feature_get_stats() -> str:
|
|||||||
"in_progress": in_progress,
|
"in_progress": in_progress,
|
||||||
"total": total,
|
"total": total,
|
||||||
"percentage": percentage
|
"percentage": percentage
|
||||||
}, indent=2)
|
})
|
||||||
finally:
|
finally:
|
||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
@@ -181,7 +192,38 @@ def feature_get_by_id(
|
|||||||
if feature is None:
|
if feature is None:
|
||||||
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
|
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
|
||||||
|
|
||||||
return json.dumps(feature.to_dict(), indent=2)
|
return json.dumps(feature.to_dict())
|
||||||
|
finally:
|
||||||
|
session.close()
|
||||||
|
|
||||||
|
|
||||||
|
@mcp.tool()
|
||||||
|
def feature_get_summary(
|
||||||
|
feature_id: Annotated[int, Field(description="The ID of the feature", ge=1)]
|
||||||
|
) -> str:
|
||||||
|
"""Get minimal feature info: id, name, status, and dependencies only.
|
||||||
|
|
||||||
|
Use this instead of feature_get_by_id when you only need status info,
|
||||||
|
not the full description and steps. This reduces response size significantly.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
feature_id: The ID of the feature to retrieve
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
JSON with: id, name, passes, in_progress, dependencies
|
||||||
|
"""
|
||||||
|
session = get_session()
|
||||||
|
try:
|
||||||
|
feature = session.query(Feature).filter(Feature.id == feature_id).first()
|
||||||
|
if feature is None:
|
||||||
|
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
|
||||||
|
return json.dumps({
|
||||||
|
"id": feature.id,
|
||||||
|
"name": feature.name,
|
||||||
|
"passes": feature.passes,
|
||||||
|
"in_progress": feature.in_progress,
|
||||||
|
"dependencies": feature.dependencies or []
|
||||||
|
})
|
||||||
finally:
|
finally:
|
||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
@@ -229,7 +271,7 @@ def feature_release_testing(
|
|||||||
return json.dumps({
|
return json.dumps({
|
||||||
"message": f"Feature #{feature_id} testing {status}",
|
"message": f"Feature #{feature_id} testing {status}",
|
||||||
"feature": feature.to_dict()
|
"feature": feature.to_dict()
|
||||||
}, indent=2)
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": f"Failed to release testing claim: {str(e)}"})
|
return json.dumps({"error": f"Failed to release testing claim: {str(e)}"})
|
||||||
@@ -250,7 +292,7 @@ def feature_mark_passing(
|
|||||||
feature_id: The ID of the feature to mark as passing
|
feature_id: The ID of the feature to mark as passing
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
JSON with the updated feature details, or error if not found.
|
JSON with success confirmation: {success, feature_id, name}
|
||||||
"""
|
"""
|
||||||
session = get_session()
|
session = get_session()
|
||||||
try:
|
try:
|
||||||
@@ -262,9 +304,8 @@ def feature_mark_passing(
|
|||||||
feature.passes = True
|
feature.passes = True
|
||||||
feature.in_progress = False
|
feature.in_progress = False
|
||||||
session.commit()
|
session.commit()
|
||||||
session.refresh(feature)
|
|
||||||
|
|
||||||
return json.dumps(feature.to_dict(), indent=2)
|
return json.dumps({"success": True, "feature_id": feature_id, "name": feature.name})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": f"Failed to mark feature passing: {str(e)}"})
|
return json.dumps({"error": f"Failed to mark feature passing: {str(e)}"})
|
||||||
@@ -309,7 +350,7 @@ def feature_mark_failing(
|
|||||||
return json.dumps({
|
return json.dumps({
|
||||||
"message": f"Feature #{feature_id} marked as failing - regression detected",
|
"message": f"Feature #{feature_id} marked as failing - regression detected",
|
||||||
"feature": feature.to_dict()
|
"feature": feature.to_dict()
|
||||||
}, indent=2)
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": f"Failed to mark feature failing: {str(e)}"})
|
return json.dumps({"error": f"Failed to mark feature failing: {str(e)}"})
|
||||||
@@ -368,7 +409,7 @@ def feature_skip(
|
|||||||
"old_priority": old_priority,
|
"old_priority": old_priority,
|
||||||
"new_priority": new_priority,
|
"new_priority": new_priority,
|
||||||
"message": f"Feature '{feature.name}' moved to end of queue"
|
"message": f"Feature '{feature.name}' moved to end of queue"
|
||||||
}, indent=2)
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": f"Failed to skip feature: {str(e)}"})
|
return json.dumps({"error": f"Failed to skip feature: {str(e)}"})
|
||||||
@@ -408,7 +449,7 @@ def feature_mark_in_progress(
|
|||||||
session.commit()
|
session.commit()
|
||||||
session.refresh(feature)
|
session.refresh(feature)
|
||||||
|
|
||||||
return json.dumps(feature.to_dict(), indent=2)
|
return json.dumps(feature.to_dict())
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": f"Failed to mark feature in-progress: {str(e)}"})
|
return json.dumps({"error": f"Failed to mark feature in-progress: {str(e)}"})
|
||||||
@@ -416,6 +457,48 @@ def feature_mark_in_progress(
|
|||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
|
|
||||||
|
@mcp.tool()
|
||||||
|
def feature_claim_and_get(
|
||||||
|
feature_id: Annotated[int, Field(description="The ID of the feature to claim", ge=1)]
|
||||||
|
) -> str:
|
||||||
|
"""Atomically claim a feature (mark in-progress) and return its full details.
|
||||||
|
|
||||||
|
Combines feature_mark_in_progress + feature_get_by_id into a single operation.
|
||||||
|
If already in-progress, still returns the feature details (idempotent).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
feature_id: The ID of the feature to claim and retrieve
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
JSON with feature details including claimed status, or error if not found.
|
||||||
|
"""
|
||||||
|
session = get_session()
|
||||||
|
try:
|
||||||
|
feature = session.query(Feature).filter(Feature.id == feature_id).first()
|
||||||
|
|
||||||
|
if feature is None:
|
||||||
|
return json.dumps({"error": f"Feature with ID {feature_id} not found"})
|
||||||
|
|
||||||
|
if feature.passes:
|
||||||
|
return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
|
||||||
|
|
||||||
|
# Idempotent: if already in-progress, just return details
|
||||||
|
already_claimed = feature.in_progress
|
||||||
|
if not already_claimed:
|
||||||
|
feature.in_progress = True
|
||||||
|
session.commit()
|
||||||
|
session.refresh(feature)
|
||||||
|
|
||||||
|
result = feature.to_dict()
|
||||||
|
result["already_claimed"] = already_claimed
|
||||||
|
return json.dumps(result)
|
||||||
|
except Exception as e:
|
||||||
|
session.rollback()
|
||||||
|
return json.dumps({"error": f"Failed to claim feature: {str(e)}"})
|
||||||
|
finally:
|
||||||
|
session.close()
|
||||||
|
|
||||||
|
|
||||||
@mcp.tool()
|
@mcp.tool()
|
||||||
def feature_clear_in_progress(
|
def feature_clear_in_progress(
|
||||||
feature_id: Annotated[int, Field(description="The ID of the feature to clear in-progress status", ge=1)]
|
feature_id: Annotated[int, Field(description="The ID of the feature to clear in-progress status", ge=1)]
|
||||||
@@ -442,7 +525,7 @@ def feature_clear_in_progress(
|
|||||||
session.commit()
|
session.commit()
|
||||||
session.refresh(feature)
|
session.refresh(feature)
|
||||||
|
|
||||||
return json.dumps(feature.to_dict(), indent=2)
|
return json.dumps(feature.to_dict())
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": f"Failed to clear in-progress status: {str(e)}"})
|
return json.dumps({"error": f"Failed to clear in-progress status: {str(e)}"})
|
||||||
@@ -549,7 +632,7 @@ def feature_create_bulk(
|
|||||||
return json.dumps({
|
return json.dumps({
|
||||||
"created": len(created_features),
|
"created": len(created_features),
|
||||||
"with_dependencies": deps_count
|
"with_dependencies": deps_count
|
||||||
}, indent=2)
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": str(e)})
|
return json.dumps({"error": str(e)})
|
||||||
@@ -604,7 +687,7 @@ def feature_create(
|
|||||||
"success": True,
|
"success": True,
|
||||||
"message": f"Created feature: {name}",
|
"message": f"Created feature: {name}",
|
||||||
"feature": db_feature.to_dict()
|
"feature": db_feature.to_dict()
|
||||||
}, indent=2)
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
session.rollback()
|
session.rollback()
|
||||||
return json.dumps({"error": str(e)})
|
return json.dumps({"error": str(e)})
|
||||||
@@ -754,20 +837,25 @@ def feature_get_ready(
|
|||||||
"features": ready[:limit],
|
"features": ready[:limit],
|
||||||
"count": len(ready[:limit]),
|
"count": len(ready[:limit]),
|
||||||
"total_ready": len(ready)
|
"total_ready": len(ready)
|
||||||
}, indent=2)
|
})
|
||||||
finally:
|
finally:
|
||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
|
|
||||||
@mcp.tool()
|
@mcp.tool()
|
||||||
def feature_get_blocked() -> str:
|
def feature_get_blocked(
|
||||||
"""Get all features that are blocked by unmet dependencies.
|
limit: Annotated[int, Field(default=20, ge=1, le=100, description="Max features to return")] = 20
|
||||||
|
) -> str:
|
||||||
|
"""Get features that are blocked by unmet dependencies.
|
||||||
|
|
||||||
Returns features that have dependencies which are not yet passing.
|
Returns features that have dependencies which are not yet passing.
|
||||||
Each feature includes a 'blocked_by' field listing the blocking feature IDs.
|
Each feature includes a 'blocked_by' field listing the blocking feature IDs.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
limit: Maximum number of features to return (1-100, default 20)
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
JSON with: features (list with blocked_by field), count (int)
|
JSON with: features (list with blocked_by field), count (int), total_blocked (int)
|
||||||
"""
|
"""
|
||||||
session = get_session()
|
session = get_session()
|
||||||
try:
|
try:
|
||||||
@@ -787,9 +875,10 @@ def feature_get_blocked() -> str:
|
|||||||
})
|
})
|
||||||
|
|
||||||
return json.dumps({
|
return json.dumps({
|
||||||
"features": blocked,
|
"features": blocked[:limit],
|
||||||
"count": len(blocked)
|
"count": len(blocked[:limit]),
|
||||||
}, indent=2)
|
"total_blocked": len(blocked)
|
||||||
|
})
|
||||||
finally:
|
finally:
|
||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
@@ -840,7 +929,7 @@ def feature_get_graph() -> str:
|
|||||||
return json.dumps({
|
return json.dumps({
|
||||||
"nodes": nodes,
|
"nodes": nodes,
|
||||||
"edges": edges
|
"edges": edges
|
||||||
}, indent=2)
|
})
|
||||||
finally:
|
finally:
|
||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
|
|||||||
@@ -186,6 +186,12 @@ class ParallelOrchestrator:
|
|||||||
# Session tracking for logging/debugging
|
# Session tracking for logging/debugging
|
||||||
self.session_start_time: datetime = None
|
self.session_start_time: datetime = None
|
||||||
|
|
||||||
|
# Event signaled when any agent completes, allowing the main loop to wake
|
||||||
|
# immediately instead of waiting for the full POLL_INTERVAL timeout.
|
||||||
|
# This reduces latency when spawning the next feature after completion.
|
||||||
|
self._agent_completed_event: asyncio.Event = None # Created in run_loop
|
||||||
|
self._event_loop: asyncio.AbstractEventLoop = None # Stored for thread-safe signaling
|
||||||
|
|
||||||
# Database session for this orchestrator
|
# Database session for this orchestrator
|
||||||
self._engine, self._session_maker = create_database(project_dir)
|
self._engine, self._session_maker = create_database(project_dir)
|
||||||
|
|
||||||
@@ -311,6 +317,9 @@ class ParallelOrchestrator:
|
|||||||
all_features = session.query(Feature).all()
|
all_features = session.query(Feature).all()
|
||||||
all_dicts = [f.to_dict() for f in all_features]
|
all_dicts = [f.to_dict() for f in all_features]
|
||||||
|
|
||||||
|
# Pre-compute passing_ids once to avoid O(n^2) in the loop
|
||||||
|
passing_ids = {f.id for f in all_features if f.passes}
|
||||||
|
|
||||||
ready = []
|
ready = []
|
||||||
skipped_reasons = {"passes": 0, "in_progress": 0, "running": 0, "failed": 0, "deps": 0}
|
skipped_reasons = {"passes": 0, "in_progress": 0, "running": 0, "failed": 0, "deps": 0}
|
||||||
for f in all_features:
|
for f in all_features:
|
||||||
@@ -329,8 +338,8 @@ class ParallelOrchestrator:
|
|||||||
if self._failure_counts.get(f.id, 0) >= MAX_FEATURE_RETRIES:
|
if self._failure_counts.get(f.id, 0) >= MAX_FEATURE_RETRIES:
|
||||||
skipped_reasons["failed"] += 1
|
skipped_reasons["failed"] += 1
|
||||||
continue
|
continue
|
||||||
# Check dependencies
|
# Check dependencies (pass pre-computed passing_ids)
|
||||||
if are_dependencies_satisfied(f.to_dict(), all_dicts):
|
if are_dependencies_satisfied(f.to_dict(), all_dicts, passing_ids):
|
||||||
ready.append(f.to_dict())
|
ready.append(f.to_dict())
|
||||||
else:
|
else:
|
||||||
skipped_reasons["deps"] += 1
|
skipped_reasons["deps"] += 1
|
||||||
@@ -794,6 +803,52 @@ class ParallelOrchestrator:
|
|||||||
finally:
|
finally:
|
||||||
self._on_agent_complete(feature_id, proc.returncode, agent_type, proc)
|
self._on_agent_complete(feature_id, proc.returncode, agent_type, proc)
|
||||||
|
|
||||||
|
def _signal_agent_completed(self):
|
||||||
|
"""Signal that an agent has completed, waking the main loop.
|
||||||
|
|
||||||
|
This method is safe to call from any thread. It schedules the event.set()
|
||||||
|
call to run on the event loop thread to avoid cross-thread issues with
|
||||||
|
asyncio.Event.
|
||||||
|
"""
|
||||||
|
if self._agent_completed_event is not None and self._event_loop is not None:
|
||||||
|
try:
|
||||||
|
# Use the stored event loop reference to schedule the set() call
|
||||||
|
# This is necessary because asyncio.Event is not thread-safe and
|
||||||
|
# asyncio.get_event_loop() fails in threads without an event loop
|
||||||
|
if self._event_loop.is_running():
|
||||||
|
self._event_loop.call_soon_threadsafe(self._agent_completed_event.set)
|
||||||
|
else:
|
||||||
|
# Fallback: set directly if loop isn't running (shouldn't happen during normal operation)
|
||||||
|
self._agent_completed_event.set()
|
||||||
|
except RuntimeError:
|
||||||
|
# Event loop closed, ignore (orchestrator may be shutting down)
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def _wait_for_agent_completion(self, timeout: float = POLL_INTERVAL):
|
||||||
|
"""Wait for an agent to complete or until timeout expires.
|
||||||
|
|
||||||
|
This replaces fixed `asyncio.sleep(POLL_INTERVAL)` calls with event-based
|
||||||
|
waiting. When an agent completes, _signal_agent_completed() sets the event,
|
||||||
|
causing this method to return immediately. If no agent completes within
|
||||||
|
the timeout, we return anyway to check for ready features.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
timeout: Maximum seconds to wait (default: POLL_INTERVAL)
|
||||||
|
"""
|
||||||
|
if self._agent_completed_event is None:
|
||||||
|
# Fallback if event not initialized (shouldn't happen in normal operation)
|
||||||
|
await asyncio.sleep(timeout)
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(self._agent_completed_event.wait(), timeout=timeout)
|
||||||
|
# Event was set - an agent completed. Clear it for the next wait cycle.
|
||||||
|
self._agent_completed_event.clear()
|
||||||
|
debug_log.log("EVENT", "Woke up immediately - agent completed")
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
# Timeout reached without agent completion - this is normal, just check anyway
|
||||||
|
pass
|
||||||
|
|
||||||
def _on_agent_complete(
|
def _on_agent_complete(
|
||||||
self,
|
self,
|
||||||
feature_id: int | None,
|
feature_id: int | None,
|
||||||
@@ -832,6 +887,8 @@ class ParallelOrchestrator:
|
|||||||
pid=proc.pid,
|
pid=proc.pid,
|
||||||
feature_id=feature_id,
|
feature_id=feature_id,
|
||||||
status=status)
|
status=status)
|
||||||
|
# Signal main loop that an agent slot is available
|
||||||
|
self._signal_agent_completed()
|
||||||
return
|
return
|
||||||
|
|
||||||
# Coding agent completion
|
# Coding agent completion
|
||||||
@@ -843,40 +900,20 @@ class ParallelOrchestrator:
|
|||||||
self.running_coding_agents.pop(feature_id, None)
|
self.running_coding_agents.pop(feature_id, None)
|
||||||
self.abort_events.pop(feature_id, None)
|
self.abort_events.pop(feature_id, None)
|
||||||
|
|
||||||
# BEFORE dispose: Query database state to see if it's stale
|
# Refresh session cache to see subprocess commits
|
||||||
session_before = self.get_session()
|
|
||||||
try:
|
|
||||||
session_before.expire_all()
|
|
||||||
feature_before = session_before.query(Feature).filter(Feature.id == feature_id).first()
|
|
||||||
all_before = session_before.query(Feature).all()
|
|
||||||
passing_before = sum(1 for f in all_before if f.passes)
|
|
||||||
debug_log.log("DB", f"BEFORE engine.dispose() - Feature #{feature_id} state",
|
|
||||||
passes=feature_before.passes if feature_before else None,
|
|
||||||
in_progress=feature_before.in_progress if feature_before else None,
|
|
||||||
total_passing_in_db=passing_before)
|
|
||||||
finally:
|
|
||||||
session_before.close()
|
|
||||||
|
|
||||||
# CRITICAL: Refresh database connection to see subprocess commits
|
|
||||||
# The coding agent runs as a subprocess and commits changes (e.g., passes=True).
|
# The coding agent runs as a subprocess and commits changes (e.g., passes=True).
|
||||||
# SQLAlchemy may have stale connections. Disposing the engine forces new connections
|
# Using session.expire_all() is lighter weight than engine.dispose() for SQLite WAL mode
|
||||||
# that will see the subprocess's committed changes.
|
# and is sufficient to invalidate cached data and force fresh reads.
|
||||||
debug_log.log("DB", "Disposing database engine now...")
|
# engine.dispose() is only called on orchestrator shutdown, not on every agent completion.
|
||||||
self._engine.dispose()
|
|
||||||
|
|
||||||
# AFTER dispose: Query again to compare
|
|
||||||
session = self.get_session()
|
session = self.get_session()
|
||||||
try:
|
try:
|
||||||
|
session.expire_all()
|
||||||
feature = session.query(Feature).filter(Feature.id == feature_id).first()
|
feature = session.query(Feature).filter(Feature.id == feature_id).first()
|
||||||
all_after = session.query(Feature).all()
|
|
||||||
passing_after = sum(1 for f in all_after if f.passes)
|
|
||||||
feature_passes = feature.passes if feature else None
|
feature_passes = feature.passes if feature else None
|
||||||
feature_in_progress = feature.in_progress if feature else None
|
feature_in_progress = feature.in_progress if feature else None
|
||||||
debug_log.log("DB", f"AFTER engine.dispose() - Feature #{feature_id} state",
|
debug_log.log("DB", f"Feature #{feature_id} state after session.expire_all()",
|
||||||
passes=feature_passes,
|
passes=feature_passes,
|
||||||
in_progress=feature_in_progress,
|
in_progress=feature_in_progress)
|
||||||
total_passing_in_db=passing_after,
|
|
||||||
passing_changed=(passing_after != passing_before) if 'passing_before' in dir() else "unknown")
|
|
||||||
if feature and feature.in_progress and not feature.passes:
|
if feature and feature.in_progress and not feature.passes:
|
||||||
feature.in_progress = False
|
feature.in_progress = False
|
||||||
session.commit()
|
session.commit()
|
||||||
@@ -900,6 +937,9 @@ class ParallelOrchestrator:
|
|||||||
# CRITICAL: This print triggers the WebSocket to emit agent_update with state='error' or 'success'
|
# CRITICAL: This print triggers the WebSocket to emit agent_update with state='error' or 'success'
|
||||||
print(f"Feature #{feature_id} {status}", flush=True)
|
print(f"Feature #{feature_id} {status}", flush=True)
|
||||||
|
|
||||||
|
# Signal main loop that an agent slot is available
|
||||||
|
self._signal_agent_completed()
|
||||||
|
|
||||||
# NOTE: Testing agents are now spawned in start_feature() when coding agents START,
|
# NOTE: Testing agents are now spawned in start_feature() when coding agents START,
|
||||||
# not here when they complete. This ensures 1:1 ratio and proper termination.
|
# not here when they complete. This ensures 1:1 ratio and proper termination.
|
||||||
|
|
||||||
@@ -949,6 +989,12 @@ class ParallelOrchestrator:
|
|||||||
"""Main orchestration loop."""
|
"""Main orchestration loop."""
|
||||||
self.is_running = True
|
self.is_running = True
|
||||||
|
|
||||||
|
# Initialize the agent completion event for this run
|
||||||
|
# Must be created in the async context where it will be used
|
||||||
|
self._agent_completed_event = asyncio.Event()
|
||||||
|
# Store the event loop reference for thread-safe signaling from output reader threads
|
||||||
|
self._event_loop = asyncio.get_running_loop()
|
||||||
|
|
||||||
# Track session start for regression testing (UTC for consistency with last_tested_at)
|
# Track session start for regression testing (UTC for consistency with last_tested_at)
|
||||||
self.session_start_time = datetime.now(timezone.utc)
|
self.session_start_time = datetime.now(timezone.utc)
|
||||||
|
|
||||||
@@ -1100,8 +1146,8 @@ class ParallelOrchestrator:
|
|||||||
at_capacity=(current >= self.max_concurrency))
|
at_capacity=(current >= self.max_concurrency))
|
||||||
|
|
||||||
if current >= self.max_concurrency:
|
if current >= self.max_concurrency:
|
||||||
debug_log.log("CAPACITY", "At max capacity, sleeping...")
|
debug_log.log("CAPACITY", "At max capacity, waiting for agent completion...")
|
||||||
await asyncio.sleep(POLL_INTERVAL)
|
await self._wait_for_agent_completion()
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Priority 1: Resume features from previous session
|
# Priority 1: Resume features from previous session
|
||||||
@@ -1119,7 +1165,7 @@ class ParallelOrchestrator:
|
|||||||
if not ready:
|
if not ready:
|
||||||
# Wait for running features to complete
|
# Wait for running features to complete
|
||||||
if current > 0:
|
if current > 0:
|
||||||
await asyncio.sleep(POLL_INTERVAL)
|
await self._wait_for_agent_completion()
|
||||||
continue
|
continue
|
||||||
else:
|
else:
|
||||||
# No ready features and nothing running
|
# No ready features and nothing running
|
||||||
@@ -1138,7 +1184,7 @@ class ParallelOrchestrator:
|
|||||||
|
|
||||||
# Still have pending features but all are blocked by dependencies
|
# Still have pending features but all are blocked by dependencies
|
||||||
print("No ready features available. All remaining features may be blocked by dependencies.", flush=True)
|
print("No ready features available. All remaining features may be blocked by dependencies.", flush=True)
|
||||||
await asyncio.sleep(POLL_INTERVAL * 2)
|
await self._wait_for_agent_completion(timeout=POLL_INTERVAL * 2)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Start features up to capacity
|
# Start features up to capacity
|
||||||
@@ -1174,7 +1220,7 @@ class ParallelOrchestrator:
|
|||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Orchestrator error: {e}", flush=True)
|
print(f"Orchestrator error: {e}", flush=True)
|
||||||
await asyncio.sleep(POLL_INTERVAL)
|
await self._wait_for_agent_completion()
|
||||||
|
|
||||||
# Wait for remaining agents to complete
|
# Wait for remaining agents to complete
|
||||||
print("Waiting for running agents to complete...", flush=True)
|
print("Waiting for running agents to complete...", flush=True)
|
||||||
@@ -1184,7 +1230,8 @@ class ParallelOrchestrator:
|
|||||||
testing_done = len(self.running_testing_agents) == 0
|
testing_done = len(self.running_testing_agents) == 0
|
||||||
if coding_done and testing_done:
|
if coding_done and testing_done:
|
||||||
break
|
break
|
||||||
await asyncio.sleep(1)
|
# Use short timeout since we're just waiting for final agents to finish
|
||||||
|
await self._wait_for_agent_completion(timeout=1.0)
|
||||||
|
|
||||||
print("Orchestrator finished.", flush=True)
|
print("Orchestrator finished.", flush=True)
|
||||||
|
|
||||||
|
|||||||
30
progress.py
30
progress.py
@@ -72,15 +72,31 @@ def count_passing_tests(project_dir: Path) -> tuple[int, int, int]:
|
|||||||
try:
|
try:
|
||||||
conn = sqlite3.connect(db_file)
|
conn = sqlite3.connect(db_file)
|
||||||
cursor = conn.cursor()
|
cursor = conn.cursor()
|
||||||
cursor.execute("SELECT COUNT(*) FROM features")
|
# Single aggregate query instead of 3 separate COUNT queries
|
||||||
total = cursor.fetchone()[0]
|
# Handle case where in_progress column doesn't exist yet (legacy DBs)
|
||||||
cursor.execute("SELECT COUNT(*) FROM features WHERE passes = 1")
|
|
||||||
passing = cursor.fetchone()[0]
|
|
||||||
# Handle case where in_progress column doesn't exist yet
|
|
||||||
try:
|
try:
|
||||||
cursor.execute("SELECT COUNT(*) FROM features WHERE in_progress = 1")
|
cursor.execute("""
|
||||||
in_progress = cursor.fetchone()[0]
|
SELECT
|
||||||
|
COUNT(*) as total,
|
||||||
|
SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
|
||||||
|
SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress
|
||||||
|
FROM features
|
||||||
|
""")
|
||||||
|
row = cursor.fetchone()
|
||||||
|
total = row[0] or 0
|
||||||
|
passing = row[1] or 0
|
||||||
|
in_progress = row[2] or 0
|
||||||
except sqlite3.OperationalError:
|
except sqlite3.OperationalError:
|
||||||
|
# Fallback for databases without in_progress column
|
||||||
|
cursor.execute("""
|
||||||
|
SELECT
|
||||||
|
COUNT(*) as total,
|
||||||
|
SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing
|
||||||
|
FROM features
|
||||||
|
""")
|
||||||
|
row = cursor.fetchone()
|
||||||
|
total = row[0] or 0
|
||||||
|
passing = row[1] or 0
|
||||||
in_progress = 0
|
in_progress = 0
|
||||||
conn.close()
|
conn.close()
|
||||||
return passing, in_progress, total
|
return passing, in_progress, total
|
||||||
|
|||||||
38
prompts.py
38
prompts.py
@@ -109,11 +109,11 @@ The orchestrator has already claimed this feature for you.
|
|||||||
|
|
||||||
|
|
||||||
def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None, yolo_mode: bool = False) -> str:
|
def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None, yolo_mode: bool = False) -> str:
|
||||||
"""
|
"""Prepend single-feature assignment header to base coding prompt.
|
||||||
Load the coding prompt with single-feature focus instructions prepended.
|
|
||||||
|
|
||||||
When the orchestrator assigns a specific feature to a coding agent,
|
Used in parallel mode to assign a specific feature to an agent.
|
||||||
this prompt ensures the agent works ONLY on that feature.
|
The base prompt already contains the full workflow - this just
|
||||||
|
identifies which feature to work on.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
feature_id: The specific feature ID to work on
|
feature_id: The specific feature ID to work on
|
||||||
@@ -122,38 +122,20 @@ def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None,
|
|||||||
handled by separate testing agents, not YOLO prompts.
|
handled by separate testing agents, not YOLO prompts.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
The prompt with single-feature instructions prepended
|
The prompt with single-feature header prepended
|
||||||
"""
|
"""
|
||||||
# Always use the standard coding prompt
|
|
||||||
# (Testing/regression is handled by separate testing agents)
|
|
||||||
base_prompt = get_coding_prompt(project_dir)
|
base_prompt = get_coding_prompt(project_dir)
|
||||||
|
|
||||||
# Prepend single-feature instructions
|
# Minimal header - the base prompt already contains the full workflow
|
||||||
single_feature_header = f"""## ASSIGNED FEATURE
|
single_feature_header = f"""## ASSIGNED FEATURE: #{feature_id}
|
||||||
|
|
||||||
**You are assigned to work on Feature #{feature_id} ONLY.**
|
Work ONLY on this feature. Other agents are handling other features.
|
||||||
|
Use `feature_claim_and_get` with ID {feature_id} to claim it and get details.
|
||||||
This session is part of a parallel execution where multiple agents work on different features simultaneously.
|
If blocked, use `feature_skip` and document the blocker.
|
||||||
|
|
||||||
### Your workflow:
|
|
||||||
|
|
||||||
1. **Get feature details** using `feature_get_by_id` with ID {feature_id}
|
|
||||||
2. **Mark as in-progress** using `feature_mark_in_progress` with ID {feature_id}
|
|
||||||
- If you get "already in-progress" error, that's OK - continue with implementation
|
|
||||||
3. **Implement the feature** following the steps from the feature details
|
|
||||||
4. **Test your implementation** to verify it works correctly
|
|
||||||
5. **Mark as passing** using `feature_mark_passing` with ID {feature_id}
|
|
||||||
6. **Commit your changes** and end the session
|
|
||||||
|
|
||||||
### Important rules:
|
|
||||||
|
|
||||||
- **Do NOT** work on any other features - other agents are handling them
|
|
||||||
- If blocked, use `feature_skip` and document the blocker in claude-progress.txt
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
return single_feature_header + base_prompt
|
return single_feature_header + base_prompt
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user