The parallel orchestrator was exiting prematurely with "All features
complete!" while pending features remained. This was caused by SQLAlchemy
session caching not seeing commits made by agent subprocesses.
Changes:
- Add session.expire_all() to get_resumable_features() to force fresh reads
- Add session.expire_all() to get_ready_features() to force fresh reads
- Add session.expire_all() to get_all_complete() to force fresh reads
- Add defensive retry logic in run_loop() when no features are ready
but nothing is running - now forces a fresh check before declaring blocked
- Add debug logging to get_all_complete() and get_ready_features() to
track passing/pending/in_progress counts for easier diagnosis
The root cause was cross-process database visibility: when an agent
subprocess committed feature completion, the orchestrator's session
had cached the old state and didn't see the update.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unused imports and organize import statements to pass ruff
linting checks:
- mcp_server/feature_mcp.py: Remove unused imports (are_dependencies_satisfied,
get_blocking_dependencies) and alphabetize import block
- parallel_orchestrator.py: Remove unused imports (time, Awaitable) and
add blank lines between import groups per PEP 8
- server/routers/features.py: Alphabetize imports in dependency resolver
These changes were identified by running `ruff check .` and auto-fixed
with `--fix` flag.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Critical fixes:
- Lock file TOCTOU race condition: Use atomic O_CREAT|O_EXCL for lock creation
- PID reuse vulnerability on Windows: Store PID:CREATE_TIME in lock file to
detect when a different process has reused the same PID
- WAL mode on network drives: Detect network paths (UNC, mapped drives, NFS,
CIFS) and fall back to DELETE journal mode to prevent corruption
High priority fixes:
- JSON migration now preserves dependencies field during legacy migration
- Process tree termination on Windows: Use psutil to kill child processes
recursively to prevent orphaned browser instances
- Retry backoff jitter: Add random 30% jitter to prevent synchronized retries
under high contention with 5 concurrent agents
Files changed:
- server/services/process_manager.py: Atomic lock creation, PID+create_time
- api/database.py: Network filesystem detection for WAL mode fallback
- api/migration.py: Add dependencies field to JSON migration
- parallel_orchestrator.py: _kill_process_tree helper function
- mcp_server/feature_mcp.py: Add jitter to exponential backoff
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes:
- Add per-agent log viewer with copy-to-clipboard functionality
- New AgentLogEntry type for structured log entries
- Logs stored per-agent in WebSocket state (up to 500 entries)
- Log modal rendered via React Portal to avoid overflow issues
- Click log icon on agent card to view full activity history
- Fix agents getting stuck in "failed" state
- Wrap client context manager in try/except (agent.py)
- Remove failed agents from UI on error state (useWebSocket.ts)
- Handle permanently failed features in get_all_complete()
- Add friendlier agent state labels
- "Hit an issue" → "Trying plan B..."
- "Retrying..." → "Being persistent..."
- Softer colors (yellow/orange instead of red)
- Add scheduling scores for smarter feature ordering
- compute_scheduling_scores() in dependency_resolver.py
- Features that unblock others get higher priority
- Update CLAUDE.md with parallel mode documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major feature implementation for parallel agent execution with dependency-aware
scheduling and an engaging multi-agent UI experience.
Backend Changes:
- Add parallel_orchestrator.py for concurrent feature processing
- Add api/dependency_resolver.py with cycle detection (Kahn's algorithm + DFS)
- Add atomic feature_claim_next() with retry limit and exponential backoff
- Fix circular dependency check arguments in 4 locations
- Add AgentTracker class for parsing agent output and emitting updates
- Add browser isolation with --isolated flag for Playwright MCP
- Extend WebSocket protocol with agent_update messages and log attribution
- Add WSAgentUpdateMessage schema with agent states and mascot names
- Fix WSProgressMessage to include in_progress field
New UI Components:
- AgentMissionControl: Dashboard showing active agents with collapsible activity
- AgentCard: Individual agent status with avatar and thought bubble
- AgentAvatar: SVG mascots (Spark, Fizz, Octo, Hoot, Buzz) with animations
- ActivityFeed: Recent activity stream with stable keys (no flickering)
- CelebrationOverlay: Confetti animation with click/Escape dismiss
- DependencyGraph: Interactive node graph visualization with dagre layout
- DependencyBadge: Visual indicator for feature dependencies
- ViewToggle: Switch between Kanban and Graph views
- KeyboardShortcutsHelp: Help overlay accessible via ? key
UI/UX Improvements:
- Celebration queue system to handle rapid success messages
- Accessibility attributes on AgentAvatar (role, aria-label, aria-live)
- Collapsible Recent Activity section with persisted preference
- Agent count display in header
- Keyboard shortcut G to toggle Kanban/Graph view
- Real-time thought bubbles and state animations
Bug Fixes:
- Fix circular dependency validation (swapped source/target arguments)
- Add MAX_CLAIM_RETRIES=10 to prevent stack overflow under contention
- Fix THOUGHT_PATTERNS to match actual [Tool: name] format
- Fix ActivityFeed key prop to prevent re-renders on new items
- Add featureId/agentIndex to log messages for proper attribution
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>