Introduce a new testing agent architecture that runs regression tests
independently from coding agents, improving quality assurance in
parallel mode.
Key changes:
Testing Agent System:
- Add testing_prompt.template.md for dedicated testing agent role
- Add feature_mark_failing MCP tool for regression detection
- Add --agent-type flag to select initializer/coding/testing mode
- Remove regression testing from coding prompt (now handled by testing agents)
Parallel Orchestrator Enhancements:
- Add testing agent spawning with configurable ratio (--testing-agent-ratio)
- Add comprehensive debug logging system (DebugLog class)
- Improve database session management to prevent stale reads
- Add engine.dispose() calls to refresh connections after subprocess commits
- Fix f-string linting issues (remove unnecessary f-prefixes)
UI Improvements:
- Add testing agent mascot (Chip) to AgentAvatar
- Enhance AgentCard to display testing agent status
- Add testing agent ratio slider in SettingsModal
- Update WebSocket handling for testing agent updates
- Improve ActivityFeed to show testing agent activity
API & Server Updates:
- Add testing_agent_ratio to settings schema and endpoints
- Update process manager to support testing agent type
- Enhance WebSocket messages for agent_update events
Template Changes:
- Delete coding_prompt_yolo.template.md (consolidated into main prompt)
- Update initializer_prompt.template.md with improved structure
- Streamline coding_prompt.template.md workflow
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes:
- Add per-agent log viewer with copy-to-clipboard functionality
- New AgentLogEntry type for structured log entries
- Logs stored per-agent in WebSocket state (up to 500 entries)
- Log modal rendered via React Portal to avoid overflow issues
- Click log icon on agent card to view full activity history
- Fix agents getting stuck in "failed" state
- Wrap client context manager in try/except (agent.py)
- Remove failed agents from UI on error state (useWebSocket.ts)
- Handle permanently failed features in get_all_complete()
- Add friendlier agent state labels
- "Hit an issue" → "Trying plan B..."
- "Retrying..." → "Being persistent..."
- Softer colors (yellow/orange instead of red)
- Add scheduling scores for smarter feature ordering
- compute_scheduling_scores() in dependency_resolver.py
- Features that unblock others get higher priority
- Update CLAUDE.md with parallel mode documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major feature implementation for parallel agent execution with dependency-aware
scheduling and an engaging multi-agent UI experience.
Backend Changes:
- Add parallel_orchestrator.py for concurrent feature processing
- Add api/dependency_resolver.py with cycle detection (Kahn's algorithm + DFS)
- Add atomic feature_claim_next() with retry limit and exponential backoff
- Fix circular dependency check arguments in 4 locations
- Add AgentTracker class for parsing agent output and emitting updates
- Add browser isolation with --isolated flag for Playwright MCP
- Extend WebSocket protocol with agent_update messages and log attribution
- Add WSAgentUpdateMessage schema with agent states and mascot names
- Fix WSProgressMessage to include in_progress field
New UI Components:
- AgentMissionControl: Dashboard showing active agents with collapsible activity
- AgentCard: Individual agent status with avatar and thought bubble
- AgentAvatar: SVG mascots (Spark, Fizz, Octo, Hoot, Buzz) with animations
- ActivityFeed: Recent activity stream with stable keys (no flickering)
- CelebrationOverlay: Confetti animation with click/Escape dismiss
- DependencyGraph: Interactive node graph visualization with dagre layout
- DependencyBadge: Visual indicator for feature dependencies
- ViewToggle: Switch between Kanban and Graph views
- KeyboardShortcutsHelp: Help overlay accessible via ? key
UI/UX Improvements:
- Celebration queue system to handle rapid success messages
- Accessibility attributes on AgentAvatar (role, aria-label, aria-live)
- Collapsible Recent Activity section with persisted preference
- Agent count display in header
- Keyboard shortcut G to toggle Kanban/Graph view
- Real-time thought bubbles and state animations
Bug Fixes:
- Fix circular dependency validation (swapped source/target arguments)
- Add MAX_CLAIM_RETRIES=10 to prevent stack overflow under contention
- Fix THOUGHT_PATTERNS to match actual [Tool: name] format
- Fix ActivityFeed key prop to prevent re-renders on new items
- Add featureId/agentIndex to log messages for proper attribution
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Applied Fixes
More flexible string matching: Changed from response.lower().strip().startswith("limit reached") to "limit reached" in response.lower() to handle cases where the message has prefix text or variations in whitespace.
Improved regex pattern: Updated to r"(?i)\bresets(?:\s+at)?\s+(\d+)(?::(\d+))?\s*(am|pm)\s*\(([^)]+)\)" which now handles:
Optional "at" after "resets" (e.g., "resets at 3pm" or "resets 3pm")
Flexible whitespace around components
Word boundaries to prevent partial matches
Timezone sanitization: Added .strip() to tz_name = match.group(4).strip() to remove any leading/trailing whitespace that could cause ZoneInfo() to fail.
Safety clamp: Added delay_seconds = min(delta.total_seconds(), 24 * 60 * 60) to ensure the delay never exceeds 24 hours, preventing the agent from being stuck waiting for extremely long periods.
- Add CI workflow with Python (ruff lint, security tests) and UI (ESLint, TypeScript, build) jobs
- Add ruff, mypy, pytest to requirements.txt
- Add pyproject.toml with ruff configuration
- Fix import sorting across Python files (ruff --fix)
- Fix test_security.py expectations to match actual security policy
- Remove invalid 'eof' command from ALLOWED_COMMANDS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new YOLO (You Only Live Once) mode that skips all browser testing
and regression tests for faster feature iteration during prototyping.
Changes made:
**Core YOLO Mode Implementation:**
- Add --yolo CLI flag to autonomous_agent_demo.py
- Update agent.py to accept yolo_mode parameter and select appropriate prompt
- Modify client.py to conditionally include Playwright MCP server (excluded in YOLO mode)
- Add coding_prompt_yolo.template.md with static analysis only verification
- Add get_coding_prompt_yolo() to prompts.py
**Server/API Updates:**
- Add AgentStartRequest schema with yolo_mode field
- Update AgentStatus to include yolo_mode
- Modify process_manager.py to pass --yolo flag to subprocess
- Update agent router to accept yolo_mode in start request
**UI Updates:**
- Add YOLO toggle button (lightning bolt icon) in AgentControl
- Show YOLO mode indicator when agent is running in YOLO mode
- Add useAgentStatus hook to track current mode
- Update startAgent API to accept yoloMode parameter
- Add YOLO toggle in SpecCreationChat completion flow
**Spec Creation Improvements:**
- Fix create-spec.md to properly replace [FEATURE_COUNT] placeholder
- Add REQUIRED FEATURE COUNT section to initializer_prompt.template.md
- Fix spec_chat_session.py to create security settings file for Claude SDK
- Delete app_spec.txt before spec creation to allow fresh creation
**Documentation:**
- Add YOLO mode section to CLAUDE.md with usage examples
- Add checkpoint.md slash command for creating detailed commits
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>