Commit Graph

23 Commits

Author SHA1 Message Date
Leon van Zyl
9f67d7ffe4 Merge pull request #95 from cabana8471-arch/fix/infrastructure-features-mock-prevention
fix: Prevent mock data implementations with infrastructure features
2026-01-29 11:00:13 +02:00
cabana8471
4cec4e63a4 fix: standardize tier naming from 'Complex' to 'Advanced' for consistency
Per CodeRabbit review - aligns with create-spec.md terminology.
2026-01-29 08:35:17 +01:00
cabana8471
d47028d97a fix: add shlex fallback parser and heredoc warning
- Add _extract_primary_command() fallback when shlex.split() fails on complex nested quotes (e.g., docker exec with PHP)

- Returns primary command instead of empty list, allowing valid commands to proceed

- Add heredoc warning to coding prompt - sandbox blocks /tmp access for here documents

- All 162 security tests pass
2026-01-29 08:04:01 +01:00
cabana8471
11cefec85b fix: add test file exclusions to mock data grep checks
The comment said "excluding test files" but the grep commands didn't
actually exclude them. Added common test file exclusion patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 06:40:34 +01:00
cabana8471
d652b18587 fix: add language tags to fenced code blocks per CodeRabbit/markdownlint
Added 'text' language identifier to all fenced code blocks in the
Infrastructure Feature Descriptions section to satisfy MD040.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 07:00:30 +01:00
cabana8471
03504b3c1a fix: use port-based process killing for cross-platform safety
Addresses reviewer feedback:
1. Windows Compatibility: Added Windows alternative using netstat/taskkill
2. Safer Process Killing: Changed from `pkill -f "node"` to port-based
   killing (`lsof -ti :$PORT`) to avoid killing unrelated Node processes
   like VS Code, Claude Code, or other development tools

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 22:26:24 +01:00
cabana8471
d1233ad104 fix: Expand Map/Set grep search to entire src/ directory
- Changed grep for "new Map()/new Set()" to search all of src/
- Previously only searched src/lib/, src/store/, src/data/
- Now consistent with other grep patterns that search entire src/
- Applied to both coding_prompt and initializer_prompt templates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 21:29:24 +01:00
cabana8471
cd9f5b76cf fix: Address Leon's review - safer process killing and cross-platform support
Changes:
- Replace pkill -f "node" with port-based killing (lsof -ti :PORT)
  - Safer: only kills dev server, not VS Code/Claude Code/other Node apps
  - More specific: targets exact port instead of all Node processes
- Add Windows alternative commands (commented, for reference)
- Use ${PORT:-3000} variable instead of hardcoded port 3000
- Update health check and API verification to use PORT variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 20:49:35 +01:00
cabana8471
95b0dfac83 fix: Health check now fails script on server startup failure
Changed from warning-only to proper error handling:
- if server doesn't respond after restart, exit with error
- prevents false negatives when server fails to start

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 20:11:06 +01:00
cabana8471
e756486515 fix: Address remaining CodeRabbit feedback
- Escape parentheses in grep patterns: new Map\(\) and new Set\(\)
- Add --include="*.js" to all grep commands for complete coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 11:50:35 +01:00
cabana8471
dae16c3cca fix: Address CodeRabbit review feedback
- Fix math error in category totals (155→165, 255→265)
- Fix example JSON to include [0,1,2,3,4] dependencies for all features
- Add more robust server shutdown (SIGTERM then SIGKILL)
- Add health check after server restart
- Align grep patterns between templates (add .js, testData, TODO/STUB/MOCK)
- Add package.json check for mock backend libraries
- Reference STEP 5.6 instead of duplicating grep commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 11:43:54 +01:00
cabana8471
8e23fee094 fix: Prevent mock data implementations with infrastructure features
Problem:
The coding agent can implement in-memory storage (e.g., `dev-store.ts` with
`globalThis`) instead of a real database. These implementations pass all tests
because data persists during runtime, but data is lost on server restart.

This is a root cause for #68 - agent "passes" features that don't actually work.

Solution:
1. Add 5 mandatory Infrastructure Features (indices 0-4) that run first:
   - Feature 0: Database connection established
   - Feature 1: Database schema applied correctly
   - Feature 2: Data persists across server restart (CRITICAL)
   - Feature 3: No mock data patterns in codebase
   - Feature 4: Backend API queries real database

2. Add STEP 5.7: Server Restart Persistence Test to coding prompt:
   - Create test data, stop server, restart, verify data still exists

3. Extend grep patterns for mock detection in STEP 5.6:
   - globalThis., devStore, dev-store, mockData, fakeData
   - TODO.*real, STUB, MOCK, new Map() as data stores

Changes:
- .claude/templates/initializer_prompt.template.md - Infrastructure features
- .claude/templates/coding_prompt.template.md - STEP 5.6/5.7 enhancements
- .claude/commands/create-spec.md - Phase 3b database question

Backwards Compatible:
- Works with YOLO mode (uses bash/grep, not browser automation)
- Stateless apps can skip database features via create-spec question

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 08:01:30 +01:00
Auto
874359fcf6 improve performance 2026-01-23 14:37:43 +02:00
Auto
b00eef5eca refactor: orchestrator pre-selects features for all agents
Replace agent-initiated feature selection with orchestrator pre-selection
for both coding and testing agents. This ensures Mission Control displays
correct feature numbers for testing agents (previously showed "Feature #0").

Key changes:

MCP Server (mcp_server/feature_mcp.py):
- Add feature_get_by_id tool for agents to fetch assigned feature details
- Remove obsolete tools: feature_get_next, feature_claim_next,
  feature_claim_for_testing, feature_get_for_regression
- Remove helper functions and unused imports (text, OperationalError, func)

Orchestrator (parallel_orchestrator.py):
- Change running_testing_agents from list to dict[int, Popen]
- Add claim_feature_for_testing() with random selection
- Add release_testing_claim() method
- Pass --testing-feature-id to spawned testing agents
- Use unified [Feature #X] output format for both agent types

Agent Entry Points:
- autonomous_agent_demo.py: Add --testing-feature-id CLI argument
- agent.py: Pass testing_feature_id to get_testing_prompt()

Prompt Templates:
- coding_prompt.template.md: Update to use feature_get_by_id
- testing_prompt.template.md: Update workflow for pre-assigned features
- prompts.py: Update pre-claimed headers for both agent types

WebSocket (server/websocket.py):
- Simplify tracking with unified [Feature #X] pattern
- Remove testing-specific parsing code

Assistant (server/services/assistant_chat_session.py):
- Update help text with current available tools

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:24:48 +02:00
Auto
357083dbae feat: decouple regression testing agents from coding agents
Major refactoring of the parallel orchestrator to run regression testing
agents independently from coding agents. This improves system reliability
and provides better control over testing behavior.

Key changes:

Database & MCP Layer:
- Add testing_in_progress and last_tested_at columns to Feature model
- Add feature_claim_for_testing() for atomic test claim with retry
- Add feature_release_testing() to release claims after testing
- Refactor claim functions to iterative loops (no recursion)
- Add OperationalError retry handling for transient DB errors
- Reduce MAX_CLAIM_RETRIES from 10 to 5

Orchestrator:
- Decouple testing agent lifecycle from coding agents
- Add _maintain_testing_agents() for continuous testing maintenance
- Fix TOCTOU race in _spawn_testing_agent() - hold lock during spawn
- Add _cleanup_stale_testing_locks() with 30-min timeout
- Fix log ordering - start_session() before stale flag cleanup
- Add stale testing_in_progress cleanup on startup

Dead Code Removal:
- Remove count_testing_in_concurrency from entire stack (12+ files)
- Remove ineffective with_for_update() from features router

API & UI:
- Pass testing_agent_ratio via CLI to orchestrator
- Update testing prompt template to use new claim/release tools
- Rename UI label to "Regression Agents" with clearer description
- Add process_utils.py for cross-platform process tree management

Testing agents now:
- Run continuously as long as passing features exist
- Can re-test features multiple times to catch regressions
- Are controlled by fixed count (0-3) via testing_agent_ratio setting
- Have atomic claiming to prevent concurrent testing of same feature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:22:48 +02:00
Auto
13128361b0 feat: add dedicated testing agents and enhanced parallel orchestration
Introduce a new testing agent architecture that runs regression tests
independently from coding agents, improving quality assurance in
parallel mode.

Key changes:

Testing Agent System:
- Add testing_prompt.template.md for dedicated testing agent role
- Add feature_mark_failing MCP tool for regression detection
- Add --agent-type flag to select initializer/coding/testing mode
- Remove regression testing from coding prompt (now handled by testing agents)

Parallel Orchestrator Enhancements:
- Add testing agent spawning with configurable ratio (--testing-agent-ratio)
- Add comprehensive debug logging system (DebugLog class)
- Improve database session management to prevent stale reads
- Add engine.dispose() calls to refresh connections after subprocess commits
- Fix f-string linting issues (remove unnecessary f-prefixes)

UI Improvements:
- Add testing agent mascot (Chip) to AgentAvatar
- Enhance AgentCard to display testing agent status
- Add testing agent ratio slider in SettingsModal
- Update WebSocket handling for testing agent updates
- Improve ActivityFeed to show testing agent activity

API & Server Updates:
- Add testing_agent_ratio to settings schema and endpoints
- Update process manager to support testing agent type
- Enhance WebSocket messages for agent_update events

Template Changes:
- Delete coding_prompt_yolo.template.md (consolidated into main prompt)
- Update initializer_prompt.template.md with improved structure
- Streamline coding_prompt.template.md workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 13:49:50 +02:00
Auto
85f6940a54 feat: add concurrent agents with dependency system and delightful UI
Major feature implementation for parallel agent execution with dependency-aware
scheduling and an engaging multi-agent UI experience.

Backend Changes:
- Add parallel_orchestrator.py for concurrent feature processing
- Add api/dependency_resolver.py with cycle detection (Kahn's algorithm + DFS)
- Add atomic feature_claim_next() with retry limit and exponential backoff
- Fix circular dependency check arguments in 4 locations
- Add AgentTracker class for parsing agent output and emitting updates
- Add browser isolation with --isolated flag for Playwright MCP
- Extend WebSocket protocol with agent_update messages and log attribution
- Add WSAgentUpdateMessage schema with agent states and mascot names
- Fix WSProgressMessage to include in_progress field

New UI Components:
- AgentMissionControl: Dashboard showing active agents with collapsible activity
- AgentCard: Individual agent status with avatar and thought bubble
- AgentAvatar: SVG mascots (Spark, Fizz, Octo, Hoot, Buzz) with animations
- ActivityFeed: Recent activity stream with stable keys (no flickering)
- CelebrationOverlay: Confetti animation with click/Escape dismiss
- DependencyGraph: Interactive node graph visualization with dagre layout
- DependencyBadge: Visual indicator for feature dependencies
- ViewToggle: Switch between Kanban and Graph views
- KeyboardShortcutsHelp: Help overlay accessible via ? key

UI/UX Improvements:
- Celebration queue system to handle rapid success messages
- Accessibility attributes on AgentAvatar (role, aria-label, aria-live)
- Collapsible Recent Activity section with persisted preference
- Agent count display in header
- Keyboard shortcut G to toggle Kanban/Graph view
- Real-time thought bubbles and state animations

Bug Fixes:
- Fix circular dependency validation (swapped source/target arguments)
- Add MAX_CLAIM_RETRIES=10 to prevent stack overflow under contention
- Fix THOUGHT_PATTERNS to match actual [Tool: name] format
- Fix ActivityFeed key prop to prevent re-renders on new items
- Add featureId/agentIndex to log messages for proper attribution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 12:59:42 +02:00
Auto
117ca89f08 feat: add configurable CLI command and UI improvements
Add support for alternative CLI commands via CLI_COMMAND environment
variable, allowing users to use CLIs other than 'claude' (e.g., 'glm').
This change affects all server services and the main CLI launcher.

Key changes:

- Configurable CLI command via CLI_COMMAND env var (defaults to 'claude')
- Configurable Playwright headless mode via PLAYWRIGHT_HEADLESS env var
- Pin claude-agent-sdk version to <0.2.0 for stability
- Use tail -500 for progress notes to avoid context overflow
- Add project delete functionality with confirmation dialog
- Replace single-line input with resizable textarea in spec chat
- Add coder agent configuration for code implementation tasks
- Ignore issues/ directory in git

Files modified:
- client.py: CLI command and Playwright headless configuration
- server/main.py, server/services/*: CLI command configuration
- start.py: CLI command configuration and error messages
- .env.example: Document new environment variables
- .gitignore: Ignore issues/ directory
- requirements.txt: Pin SDK version
- .claude/templates/*: Use tail -500 for progress notes
- ui/src/components/ProjectSelector.tsx: Add delete button
- ui/src/components/SpecCreationChat.tsx: Auto-resizing textarea
- ui/src/components/ConfirmDialog.tsx: New reusable dialog
- .claude/agents/coder.md: New coder agent configuration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 13:19:49 +02:00
Auto
05607b310a feat: Add YOLO mode for rapid prototyping without browser testing
Add a new YOLO (You Only Live Once) mode that skips all browser testing
and regression tests for faster feature iteration during prototyping.

Changes made:

**Core YOLO Mode Implementation:**
- Add --yolo CLI flag to autonomous_agent_demo.py
- Update agent.py to accept yolo_mode parameter and select appropriate prompt
- Modify client.py to conditionally include Playwright MCP server (excluded in YOLO mode)
- Add coding_prompt_yolo.template.md with static analysis only verification
- Add get_coding_prompt_yolo() to prompts.py

**Server/API Updates:**
- Add AgentStartRequest schema with yolo_mode field
- Update AgentStatus to include yolo_mode
- Modify process_manager.py to pass --yolo flag to subprocess
- Update agent router to accept yolo_mode in start request

**UI Updates:**
- Add YOLO toggle button (lightning bolt icon) in AgentControl
- Show YOLO mode indicator when agent is running in YOLO mode
- Add useAgentStatus hook to track current mode
- Update startAgent API to accept yoloMode parameter
- Add YOLO toggle in SpecCreationChat completion flow

**Spec Creation Improvements:**
- Fix create-spec.md to properly replace [FEATURE_COUNT] placeholder
- Add REQUIRED FEATURE COUNT section to initializer_prompt.template.md
- Fix spec_chat_session.py to create security settings file for Claude SDK
- Delete app_spec.txt before spec creation to allow fresh creation

**Documentation:**
- Add YOLO mode section to CLAUDE.md with usage examples
- Add checkpoint.md slash command for creating detailed commits

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 08:36:58 +02:00
Auto
981d452134 fix skipping rules 2025-12-31 14:15:04 +02:00
Auto
f1cabd2d49 remove browser code tool 2025-12-30 21:37:23 +02:00
Auto
f180e1933d Add in-progress status tracking for features
Implements feature locking to prevent multiple agent sessions from working
on the same feature simultaneously. This is essential for parallel agent
execution.

Database changes:
- Add `in_progress` boolean column to Feature model
- Add migration function to handle existing databases

MCP Server tools:
- Add `feature_mark_in_progress` - lock feature when starting work
- Add `feature_clear_in_progress` - unlock feature when abandoning
- Update `feature_get_next` to skip in-progress features
- Update `feature_get_stats` to include in_progress count
- Update `feature_mark_passing` and `feature_skip` to clear in_progress

Backend updates:
- Update progress.py to track and display in_progress count
- Update features router to properly categorize in-progress features
- Update WebSocket to broadcast in_progress in progress updates
- Add in_progress to FeatureResponse schema

Frontend updates:
- Add in_progress to TypeScript types (Feature, ProjectStats, WSProgressMessage)
- Update useWebSocket hook to track in_progress state

Prompt template:
- Add instructions for agents to mark features in-progress immediately
- Document new MCP tools in allowed tools section

Also fixes spec_chat_session.py to use absolute project path instead of
relative path for consistency with CLI behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 19:00:49 +02:00
Auto
dd7c1ddd82 init 2025-12-30 11:13:18 +02:00