Set unique PLAYWRIGHT_CLI_SESSION environment variable for each spawned
agent subprocess to prevent concurrent agents from sharing a single
browser instance and interfering with each other's navigation.
- _spawn_coding_agent: session named "coding-{feature_id}"
- _spawn_coding_agent_batch: session named "coding-{primary_id}"
- _spawn_testing_agent: session named "testing-{counter}" using an
incrementing counter (since multiple testing agents can test
overlapping features, feature ID alone isn't sufficient)
Previously, after migrating from Playwright MCP to CLI, all parallel
agents shared the default browser session, causing them to navigate
away from each other's pages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add feature_get_ready, feature_get_blocked, and feature_get_graph to
CODING_AGENT_TOOLS, TESTING_AGENT_TOOLS, and INITIALIZER_AGENT_TOOLS.
These read-only tools were available on the MCP server but blocked by
the allowed_tools lists, causing "blocked/not allowed" errors when
agents tried to query project state.
Fix SettingsModal custom base URL input:
- Remove fallback to current settings value when saving, so empty input
is not silently replaced with the existing URL
- Remove .trim() on the input value to prevent cursor jumping while typing
- Fix "Change" button pre-fill using empty string instead of space
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major changes across 21 files (755 additions, 196 deletions):
Browser Automation Migration:
- Add versioned project migration system (prompts.py) with content-based
detection and section-level regex replacement for coding/testing prompts
- Migrate STEP 5 (browser verification) and BROWSER AUTOMATION sections
in coding prompt template to use playwright-cli commands
- Migrate STEP 2 and AVAILABLE TOOLS sections in testing prompt template
- Migration auto-runs at agent startup (autonomous_agent_demo.py), copies
playwright-cli skill, scaffolds .playwright/cli.config.json, updates
.gitignore, and stamps .migration_version file
- Add playwright-cli command validation to security allowlist (security.py)
with tests for allowed subcommands and blocked eval/run-code
Headless Browser Setting Fix:
- Add _apply_playwright_headless() to process_manager.py that reads/updates
.playwright/cli.config.json before agent subprocess launch
- Remove dead PLAYWRIGHT_HEADLESS env var that was never consumed
- Settings UI toggle now correctly controls visible browser window
Playwright CLI Auto-Install:
- Add ensurePlaywrightCli() to lib/cli.js for npm global entry point
- Add playwright-cli detection + npm install to start.bat, start.sh,
start_ui.bat, start_ui.sh for all startup paths
Other Improvements:
- Add project folder path tooltip to ProjectSelector.tsx dropdown items
- Remove legacy Playwright MCP server configuration from client.py
- Update CLAUDE.md with playwright-cli skill documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Azure Anthropic (Claude)" to API_PROVIDERS in registry.py
with ANTHROPIC_API_KEY auth (required for Claude CLI to route
through custom base URL instead of default Anthropic endpoint)
- Add Azure env var template to .env.example
- Show Base URL input field for Azure provider in Settings UI
with "Configured" state and Azure-specific placeholder
- Widen Settings modal for better readability with long URLs
- Add Azure endpoint detection and "Azure Mode" log label
- Rename misleading "GLM Mode" fallback label to "Alternative API"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address three issues reported after overnight AutoForge runs:
1. ~193GB of .node files in %TEMP% from V8 compile caching
2. Stale npm artifact folders on drive root when %TEMP% fills up
3. PNG screenshot files left in project root by Playwright
Changes:
- Widen .node cleanup glob from ".78912*.node" to ".[0-9a-f]*.node"
to match all V8 compile cache hex prefixes
- Add "node-compile-cache" directory to temp cleanup patterns
- Set NODE_COMPILE_CACHE="" in all subprocess environments (client.py,
parallel_orchestrator.py, process_manager.py) to disable V8 compile
caching at the source
- Add cleanup_project_screenshots() to remove stale .png files from
project directories (feature*-*.png, screenshot-*.png, step-*.png)
- Run cleanup_stale_temp() at server startup in lifespan()
- Add _run_inter_session_cleanup() to orchestrator, called after each
agent completes (both coding and testing paths)
- Update coding and testing prompt templates to instruct agents to use
inline (base64) screenshots only, never saving files to disk
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a draggable resize handle on the left edge of the AI assistant
panel, allowing users to adjust the panel width by clicking and
dragging. Width is persisted to localStorage across sessions.
- Drag handle with hover highlight (border -> primary color)
- Min width 300px, max width 90vw
- Width saved to localStorage under 'assistant-panel-width'
- Cursor changes to col-resize and text selection disabled during drag
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign the header from a single overflowing row into a clean two-row
layout that prevents content from overlapping the logo and bleeding
outside the navbar on smaller screens.
Row 1: Logo + project selector + spacer + mode badges + utility icons
Row 2: Agent controls + dev server + spacer + settings + reset
(only rendered when a project is selected, with a subtle border divider)
Changes:
- App.tsx: Split header into two logical rows with flex spacers for
right-alignment; hide title text below md breakpoint; move mode
badges (Ollama/GLM) to row 1 with sm:hidden for small screens
- ProjectSelector: Responsive min-width (140px mobile, 200px desktop);
truncate long project names instead of pushing icons off-screen
- AgentControl: Responsive gap (gap-2 mobile, gap-4 desktop)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tooltip fixes (PR #177 follow-up):
- Remove duplicate title attr on Settings button that caused double-tooltip
- Restore keyboard shortcut hints in tooltip text: Settings (,), Reset (R)
- Clean up spurious peer markers in package-lock.json
Dev server config dialog:
- Add DevServerConfigDialog component for custom dev commands
- Open config dialog automatically when start fails with "no dev command"
- Add useDevServerConfig/useUpdateDevServerConfig hooks
- Add updateDevServerConfig API function
- Add config gear button next to dev server start
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add interactive multiple-choice question support to the project assistant,
allowing it to present clickable options when clarification is needed.
Backend changes:
- Add ask_user MCP tool to feature_mcp.py with input validation
- Add mcp__features__ask_user to assistant allowed tools list
- Intercept ask_user tool calls in _query_claude() to yield question messages
- Add answer WebSocket message handler in assistant_chat router
- Document ask_user tool in assistant system prompt
Frontend changes:
- Add AssistantChatQuestionMessage type and update server message union
- Add currentQuestions state and sendAnswer() to useAssistantChat hook
- Handle question WebSocket messages by attaching to last assistant message
- Render QuestionOptions component between messages and input area
- Disable text input while structured questions are active
Flow: Claude calls ask_user → backend intercepts → WebSocket question message →
frontend renders QuestionOptions → user clicks options → answer sent back →
Claude receives formatted answer and continues conversation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the custom BOLD_REGEX parser in ChatMessage.tsx with
react-markdown + remark-gfm for proper rendering of headers, tables,
lists, code blocks, blockquotes, links, and horizontal rules in all
chat UIs (AssistantChat, SpecCreationChat, ExpandProjectChat).
Changes:
- Add react-markdown and remark-gfm dependencies
- Add vendor-markdown chunk to Vite manual chunks for code splitting
- Add .chat-prose CSS class with styles for all markdown elements
- Add .chat-prose-user modifier for contrast on primary-colored bubbles
- Replace line-splitting + regex logic with ReactMarkdown component
- Links open in new tabs via custom component override
- System messages remain plain text (unchanged)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When users configured GLM/Ollama/Kimi via the Settings UI, agents still
used Claude because conflicting env vars leaked through subprocess env.
Root cause: get_effective_sdk_env() set ANTHROPIC_AUTH_TOKEN for GLM but
didn't clear ANTHROPIC_API_KEY, which leaked from os.environ. The CLI
prioritized the wrong credential.
Changes:
- registry.py: Clear conflicting auth vars (API_KEY vs AUTH_TOKEN) and
Vertex AI vars when building env for alternative providers
- client.py: Replace manual os.getenv() loop with get_effective_sdk_env()
so agent SDK reads provider settings from the database
- autonomous_agent_demo.py: Apply UI-configured provider settings to
process env so CLI-launched agents also respect Settings UI config
- start.py: Pass --model from settings when launching agent subprocess
- server/schemas.py: Allow non-Claude model names when an alternative
provider is configured (prevents 422 errors for glm-4.7, etc.)
- .env.example: Document env vars for GLM, Ollama, and Kimi providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove legacy env-var-based provider/mode detection that caused misleading
UI badges (e.g., GLM badge showing when Settings was set to Claude).
Key changes:
- Remove _is_glm_mode() and _is_ollama_mode() env-var sniffing functions
from server/routers/settings.py; derive glm_mode/ollama_mode purely from
the api_provider setting
- Remove `import os` from settings router (no longer needed)
- Update schema comments to reflect settings-based derivation
- Remove "(configured via .env)" from badge tooltips in App.tsx
- Remove Kimi/GLM/Ollama/Playwright-headless sections from .env.example;
add note pointing to Settings UI
- Update CLAUDE.md and README.md documentation to reference Settings UI
for alternative provider configuration
- Update model IDs from claude-opus-4-5-20251101 to claude-opus-4-6
across registry, client, chat sessions, tests, and UI defaults
- Add LEGACY_MODEL_MAP with auto-migration in get_all_settings()
- Show model ID subtitle in SettingsModal model selector
- Add Vertex passthrough test for claude-opus-4-6 (no date suffix)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix model selection regression: _get_settings_defaults() now checks
api_model (set by new provider UI) before falling back to legacy
model setting, ensuring Claude model selection works end-to-end
- Add input validation for provider settings: api_base_url must start
with http:// or https:// (max 500 chars), api_auth_token max 500
chars, api_model max 200 chars
- Fix terminal.py misleading import alias: replace
is_valid_project_name aliased as validate_project_name with direct
is_valid_project_name import across all 5 call sites
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
assistant_chat.py and spec_creation.py imported is_valid_project_name
(returns bool) aliased as validate_project_name. When used as
`project_name = validate_project_name(project_name)`, the project name
was replaced with True, causing "Project not found in registry" errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ensures features stuck from a previous crash are reset before
launching a new agent, not just on stop/crash going forward.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The provider refactor moved env building to get_effective_sdk_env(),
making these imports unused. Fixes ruff F401 lint errors in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API Provider Selection:
- Add provider switcher in Settings modal (Claude, Kimi, GLM, Ollama, Custom)
- Auth tokens stored locally only (registry.db), never returned by API
- get_effective_sdk_env() builds provider-specific env vars for agent subprocess
- All chat sessions (spec, expand, assistant) use provider settings
- Backward compatible: defaults to Claude, env vars still work as override
Fix Stuck Features:
- Add _cleanup_stale_features() to process_manager.py
- Reset in_progress features when agent stops, crashes, or fails healthcheck
- Prevents features from being permanently stuck after rate limit crashes
- Uses separate SQLAlchemy engine to avoid session conflicts with subprocess
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All WebSocket endpoints now call websocket.accept() before any
validation checks. Previously, closing the connection before accepting
caused Starlette to return an opaque HTTP 403 instead of a meaningful
error message.
Changes:
- Server: Accept WebSocket first, then send JSON error + close with
4xxx code if validation fails (expand, spec, assistant, terminal,
main project WS)
- Server: ConnectionManager.connect() no longer calls accept() to
avoid double-accept
- UI: Gate expand button and keyboard shortcut on hasSpec
- UI: Skip WebSocket reconnection on application error codes (4000-4999)
- UI: Update keyboard shortcuts help text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 5 WebSocket endpoints (expand, spec, assistant, terminal, project)
were closing the connection before calling accept() when validation
failed. Starlette converts pre-accept close into an HTTP 403, giving
clients no meaningful error information.
Server changes:
- Move websocket.accept() before all validation checks in every WS handler
- Send JSON error message before closing so clients get actionable errors
- Fix validate_project_name usage (raises HTTPException, not returns bool)
- ConnectionManager.connect() no longer calls accept() (caller's job)
Client changes:
- All 3 WS hooks (useWebSocket, useExpandChat, useSpecChat) skip
reconnection on 4xxx close codes (application errors won't self-resolve)
- Gate expand button, keyboard shortcut, and modal on hasSpec
- Add hasSpec to useEffect dependency array to prevent stale closure
- Update keyboard shortcuts help text for E key context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #158 added temp_cleanup.py and its import in autonomous_agent_demo.py
but did not include the file in the package.json "files" array. This
caused ModuleNotFoundError for npm installations since the module was
missing from the published tarball.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address security gaps and improve validation in the dev server command
execution path introduced by PR #153:
Security fixes (critical):
- Add missing shell metacharacters to dangerous_ops blocklist: single &
(Windows cmd.exe command separator), >, <, ^, %, \n, \r
- The single & gap was a confirmed RCE bypass on Windows where .cmd
files are always executed via cmd.exe even with shell=False (CPython
limitation documented in issue #77696)
- Apply validate_custom_command_strict at /start endpoint for
defense-in-depth against config file tampering
Validation improvements:
- Fix uvicorn --flag=value syntax (split on = before comparing)
- Expand Python support: Django (manage.py), Flask, custom .py scripts
- Add runners: flask, poetry, cargo, go, npx
- Expand npm script allowlist: serve, develop, server, preview
- Reorder PATCH /config validation to run strict check first (fail fast)
- Extract constants: ALLOWED_NPM_SCRIPTS, ALLOWED_PYTHON_MODULES,
BLOCKED_SHELLS for reuse and testability
Cleanup:
- Remove unused security.py imports from dev_server_manager.py
- Fix deprecated datetime.utcnow() -> datetime.now(timezone.utc)
- Remove unnecessary _remove_lock() in exception handlers where lock
was never created (Popen failure path)
Tests:
- Add test_devserver_security.py with 78 tests covering valid commands,
blocked shells, blocked commands, injection attempts, dangerous_ops
blocking, and constant verification
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem:
When AutoForge runs agents that use Playwright for browser testing or
mongodb-memory-server for database tests, temporary files accumulate in
the system temp folder (%TEMP% on Windows, /tmp on Linux/macOS). These
files are never cleaned up automatically and can consume hundreds of GB
over time.
Affected temp items:
- playwright_firefoxdev_profile-* (browser profiles)
- playwright-artifacts-* (test artifacts)
- playwright-transform-cache
- mongodb-memory-server* (MongoDB binaries)
- ng-* (Angular CLI temp)
- scoped_dir* (Chrome/Chromium temp)
- .78912*.node (Node.js native module cache, ~7MB each)
- claude-*-cwd (Claude CLI working directory files)
- mat-debug-*.log (Material/Angular debug logs)
Solution:
- New temp_cleanup.py module with cleanup_stale_temp() function
- Called at Maestro (orchestrator) startup in autonomous_agent_demo.py
- Only deletes files/folders older than 1 hour (safe for running processes)
- Runs every time the Play button is clicked or agent auto-restarts
- Reports cleanup stats: dirs deleted, files deleted, MB freed
Why cleanup at Maestro startup:
- Reliable hook point (runs on every agent start, including auto-restart
after rate limits which happens every ~5 hours)
- No need for background timers or scheduled tasks
- Cleanup happens before new temp files are created
Testing:
- Tested on Windows with 958 items in temp folder
- Successfully cleaned 45 dirs, 758 files, freed 415 MB
- Files younger than 1 hour correctly preserved
Closes#155
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>