Add feature_get_ready, feature_get_blocked, and feature_get_graph to
CODING_AGENT_TOOLS, TESTING_AGENT_TOOLS, and INITIALIZER_AGENT_TOOLS.
These read-only tools were available on the MCP server but blocked by
the allowed_tools lists, causing "blocked/not allowed" errors when
agents tried to query project state.
Fix SettingsModal custom base URL input:
- Remove fallback to current settings value when saving, so empty input
is not silently replaced with the existing URL
- Remove .trim() on the input value to prevent cursor jumping while typing
- Fix "Change" button pre-fill using empty string instead of space
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major changes across 21 files (755 additions, 196 deletions):
Browser Automation Migration:
- Add versioned project migration system (prompts.py) with content-based
detection and section-level regex replacement for coding/testing prompts
- Migrate STEP 5 (browser verification) and BROWSER AUTOMATION sections
in coding prompt template to use playwright-cli commands
- Migrate STEP 2 and AVAILABLE TOOLS sections in testing prompt template
- Migration auto-runs at agent startup (autonomous_agent_demo.py), copies
playwright-cli skill, scaffolds .playwright/cli.config.json, updates
.gitignore, and stamps .migration_version file
- Add playwright-cli command validation to security allowlist (security.py)
with tests for allowed subcommands and blocked eval/run-code
Headless Browser Setting Fix:
- Add _apply_playwright_headless() to process_manager.py that reads/updates
.playwright/cli.config.json before agent subprocess launch
- Remove dead PLAYWRIGHT_HEADLESS env var that was never consumed
- Settings UI toggle now correctly controls visible browser window
Playwright CLI Auto-Install:
- Add ensurePlaywrightCli() to lib/cli.js for npm global entry point
- Add playwright-cli detection + npm install to start.bat, start.sh,
start_ui.bat, start_ui.sh for all startup paths
Other Improvements:
- Add project folder path tooltip to ProjectSelector.tsx dropdown items
- Remove legacy Playwright MCP server configuration from client.py
- Update CLAUDE.md with playwright-cli skill documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Azure Anthropic (Claude)" to API_PROVIDERS in registry.py
with ANTHROPIC_API_KEY auth (required for Claude CLI to route
through custom base URL instead of default Anthropic endpoint)
- Add Azure env var template to .env.example
- Show Base URL input field for Azure provider in Settings UI
with "Configured" state and Azure-specific placeholder
- Widen Settings modal for better readability with long URLs
- Add Azure endpoint detection and "Azure Mode" log label
- Rename misleading "GLM Mode" fallback label to "Alternative API"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address three issues reported after overnight AutoForge runs:
1. ~193GB of .node files in %TEMP% from V8 compile caching
2. Stale npm artifact folders on drive root when %TEMP% fills up
3. PNG screenshot files left in project root by Playwright
Changes:
- Widen .node cleanup glob from ".78912*.node" to ".[0-9a-f]*.node"
to match all V8 compile cache hex prefixes
- Add "node-compile-cache" directory to temp cleanup patterns
- Set NODE_COMPILE_CACHE="" in all subprocess environments (client.py,
parallel_orchestrator.py, process_manager.py) to disable V8 compile
caching at the source
- Add cleanup_project_screenshots() to remove stale .png files from
project directories (feature*-*.png, screenshot-*.png, step-*.png)
- Run cleanup_stale_temp() at server startup in lifespan()
- Add _run_inter_session_cleanup() to orchestrator, called after each
agent completes (both coding and testing paths)
- Update coding and testing prompt templates to instruct agents to use
inline (base64) screenshots only, never saving files to disk
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a draggable resize handle on the left edge of the AI assistant
panel, allowing users to adjust the panel width by clicking and
dragging. Width is persisted to localStorage across sessions.
- Drag handle with hover highlight (border -> primary color)
- Min width 300px, max width 90vw
- Width saved to localStorage under 'assistant-panel-width'
- Cursor changes to col-resize and text selection disabled during drag
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign the header from a single overflowing row into a clean two-row
layout that prevents content from overlapping the logo and bleeding
outside the navbar on smaller screens.
Row 1: Logo + project selector + spacer + mode badges + utility icons
Row 2: Agent controls + dev server + spacer + settings + reset
(only rendered when a project is selected, with a subtle border divider)
Changes:
- App.tsx: Split header into two logical rows with flex spacers for
right-alignment; hide title text below md breakpoint; move mode
badges (Ollama/GLM) to row 1 with sm:hidden for small screens
- ProjectSelector: Responsive min-width (140px mobile, 200px desktop);
truncate long project names instead of pushing icons off-screen
- AgentControl: Responsive gap (gap-2 mobile, gap-4 desktop)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tooltip fixes (PR #177 follow-up):
- Remove duplicate title attr on Settings button that caused double-tooltip
- Restore keyboard shortcut hints in tooltip text: Settings (,), Reset (R)
- Clean up spurious peer markers in package-lock.json
Dev server config dialog:
- Add DevServerConfigDialog component for custom dev commands
- Open config dialog automatically when start fails with "no dev command"
- Add useDevServerConfig/useUpdateDevServerConfig hooks
- Add updateDevServerConfig API function
- Add config gear button next to dev server start
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add interactive multiple-choice question support to the project assistant,
allowing it to present clickable options when clarification is needed.
Backend changes:
- Add ask_user MCP tool to feature_mcp.py with input validation
- Add mcp__features__ask_user to assistant allowed tools list
- Intercept ask_user tool calls in _query_claude() to yield question messages
- Add answer WebSocket message handler in assistant_chat router
- Document ask_user tool in assistant system prompt
Frontend changes:
- Add AssistantChatQuestionMessage type and update server message union
- Add currentQuestions state and sendAnswer() to useAssistantChat hook
- Handle question WebSocket messages by attaching to last assistant message
- Render QuestionOptions component between messages and input area
- Disable text input while structured questions are active
Flow: Claude calls ask_user → backend intercepts → WebSocket question message →
frontend renders QuestionOptions → user clicks options → answer sent back →
Claude receives formatted answer and continues conversation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the custom BOLD_REGEX parser in ChatMessage.tsx with
react-markdown + remark-gfm for proper rendering of headers, tables,
lists, code blocks, blockquotes, links, and horizontal rules in all
chat UIs (AssistantChat, SpecCreationChat, ExpandProjectChat).
Changes:
- Add react-markdown and remark-gfm dependencies
- Add vendor-markdown chunk to Vite manual chunks for code splitting
- Add .chat-prose CSS class with styles for all markdown elements
- Add .chat-prose-user modifier for contrast on primary-colored bubbles
- Replace line-splitting + regex logic with ReactMarkdown component
- Links open in new tabs via custom component override
- System messages remain plain text (unchanged)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When users configured GLM/Ollama/Kimi via the Settings UI, agents still
used Claude because conflicting env vars leaked through subprocess env.
Root cause: get_effective_sdk_env() set ANTHROPIC_AUTH_TOKEN for GLM but
didn't clear ANTHROPIC_API_KEY, which leaked from os.environ. The CLI
prioritized the wrong credential.
Changes:
- registry.py: Clear conflicting auth vars (API_KEY vs AUTH_TOKEN) and
Vertex AI vars when building env for alternative providers
- client.py: Replace manual os.getenv() loop with get_effective_sdk_env()
so agent SDK reads provider settings from the database
- autonomous_agent_demo.py: Apply UI-configured provider settings to
process env so CLI-launched agents also respect Settings UI config
- start.py: Pass --model from settings when launching agent subprocess
- server/schemas.py: Allow non-Claude model names when an alternative
provider is configured (prevents 422 errors for glm-4.7, etc.)
- .env.example: Document env vars for GLM, Ollama, and Kimi providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove legacy env-var-based provider/mode detection that caused misleading
UI badges (e.g., GLM badge showing when Settings was set to Claude).
Key changes:
- Remove _is_glm_mode() and _is_ollama_mode() env-var sniffing functions
from server/routers/settings.py; derive glm_mode/ollama_mode purely from
the api_provider setting
- Remove `import os` from settings router (no longer needed)
- Update schema comments to reflect settings-based derivation
- Remove "(configured via .env)" from badge tooltips in App.tsx
- Remove Kimi/GLM/Ollama/Playwright-headless sections from .env.example;
add note pointing to Settings UI
- Update CLAUDE.md and README.md documentation to reference Settings UI
for alternative provider configuration
- Update model IDs from claude-opus-4-5-20251101 to claude-opus-4-6
across registry, client, chat sessions, tests, and UI defaults
- Add LEGACY_MODEL_MAP with auto-migration in get_all_settings()
- Show model ID subtitle in SettingsModal model selector
- Add Vertex passthrough test for claude-opus-4-6 (no date suffix)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix model selection regression: _get_settings_defaults() now checks
api_model (set by new provider UI) before falling back to legacy
model setting, ensuring Claude model selection works end-to-end
- Add input validation for provider settings: api_base_url must start
with http:// or https:// (max 500 chars), api_auth_token max 500
chars, api_model max 200 chars
- Fix terminal.py misleading import alias: replace
is_valid_project_name aliased as validate_project_name with direct
is_valid_project_name import across all 5 call sites
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
assistant_chat.py and spec_creation.py imported is_valid_project_name
(returns bool) aliased as validate_project_name. When used as
`project_name = validate_project_name(project_name)`, the project name
was replaced with True, causing "Project not found in registry" errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ensures features stuck from a previous crash are reset before
launching a new agent, not just on stop/crash going forward.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The provider refactor moved env building to get_effective_sdk_env(),
making these imports unused. Fixes ruff F401 lint errors in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API Provider Selection:
- Add provider switcher in Settings modal (Claude, Kimi, GLM, Ollama, Custom)
- Auth tokens stored locally only (registry.db), never returned by API
- get_effective_sdk_env() builds provider-specific env vars for agent subprocess
- All chat sessions (spec, expand, assistant) use provider settings
- Backward compatible: defaults to Claude, env vars still work as override
Fix Stuck Features:
- Add _cleanup_stale_features() to process_manager.py
- Reset in_progress features when agent stops, crashes, or fails healthcheck
- Prevents features from being permanently stuck after rate limit crashes
- Uses separate SQLAlchemy engine to avoid session conflicts with subprocess
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All WebSocket endpoints now call websocket.accept() before any
validation checks. Previously, closing the connection before accepting
caused Starlette to return an opaque HTTP 403 instead of a meaningful
error message.
Changes:
- Server: Accept WebSocket first, then send JSON error + close with
4xxx code if validation fails (expand, spec, assistant, terminal,
main project WS)
- Server: ConnectionManager.connect() no longer calls accept() to
avoid double-accept
- UI: Gate expand button and keyboard shortcut on hasSpec
- UI: Skip WebSocket reconnection on application error codes (4000-4999)
- UI: Update keyboard shortcuts help text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 5 WebSocket endpoints (expand, spec, assistant, terminal, project)
were closing the connection before calling accept() when validation
failed. Starlette converts pre-accept close into an HTTP 403, giving
clients no meaningful error information.
Server changes:
- Move websocket.accept() before all validation checks in every WS handler
- Send JSON error message before closing so clients get actionable errors
- Fix validate_project_name usage (raises HTTPException, not returns bool)
- ConnectionManager.connect() no longer calls accept() (caller's job)
Client changes:
- All 3 WS hooks (useWebSocket, useExpandChat, useSpecChat) skip
reconnection on 4xxx close codes (application errors won't self-resolve)
- Gate expand button, keyboard shortcut, and modal on hasSpec
- Add hasSpec to useEffect dependency array to prevent stale closure
- Update keyboard shortcuts help text for E key context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #158 added temp_cleanup.py and its import in autonomous_agent_demo.py
but did not include the file in the package.json "files" array. This
caused ModuleNotFoundError for npm installations since the module was
missing from the published tarball.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address security gaps and improve validation in the dev server command
execution path introduced by PR #153:
Security fixes (critical):
- Add missing shell metacharacters to dangerous_ops blocklist: single &
(Windows cmd.exe command separator), >, <, ^, %, \n, \r
- The single & gap was a confirmed RCE bypass on Windows where .cmd
files are always executed via cmd.exe even with shell=False (CPython
limitation documented in issue #77696)
- Apply validate_custom_command_strict at /start endpoint for
defense-in-depth against config file tampering
Validation improvements:
- Fix uvicorn --flag=value syntax (split on = before comparing)
- Expand Python support: Django (manage.py), Flask, custom .py scripts
- Add runners: flask, poetry, cargo, go, npx
- Expand npm script allowlist: serve, develop, server, preview
- Reorder PATCH /config validation to run strict check first (fail fast)
- Extract constants: ALLOWED_NPM_SCRIPTS, ALLOWED_PYTHON_MODULES,
BLOCKED_SHELLS for reuse and testability
Cleanup:
- Remove unused security.py imports from dev_server_manager.py
- Fix deprecated datetime.utcnow() -> datetime.now(timezone.utc)
- Remove unnecessary _remove_lock() in exception handlers where lock
was never created (Popen failure path)
Tests:
- Add test_devserver_security.py with 78 tests covering valid commands,
blocked shells, blocked commands, injection attempts, dangerous_ops
blocking, and constant verification
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem:
When AutoForge runs agents that use Playwright for browser testing or
mongodb-memory-server for database tests, temporary files accumulate in
the system temp folder (%TEMP% on Windows, /tmp on Linux/macOS). These
files are never cleaned up automatically and can consume hundreds of GB
over time.
Affected temp items:
- playwright_firefoxdev_profile-* (browser profiles)
- playwright-artifacts-* (test artifacts)
- playwright-transform-cache
- mongodb-memory-server* (MongoDB binaries)
- ng-* (Angular CLI temp)
- scoped_dir* (Chrome/Chromium temp)
- .78912*.node (Node.js native module cache, ~7MB each)
- claude-*-cwd (Claude CLI working directory files)
- mat-debug-*.log (Material/Angular debug logs)
Solution:
- New temp_cleanup.py module with cleanup_stale_temp() function
- Called at Maestro (orchestrator) startup in autonomous_agent_demo.py
- Only deletes files/folders older than 1 hour (safe for running processes)
- Runs every time the Play button is clicked or agent auto-restarts
- Reports cleanup stats: dirs deleted, files deleted, MB freed
Why cleanup at Maestro startup:
- Reliable hook point (runs on every agent start, including auto-restart
after rate limits which happens every ~5 hours)
- No need for background timers or scheduled tasks
- Cleanup happens before new temp files are created
Testing:
- Tested on Windows with 958 items in temp folder
- Successfully cleaned 45 dirs, 758 files, freed 415 MB
- Files younger than 1 hour correctly preserved
Closes#155
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>