When users configured GLM/Ollama/Kimi via the Settings UI, agents still
used Claude because conflicting env vars leaked through subprocess env.
Root cause: get_effective_sdk_env() set ANTHROPIC_AUTH_TOKEN for GLM but
didn't clear ANTHROPIC_API_KEY, which leaked from os.environ. The CLI
prioritized the wrong credential.
Changes:
- registry.py: Clear conflicting auth vars (API_KEY vs AUTH_TOKEN) and
Vertex AI vars when building env for alternative providers
- client.py: Replace manual os.getenv() loop with get_effective_sdk_env()
so agent SDK reads provider settings from the database
- autonomous_agent_demo.py: Apply UI-configured provider settings to
process env so CLI-launched agents also respect Settings UI config
- start.py: Pass --model from settings when launching agent subprocess
- server/schemas.py: Allow non-Claude model names when an alternative
provider is configured (prevents 422 errors for glm-4.7, etc.)
- .env.example: Document env vars for GLM, Ollama, and Kimi providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove legacy env-var-based provider/mode detection that caused misleading
UI badges (e.g., GLM badge showing when Settings was set to Claude).
Key changes:
- Remove _is_glm_mode() and _is_ollama_mode() env-var sniffing functions
from server/routers/settings.py; derive glm_mode/ollama_mode purely from
the api_provider setting
- Remove `import os` from settings router (no longer needed)
- Update schema comments to reflect settings-based derivation
- Remove "(configured via .env)" from badge tooltips in App.tsx
- Remove Kimi/GLM/Ollama/Playwright-headless sections from .env.example;
add note pointing to Settings UI
- Update CLAUDE.md and README.md documentation to reference Settings UI
for alternative provider configuration
- Update model IDs from claude-opus-4-5-20251101 to claude-opus-4-6
across registry, client, chat sessions, tests, and UI defaults
- Add LEGACY_MODEL_MAP with auto-migration in get_all_settings()
- Show model ID subtitle in SettingsModal model selector
- Add Vertex passthrough test for claude-opus-4-6 (no date suffix)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix model selection regression: _get_settings_defaults() now checks
api_model (set by new provider UI) before falling back to legacy
model setting, ensuring Claude model selection works end-to-end
- Add input validation for provider settings: api_base_url must start
with http:// or https:// (max 500 chars), api_auth_token max 500
chars, api_model max 200 chars
- Fix terminal.py misleading import alias: replace
is_valid_project_name aliased as validate_project_name with direct
is_valid_project_name import across all 5 call sites
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
assistant_chat.py and spec_creation.py imported is_valid_project_name
(returns bool) aliased as validate_project_name. When used as
`project_name = validate_project_name(project_name)`, the project name
was replaced with True, causing "Project not found in registry" errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ensures features stuck from a previous crash are reset before
launching a new agent, not just on stop/crash going forward.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The provider refactor moved env building to get_effective_sdk_env(),
making these imports unused. Fixes ruff F401 lint errors in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API Provider Selection:
- Add provider switcher in Settings modal (Claude, Kimi, GLM, Ollama, Custom)
- Auth tokens stored locally only (registry.db), never returned by API
- get_effective_sdk_env() builds provider-specific env vars for agent subprocess
- All chat sessions (spec, expand, assistant) use provider settings
- Backward compatible: defaults to Claude, env vars still work as override
Fix Stuck Features:
- Add _cleanup_stale_features() to process_manager.py
- Reset in_progress features when agent stops, crashes, or fails healthcheck
- Prevents features from being permanently stuck after rate limit crashes
- Uses separate SQLAlchemy engine to avoid session conflicts with subprocess
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All WebSocket endpoints now call websocket.accept() before any
validation checks. Previously, closing the connection before accepting
caused Starlette to return an opaque HTTP 403 instead of a meaningful
error message.
Changes:
- Server: Accept WebSocket first, then send JSON error + close with
4xxx code if validation fails (expand, spec, assistant, terminal,
main project WS)
- Server: ConnectionManager.connect() no longer calls accept() to
avoid double-accept
- UI: Gate expand button and keyboard shortcut on hasSpec
- UI: Skip WebSocket reconnection on application error codes (4000-4999)
- UI: Update keyboard shortcuts help text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 5 WebSocket endpoints (expand, spec, assistant, terminal, project)
were closing the connection before calling accept() when validation
failed. Starlette converts pre-accept close into an HTTP 403, giving
clients no meaningful error information.
Server changes:
- Move websocket.accept() before all validation checks in every WS handler
- Send JSON error message before closing so clients get actionable errors
- Fix validate_project_name usage (raises HTTPException, not returns bool)
- ConnectionManager.connect() no longer calls accept() (caller's job)
Client changes:
- All 3 WS hooks (useWebSocket, useExpandChat, useSpecChat) skip
reconnection on 4xxx close codes (application errors won't self-resolve)
- Gate expand button, keyboard shortcut, and modal on hasSpec
- Add hasSpec to useEffect dependency array to prevent stale closure
- Update keyboard shortcuts help text for E key context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #158 added temp_cleanup.py and its import in autonomous_agent_demo.py
but did not include the file in the package.json "files" array. This
caused ModuleNotFoundError for npm installations since the module was
missing from the published tarball.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address security gaps and improve validation in the dev server command
execution path introduced by PR #153:
Security fixes (critical):
- Add missing shell metacharacters to dangerous_ops blocklist: single &
(Windows cmd.exe command separator), >, <, ^, %, \n, \r
- The single & gap was a confirmed RCE bypass on Windows where .cmd
files are always executed via cmd.exe even with shell=False (CPython
limitation documented in issue #77696)
- Apply validate_custom_command_strict at /start endpoint for
defense-in-depth against config file tampering
Validation improvements:
- Fix uvicorn --flag=value syntax (split on = before comparing)
- Expand Python support: Django (manage.py), Flask, custom .py scripts
- Add runners: flask, poetry, cargo, go, npx
- Expand npm script allowlist: serve, develop, server, preview
- Reorder PATCH /config validation to run strict check first (fail fast)
- Extract constants: ALLOWED_NPM_SCRIPTS, ALLOWED_PYTHON_MODULES,
BLOCKED_SHELLS for reuse and testability
Cleanup:
- Remove unused security.py imports from dev_server_manager.py
- Fix deprecated datetime.utcnow() -> datetime.now(timezone.utc)
- Remove unnecessary _remove_lock() in exception handlers where lock
was never created (Popen failure path)
Tests:
- Add test_devserver_security.py with 78 tests covering valid commands,
blocked shells, blocked commands, injection attempts, dangerous_ops
blocking, and constant verification
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem:
When AutoForge runs agents that use Playwright for browser testing or
mongodb-memory-server for database tests, temporary files accumulate in
the system temp folder (%TEMP% on Windows, /tmp on Linux/macOS). These
files are never cleaned up automatically and can consume hundreds of GB
over time.
Affected temp items:
- playwright_firefoxdev_profile-* (browser profiles)
- playwright-artifacts-* (test artifacts)
- playwright-transform-cache
- mongodb-memory-server* (MongoDB binaries)
- ng-* (Angular CLI temp)
- scoped_dir* (Chrome/Chromium temp)
- .78912*.node (Node.js native module cache, ~7MB each)
- claude-*-cwd (Claude CLI working directory files)
- mat-debug-*.log (Material/Angular debug logs)
Solution:
- New temp_cleanup.py module with cleanup_stale_temp() function
- Called at Maestro (orchestrator) startup in autonomous_agent_demo.py
- Only deletes files/folders older than 1 hour (safe for running processes)
- Runs every time the Play button is clicked or agent auto-restarts
- Reports cleanup stats: dirs deleted, files deleted, MB freed
Why cleanup at Maestro startup:
- Reliable hook point (runs on every agent start, including auto-restart
after rate limits which happens every ~5 hours)
- No need for background timers or scheduled tasks
- Cleanup happens before new temp files are created
Testing:
- Tested on Windows with 958 items in temp folder
- Successfully cleaned 45 dirs, 758 files, freed 415 MB
- Files younger than 1 hour correctly preserved
Closes#155
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove embedded documentation system (18 files) from main UI:
- Delete ui/src/components/docs/ (DocsPage, DocsContent, DocsSidebar,
DocsSearch, docsData, and all 13 section components)
- Delete ui/src/hooks/useHashRoute.ts (only used for docs routing)
- Simplify ui/src/main.tsx: remove Router component, render App directly
inside QueryClientProvider (no more hash-based routing)
- Update docs button in App.tsx header to open https://autoforge.cc in
a new tab instead of navigating to #/docs hash route
- Add logo to header
- Add temp-docs/ to .gitignore
- Update CLAUDE.md with current architecture documentation
The documentation has been extracted into a separate repository and
deployed as a standalone Vite + React site at https://autoforge.cc.
This reduces the main UI bundle and decouples docs from app releases.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a Node.js CLI wrapper that allows installing AutoForge globally via
`npm install -g autoforge-ai` and running it with a single `autoforge`
command. The CLI handles Python detection, venv management, config
loading, and uvicorn server lifecycle automatically.
New files:
- package.json: npm package config with bin entry, files whitelist,
and prepublishOnly script that builds the UI
- bin/autoforge.js: thin entry point that imports lib/cli.js
- lib/cli.js: main CLI module (~790 lines) with cross-platform Python
3.11+ detection, composite venv marker for smart invalidation
(requirements hash + Python version + path), .env config management
at ~/.autoforge/.env, server startup with PID file and port detection,
and signal handling with process tree cleanup
- requirements-prod.txt: runtime-only deps (excludes ruff, mypy, pytest)
- .npmignore: excludes dev files, tests, __pycache__, UI source
Modified files:
- ui/package.json: rename to autoforge-ui to avoid confusion with root
- .gitignore: add *.tgz for npm pack output
- README.md: add npm install as primary quick start method, document
CLI commands, add Ollama/Vertex AI config sections, new troubleshooting
entries for Python/venv issues
- GettingStarted.tsx: add Installation, Quick Start, and CLI Commands
sections to in-app documentation with command reference table
- docsData.ts: add installation and cli-commands sidebar entries
Published as autoforge-ai@0.1.0 on npm.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add max-h-[calc(100vh-2rem)] and overflow-y-auto to the shared
DialogContent component so modals scroll vertically when their
content exceeds the viewport height (e.g., Settings modal when
browser is zoomed in).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Complete project rebrand from AutoCoder to AutoForge, touching 62 files
across Python backend, FastAPI server, React UI, documentation, config,
and CI/CD.
Key changes:
- Rename autocoder_paths.py -> autoforge_paths.py with backward-compat
migration from .autocoder/ -> .autoforge/ directories
- Update registry.py to migrate ~/.autocoder/ -> ~/.autoforge/ global
config directory with fallback support
- Update security.py with fallback reads from legacy .autocoder/ paths
- Rename .claude/commands and skills from gsd-to-autocoder-spec to
gsd-to-autoforge-spec
- Update all Python modules: client, prompts, progress, agent,
orchestrator, server routers and services
- Update React UI: package.json name, index.html title, localStorage
keys, all documentation sections, component references
- Update start scripts (bat/sh/py), examples, and .env.example
- Update CLAUDE.md and README.md with new branding and paths
- Update test files for new .autoforge/ directory structure
- Transfer git remote from leonvanzyl/autocoder to
AutoForgeAI/autoforge
Backward compatibility preserved: legacy .autocoder/ directories are
auto-detected and migrated on next agent start. Config fallback chain
checks both new and old paths.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The parallel orchestrator was using AUTOCODER_ROOT as the working
directory when spawning coding, batch, and testing agent subprocesses.
This caused the Claude Code CLI to create .claude/ and .claude_worktrees/
directories in the autocoder installation folder instead of the project
directory, scattering output files across multiple locations.
Changed all 3 subprocess spawn sites (coding agent, batch agent, testing
agent) to use self.project_dir as cwd, matching the behavior of the
server's process_manager.py. The subprocess commands already use absolute
paths to autonomous_agent_demo.py, so Python imports are unaffected.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The output reader was stamping every line with the primary feature ID
(e.g., [Feature #24]) even after the agent claimed a new feature in the
batch. Now parses feature_claim_and_get calls in the output stream and
switches the prefix to the newly claimed feature ID, so logs correctly
show [Feature #30] once the agent moves on.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable the orchestrator to assign 1-3 features per coding agent subprocess,
selected via dependency chain extension + same-category fill. This reduces
cold-start overhead and leverages shared context across related features.
Orchestrator (parallel_orchestrator.py):
- Add batch tracking: _batch_features and _feature_to_primary data structures
- Add build_feature_batches() with dependency chain + category fill algorithm
- Add start_feature_batch() and _spawn_coding_agent_batch() methods
- Update _on_agent_complete() for batch cleanup across all features
- Update stop_feature() with _feature_to_primary lookup
- Update get_ready_features() to exclude all batch feature IDs
- Update main loop to build batches then spawn per available slot
CLI and agent layer:
- Add --feature-ids (comma-separated) and --batch-size CLI args
- Add feature_ids parameter to run_autonomous_agent() with batch prompt selection
- Add get_batch_feature_prompt() with sequential workflow instructions
WebSocket layer (server/websocket.py):
- Add BATCH_CODING_AGENT_START_PATTERN and BATCH_FEATURES_COMPLETE_PATTERN
- Add _handle_batch_agent_start() and _handle_batch_agent_complete() methods
- Add featureIds field to all agent_update messages
- Track current_feature_id updates as agent moves through batch
Frontend (React UI):
- Add featureIds to ActiveAgent and WSAgentUpdateMessage types
- Update KanbanColumn and DependencyGraph agent-feature maps for batch
- Update AgentCard to show "Batch: #X, #Y, #Z" with active feature highlight
- Add "Features per Agent" segmented control (1-3) in SettingsModal
Settings integration (full stack):
- Add batch_size to schemas, settings router, agent router, process manager
- Default batch_size=3, user-configurable 1-3 via settings UI
- batch_size=1 is functionally identical to pre-batching behavior
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Redesign ProgressDashboard from tall stacked layout to compact inline:
title/badge left, passing/total right, progress bar with percentage below
- Absorb AgentThought functionality directly into ProgressDashboard,
showing the agent's current thought below the progress bar
- Remove standalone AgentThought usage from App.tsx (component now unused)
- Pass logs/agentStatus to ProgressDashboard in single-agent mode only
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace the PLAYWRIGHT_HEADLESS environment variable with a global
setting toggle in the Settings modal. The setting is persisted in the
registry DB and injected as an env var into agent subprocesses, so
client.py reads it unchanged.
Backend:
- Add playwright_headless field to SettingsResponse/SettingsUpdate schemas
- Read/write the setting in settings router via existing _parse_bool helper
- Pass playwright_headless from agent router through to process manager
- Inject PLAYWRIGHT_HEADLESS env var into subprocess environment
Frontend:
- Add playwright_headless to Settings/SettingsUpdate TypeScript types
- Add "Headless Browser" Switch toggle below YOLO mode in SettingsModal
- Add default value to DEFAULT_SETTINGS in useProjects
Also fix CSS build warning: change @import url("tw-animate-css") to bare
@import "tw-animate-css" so Tailwind v4 inlines it during compilation
instead of leaving it for Vite/Lightning CSS post-processing.
Remove stale summary.md from previous refactoring session.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add centralized path resolution module (autocoder_paths.py) that
consolidates all autocoder-generated file paths behind a dual-path
strategy: check .autocoder/X first, fall back to root-level X for
backward compatibility, default to .autocoder/X for new projects.
Key changes:
- New autocoder_paths.py with dual-path resolution for features.db,
assistant.db, lock files, settings, prompts dir, and progress cache
- migrate_project_layout() safely moves old-layout projects to new
layout with SQLite WAL flush and integrity verification
- Updated 22 files to delegate path construction to autocoder_paths
- Reset/delete logic cleans both old and new file locations
- Orphan lock cleanup checks both locations per project
- Migration called automatically at agent start in autonomous_agent_demo.py
- Updated markdown commands/skills to reference .autocoder/prompts/
- CLAUDE.md documentation updated with new project structure
Files at project root that remain unchanged:
- CLAUDE.md (Claude SDK reads from cwd via setting_sources=["project"])
- app_spec.txt root copy (agent templates reference it via cat)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Narrow `\boverloaded\b` regex to require server/api/system context,
preventing false positives when Claude discusses method/operator
overloading in OOP code (C++, Java, C#, etc.)
- Restore 24-hour cap for absolute reset-time delays instead of 1-hour
clamp, avoiding unnecessary retry loops when rate limits reset hours
in the future
- Add test for Retry-After: 0 returning 0 (regression lock for the
`is not None` fix)
- Add false positive tests for "overloaded" in programming context
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The onComplete handler in the empty Kanban spec creation flow only
logged errors to console.error, leaving users with no feedback when
the agent failed to start. This wires up the SpecCreationChat
component's built-in error UI (spinner, error banner, retry button)
via initializerStatus, initializerError, and onRetryInitializer props.
Changes:
- Add InitializerStatus type and specInitializerStatus/Error state
- Set status to 'starting' before startAgent call (shows spinner)
- On error, keep spec chat open so the error banner is visible
- On success, close chat and refresh queries (same as before)
- Wire up onRetryInitializer to reset state to idle
- Reset initializer status on cancel/exit for clean re-entry
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>