fix: address PR #184 review findings for blocked-for-human-input feature

A) Graph view: add needs_human_input bucket to handleGraphNodeClick so clicking blocked nodes opens the feature modal B) MCP validation: validate field type enum, require options for select, enforce unique non-empty field IDs and labels C) Progress fallback: include needs_human_input in non-WebSocket total D) WebSocket: track needs_human_input count in progress state E) Cleanup guard: remove unnecessary needs_human_input check in _cleanup_stale_features (resolved via merge conflict) F) Defensive SQL: require in_progress=1 in feature_request_human_input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge remote-tracking branch 'origin/master' into feature/blocked-for-human-input
2026-03-21 21:03:08 +00:00 · 2026-02-12 07:36:48 +02:00 · 2026-02-12 07:36:11 +02:00 · 2026-02-12 07:28:37 +02:00 · 2026-02-12 07:22:01 +02:00 · 2026-02-11 18:48:44 +02:00
173 changed files with 19974 additions and 8337 deletions
--- a/.claude/agents/coder.md
+++ b/.claude/agents/coder.md
@@ -97,7 +97,7 @@ Fix ALL issues before considering the implementation complete. Never leave linti

 ## Project-Specific Context

-For this project (autocoder):
+For this project (autoforge):
 - **Python Backend**: Uses SQLAlchemy, FastAPI, follows patterns in `api/`, `mcp_server/`
 - **React UI**: Uses React 18, TypeScript, TanStack Query, Tailwind CSS v4, Radix UI
 - **Design System**: Neobrutalism style with specific color tokens and animations
--- a/.claude/commands/create-spec.md
+++ b/.claude/commands/create-spec.md
@@ -8,7 +8,7 @@ This command **requires** the project directory as an argument via `$ARGUMENTS`.

 **Example:** `/create-spec generations/my-app`

-**Output location:** `$ARGUMENTS/prompts/app_spec.txt` and `$ARGUMENTS/prompts/initializer_prompt.md`
+**Output location:** `$ARGUMENTS/.autoforge/prompts/app_spec.txt` and `$ARGUMENTS/.autoforge/prompts/initializer_prompt.md`

 If `$ARGUMENTS` is empty, inform the user they must provide a project path and exit.

@@ -95,6 +95,27 @@ Ask the user about their involvement preference:

 **For Detailed Mode users**, ask specific tech questions about frontend, backend, database, etc.

+### Phase 3b: Database Requirements (MANDATORY)
+
+**Always ask this question regardless of mode:**
+
+> "One foundational question about data storage:
+>
+> **Does this application need to store user data persistently?**
+>
+> 1. **Yes, needs a database** - Users create, save, and retrieve data (most apps)
+> 2. **No, stateless** - Pure frontend, no data storage needed (calculators, static sites)
+> 3. **Not sure** - Let me describe what I need and you decide"
+
+**Branching logic:**
+
+- **If "Yes" or "Not sure"**: Continue normally. The spec will include database in tech stack and the initializer will create 5 mandatory Infrastructure features (indices 0-4) to verify database connectivity and persistence.
+
+- **If "No, stateless"**: Note this in the spec. Skip database from tech stack. Infrastructure features will be simplified (no database persistence tests). Mark this clearly:
+  ```xml
+  <database>none - stateless application</database>
+  ```
+
 ## Phase 4: Features (THE MAIN PHASE)

 This is where you spend most of your time. Ask questions in plain language that anyone can answer.
@@ -207,12 +228,23 @@ After gathering all features, **you** (the agent) should tally up the testable f

 **Typical ranges for reference:**

- **Simple apps** (todo list, calculator, notes): ~20-50 features
- **Medium apps** (blog, task manager with auth): ~100 features
- **Advanced apps** (e-commerce, CRM, full SaaS): ~150-200 features
+- **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
+- **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
+- **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)

 These are just reference points - your actual count should come from the requirements discussed.

+**MANDATORY: Infrastructure Features**
+
+If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
+1. Database connection established
+2. Database schema applied correctly
+3. Data persists across server restart
+4. No mock data patterns in codebase
+5. Backend API queries real database
+
+These features ensure the coding agent implements a real database, not mock data or in-memory storage.
+
 **How to count features:**
 For each feature area discussed, estimate the number of discrete, testable behaviors:

@@ -225,17 +257,20 @@ For each feature area discussed, estimate the number of discrete, testable behav

 > "Based on what we discussed, here's my feature breakdown:
 >
+> - **Infrastructure (required)**: 5 features (database setup, persistence verification)
 > - [Category 1]: ~X features
 > - [Category 2]: ~Y features
 > - [Category 3]: ~Z features
 > - ...
 >
-> **Total: ~N features**
+> **Total: ~N features** (including 5 infrastructure)
 >
 > Does this seem right, or should I adjust?"

 Let the user confirm or adjust. This becomes your `feature_count` for the spec.

+**Important:** The first 5 features (indices 0-4) created by the initializer MUST be the Infrastructure category with no dependencies. All other features depend on these.
+
 ## Phase 5: Technical Details (DERIVED OR DISCUSSED)

 **For Quick Mode users:**
@@ -312,13 +347,13 @@ First ask in conversation if they want to make changes.

 ## Output Directory

-The output directory is: `$ARGUMENTS/prompts/`
+The output directory is: `$ARGUMENTS/.autoforge/prompts/`

 Once the user approves, generate these files:

 ## 1. Generate `app_spec.txt`

-**Output path:** `$ARGUMENTS/prompts/app_spec.txt`
+**Output path:** `$ARGUMENTS/.autoforge/prompts/app_spec.txt`

 Create a new file using this XML structure:

@@ -454,7 +489,7 @@ Create a new file using this XML structure:

 ## 2. Update `initializer_prompt.md`

-**Output path:** `$ARGUMENTS/prompts/initializer_prompt.md`
+**Output path:** `$ARGUMENTS/.autoforge/prompts/initializer_prompt.md`

 If the output directory has an existing `initializer_prompt.md`, read it and update the feature count.
 If not, copy from `.claude/templates/initializer_prompt.template.md` first, then update.
@@ -477,7 +512,7 @@ After:  **CRITICAL:** You must create exactly **25** features using the `feature

 ## 3. Write Status File (REQUIRED - Do This Last)

-**Output path:** `$ARGUMENTS/prompts/.spec_status.json`
+**Output path:** `$ARGUMENTS/.autoforge/prompts/.spec_status.json`

 **CRITICAL:** After you have completed ALL requested file changes, write this status file to signal completion to the UI. This is required for the "Continue to Project" button to appear.

@@ -489,8 +524,8 @@ Write this JSON file:
  "version": 1,
  "timestamp": "[current ISO 8601 timestamp, e.g., 2025-01-15T14:30:00.000Z]",
  "files_written": [
-    "prompts/app_spec.txt",
-    "prompts/initializer_prompt.md"
+    ".autoforge/prompts/app_spec.txt",
+    ".autoforge/prompts/initializer_prompt.md"
  ],
  "feature_count": [the feature count from Phase 4L]
 }
@@ -504,9 +539,9 @@ Write this JSON file:
  "version": 1,
  "timestamp": "2025-01-15T14:30:00.000Z",
  "files_written": [
-    "prompts/app_spec.txt",
-    "prompts/initializer_prompt.md",
-    "prompts/coding_prompt.md"
+    ".autoforge/prompts/app_spec.txt",
+    ".autoforge/prompts/initializer_prompt.md",
+    ".autoforge/prompts/coding_prompt.md"
  ],
  "feature_count": 35
 }
@@ -524,11 +559,11 @@ Write this JSON file:

 Once files are generated, tell the user what to do next:

-> "Your specification files have been created in `$ARGUMENTS/prompts/`!
+> "Your specification files have been created in `$ARGUMENTS/.autoforge/prompts/`!
 >
 > **Files created:**
-> - `$ARGUMENTS/prompts/app_spec.txt`
-> - `$ARGUMENTS/prompts/initializer_prompt.md`
+> - `$ARGUMENTS/.autoforge/prompts/app_spec.txt`
+> - `$ARGUMENTS/.autoforge/prompts/initializer_prompt.md`
 >
 > The **Continue to Project** button should now appear. Click it to start the autonomous coding agent!
 >
--- a/.claude/commands/expand-project.md
+++ b/.claude/commands/expand-project.md
@@ -42,7 +42,7 @@ You are the **Project Expansion Assistant** - an expert at understanding existin
 # FIRST: Read and Understand Existing Project

 **Step 1:** Read the existing specification:
- Read `$ARGUMENTS/prompts/app_spec.txt`
+- Read `$ARGUMENTS/.autoforge/prompts/app_spec.txt`

 **Step 2:** Present a summary to the user:

@@ -231,4 +231,4 @@ If they want to add more, go back to Phase 1.

 # BEGIN

-Start by reading the app specification file at `$ARGUMENTS/prompts/app_spec.txt`, then greet the user with a summary of their existing project and ask what they want to add.
+Start by reading the app specification file at `$ARGUMENTS/.autoforge/prompts/app_spec.txt`, then greet the user with a summary of their existing project and ask what they want to add.
--- a/.claude/commands/gsd-to-autocoder-spec.md
+++ b/.claude/commands/gsd-to-autocoder-spec.md
@@ -1,10 +0,0 @@
---
-allowed-tools: Read, Write, Bash, Glob, Grep
-description: Convert GSD codebase mapping to Autocoder app_spec.txt
---
-
-# GSD to Autocoder Spec
-
-Convert `.planning/codebase/*.md` (from `/gsd:map-codebase`) to Autocoder's `prompts/app_spec.txt`.
-
-@.claude/skills/gsd-to-autocoder-spec/SKILL.md
--- a/.claude/commands/gsd-to-autoforge-spec.md
+++ b/.claude/commands/gsd-to-autoforge-spec.md
@@ -0,0 +1,10 @@
+---
+allowed-tools: Read, Write, Bash, Glob, Grep
+description: Convert GSD codebase mapping to AutoForge app_spec.txt
+---
+
+# GSD to AutoForge Spec
+
+Convert `.planning/codebase/*.md` (from `/gsd:map-codebase`) to AutoForge's `.autoforge/prompts/app_spec.txt`.
+
+@.claude/skills/gsd-to-autoforge-spec/SKILL.md
--- a/.claude/commands/review-pr.md
+++ b/.claude/commands/review-pr.md
@@ -0,0 +1,92 @@
+---
+description: Review pull requests
+---
+
+Pull request(s): $ARGUMENTS
+
+- If no PR numbers are provided, ask the user to provide PR number(s).
+- At least 1 PR is required.
+
+## TASKS
+
+1. **Retrieve PR Details**
+   - Use the GH CLI tool to retrieve the details (descriptions, diffs, comments, feedback, reviews, etc)
+
+2. **Assess PR Complexity**
+
+   After retrieving PR details, assess complexity based on:
+   - Number of files changed
+   - Lines added/removed
+   - Number of contributors/commits
+   - Whether changes touch core/architectural files
+
+   ### Complexity Tiers
+
+   **Simple** (no deep dive agents needed):
+   - ≤5 files changed AND ≤100 lines changed AND single author
+   - Review directly without spawning agents
+
+   **Medium** (1-2 deep dive agents):
+   - 6-15 files changed, OR 100-500 lines, OR 2 contributors
+   - Spawn 1 agent for focused areas, 2 if changes span multiple domains
+
+   **Complex** (up to 3 deep dive agents):
+   - >15 files, OR >500 lines, OR >2 contributors, OR touches core architecture
+   - Spawn up to 3 agents to analyze different aspects (e.g., security, performance, architecture)
+
+3. **Analyze Codebase Impact**
+   - Based on the complexity tier determined above, spawn the appropriate number of deep dive subagents
+   - For Simple PRs: analyze directly without spawning agents
+   - For Medium PRs: spawn 1-2 agents focusing on the most impacted areas
+   - For Complex PRs: spawn up to 3 agents to cover security, performance, and architectural concerns
+
+4. **PR Scope & Title Alignment Check**
+   - Compare the PR title and description against the actual diff content
+   - Check whether the PR is focused on a single coherent change or contains multiple unrelated changes
+   - If the title/description describe one thing but the PR contains significantly more (e.g., title says "fix typo in README" but the diff touches 20 files across multiple domains), flag this as a **scope mismatch**
+   - A scope mismatch is a **merge blocker** — recommend the author split the PR into smaller, focused PRs
+   - Suggest specific ways to split the PR (e.g., "separate the refactor from the feature addition")
+   - Reviewing large, unfocused PRs is impractical and error-prone; the review cannot provide adequate assurance for such changes
+
+5. **Vision Alignment Check**
+   - Read the project's README.md and CLAUDE.md to understand the application's core purpose
+   - Assess whether this PR aligns with the application's intended functionality
+   - If the changes deviate significantly from the core vision or add functionality that doesn't serve the application's purpose, note this in the review
+   - This is not a blocker, but should be flagged for the reviewer's consideration
+
+6. **Safety Assessment**
+   - Provide a review on whether the PR is safe to merge as-is
+   - Provide any feedback in terms of risk level
+
+7. **Improvements**
+   - Propose any improvements in terms of importance and complexity
+
+8. **Merge Recommendation**
+   - Based on all findings, provide a clear merge/don't-merge recommendation
+   - If all concerns are minor (cosmetic issues, naming suggestions, small style nits, missing comments, etc.), recommend **merging the PR** and note that the reviewer can address these minor concerns themselves with a quick follow-up commit pushed directly to master
+   - If there are significant concerns (bugs, security issues, architectural problems, scope mismatch), recommend **not merging** and explain what needs to be resolved first
+
+9. **TLDR**
+   - End the review with a `## TLDR` section
+   - In 3-5 bullet points maximum, summarize:
+     - What this PR is actually about (one sentence)
+     - The key concerns, if any (or "no significant concerns")
+     - **Verdict: MERGE** / **MERGE (with minor follow-up)** / **DON'T MERGE** with a one-line reason
+   - This section should be scannable in under 10 seconds
+
+10. **Post-Review Action**
+    - Immediately after the TLDR, provide a `## Recommended Action` section
+    - Based on the verdict, recommend one of the following actions:
+
+    **If verdict is MERGE (no concerns):**
+    - Recommend merging as-is. No further action needed.
+
+    **If verdict is MERGE (with minor follow-up):**
+    - If the concerns are low-risk and straightforward to fix (e.g., naming tweaks, small refactors, missing type annotations, minor style issues, trivial bug fixes), recommend merging the PR now and offer to immediately address the concerns in a follow-up commit directly on the target branch
+    - List the specific changes you would make in the follow-up
+    - Ask the user: *"Should I merge this PR and push a follow-up commit addressing these concerns?"*
+
+    **If verdict is DON'T MERGE:**
+    - If the blocking concerns are still relatively contained and you are confident you can resolve them quickly (e.g., a small bug fix, a missing validation, a straightforward architectural adjustment), recommend merging the PR and immediately addressing the issues in a follow-up commit — but only if the fixes are low-risk and well-understood
+    - If the issues are too complex, risky, or require author input (e.g., design decisions, major refactors, unclear intent), recommend sending the PR back to the author with specific feedback on what needs to change
+    - Be honest about your confidence level — if you're unsure whether you can address the concerns correctly, say so and defer to the author
--- a/.claude/skills/gsd-to-autoforge-spec/SKILL.md
+++ b/.claude/skills/gsd-to-autoforge-spec/SKILL.md
@@ -1,21 +1,21 @@
 ---
-name: gsd-to-autocoder-spec
+name: gsd-to-autoforge-spec
 description: |
-  Convert GSD codebase mapping to Autocoder app_spec.txt. This skill should be used when
-  the user has run /gsd:map-codebase and wants to use Autocoder on an existing project.
-  Triggers: "convert to autocoder", "gsd to spec", "create app_spec from codebase",
-  "use autocoder on existing project", after /gsd:map-codebase completion.
+  Convert GSD codebase mapping to AutoForge app_spec.txt. This skill should be used when
+  the user has run /gsd:map-codebase and wants to use AutoForge on an existing project.
+  Triggers: "convert to autoforge", "gsd to spec", "create app_spec from codebase",
+  "use autoforge on existing project", after /gsd:map-codebase completion.
 ---

-# GSD to Autocoder Spec Converter
+# GSD to AutoForge Spec Converter

-Converts `.planning/codebase/*.md` (GSD mapping output) to `prompts/app_spec.txt` (Autocoder format).
+Converts `.planning/codebase/*.md` (GSD mapping output) to `.autoforge/prompts/app_spec.txt` (AutoForge format).

 ## When to Use

 - After running `/gsd:map-codebase` on an existing project
- When onboarding an existing codebase to Autocoder
- User wants Autocoder to continue development on existing code
+- When onboarding an existing codebase to AutoForge
+- User wants AutoForge to continue development on existing code

 ## Prerequisites

@@ -84,12 +84,12 @@ Extract:

 Create `prompts/` directory:
 ```bash
-mkdir -p prompts
+mkdir -p .autoforge/prompts
 ```

-**Mapping GSD Documents to Autocoder Spec:**
+**Mapping GSD Documents to AutoForge Spec:**

-| GSD Source | Autocoder Target |
+| GSD Source | AutoForge Target |
 |------------|------------------|
 | STACK.md Languages | `<technology_stack>` |
 | STACK.md Frameworks | `<frontend>`, `<backend>` |
@@ -114,7 +114,7 @@ mkdir -p prompts
 **Write the spec file** using the XML format from [references/app-spec-format.md](references/app-spec-format.md):

 ```bash
-cat > prompts/app_spec.txt << 'EOF'
+cat > .autoforge/prompts/app_spec.txt << 'EOF'
 <project_specification>
  <project_name>{from package.json or directory}</project_name>

@@ -173,9 +173,9 @@ EOF
 ### Step 5: Verify Generated Spec

 ```bash
-head -100 prompts/app_spec.txt
+head -100 .autoforge/prompts/app_spec.txt
 echo "---"
-grep -c "User can\|System\|API\|Feature" prompts/app_spec.txt || echo "0"
+grep -c "User can\|System\|API\|Feature" .autoforge/prompts/app_spec.txt || echo "0"
 ```

 **Validation checklist:**
@@ -194,15 +194,15 @@ Output:
 app_spec.txt generated from GSD codebase mapping.

 Source: .planning/codebase/*.md
-Output: prompts/app_spec.txt
+Output: .autoforge/prompts/app_spec.txt

-Next: Start Autocoder
+Next: Start AutoForge

  cd {project_dir}
-  python ~/projects/autocoder/start.py
+  python ~/projects/autoforge/start.py

 Or via UI:
-  ~/projects/autocoder/start_ui.sh
+  ~/projects/autoforge/start_ui.sh

 The Initializer will create features.db from this spec.
 ```
--- a/.claude/skills/gsd-to-autoforge-spec/references/app-spec-format.md
+++ b/.claude/skills/gsd-to-autoforge-spec/references/app-spec-format.md
@@ -1,6 +1,6 @@
-# Autocoder app_spec.txt XML Format
+# AutoForge app_spec.txt XML Format

-Complete reference for the XML structure expected by Autocoder's Initializer agent.
+Complete reference for the XML structure expected by AutoForge's Initializer agent.

 ## Root Structure

@@ -275,7 +275,7 @@ The Initializer agent expects features distributed across categories:
 | Medium web app | 200-250 | 10-15 |
 | Complex full-stack | 300-400 | 15-20 |

-## GSD to Autocoder Mapping
+## GSD to AutoForge Mapping

 When converting from GSD codebase mapping:

--- a/.claude/skills/playwright-cli/SKILL.md
+++ b/.claude/skills/playwright-cli/SKILL.md
@@ -0,0 +1,259 @@
+---
+name: playwright-cli
+description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
+allowed-tools: Bash(playwright-cli:*)
+---
+
+# Browser Automation with playwright-cli
+
+## Quick start
+
+```bash
+# open new browser
+playwright-cli open
+# navigate to a page
+playwright-cli goto https://playwright.dev
+# interact with the page using refs from the snapshot
+playwright-cli click e15
+playwright-cli type "page.click"
+playwright-cli press Enter
+# take a screenshot
+playwright-cli screenshot
+# close the browser
+playwright-cli close
+```
+
+## Commands
+
+### Core
+
+```bash
+playwright-cli open
+# open and navigate right away
+playwright-cli open https://example.com/
+playwright-cli goto https://playwright.dev
+playwright-cli type "search query"
+playwright-cli click e3
+playwright-cli dblclick e7
+playwright-cli fill e5 "user@example.com"
+playwright-cli drag e2 e8
+playwright-cli hover e4
+playwright-cli select e9 "option-value"
+playwright-cli upload ./document.pdf
+playwright-cli check e12
+playwright-cli uncheck e12
+playwright-cli snapshot
+playwright-cli snapshot --filename=after-click.yaml
+playwright-cli eval "document.title"
+playwright-cli eval "el => el.textContent" e5
+playwright-cli dialog-accept
+playwright-cli dialog-accept "confirmation text"
+playwright-cli dialog-dismiss
+playwright-cli resize 1920 1080
+playwright-cli close
+```
+
+### Navigation
+
+```bash
+playwright-cli go-back
+playwright-cli go-forward
+playwright-cli reload
+```
+
+### Keyboard
+
+```bash
+playwright-cli press Enter
+playwright-cli press ArrowDown
+playwright-cli keydown Shift
+playwright-cli keyup Shift
+```
+
+### Mouse
+
+```bash
+playwright-cli mousemove 150 300
+playwright-cli mousedown
+playwright-cli mousedown right
+playwright-cli mouseup
+playwright-cli mouseup right
+playwright-cli mousewheel 0 100
+```
+
+### Save as
+
+```bash
+playwright-cli screenshot
+playwright-cli screenshot e5
+playwright-cli screenshot --filename=page.png
+playwright-cli pdf --filename=page.pdf
+```
+
+### Tabs
+
+```bash
+playwright-cli tab-list
+playwright-cli tab-new
+playwright-cli tab-new https://example.com/page
+playwright-cli tab-close
+playwright-cli tab-close 2
+playwright-cli tab-select 0
+```
+
+### Storage
+
+```bash
+playwright-cli state-save
+playwright-cli state-save auth.json
+playwright-cli state-load auth.json
+
+# Cookies
+playwright-cli cookie-list
+playwright-cli cookie-list --domain=example.com
+playwright-cli cookie-get session_id
+playwright-cli cookie-set session_id abc123
+playwright-cli cookie-set session_id abc123 --domain=example.com --httpOnly --secure
+playwright-cli cookie-delete session_id
+playwright-cli cookie-clear
+
+# LocalStorage
+playwright-cli localstorage-list
+playwright-cli localstorage-get theme
+playwright-cli localstorage-set theme dark
+playwright-cli localstorage-delete theme
+playwright-cli localstorage-clear
+
+# SessionStorage
+playwright-cli sessionstorage-list
+playwright-cli sessionstorage-get step
+playwright-cli sessionstorage-set step 3
+playwright-cli sessionstorage-delete step
+playwright-cli sessionstorage-clear
+```
+
+### Network
+
+```bash
+playwright-cli route "**/*.jpg" --status=404
+playwright-cli route "https://api.example.com/**" --body='{"mock": true}'
+playwright-cli route-list
+playwright-cli unroute "**/*.jpg"
+playwright-cli unroute
+```
+
+### DevTools
+
+```bash
+playwright-cli console
+playwright-cli console warning
+playwright-cli network
+playwright-cli run-code "async page => await page.context().grantPermissions(['geolocation'])"
+playwright-cli tracing-start
+playwright-cli tracing-stop
+playwright-cli video-start
+playwright-cli video-stop video.webm
+```
+
+### Install
+
+```bash
+playwright-cli install --skills
+playwright-cli install-browser
+```
+
+### Configuration
+```bash
+# Use specific browser when creating session
+playwright-cli open --browser=chrome
+playwright-cli open --browser=firefox
+playwright-cli open --browser=webkit
+playwright-cli open --browser=msedge
+# Connect to browser via extension
+playwright-cli open --extension
+
+# Use persistent profile (by default profile is in-memory)
+playwright-cli open --persistent
+# Use persistent profile with custom directory
+playwright-cli open --profile=/path/to/profile
+
+# Start with config file
+playwright-cli open --config=my-config.json
+
+# Close the browser
+playwright-cli close
+# Delete user data for the default session
+playwright-cli delete-data
+```
+
+### Browser Sessions
+
+```bash
+# create new browser session named "mysession" with persistent profile
+playwright-cli -s=mysession open example.com --persistent
+# same with manually specified profile directory (use when requested explicitly)
+playwright-cli -s=mysession open example.com --profile=/path/to/profile
+playwright-cli -s=mysession click e6
+playwright-cli -s=mysession close  # stop a named browser
+playwright-cli -s=mysession delete-data  # delete user data for persistent session
+
+playwright-cli list
+# Close all browsers
+playwright-cli close-all
+# Forcefully kill all browser processes
+playwright-cli kill-all
+```
+
+## Example: Form submission
+
+```bash
+playwright-cli open https://example.com/form
+playwright-cli snapshot
+
+playwright-cli fill e1 "user@example.com"
+playwright-cli fill e2 "password123"
+playwright-cli click e3
+playwright-cli snapshot
+playwright-cli close
+```
+
+## Example: Multi-tab workflow
+
+```bash
+playwright-cli open https://example.com
+playwright-cli tab-new https://example.com/other
+playwright-cli tab-list
+playwright-cli tab-select 0
+playwright-cli snapshot
+playwright-cli close
+```
+
+## Example: Debugging with DevTools
+
+```bash
+playwright-cli open https://example.com
+playwright-cli click e4
+playwright-cli fill e7 "test"
+playwright-cli console
+playwright-cli network
+playwright-cli close
+```
+
+```bash
+playwright-cli open https://example.com
+playwright-cli tracing-start
+playwright-cli click e4
+playwright-cli fill e7 "test"
+playwright-cli tracing-stop
+playwright-cli close
+```
+
+## Specific tasks
+
+* **Request mocking** [references/request-mocking.md](references/request-mocking.md)
+* **Running Playwright code** [references/running-code.md](references/running-code.md)
+* **Browser session management** [references/session-management.md](references/session-management.md)
+* **Storage state (cookies, localStorage)** [references/storage-state.md](references/storage-state.md)
+* **Test generation** [references/test-generation.md](references/test-generation.md)
+* **Tracing** [references/tracing.md](references/tracing.md)
+* **Video recording** [references/video-recording.md](references/video-recording.md)
--- a/.claude/skills/playwright-cli/references/request-mocking.md
+++ b/.claude/skills/playwright-cli/references/request-mocking.md
@@ -0,0 +1,87 @@
+# Request Mocking
+
+Intercept, mock, modify, and block network requests.
+
+## CLI Route Commands
+
+```bash
+# Mock with custom status
+playwright-cli route "**/*.jpg" --status=404
+
+# Mock with JSON body
+playwright-cli route "**/api/users" --body='[{"id":1,"name":"Alice"}]' --content-type=application/json
+
+# Mock with custom headers
+playwright-cli route "**/api/data" --body='{"ok":true}' --header="X-Custom: value"
+
+# Remove headers from requests
+playwright-cli route "**/*" --remove-header=cookie,authorization
+
+# List active routes
+playwright-cli route-list
+
+# Remove a route or all routes
+playwright-cli unroute "**/*.jpg"
+playwright-cli unroute
+```
+
+## URL Patterns
+
+```
+**/api/users           - Exact path match
+**/api/*/details       - Wildcard in path
+**/*.{png,jpg,jpeg}    - Match file extensions
+**/search?q=*          - Match query parameters
+```
+
+## Advanced Mocking with run-code
+
+For conditional responses, request body inspection, response modification, or delays:
+
+### Conditional Response Based on Request
+
+```bash
+playwright-cli run-code "async page => {
+  await page.route('**/api/login', route => {
+    const body = route.request().postDataJSON();
+    if (body.username === 'admin') {
+      route.fulfill({ body: JSON.stringify({ token: 'mock-token' }) });
+    } else {
+      route.fulfill({ status: 401, body: JSON.stringify({ error: 'Invalid' }) });
+    }
+  });
+}"
+```
+
+### Modify Real Response
+
+```bash
+playwright-cli run-code "async page => {
+  await page.route('**/api/user', async route => {
+    const response = await route.fetch();
+    const json = await response.json();
+    json.isPremium = true;
+    await route.fulfill({ response, json });
+  });
+}"
+```
+
+### Simulate Network Failures
+
+```bash
+playwright-cli run-code "async page => {
+  await page.route('**/api/offline', route => route.abort('internetdisconnected'));
+}"
+# Options: connectionrefused, timedout, connectionreset, internetdisconnected
+```
+
+### Delayed Response
+
+```bash
+playwright-cli run-code "async page => {
+  await page.route('**/api/slow', async route => {
+    await new Promise(r => setTimeout(r, 3000));
+    route.fulfill({ body: JSON.stringify({ data: 'loaded' }) });
+  });
+}"
+```
--- a/.claude/skills/playwright-cli/references/running-code.md
+++ b/.claude/skills/playwright-cli/references/running-code.md
@@ -0,0 +1,232 @@
+# Running Custom Playwright Code
+
+Use `run-code` to execute arbitrary Playwright code for advanced scenarios not covered by CLI commands.
+
+## Syntax
+
+```bash
+playwright-cli run-code "async page => {
+  // Your Playwright code here
+  // Access page.context() for browser context operations
+}"
+```
+
+## Geolocation
+
+```bash
+# Grant geolocation permission and set location
+playwright-cli run-code "async page => {
+  await page.context().grantPermissions(['geolocation']);
+  await page.context().setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
+}"
+
+# Set location to London
+playwright-cli run-code "async page => {
+  await page.context().grantPermissions(['geolocation']);
+  await page.context().setGeolocation({ latitude: 51.5074, longitude: -0.1278 });
+}"
+
+# Clear geolocation override
+playwright-cli run-code "async page => {
+  await page.context().clearPermissions();
+}"
+```
+
+## Permissions
+
+```bash
+# Grant multiple permissions
+playwright-cli run-code "async page => {
+  await page.context().grantPermissions([
+    'geolocation',
+    'notifications',
+    'camera',
+    'microphone'
+  ]);
+}"
+
+# Grant permissions for specific origin
+playwright-cli run-code "async page => {
+  await page.context().grantPermissions(['clipboard-read'], {
+    origin: 'https://example.com'
+  });
+}"
+```
+
+## Media Emulation
+
+```bash
+# Emulate dark color scheme
+playwright-cli run-code "async page => {
+  await page.emulateMedia({ colorScheme: 'dark' });
+}"
+
+# Emulate light color scheme
+playwright-cli run-code "async page => {
+  await page.emulateMedia({ colorScheme: 'light' });
+}"
+
+# Emulate reduced motion
+playwright-cli run-code "async page => {
+  await page.emulateMedia({ reducedMotion: 'reduce' });
+}"
+
+# Emulate print media
+playwright-cli run-code "async page => {
+  await page.emulateMedia({ media: 'print' });
+}"
+```
+
+## Wait Strategies
+
+```bash
+# Wait for network idle
+playwright-cli run-code "async page => {
+  await page.waitForLoadState('networkidle');
+}"
+
+# Wait for specific element
+playwright-cli run-code "async page => {
+  await page.waitForSelector('.loading', { state: 'hidden' });
+}"
+
+# Wait for function to return true
+playwright-cli run-code "async page => {
+  await page.waitForFunction(() => window.appReady === true);
+}"
+
+# Wait with timeout
+playwright-cli run-code "async page => {
+  await page.waitForSelector('.result', { timeout: 10000 });
+}"
+```
+
+## Frames and Iframes
+
+```bash
+# Work with iframe
+playwright-cli run-code "async page => {
+  const frame = page.locator('iframe#my-iframe').contentFrame();
+  await frame.locator('button').click();
+}"
+
+# Get all frames
+playwright-cli run-code "async page => {
+  const frames = page.frames();
+  return frames.map(f => f.url());
+}"
+```
+
+## File Downloads
+
+```bash
+# Handle file download
+playwright-cli run-code "async page => {
+  const [download] = await Promise.all([
+    page.waitForEvent('download'),
+    page.click('a.download-link')
+  ]);
+  await download.saveAs('./downloaded-file.pdf');
+  return download.suggestedFilename();
+}"
+```
+
+## Clipboard
+
+```bash
+# Read clipboard (requires permission)
+playwright-cli run-code "async page => {
+  await page.context().grantPermissions(['clipboard-read']);
+  return await page.evaluate(() => navigator.clipboard.readText());
+}"
+
+# Write to clipboard
+playwright-cli run-code "async page => {
+  await page.evaluate(text => navigator.clipboard.writeText(text), 'Hello clipboard!');
+}"
+```
+
+## Page Information
+
+```bash
+# Get page title
+playwright-cli run-code "async page => {
+  return await page.title();
+}"
+
+# Get current URL
+playwright-cli run-code "async page => {
+  return page.url();
+}"
+
+# Get page content
+playwright-cli run-code "async page => {
+  return await page.content();
+}"
+
+# Get viewport size
+playwright-cli run-code "async page => {
+  return page.viewportSize();
+}"
+```
+
+## JavaScript Execution
+
+```bash
+# Execute JavaScript and return result
+playwright-cli run-code "async page => {
+  return await page.evaluate(() => {
+    return {
+      userAgent: navigator.userAgent,
+      language: navigator.language,
+      cookiesEnabled: navigator.cookieEnabled
+    };
+  });
+}"
+
+# Pass arguments to evaluate
+playwright-cli run-code "async page => {
+  const multiplier = 5;
+  return await page.evaluate(m => document.querySelectorAll('li').length * m, multiplier);
+}"
+```
+
+## Error Handling
+
+```bash
+# Try-catch in run-code
+playwright-cli run-code "async page => {
+  try {
+    await page.click('.maybe-missing', { timeout: 1000 });
+    return 'clicked';
+  } catch (e) {
+    return 'element not found';
+  }
+}"
+```
+
+## Complex Workflows
+
+```bash
+# Login and save state
+playwright-cli run-code "async page => {
+  await page.goto('https://example.com/login');
+  await page.fill('input[name=email]', 'user@example.com');
+  await page.fill('input[name=password]', 'secret');
+  await page.click('button[type=submit]');
+  await page.waitForURL('**/dashboard');
+  await page.context().storageState({ path: 'auth.json' });
+  return 'Login successful';
+}"
+
+# Scrape data from multiple pages
+playwright-cli run-code "async page => {
+  const results = [];
+  for (let i = 1; i <= 3; i++) {
+    await page.goto(\`https://example.com/page/\${i}\`);
+    const items = await page.locator('.item').allTextContents();
+    results.push(...items);
+  }
+  return results;
+}"
+```
--- a/.claude/skills/playwright-cli/references/session-management.md
+++ b/.claude/skills/playwright-cli/references/session-management.md
@@ -0,0 +1,169 @@
+# Browser Session Management
+
+Run multiple isolated browser sessions concurrently with state persistence.
+
+## Named Browser Sessions
+
+Use `-b` flag to isolate browser contexts:
+
+```bash
+# Browser 1: Authentication flow
+playwright-cli -s=auth open https://app.example.com/login
+
+# Browser 2: Public browsing (separate cookies, storage)
+playwright-cli -s=public open https://example.com
+
+# Commands are isolated by browser session
+playwright-cli -s=auth fill e1 "user@example.com"
+playwright-cli -s=public snapshot
+```
+
+## Browser Session Isolation Properties
+
+Each browser session has independent:
+- Cookies
+- LocalStorage / SessionStorage
+- IndexedDB
+- Cache
+- Browsing history
+- Open tabs
+
+## Browser Session Commands
+
+```bash
+# List all browser sessions
+playwright-cli list
+
+# Stop a browser session (close the browser)
+playwright-cli close                # stop the default browser
+playwright-cli -s=mysession close   # stop a named browser
+
+# Stop all browser sessions
+playwright-cli close-all
+
+# Forcefully kill all daemon processes (for stale/zombie processes)
+playwright-cli kill-all
+
+# Delete browser session user data (profile directory)
+playwright-cli delete-data                # delete default browser data
+playwright-cli -s=mysession delete-data   # delete named browser data
+```
+
+## Environment Variable
+
+Set a default browser session name via environment variable:
+
+```bash
+export PLAYWRIGHT_CLI_SESSION="mysession"
+playwright-cli open example.com  # Uses "mysession" automatically
+```
+
+## Common Patterns
+
+### Concurrent Scraping
+
+```bash
+#!/bin/bash
+# Scrape multiple sites concurrently
+
+# Start all browsers
+playwright-cli -s=site1 open https://site1.com &
+playwright-cli -s=site2 open https://site2.com &
+playwright-cli -s=site3 open https://site3.com &
+wait
+
+# Take snapshots from each
+playwright-cli -s=site1 snapshot
+playwright-cli -s=site2 snapshot
+playwright-cli -s=site3 snapshot
+
+# Cleanup
+playwright-cli close-all
+```
+
+### A/B Testing Sessions
+
+```bash
+# Test different user experiences
+playwright-cli -s=variant-a open "https://app.com?variant=a"
+playwright-cli -s=variant-b open "https://app.com?variant=b"
+
+# Compare
+playwright-cli -s=variant-a screenshot
+playwright-cli -s=variant-b screenshot
+```
+
+### Persistent Profile
+
+By default, browser profile is kept in memory only. Use `--persistent` flag on `open` to persist the browser profile to disk:
+
+```bash
+# Use persistent profile (auto-generated location)
+playwright-cli open https://example.com --persistent
+
+# Use persistent profile with custom directory
+playwright-cli open https://example.com --profile=/path/to/profile
+```
+
+## Default Browser Session
+
+When `-s` is omitted, commands use the default browser session:
+
+```bash
+# These use the same default browser session
+playwright-cli open https://example.com
+playwright-cli snapshot
+playwright-cli close  # Stops default browser
+```
+
+## Browser Session Configuration
+
+Configure a browser session with specific settings when opening:
+
+```bash
+# Open with config file
+playwright-cli open https://example.com --config=.playwright/my-cli.json
+
+# Open with specific browser
+playwright-cli open https://example.com --browser=firefox
+
+# Open in headed mode
+playwright-cli open https://example.com --headed
+
+# Open with persistent profile
+playwright-cli open https://example.com --persistent
+```
+
+## Best Practices
+
+### 1. Name Browser Sessions Semantically
+
+```bash
+# GOOD: Clear purpose
+playwright-cli -s=github-auth open https://github.com
+playwright-cli -s=docs-scrape open https://docs.example.com
+
+# AVOID: Generic names
+playwright-cli -s=s1 open https://github.com
+```
+
+### 2. Always Clean Up
+
+```bash
+# Stop browsers when done
+playwright-cli -s=auth close
+playwright-cli -s=scrape close
+
+# Or stop all at once
+playwright-cli close-all
+
+# If browsers become unresponsive or zombie processes remain
+playwright-cli kill-all
+```
+
+### 3. Delete Stale Browser Data
+
+```bash
+# Remove old browser data to free disk space
+playwright-cli -s=oldsession delete-data
+```
--- a/.claude/skills/playwright-cli/references/storage-state.md
+++ b/.claude/skills/playwright-cli/references/storage-state.md
@@ -0,0 +1,275 @@
+# Storage Management
+
+Manage cookies, localStorage, sessionStorage, and browser storage state.
+
+## Storage State
+
+Save and restore complete browser state including cookies and storage.
+
+### Save Storage State
+
+```bash
+# Save to auto-generated filename (storage-state-{timestamp}.json)
+playwright-cli state-save
+
+# Save to specific filename
+playwright-cli state-save my-auth-state.json
+```
+
+### Restore Storage State
+
+```bash
+# Load storage state from file
+playwright-cli state-load my-auth-state.json
+
+# Reload page to apply cookies
+playwright-cli open https://example.com
+```
+
+### Storage State File Format
+
+The saved file contains:
+
+```json
+{
+  "cookies": [
+    {
+      "name": "session_id",
+      "value": "abc123",
+      "domain": "example.com",
+      "path": "/",
+      "expires": 1735689600,
+      "httpOnly": true,
+      "secure": true,
+      "sameSite": "Lax"
+    }
+  ],
+  "origins": [
+    {
+      "origin": "https://example.com",
+      "localStorage": [
+        { "name": "theme", "value": "dark" },
+        { "name": "user_id", "value": "12345" }
+      ]
+    }
+  ]
+}
+```
+
+## Cookies
+
+### List All Cookies
+
+```bash
+playwright-cli cookie-list
+```
+
+### Filter Cookies by Domain
+
+```bash
+playwright-cli cookie-list --domain=example.com
+```
+
+### Filter Cookies by Path
+
+```bash
+playwright-cli cookie-list --path=/api
+```
+
+### Get Specific Cookie
+
+```bash
+playwright-cli cookie-get session_id
+```
+
+### Set a Cookie
+
+```bash
+# Basic cookie
+playwright-cli cookie-set session abc123
+
+# Cookie with options
+playwright-cli cookie-set session abc123 --domain=example.com --path=/ --httpOnly --secure --sameSite=Lax
+
+# Cookie with expiration (Unix timestamp)
+playwright-cli cookie-set remember_me token123 --expires=1735689600
+```
+
+### Delete a Cookie
+
+```bash
+playwright-cli cookie-delete session_id
+```
+
+### Clear All Cookies
+
+```bash
+playwright-cli cookie-clear
+```
+
+### Advanced: Multiple Cookies or Custom Options
+
+For complex scenarios like adding multiple cookies at once, use `run-code`:
+
+```bash
+playwright-cli run-code "async page => {
+  await page.context().addCookies([
+    { name: 'session_id', value: 'sess_abc123', domain: 'example.com', path: '/', httpOnly: true },
+    { name: 'preferences', value: JSON.stringify({ theme: 'dark' }), domain: 'example.com', path: '/' }
+  ]);
+}"
+```
+
+## Local Storage
+
+### List All localStorage Items
+
+```bash
+playwright-cli localstorage-list
+```
+
+### Get Single Value
+
+```bash
+playwright-cli localstorage-get token
+```
+
+### Set Value
+
+```bash
+playwright-cli localstorage-set theme dark
+```
+
+### Set JSON Value
+
+```bash
+playwright-cli localstorage-set user_settings '{"theme":"dark","language":"en"}'
+```
+
+### Delete Single Item
+
+```bash
+playwright-cli localstorage-delete token
+```
+
+### Clear All localStorage
+
+```bash
+playwright-cli localstorage-clear
+```
+
+### Advanced: Multiple Operations
+
+For complex scenarios like setting multiple values at once, use `run-code`:
+
+```bash
+playwright-cli run-code "async page => {
+  await page.evaluate(() => {
+    localStorage.setItem('token', 'jwt_abc123');
+    localStorage.setItem('user_id', '12345');
+    localStorage.setItem('expires_at', Date.now() + 3600000);
+  });
+}"
+```
+
+## Session Storage
+
+### List All sessionStorage Items
+
+```bash
+playwright-cli sessionstorage-list
+```
+
+### Get Single Value
+
+```bash
+playwright-cli sessionstorage-get form_data
+```
+
+### Set Value
+
+```bash
+playwright-cli sessionstorage-set step 3
+```
+
+### Delete Single Item
+
+```bash
+playwright-cli sessionstorage-delete step
+```
+
+### Clear sessionStorage
+
+```bash
+playwright-cli sessionstorage-clear
+```
+
+## IndexedDB
+
+### List Databases
+
+```bash
+playwright-cli run-code "async page => {
+  return await page.evaluate(async () => {
+    const databases = await indexedDB.databases();
+    return databases;
+  });
+}"
+```
+
+### Delete Database
+
+```bash
+playwright-cli run-code "async page => {
+  await page.evaluate(() => {
+    indexedDB.deleteDatabase('myDatabase');
+  });
+}"
+```
+
+## Common Patterns
+
+### Authentication State Reuse
+
+```bash
+# Step 1: Login and save state
+playwright-cli open https://app.example.com/login
+playwright-cli snapshot
+playwright-cli fill e1 "user@example.com"
+playwright-cli fill e2 "password123"
+playwright-cli click e3
+
+# Save the authenticated state
+playwright-cli state-save auth.json
+
+# Step 2: Later, restore state and skip login
+playwright-cli state-load auth.json
+playwright-cli open https://app.example.com/dashboard
+# Already logged in!
+```
+
+### Save and Restore Roundtrip
+
+```bash
+# Set up authentication state
+playwright-cli open https://example.com
+playwright-cli eval "() => { document.cookie = 'session=abc123'; localStorage.setItem('user', 'john'); }"
+
+# Save state to file
+playwright-cli state-save my-session.json
+
+# ... later, in a new session ...
+
+# Restore state
+playwright-cli state-load my-session.json
+playwright-cli open https://example.com
+# Cookies and localStorage are restored!
+```
+
+## Security Notes
+
+- Never commit storage state files containing auth tokens
+- Add `*.auth-state.json` to `.gitignore`
+- Delete state files after automation completes
+- Use environment variables for sensitive data
+- By default, sessions run in-memory mode which is safer for sensitive operations
--- a/.claude/skills/playwright-cli/references/test-generation.md
+++ b/.claude/skills/playwright-cli/references/test-generation.md
@@ -0,0 +1,88 @@
+# Test Generation
+
+Generate Playwright test code automatically as you interact with the browser.
+
+## How It Works
+
+Every action you perform with `playwright-cli` generates corresponding Playwright TypeScript code.
+This code appears in the output and can be copied directly into your test files.
+
+## Example Workflow
+
+```bash
+# Start a session
+playwright-cli open https://example.com/login
+
+# Take a snapshot to see elements
+playwright-cli snapshot
+# Output shows: e1 [textbox "Email"], e2 [textbox "Password"], e3 [button "Sign In"]
+
+# Fill form fields - generates code automatically
+playwright-cli fill e1 "user@example.com"
+# Ran Playwright code:
+# await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
+
+playwright-cli fill e2 "password123"
+# Ran Playwright code:
+# await page.getByRole('textbox', { name: 'Password' }).fill('password123');
+
+playwright-cli click e3
+# Ran Playwright code:
+# await page.getByRole('button', { name: 'Sign In' }).click();
+```
+
+## Building a Test File
+
+Collect the generated code into a Playwright test:
+
+```typescript
+import { test, expect } from '@playwright/test';
+
+test('login flow', async ({ page }) => {
+  // Generated code from playwright-cli session:
+  await page.goto('https://example.com/login');
+  await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
+  await page.getByRole('textbox', { name: 'Password' }).fill('password123');
+  await page.getByRole('button', { name: 'Sign In' }).click();
+
+  // Add assertions
+  await expect(page).toHaveURL(/.*dashboard/);
+});
+```
+
+## Best Practices
+
+### 1. Use Semantic Locators
+
+The generated code uses role-based locators when possible, which are more resilient:
+
+```typescript
+// Generated (good - semantic)
+await page.getByRole('button', { name: 'Submit' }).click();
+
+// Avoid (fragile - CSS selectors)
+await page.locator('#submit-btn').click();
+```
+
+### 2. Explore Before Recording
+
+Take snapshots to understand the page structure before recording actions:
+
+```bash
+playwright-cli open https://example.com
+playwright-cli snapshot
+# Review the element structure
+playwright-cli click e5
+```
+
+### 3. Add Assertions Manually
+
+Generated code captures actions but not assertions. Add expectations in your test:
+
+```typescript
+// Generated action
+await page.getByRole('button', { name: 'Submit' }).click();
+
+// Manual assertion
+await expect(page.getByText('Success')).toBeVisible();
+```
--- a/.claude/skills/playwright-cli/references/tracing.md
+++ b/.claude/skills/playwright-cli/references/tracing.md
@@ -0,0 +1,139 @@
+# Tracing
+
+Capture detailed execution traces for debugging and analysis. Traces include DOM snapshots, screenshots, network activity, and console logs.
+
+## Basic Usage
+
+```bash
+# Start trace recording
+playwright-cli tracing-start
+
+# Perform actions
+playwright-cli open https://example.com
+playwright-cli click e1
+playwright-cli fill e2 "test"
+
+# Stop trace recording
+playwright-cli tracing-stop
+```
+
+## Trace Output Files
+
+When you start tracing, Playwright creates a `traces/` directory with several files:
+
+### `trace-{timestamp}.trace`
+
+**Action log** - The main trace file containing:
+- Every action performed (clicks, fills, navigations)
+- DOM snapshots before and after each action
+- Screenshots at each step
+- Timing information
+- Console messages
+- Source locations
+
+### `trace-{timestamp}.network`
+
+**Network log** - Complete network activity:
+- All HTTP requests and responses
+- Request headers and bodies
+- Response headers and bodies
+- Timing (DNS, connect, TLS, TTFB, download)
+- Resource sizes
+- Failed requests and errors
+
+### `resources/`
+
+**Resources directory** - Cached resources:
+- Images, fonts, stylesheets, scripts
+- Response bodies for replay
+- Assets needed to reconstruct page state
+
+## What Traces Capture
+
+| Category | Details |
+|----------|---------|
+| **Actions** | Clicks, fills, hovers, keyboard input, navigations |
+| **DOM** | Full DOM snapshot before/after each action |
+| **Screenshots** | Visual state at each step |
+| **Network** | All requests, responses, headers, bodies, timing |
+| **Console** | All console.log, warn, error messages |
+| **Timing** | Precise timing for each operation |
+
+## Use Cases
+
+### Debugging Failed Actions
+
+```bash
+playwright-cli tracing-start
+playwright-cli open https://app.example.com
+
+# This click fails - why?
+playwright-cli click e5
+
+playwright-cli tracing-stop
+# Open trace to see DOM state when click was attempted
+```
+
+### Analyzing Performance
+
+```bash
+playwright-cli tracing-start
+playwright-cli open https://slow-site.com
+playwright-cli tracing-stop
+
+# View network waterfall to identify slow resources
+```
+
+### Capturing Evidence
+
+```bash
+# Record a complete user flow for documentation
+playwright-cli tracing-start
+
+playwright-cli open https://app.example.com/checkout
+playwright-cli fill e1 "4111111111111111"
+playwright-cli fill e2 "12/25"
+playwright-cli fill e3 "123"
+playwright-cli click e4
+
+playwright-cli tracing-stop
+# Trace shows exact sequence of events
+```
+
+## Trace vs Video vs Screenshot
+
+| Feature | Trace | Video | Screenshot |
+|---------|-------|-------|------------|
+| **Format** | .trace file | .webm video | .png/.jpeg image |
+| **DOM inspection** | Yes | No | No |
+| **Network details** | Yes | No | No |
+| **Step-by-step replay** | Yes | Continuous | Single frame |
+| **File size** | Medium | Large | Small |
+| **Best for** | Debugging | Demos | Quick capture |
+
+## Best Practices
+
+### 1. Start Tracing Before the Problem
+
+```bash
+# Trace the entire flow, not just the failing step
+playwright-cli tracing-start
+playwright-cli open https://example.com
+# ... all steps leading to the issue ...
+playwright-cli tracing-stop
+```
+
+### 2. Clean Up Old Traces
+
+Traces can consume significant disk space:
+
+```bash
+# Remove traces older than 7 days
+find .playwright-cli/traces -mtime +7 -delete
+```
+
+## Limitations
+
+- Traces add overhead to automation
+- Large traces can consume significant disk space
+- Some dynamic content may not replay perfectly
--- a/.claude/skills/playwright-cli/references/video-recording.md
+++ b/.claude/skills/playwright-cli/references/video-recording.md
@@ -0,0 +1,43 @@
+# Video Recording
+
+Capture browser automation sessions as video for debugging, documentation, or verification. Produces WebM (VP8/VP9 codec).
+
+## Basic Recording
+
+```bash
+# Start recording
+playwright-cli video-start
+
+# Perform actions
+playwright-cli open https://example.com
+playwright-cli snapshot
+playwright-cli click e1
+playwright-cli fill e2 "test input"
+
+# Stop and save
+playwright-cli video-stop demo.webm
+```
+
+## Best Practices
+
+### 1. Use Descriptive Filenames
+
+```bash
+# Include context in filename
+playwright-cli video-stop recordings/login-flow-2024-01-15.webm
+playwright-cli video-stop recordings/checkout-test-run-42.webm
+```
+
+## Tracing vs Video
+
+| Feature | Video | Tracing |
+|---------|-------|---------|
+| Output | WebM file | Trace file (viewable in Trace Viewer) |
+| Shows | Visual recording | DOM snapshots, network, console, actions |
+| Use case | Demos, documentation | Debugging, analysis |
+| Size | Larger | Smaller |
+
+## Limitations
+
+- Recording adds slight overhead to automation
+- Large recordings can consume significant disk space
--- a/.claude/templates/coding_prompt.template.md
+++ b/.claude/templates/coding_prompt.template.md
@@ -49,51 +49,21 @@ Otherwise, start servers manually and document the process.

 #### TEST-DRIVEN DEVELOPMENT MINDSET (CRITICAL)

-Features are **test cases** that drive development. This is test-driven development:
+Features are **test cases** that drive development. If functionality doesn't exist, **BUILD IT** -- you are responsible for implementing ALL required functionality. Missing pages, endpoints, database tables, or components are NOT blockers; they are your job to create.

- **If you can't test a feature because functionality doesn't exist → BUILD IT**
- You are responsible for implementing ALL required functionality
- Never assume another process will build it later
- "Missing functionality" is NOT a blocker - it's your job to create it
-
-**Example:** Feature says "User can filter flashcards by difficulty level"
- WRONG: "Flashcard page doesn't exist yet" → skip feature
- RIGHT: "Flashcard page doesn't exist yet" → build flashcard page → implement filter → test feature
-
-**Note:** Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` with your assigned feature ID to get the details.
-
-Once you've retrieved the feature, **mark it as in-progress** (if not already):
+**Note:** Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` with your assigned feature ID to get the details. Then mark it as in-progress:

 ```
-# Mark feature as in-progress
 Use the feature_mark_in_progress tool with feature_id={your_assigned_id}
 ```

 If you get "already in-progress" error, that's OK - continue with implementation.

-Focus on completing one feature perfectly and completing its testing steps in this session before moving on to other features.
-It's ok if you only complete one feature in this session, as there will be more sessions later that continue to make progress.
+Focus on completing one feature perfectly in this session. It's ok if you only complete one feature, as more sessions will follow.

 #### When to Skip a Feature (EXTREMELY RARE)

-**Skipping should almost NEVER happen.** Only skip for truly external blockers you cannot control:
-
- **External API not configured**: Third-party service credentials missing (e.g., Stripe keys, OAuth secrets)
- **External service unavailable**: Dependency on service that's down or inaccessible
- **Environment limitation**: Hardware or system requirement you cannot fulfill
-
-**NEVER skip because:**
-
-| Situation | Wrong Action | Correct Action |
-|-----------|--------------|----------------|
-| "Page doesn't exist" | Skip | Create the page |
-| "API endpoint missing" | Skip | Implement the endpoint |
-| "Database table not ready" | Skip | Create the migration |
-| "Component not built" | Skip | Build the component |
-| "No data to test with" | Skip | Create test data or build data entry flow |
-| "Feature X needs to be done first" | Skip | Build feature X as part of this feature |
-
-If a feature requires building other functionality first, **build that functionality**. You are the coding agent - your job is to make the feature work, not to defer it.
+Only skip for truly external blockers: missing third-party credentials (Stripe keys, OAuth secrets), unavailable external services, or unfulfillable environment requirements. **NEVER** skip because a page, endpoint, component, or data doesn't exist yet -- build it. If a feature requires other functionality first, build that functionality as part of this feature.

 If you must skip (truly external blocker only):

@@ -116,68 +86,54 @@ Implement the chosen feature thoroughly:

 **CRITICAL:** You MUST verify features through the actual UI.

-Use browser automation tools:
+Use `playwright-cli` for browser automation:

- Navigate to the app in a real browser
- Interact like a human user (click, type, scroll)
- Take screenshots at each step
- Verify both functionality AND visual appearance
+- Open the browser: `playwright-cli open http://localhost:PORT`
+- Take a snapshot to see page elements: `playwright-cli snapshot`
+- Read the snapshot YAML file to see element refs
+- Click elements by ref: `playwright-cli click e5`
+- Type text: `playwright-cli type "search query"`
+- Fill form fields: `playwright-cli fill e3 "value"`
+- Take screenshots: `playwright-cli screenshot`
+- Read the screenshot file to verify visual appearance
+- Check console errors: `playwright-cli console`
+- Close browser when done: `playwright-cli close`
+
+**Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files
+to `.playwright-cli/`. You will see a file link in the output. Read the file only
+when you need to verify visual appearance or find element refs.

 **DO:**
-
 - Test through the UI with clicks and keyboard input
- Take screenshots to verify visual appearance
- Check for console errors in browser
+- Take screenshots and read them to verify visual appearance
+- Check for console errors with `playwright-cli console`
 - Verify complete user workflows end-to-end
+- Always run `playwright-cli close` when finished testing

 **DON'T:**
-
- Only test with curl commands (backend testing alone is insufficient)
- Use JavaScript evaluation to bypass UI (no shortcuts)
+- Only test with curl commands
+- Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked)
 - Skip visual verification
 - Mark tests passing without thorough verification

 ### STEP 5.5: MANDATORY VERIFICATION CHECKLIST (BEFORE MARKING ANY TEST PASSING)

-**You MUST complete ALL of these checks before marking any feature as "passes": true**
+**Complete ALL applicable checks before marking any feature as passing:**

-#### Security Verification (for protected features)
-
- [ ] Feature respects user role permissions
- [ ] Unauthenticated access is blocked (redirects to login)
- [ ] API endpoint checks authorization (returns 401/403 appropriately)
- [ ] Cannot access other users' data by manipulating URLs
-
-#### Real Data Verification (CRITICAL - NO MOCK DATA)
-
- [ ] Created unique test data via UI (e.g., "TEST_12345_VERIFY_ME")
- [ ] Verified the EXACT data I created appears in UI
- [ ] Refreshed page - data persists (proves database storage)
- [ ] Deleted the test data - verified it's gone everywhere
- [ ] NO unexplained data appeared (would indicate mock data)
- [ ] Dashboard/counts reflect real numbers after my changes
-
-#### Navigation Verification
-
- [ ] All buttons on this page link to existing routes
- [ ] No 404 errors when clicking any interactive element
- [ ] Back button returns to correct previous page
- [ ] Related links (edit, view, delete) have correct IDs in URLs
-
-#### Integration Verification
-
- [ ] Console shows ZERO JavaScript errors
- [ ] Network tab shows successful API calls (no 500s)
- [ ] Data returned from API matches what UI displays
- [ ] Loading states appeared during API calls
- [ ] Error states handle failures gracefully
+- **Security:** Feature respects role permissions; unauthenticated access blocked; API checks auth (401/403); no cross-user data leaks via URL manipulation
+- **Real Data:** Create unique test data via UI, verify it appears, refresh to confirm persistence, delete and verify removal. No unexplained data (indicates mocks). Dashboard counts reflect real numbers
+- **Mock Data Grep:** Run STEP 5.6 grep checks - no hits in src/ (excluding tests). No globalThis, devStore, or dev-store patterns
+- **Server Restart:** For data features, run STEP 5.7 - data persists across server restart
+- **Navigation:** All buttons link to existing routes, no 404s, back button works, edit/view/delete links have correct IDs
+- **Integration:** Zero JS console errors, no 500s in network tab, API data matches UI, loading/error states work

 ### STEP 5.6: MOCK DATA DETECTION (Before marking passing)

-1. **Search code:** `grep -r "mockData\|fakeData\|TODO\|STUB" --include="*.ts" --include="*.tsx"`
-2. **Runtime test:** Create unique data (e.g., "TEST_12345") → verify in UI → delete → verify gone
-3. **Check database:** All displayed data must come from real DB queries
-4. If unexplained data appears, it's mock data - fix before marking passing.
+Before marking a feature passing, grep for mock/placeholder data patterns in src/ (excluding test files): `globalThis`, `devStore`, `dev-store`, `mockDb`, `mockData`, `fakeData`, `sampleData`, `dummyData`, `testData`, `TODO.*real`, `TODO.*database`, `STUB`, `MOCK`, `isDevelopment`, `isDev`. Any hits in production code must be investigated and fixed. Also create unique test data (e.g., "TEST_12345"), verify it appears in UI, then delete and confirm removal - unexplained data indicates mock implementations.
+
+### STEP 5.7: SERVER RESTART PERSISTENCE TEST (MANDATORY for data features)
+
+For any feature involving CRUD or data persistence: create unique test data (e.g., "RESTART_TEST_12345"), verify it exists, then fully stop and restart the dev server. After restart, verify the test data still exists. If data is gone, the implementation uses in-memory storage -- run STEP 5.6 greps, find the mock pattern, and replace with real database queries. Clean up test data after verification. This test catches in-memory stores like `globalThis.devStore` that pass all other tests but lose data on restart.

 ### STEP 6: UPDATE FEATURE STATUS (CAREFULLY!)

@@ -198,21 +154,27 @@ Use the feature_mark_passing tool with feature_id=42
 - Combine or consolidate features
 - Reorder features

-**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**
+**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**

 ### STEP 7: COMMIT YOUR PROGRESS

-Make a descriptive git commit:
+Make a descriptive git commit.
+
+**Git Commit Rules:**
+- ALWAYS use simple `-m` flag for commit messages
+- NEVER use heredocs (`cat <<EOF` or `<<'EOF'`) - they fail in sandbox mode with "can't create temp file for here document: operation not permitted"
+- For multi-line messages, use multiple `-m` flags:

 ```bash
 git add .
-git commit -m "Implement [feature name] - verified end-to-end
+git commit -m "Implement [feature name] - verified end-to-end" -m "- Added [specific changes]" -m "- Tested with browser automation" -m "- Marked feature #X as passing"
+```

- Added [specific changes]
- Tested with browser automation
- Marked feature #X as passing
- Screenshots in verification/ directory
-"
+Or use a single descriptive message:
+
+```bash
+git add .
+git commit -m "feat: implement [feature name] with browser verification"
 ```

 ### STEP 8: UPDATE PROGRESS NOTES
@@ -239,9 +201,15 @@ Before context fills up:

 ## BROWSER AUTOMATION

-Use Playwright MCP tools (`browser_*`) for UI verification. Key tools: `navigate`, `click`, `type`, `fill_form`, `take_screenshot`, `console_messages`, `network_requests`. All tools have auto-wait built in.
+Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`,
+`snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`.

-Test like a human user with mouse and keyboard. Use `browser_console_messages` to detect errors. Don't bypass UI with JavaScript evaluation.
+**How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it,
+subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots
+save to `.playwright-cli/` -- read the files when you need to verify content.
+
+Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect
+JS errors. Don't bypass UI with JavaScript evaluation.

 ---

--- a/.claude/templates/initializer_prompt.template.md
+++ b/.claude/templates/initializer_prompt.template.md
@@ -36,9 +36,9 @@ Use the feature_create_bulk tool to add all features at once. You can create fea

 - Feature count must match the `feature_count` specified in app_spec.txt
 - Reference tiers for other projects:
-  - **Simple apps**: ~150 tests
-  - **Medium apps**: ~250 tests
-  - **Complex apps**: ~400+ tests
+  - **Simple apps**: ~165 tests (includes 5 infrastructure)
+  - **Medium apps**: ~265 tests (includes 5 infrastructure)
+  - **Advanced apps**: ~405+ tests (includes 5 infrastructure)
 - Both "functional" and "style" categories
 - Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
 - At least 25 tests MUST have 10+ steps each (more for complex apps)
@@ -60,8 +60,9 @@ Dependencies enable **parallel execution** of independent features. When specifi
 2. **Can only depend on EARLIER features** (index must be less than current position)
 3. **No circular dependencies** allowed
 4. **Maximum 20 dependencies** per feature
-5. **Foundation features (index 0-9)** should have NO dependencies
-6. **60% of features after index 10** should have at least one dependency
+5. **Infrastructure features (indices 0-4)** have NO dependencies - they run FIRST
+6. **ALL features after index 4** MUST depend on `[0, 1, 2, 3, 4]` (infrastructure)
+7. **60% of features after index 10** should have additional dependencies beyond infrastructure

 ### Dependency Types

@@ -82,30 +83,113 @@ Create WIDE dependency graphs, not linear chains:

 ```json
 [
-  // FOUNDATION TIER (indices 0-2, no dependencies) - run first
-  { "name": "App loads without errors", "category": "functional" },
-  { "name": "Navigation bar displays", "category": "style" },
-  { "name": "Homepage renders correctly", "category": "functional" },
+  // INFRASTRUCTURE TIER (indices 0-4, no dependencies) - MUST run first
+  { "name": "Database connection established", "category": "functional" },
+  { "name": "Database schema applied correctly", "category": "functional" },
+  { "name": "Data persists across server restart", "category": "functional" },
+  { "name": "No mock data patterns in codebase", "category": "functional" },
+  { "name": "Backend API queries real database", "category": "functional" },

-  // AUTH TIER (indices 3-5, depend on foundation) - run in parallel
-  { "name": "User can register", "depends_on_indices": [0] },
-  { "name": "User can login", "depends_on_indices": [0, 3] },
-  { "name": "User can logout", "depends_on_indices": [4] },
+  // FOUNDATION TIER (indices 5-7, depend on infrastructure)
+  { "name": "App loads without errors", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },
+  { "name": "Navigation bar displays", "category": "style", "depends_on_indices": [0, 1, 2, 3, 4] },
+  { "name": "Homepage renders correctly", "category": "functional", "depends_on_indices": [0, 1, 2, 3, 4] },

-  // CORE CRUD TIER (indices 6-9) - WIDE GRAPH: all 4 depend on login
-  // All 4 start as soon as login passes!
-  { "name": "User can create todo", "depends_on_indices": [4] },
-  { "name": "User can view todos", "depends_on_indices": [4] },
-  { "name": "User can edit todo", "depends_on_indices": [4, 6] },
-  { "name": "User can delete todo", "depends_on_indices": [4, 6] },
+  // AUTH TIER (indices 8-10, depend on foundation + infrastructure)
+  { "name": "User can register", "depends_on_indices": [0, 1, 2, 3, 4, 5] },
+  { "name": "User can login", "depends_on_indices": [0, 1, 2, 3, 4, 5, 8] },
+  { "name": "User can logout", "depends_on_indices": [0, 1, 2, 3, 4, 9] },

-  // ADVANCED TIER (indices 10-11) - both depend on view, not each other
-  { "name": "User can filter todos", "depends_on_indices": [7] },
-  { "name": "User can search todos", "depends_on_indices": [7] }
+  // CORE CRUD TIER (indices 11-14) - WIDE GRAPH: all 4 depend on login
+  { "name": "User can create todo", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
+  { "name": "User can view todos", "depends_on_indices": [0, 1, 2, 3, 4, 9] },
+  { "name": "User can edit todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
+  { "name": "User can delete todo", "depends_on_indices": [0, 1, 2, 3, 4, 9, 11] },
+
+  // ADVANCED TIER (indices 15-16) - both depend on view, not each other
+  { "name": "User can filter todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] },
+  { "name": "User can search todos", "depends_on_indices": [0, 1, 2, 3, 4, 12] }
 ]
 ```

-**Result:** With 3 parallel agents, this 12-feature project completes in ~5-6 cycles instead of 12 sequential cycles.
+**Result:** With 3 parallel agents, this project completes efficiently with proper database validation first.
+
+---
+
+## MANDATORY INFRASTRUCTURE FEATURES (Indices 0-4)
+
+**CRITICAL:** Create these FIRST, before any functional features. These features ensure the application uses a real database, not mock data or in-memory storage.
+
+| Index | Name | Test Steps |
+|-------|------|------------|
+| 0 | Database connection established | Start server → check logs for DB connection → health endpoint returns DB status |
+| 1 | Database schema applied correctly | Connect to DB directly → list tables → verify schema matches spec |
+| 2 | Data persists across server restart | Create via API → STOP server completely → START server → query API → data still exists |
+| 3 | No mock data patterns in codebase | Run grep for prohibited patterns → must return empty |
+| 4 | Backend API queries real database | Check server logs → SQL/DB queries appear for API calls |
+
+**ALL other features MUST depend on indices [0, 1, 2, 3, 4].**
+
+### Infrastructure Feature Descriptions
+
+**Feature 0 - Database connection established:**
+```text
+Steps:
+1. Start the development server
+2. Check server logs for database connection message
+3. Call health endpoint (e.g., GET /api/health)
+4. Verify response includes database status: connected
+```
+
+**Feature 1 - Database schema applied correctly:**
+```text
+Steps:
+1. Connect to database directly (sqlite3, psql, etc.)
+2. List all tables in the database
+3. Verify tables match what's defined in app_spec.txt
+4. Verify key columns exist on each table
+```
+
+**Feature 2 - Data persists across server restart (CRITICAL):**
+```text
+Steps:
+1. Create unique test data via API (e.g., POST /api/items with name "RESTART_TEST_12345")
+2. Verify data appears in API response (GET /api/items)
+3. STOP the server completely (kill by port to avoid killing unrelated Node processes):
+   - Unix/macOS: lsof -ti :$PORT | xargs kill -9 2>/dev/null || true && sleep 5
+   - Windows: FOR /F "tokens=5" %a IN ('netstat -aon ^| find ":$PORT"') DO taskkill /F /PID %a 2>nul
+   - Note: Replace $PORT with actual port (e.g., 3000)
+4. Verify server is stopped: lsof -ti :$PORT returns nothing (or netstat on Windows)
+5. RESTART the server: ./init.sh & sleep 15
+6. Query API again: GET /api/items
+7. Verify "RESTART_TEST_12345" still exists
+8. If data is GONE → CRITICAL FAILURE (in-memory storage detected)
+9. Clean up test data
+```
+
+**Feature 3 - No mock data patterns in codebase:**
+```text
+Steps:
+1. Run: grep -r "globalThis\." --include="*.ts" --include="*.tsx" --include="*.js" src/
+2. Run: grep -r "dev-store\|devStore\|DevStore\|mock-db\|mockDb" --include="*.ts" --include="*.tsx" --include="*.js" src/
+3. Run: grep -r "mockData\|testData\|fakeData\|sampleData\|dummyData" --include="*.ts" --include="*.tsx" --include="*.js" src/
+4. Run: grep -r "TODO.*real\|TODO.*database\|TODO.*API\|STUB\|MOCK" --include="*.ts" --include="*.tsx" --include="*.js" src/
+5. Run: grep -r "isDevelopment\|isDev\|process\.env\.NODE_ENV.*development" --include="*.ts" --include="*.tsx" --include="*.js" src/
+6. Run: grep -r "new Map\(\)\|new Set\(\)" --include="*.ts" --include="*.tsx" --include="*.js" src/ 2>/dev/null
+7. Run: grep -E "json-server|miragejs|msw" package.json
+8. ALL grep commands must return empty (exit code 1)
+9. If any returns results → investigate and fix before passing
+```
+
+**Feature 4 - Backend API queries real database:**
+```text
+Steps:
+1. Start server with verbose logging
+2. Make API call (e.g., GET /api/items)
+3. Check server logs
+4. Verify SQL query appears (SELECT, INSERT, etc.) or ORM query log
+5. If no DB queries in logs → implementation is using mock data
+```

 ---

@@ -115,8 +199,9 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou

 ### Category Distribution by Complexity Tier

-| Category                         | Simple  | Medium  | Complex  |
+| Category                         | Simple  | Medium  | Advanced |
 | -------------------------------- | ------- | ------- | -------- |
+| **0. Infrastructure (REQUIRED)** | 5       | 5       | 5        |
 | A. Security & Access Control     | 5       | 20      | 40       |
 | B. Navigation Integrity          | 15      | 25      | 40       |
 | C. Real Data Verification        | 20      | 30      | 50       |
@@ -137,12 +222,14 @@ The feature_list.json **MUST** include tests from ALL 20 categories. Minimum cou
 | R. Concurrency & Race Conditions | 5       | 8       | 15       |
 | S. Export/Import                 | 5       | 6       | 10       |
 | T. Performance                   | 5       | 5       | 10       |
-| **TOTAL**                        | **150** | **250** | **400+** |
+| **TOTAL**                        | **165** | **265** | **405+** |

 ---

 ### Category Descriptions

+**0. Infrastructure (REQUIRED - Priority 0)** - Database connectivity, schema existence, data persistence across server restart, absence of mock patterns. These features MUST pass before any functional features can begin. All tiers require exactly 5 infrastructure features (indices 0-4).
+
 **A. Security & Access Control** - Test unauthorized access blocking, permission enforcement, session management, role-based access, and data isolation between users.

 **B. Navigation Integrity** - Test all buttons, links, menus, breadcrumbs, deep links, back button behavior, 404 handling, and post-login/logout redirects.
@@ -205,6 +292,16 @@ The feature_list.json must include tests that **actively verify real data** and
 - `setTimeout` simulating API delays with static data
 - Static returns instead of database queries

+**Additional prohibited patterns (in-memory stores):**
+
+- `globalThis.` (in-memory storage pattern)
+- `dev-store`, `devStore`, `DevStore` (development stores)
+- `json-server`, `mirage`, `msw` (mock backends)
+- `Map()` or `Set()` used as primary data store
+- Environment checks like `if (process.env.NODE_ENV === 'development')` for data routing
+
+**Why this matters:** In-memory stores (like `globalThis.devStore`) will pass simple tests because data persists during a single server run. But data is LOST on server restart, which is unacceptable for production. The Infrastructure features (0-4) specifically test for this by requiring data to survive a full server restart.
+
 ---

 **CRITICAL INSTRUCTION:**
--- a/.claude/templates/testing_prompt.template.md
+++ b/.claude/templates/testing_prompt.template.md
@@ -1,58 +1,29 @@
 ## YOUR ROLE - TESTING AGENT

-You are a **testing agent** responsible for **regression testing** previously-passing features.
+You are a **testing agent** responsible for **regression testing** previously-passing features. If you find a regression, you must fix it.

-Your job is to ensure that features marked as "passing" still work correctly. If you find a regression (a feature that no longer works), you must fix it.
+## ASSIGNED FEATURES FOR REGRESSION TESTING

-### STEP 1: GET YOUR BEARINGS (MANDATORY)
+You are assigned to test the following features: {{TESTING_FEATURE_IDS}}

-Start by orienting yourself:
+### Workflow for EACH feature:
+1. Call `feature_get_by_id` with the feature ID
+2. Read the feature's verification steps
+3. Test the feature in the browser
+4. Call `feature_mark_passing` or `feature_mark_failing`
+5. Move to the next feature

-```bash
-# 1. See your working directory
-pwd
+---

-# 2. List files to understand project structure
-ls -la
+### STEP 1: GET YOUR ASSIGNED FEATURE(S)

-# 3. Read progress notes from previous sessions (last 200 lines)
-tail -200 claude-progress.txt
-
-# 4. Check recent git history
-git log --oneline -10
-```
-
-Then use MCP tools to check feature status:
+Your features have been pre-assigned by the orchestrator. For each feature ID listed above, use `feature_get_by_id` to get the details:

 ```
-# 5. Get progress statistics
-Use the feature_get_stats tool
+Use the feature_get_by_id tool with feature_id=<ID>
 ```

-### STEP 2: START SERVERS (IF NOT RUNNING)
-
-If `init.sh` exists, run it:
-
-```bash
-chmod +x init.sh
-./init.sh
-```
-
-Otherwise, start servers manually.
-
-### STEP 3: GET YOUR ASSIGNED FEATURE
-
-Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id` to get the details:
-
-```
-Use the feature_get_by_id tool with feature_id={your_assigned_id}
-```
-
-The orchestrator has already claimed this feature for testing (set `testing_in_progress=true`).
-
-**CRITICAL:** You MUST call `feature_release_testing` when done, regardless of pass/fail.
-
-### STEP 4: VERIFY THE FEATURE
+### STEP 2: VERIFY THE FEATURE

 **CRITICAL:** You MUST verify the feature through the actual UI using browser automation.

@@ -60,42 +31,38 @@ For the feature returned:
 1. Read and understand the feature's verification steps
 2. Navigate to the relevant part of the application
 3. Execute each verification step using browser automation
-4. Take screenshots to document the verification
+4. Take screenshots and read them to verify visual appearance
 5. Check for console errors

-Use browser automation tools:
+### Browser Automation (Playwright CLI)

 **Navigation & Screenshots:**
- browser_navigate - Navigate to a URL
- browser_take_screenshot - Capture screenshot (use for visual verification)
- browser_snapshot - Get accessibility tree snapshot
+- `playwright-cli open <url>` - Open browser and navigate
+- `playwright-cli goto <url>` - Navigate to URL
+- `playwright-cli screenshot` - Save screenshot to `.playwright-cli/`
+- `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/`

 **Element Interaction:**
- browser_click - Click elements
- browser_type - Type text into editable elements
- browser_fill_form - Fill multiple form fields
- browser_select_option - Select dropdown options
- browser_press_key - Press keyboard keys
+- `playwright-cli click <ref>` - Click elements (ref from snapshot)
+- `playwright-cli type <text>` - Type text
+- `playwright-cli fill <ref> <text>` - Fill form fields
+- `playwright-cli select <ref> <val>` - Select dropdown
+- `playwright-cli press <key>` - Keyboard input

 **Debugging:**
- browser_console_messages - Get browser console output (check for errors)
- browser_network_requests - Monitor API calls
+- `playwright-cli console` - Check for JS errors
+- `playwright-cli network` - Monitor API calls

-### STEP 5: HANDLE RESULTS
+**Cleanup:**
+- `playwright-cli close` - Close browser when done (ALWAYS do this)
+
+**Note:** Screenshots and snapshots save to files. Read the file to see the content.
+
+### STEP 3: HANDLE RESULTS

 #### If the feature PASSES:

-The feature still works correctly. Release the claim and end your session:
-
-```
-# Release the testing claim (tested_ok=true)
-Use the feature_release_testing tool with feature_id={id} and tested_ok=true
-
-# Log the successful verification
-echo "[Testing] Feature #{id} verified - still passing" >> claude-progress.txt
-```
-
-**DO NOT** call feature_mark_passing again - it's already passing.
+The feature still works correctly. **DO NOT** call feature_mark_passing again -- it's already passing. End your session.

 #### If the feature FAILS (regression found):

@@ -118,20 +85,14 @@ A regression has been introduced. You MUST fix it:

 4. **Verify the fix:**
   - Run through all verification steps again
-   - Take screenshots confirming the fix
+   - Take screenshots and read them to confirm the fix

 5. **Mark as passing after fix:**
   ```
   Use the feature_mark_passing tool with feature_id={id}
   ```

-6. **Release the testing claim:**
-   ```
-   Use the feature_release_testing tool with feature_id={id} and tested_ok=false
-   ```
-   Note: tested_ok=false because we found a regression (even though we fixed it).
-
-7. **Commit the fix:**
+6. **Commit the fix:**
   ```bash
   git add .
   git commit -m "Fix regression in [feature name]
@@ -141,46 +102,33 @@ A regression has been introduced. You MUST fix it:
   - Verified with browser automation"
   ```

-### STEP 6: UPDATE PROGRESS AND END
-
-Update `claude-progress.txt`:
-
-```bash
-echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progress.txt
-```
-
 ---

-## AVAILABLE MCP TOOLS
+## AVAILABLE TOOLS

 ### Feature Management
 - `feature_get_stats` - Get progress overview (passing/in_progress/total counts)
 - `feature_get_by_id` - Get your assigned feature details
- `feature_release_testing` - **REQUIRED** - Release claim after testing (pass tested_ok=true/false)
 - `feature_mark_failing` - Mark a feature as failing (when you find a regression)
 - `feature_mark_passing` - Mark a feature as passing (after fixing a regression)

-### Browser Automation (Playwright)
-All interaction tools have **built-in auto-wait** - no manual timeouts needed.
-
- `browser_navigate` - Navigate to URL
- `browser_take_screenshot` - Capture screenshot
- `browser_snapshot` - Get accessibility tree
- `browser_click` - Click elements
- `browser_type` - Type text
- `browser_fill_form` - Fill form fields
- `browser_select_option` - Select dropdown
- `browser_press_key` - Keyboard input
- `browser_console_messages` - Check for JS errors
- `browser_network_requests` - Monitor API calls
+### Browser Automation (Playwright CLI)
+Use `playwright-cli` commands for browser interaction. Key commands:
+- `playwright-cli open <url>` - Open browser
+- `playwright-cli goto <url>` - Navigate to URL
+- `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`)
+- `playwright-cli snapshot` - Get page snapshot with element refs
+- `playwright-cli click <ref>` - Click element
+- `playwright-cli type <text>` - Type text
+- `playwright-cli fill <ref> <text>` - Fill form field
+- `playwright-cli console` - Check for JS errors
+- `playwright-cli close` - Close browser (always do this when done)

 ---

 ## IMPORTANT REMINDERS

-**Your Goal:** Verify that passing features still work, and fix any regressions found.
-
-**This Session's Goal:** Test ONE feature thoroughly.
+**Your Goal:** Test each assigned feature thoroughly. Verify it still works, and fix any regression found. Process ALL features in your list before ending your session.

 **Quality Bar:**
 - Zero console errors
@@ -188,21 +136,15 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
 - Visual appearance correct
 - API calls succeed

-**CRITICAL - Always release your claim:**
- Call `feature_release_testing` when done, whether pass or fail
- Pass `tested_ok=true` if the feature passed
- Pass `tested_ok=false` if you found a regression
-
 **If you find a regression:**
 1. Mark the feature as failing immediately
 2. Fix the issue
 3. Verify the fix with browser automation
 4. Mark as passing only after thorough verification
-5. Release the testing claim with `tested_ok=false`
-6. Commit the fix
+5. Commit the fix

-**You have one iteration.** Focus on testing ONE feature thoroughly.
+**You have one iteration.** Test all assigned features before ending.

 ---

-Begin by running Step 1 (Get Your Bearings).
+Begin by running Step 1 for the first feature in your assigned list.
--- a/.env.example
+++ b/.env.example
@@ -1,21 +1,63 @@
 # Optional: N8N webhook for progress notifications
 # PROGRESS_N8N_WEBHOOK_URL=https://your-n8n-instance.com/webhook/...

-# Playwright Browser Mode
-# Controls whether Playwright runs Chrome in headless mode (no visible browser window).
-# - true: Browser runs in background, invisible (recommended for using PC while agent works)
-# - false: Browser opens a visible window (useful for debugging)
-# Defaults to 'false' if not specified
-# PLAYWRIGHT_HEADLESS=false
-
-# GLM/Alternative API Configuration (Optional)
-# To use Zhipu AI's GLM models instead of Claude, uncomment and set these variables.
-# This only affects AutoCoder - your global Claude Code settings remain unchanged.
-# Get an API key at: https://z.ai/subscribe
+# Playwright Browser Configuration
 #
+# PLAYWRIGHT_BROWSER: Which browser to use for testing
+# - firefox: Lower CPU usage, recommended (default)
+# - chrome: Google Chrome
+# - webkit: Safari engine
+# - msedge: Microsoft Edge
+# PLAYWRIGHT_BROWSER=firefox
+
+# Extra Read Paths (Optional)
+# Comma-separated list of absolute paths for read-only access to external directories.
+# The agent can read files from these paths but cannot write to them.
+# Useful for referencing documentation, shared libraries, or other projects.
+# Example: EXTRA_READ_PATHS=/Volumes/Data/dev,/Users/shared/libs
+# EXTRA_READ_PATHS=
+
+# Google Cloud Vertex AI Configuration (Optional)
+# To use Claude via Vertex AI on Google Cloud Platform, uncomment and set these variables.
+# Requires: gcloud CLI installed and authenticated (run: gcloud auth application-default login)
+# Note: Use @ instead of - in model names for date-suffixed models (e.g., claude-sonnet-4-5@20250929)
+#
+# CLAUDE_CODE_USE_VERTEX=1
+# CLOUD_ML_REGION=us-east5
+# ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
+# ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
+# ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5@20250929
+# ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-haiku@20241022
+
+# ===================
+# Alternative API Providers (Azure, GLM, Ollama, Kimi, Custom)
+# ===================
+# Configure via Settings UI (recommended) or set env vars below.
+# When both are set, env vars take precedence.
+#
+# Azure Anthropic (Claude):
+# ANTHROPIC_BASE_URL=https://your-resource.services.ai.azure.com/anthropic
+# ANTHROPIC_API_KEY=your-azure-api-key
+# ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
+# ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5
+# ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5
+#
+# GLM (Zhipu AI):
 # ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
-# ANTHROPIC_AUTH_TOKEN=your-zhipu-api-key
-# API_TIMEOUT_MS=3000000
-# ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
+# ANTHROPIC_AUTH_TOKEN=your-glm-api-key
 # ANTHROPIC_DEFAULT_OPUS_MODEL=glm-4.7
-# ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.5-air
+# ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
+# ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.7
+#
+# Ollama (Local):
+# ANTHROPIC_BASE_URL=http://localhost:11434
+# ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3-coder
+# ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3-coder
+# ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3-coder
+#
+# Kimi (Moonshot):
+# ANTHROPIC_BASE_URL=https://api.kimi.com/coding/
+# ANTHROPIC_API_KEY=your-kimi-api-key
+# ANTHROPIC_DEFAULT_OPUS_MODEL=kimi-k2.5
+# ANTHROPIC_DEFAULT_SONNET_MODEL=kimi-k2.5
+# ANTHROPIC_DEFAULT_HAIKU_MODEL=kimi-k2.5
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,7 @@
 generations/
 automaker/
 temp/
+temp-docs/

 nul
 issues/
@@ -9,6 +10,10 @@ issues/
 # Browser profiles for parallel agent execution
 .browser-profiles/

+# Playwright CLI daemon artifacts
+.playwright-cli/
+.playwright/
+
 # Log files
 logs/
 *.log
@@ -76,6 +81,13 @@ ui/playwright-report/
 .dmypy.json
 dmypy.json

+.ruff_cache/
+
+# ===================
+# Claude Code
+# ===================
+.claude/settings.local.json
+
 # ===================
 # IDE / Editors
 # ===================
@@ -107,6 +119,7 @@ Desktop.ini
 ui/dist/
 ui/.vite/
 .vite/
+*.tgz

 # ===================
 # Environment files
--- a/.npmignore
+++ b/.npmignore
@@ -0,0 +1,31 @@
+venv/
+**/__pycache__/
+**/*.pyc
+.git/
+.github/
+node_modules/
+test_*.py
+tests/
+generations/
+*.db
+.env
+requirements.txt
+CLAUDE.md
+LICENSE.md
+README.md
+ui/src/
+ui/node_modules/
+ui/tsconfig*.json
+ui/vite.config.ts
+ui/eslint.config.js
+ui/index.html
+ui/public/
+ui/playwright.config.ts
+ui/tests/
+start.bat
+start_ui.bat
+start.sh
+start_ui.sh
+start_ui.py
+.claude/agents/
+.claude/settings.json
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -2,6 +2,12 @@

 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

+## Prerequisites
+
+- Python 3.11+
+- Node.js 20+ (for UI development)
+- Claude Code CLI
+
 ## Project Overview

 This is an autonomous coding agent system with a React-based UI. It uses the Claude Agent SDK to build complete applications over multiple sessions using a two-agent pattern:
@@ -11,18 +17,28 @@ This is an autonomous coding agent system with a React-based UI. It uses the Cla

 ## Commands

-### Quick Start (Recommended)
+### npm Global Install (Recommended)

 ```bash
-# Windows - launches CLI menu
-start.bat
+npm install -g autoforge-ai
+autoforge                    # Start server (first run sets up Python venv)
+autoforge config             # Edit ~/.autoforge/.env in $EDITOR
+autoforge config --show      # Print active configuration
+autoforge --port 9999        # Custom port
+autoforge --no-browser       # Don't auto-open browser
+autoforge --repair           # Delete and recreate ~/.autoforge/venv/
+```

-# macOS/Linux
-./start.sh
+### From Source (Development)

+```bash
 # Launch Web UI (serves pre-built React app)
 start_ui.bat      # Windows
 ./start_ui.sh     # macOS/Linux
+
+# CLI menu
+start.bat         # Windows
+./start.sh        # macOS/Linux
 ```

 ### Python Backend (Manual)
@@ -48,6 +64,12 @@ python autonomous_agent_demo.py --project-dir my-app --yolo

 # Parallel mode: run multiple agents concurrently (1-5 agents)
 python autonomous_agent_demo.py --project-dir my-app --parallel --max-concurrency 3
+
+# Batch mode: implement multiple features per agent session (1-3)
+python autonomous_agent_demo.py --project-dir my-app --batch-size 3
+
+# Batch specific features by ID
+python autonomous_agent_demo.py --project-dir my-app --batch-features 1,2,3
 ```

 ### YOLO Mode (Rapid Prototyping)
@@ -62,8 +84,8 @@ python autonomous_agent_demo.py --project-dir my-app --yolo
 ```

 **What's different in YOLO mode:**
- No regression testing (skips `feature_get_for_regression`)
- No Playwright MCP server (browser automation disabled)
+- No regression testing
+- No Playwright CLI (browser automation disabled)
 - Features marked passing after lint/type-check succeeds
 - Faster iteration for prototyping

@@ -86,25 +108,78 @@ npm run lint     # Run ESLint

 **Note:** The `start_ui.bat` script serves the pre-built UI from `ui/dist/`. After making UI changes, run `npm run build` in the `ui/` directory.

+## Testing
+
+### Python
+
+```bash
+ruff check .                          # Lint
+mypy .                                # Type check
+python test_security.py               # Security unit tests (12 tests)
+python test_security_integration.py   # Integration tests (9 tests)
+python -m pytest test_client.py       # Client tests (20 tests)
+python -m pytest test_dependency_resolver.py  # Dependency resolver tests (12 tests)
+python -m pytest test_rate_limit_utils.py     # Rate limit tests (22 tests)
+```
+
+### React UI
+
+```bash
+cd ui
+npm run lint          # ESLint
+npm run build         # Type check + build (Vite 7)
+npm run test:e2e      # Playwright end-to-end tests
+npm run test:e2e:ui   # Playwright tests with UI
+```
+
+### CI/CD
+
+GitHub Actions (`.github/workflows/ci.yml`) runs on push/PR to master:
+- **Python job**: ruff lint + security tests
+- **UI job**: ESLint + TypeScript build
+
+### Code Quality
+
+Configuration in `pyproject.toml`:
+- ruff: Line length 120, Python 3.11 target
+- mypy: Strict return type checking, ignores missing imports
+
 ## Architecture

+### npm CLI (bin/, lib/)
+
+The `autoforge` command is a Node.js wrapper that manages the Python environment and server lifecycle:
+- `bin/autoforge.js` - Entry point (shebang script)
+- `lib/cli.js` - Main CLI logic: Python 3.11+ detection (cross-platform), venv management at `~/.autoforge/venv/` with composite marker (requirements hash + Python version), `.env` config loading from `~/.autoforge/.env`, uvicorn server startup with PID file, and signal handling
+- `package.json` - npm package config (`autoforge-ai` on npm), `files` whitelist with `__pycache__` exclusions, `prepublishOnly` builds the UI
+- `requirements-prod.txt` - Runtime-only Python deps (excludes ruff, mypy, pytest)
+- `.npmignore` - Excludes dev files, tests, UI source from the published tarball
+
+Publishing: `npm publish` (triggers `prepublishOnly` which builds UI, then publishes ~600KB tarball with 84 files)
+
 ### Core Python Modules

 - `start.py` - CLI launcher with project creation/selection menu
- `autonomous_agent_demo.py` - Entry point for running the agent
+- `autonomous_agent_demo.py` - Entry point for running the agent (supports `--yolo`, `--parallel`, `--batch-size`, `--batch-features`)
+- `autoforge_paths.py` - Central path resolution with dual-path backward compatibility and migration
 - `agent.py` - Agent session loop using Claude Agent SDK
- `client.py` - ClaudeSDKClient configuration with security hooks and MCP servers
+- `client.py` - ClaudeSDKClient configuration with security hooks, feature MCP server, and Vertex AI support
 - `security.py` - Bash command allowlist validation (ALLOWED_COMMANDS whitelist)
- `prompts.py` - Prompt template loading with project-specific fallback
+- `prompts.py` - Prompt template loading with project-specific fallback and batch feature prompts
 - `progress.py` - Progress tracking, database queries, webhook notifications
- `registry.py` - Project registry for mapping names to paths (cross-platform)
+- `registry.py` - Project registry for mapping names to paths (cross-platform), global settings model
 - `parallel_orchestrator.py` - Concurrent agent execution with dependency-aware scheduling
+- `auth.py` - Authentication error detection for Claude CLI
+- `env_constants.py` - Shared environment variable constants (API_ENV_VARS) used by client.py and chat sessions
+- `rate_limit_utils.py` - Rate limit detection, retry parsing, exponential backoff with jitter
+- `api/database.py` - SQLAlchemy models (Feature, Schedule, ScheduleOverride)
 - `api/dependency_resolver.py` - Cycle detection (Kahn's algorithm + DFS) and dependency validation
+- `api/migration.py` - JSON-to-SQLite migration utility

 ### Project Registry

 Projects can be stored in any directory. The registry maps project names to paths using SQLite:
- **All platforms**: `~/.autocoder/registry.db`
+- **All platforms**: `~/.autoforge/registry.db`

 The registry uses:
 - SQLite database with SQLAlchemy ORM
@@ -113,13 +188,36 @@ The registry uses:

 ### Server API (server/)

-The FastAPI server provides REST endpoints for the UI:
+The FastAPI server provides REST and WebSocket endpoints for the UI:

- `server/routers/projects.py` - Project CRUD with registry integration
- `server/routers/features.py` - Feature management
- `server/routers/agent.py` - Agent control (start/stop/pause/resume)
- `server/routers/filesystem.py` - Filesystem browser API with security controls
- `server/routers/spec_creation.py` - WebSocket for interactive spec creation
+**Routers** (`server/routers/`):
+- `projects.py` - Project CRUD with registry integration
+- `features.py` - Feature management
+- `agent.py` - Agent control (start/stop/pause/resume)
+- `filesystem.py` - Filesystem browser API with security controls
+- `spec_creation.py` - WebSocket for interactive spec creation
+- `expand_project.py` - Interactive project expansion via natural language
+- `assistant_chat.py` - Read-only project assistant chat (WebSocket/REST)
+- `terminal.py` - Interactive terminal I/O with PTY support (WebSocket bidirectional)
+- `devserver.py` - Dev server control (start/stop) and config
+- `schedules.py` - CRUD for time-based agent scheduling
+- `settings.py` - Global settings management (model selection, YOLO, batch size, headless browser)
+
+**Services** (`server/services/`):
+- `process_manager.py` - Agent process lifecycle management
+- `project_config.py` - Project type detection and dev command management
+- `terminal_manager.py` - Terminal session management with PTY (`pywinpty` on Windows)
+- `scheduler_service.py` - APScheduler-based automated agent scheduling
+- `dev_server_manager.py` - Dev server lifecycle management
+- `assistant_chat_session.py` / `assistant_database.py` - Assistant chat sessions with SQLite persistence
+- `spec_chat_session.py` - Spec creation chat sessions
+- `expand_chat_session.py` - Expand project chat sessions
+- `chat_constants.py` - Shared constants for chat services
+
+**Utilities** (`server/utils/`):
+- `process_utils.py` - Process management utilities
+- `project_helpers.py` - Project path resolution helpers
+- `validation.py` - Project name validation

 ### Feature Management

@@ -130,18 +228,26 @@ Features are stored in SQLite (`features.db`) via SQLAlchemy. The agent interact

 MCP tools available to the agent:
 - `feature_get_stats` - Progress statistics
- `feature_get_next` - Get highest-priority pending feature (respects dependencies)
- `feature_claim_next` - Atomically claim next available feature (for parallel mode)
- `feature_get_for_regression` - Random passing features for regression testing
+- `feature_get_by_id` - Get a single feature by ID
+- `feature_get_summary` - Get summary of all features
+- `feature_get_ready` - Get features ready to work on (dependencies met)
+- `feature_get_blocked` - Get features blocked by unmet dependencies
+- `feature_get_graph` - Get full dependency graph
+- `feature_claim_and_get` - Atomically claim next available feature (for parallel mode)
+- `feature_mark_in_progress` - Mark feature as in progress
 - `feature_mark_passing` - Mark feature complete
+- `feature_mark_failing` - Mark feature as failing
 - `feature_skip` - Move feature to end of queue
+- `feature_clear_in_progress` - Clear in-progress status
 - `feature_create_bulk` - Initialize all features (used by initializer)
+- `feature_create` - Create a single feature
 - `feature_add_dependency` - Add dependency between features (with cycle detection)
 - `feature_remove_dependency` - Remove a dependency
+- `feature_set_dependencies` - Set all dependencies for a feature at once

 ### React UI (ui/)

- Tech stack: React 18, TypeScript, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout)
+- Tech stack: React 19, TypeScript, Vite 7, TanStack Query, Tailwind CSS v4, Radix UI, dagre (graph layout), xterm.js (terminal)
 - `src/App.tsx` - Main app with project selection, kanban board, agent controls
 - `src/hooks/useWebSocket.ts` - Real-time updates via WebSocket (progress, agent status, logs, agent updates)
 - `src/hooks/useProjects.ts` - React Query hooks for API calls
@@ -153,6 +259,17 @@ Key components:
 - `DependencyGraph.tsx` - Interactive node graph visualization with dagre layout
 - `CelebrationOverlay.tsx` - Confetti animation on feature completion
 - `FolderBrowser.tsx` - Server-side filesystem browser for project folder selection
+- `Terminal.tsx` / `TerminalTabs.tsx` - xterm.js-based multi-tab terminal
+- `AssistantPanel.tsx` / `AssistantChat.tsx` - AI assistant for project Q&A
+- `ExpandProjectModal.tsx` / `ExpandProjectChat.tsx` - Add features via natural language
+- `DevServerControl.tsx` - Dev server start/stop control
+- `ScheduleModal.tsx` - Schedule management UI
+- `SettingsModal.tsx` - Global settings panel
+
+In-app documentation (`/#/docs` route):
+- `src/components/docs/sections/` - Content for each doc section (GettingStarted.tsx, AgentSystem.tsx, etc.)
+- `src/components/docs/docsData.ts` - Sidebar structure, subsection IDs, search keywords
+- `src/components/docs/DocsPage.tsx` - Page layout; `DocsContent.tsx` - section renderer with scroll tracking

 Keyboard shortcuts (press `?` for help):
 - `D` - Toggle debug panel
@@ -163,13 +280,21 @@ Keyboard shortcuts (press `?` for help):

 ### Project Structure for Generated Apps

-Projects can be stored in any directory (registered in `~/.autocoder/registry.db`). Each project contains:
- `prompts/app_spec.txt` - Application specification (XML format)
- `prompts/initializer_prompt.md` - First session prompt
- `prompts/coding_prompt.md` - Continuation session prompt
- `features.db` - SQLite database with feature test cases
- `.agent.lock` - Lock file to prevent multiple agent instances
- `.autocoder/allowed_commands.yaml` - Project-specific bash command allowlist (optional)
+Projects can be stored in any directory (registered in `~/.autoforge/registry.db`). Each project contains:
+- `.autoforge/prompts/app_spec.txt` - Application specification (XML format)
+- `.autoforge/prompts/initializer_prompt.md` - First session prompt
+- `.autoforge/prompts/coding_prompt.md` - Continuation session prompt
+- `.autoforge/features.db` - SQLite database with feature test cases
+- `.autoforge/.agent.lock` - Lock file to prevent multiple agent instances
+- `.autoforge/allowed_commands.yaml` - Project-specific bash command allowlist (optional)
+- `.autoforge/.gitignore` - Ignores runtime files
+- `.claude/skills/playwright-cli/` - Playwright CLI skill for browser automation
+- `.playwright/cli.config.json` - Browser configuration (headless, viewport, etc.)
+- `.playwright-cli/` - Playwright CLI daemon artifacts (screenshots, snapshots) - gitignored
+- `CLAUDE.md` - Stays at project root (SDK convention)
+- `app_spec.txt` - Root copy for agent template compatibility
+
+Legacy projects with files at root level (e.g., `features.db`, `prompts/`) are auto-migrated to `.autoforge/` on next agent start. Dual-path resolution ensures old and new layouts work transparently.

 ### Security Model

@@ -178,20 +303,51 @@ Defense-in-depth approach configured in `client.py`:
 2. Filesystem restricted to project directory only
 3. Bash commands validated using hierarchical allowlist system

+#### Extra Read Paths (Cross-Project File Access)
+
+The agent can optionally read files from directories outside the project folder via the `EXTRA_READ_PATHS` environment variable. This enables referencing documentation, shared libraries, or other projects.
+
+**Configuration:**
+
+```bash
+# Single path
+EXTRA_READ_PATHS=/Users/me/docs
+
+# Multiple paths (comma-separated)
+EXTRA_READ_PATHS=/Users/me/docs,/opt/shared-libs,/Volumes/Data/reference
+```
+
+**Security Controls:**
+
+All paths are validated before being granted read access:
+- Must be absolute paths (not relative)
+- Must exist and be directories
+- Paths are canonicalized via `Path.resolve()` to prevent `..` traversal attacks
+- Sensitive directories are blocked (see blocklist below)
+- Only Read, Glob, and Grep operations are allowed (no Write/Edit)
+
+**Blocked Sensitive Directories:**
+
+The following directories (relative to home) are always blocked:
+- `.ssh`, `.aws`, `.azure`, `.kube` - Cloud/SSH credentials
+- `.gnupg`, `.gpg`, `.password-store` - Encryption keys
+- `.docker`, `.config/gcloud` - Container/cloud configs
+- `.npmrc`, `.pypirc`, `.netrc` - Package manager credentials
+
 #### Per-Project Allowed Commands

 The agent's bash command access is controlled through a hierarchical configuration system:

 **Command Hierarchy (highest to lowest priority):**
 1. **Hardcoded Blocklist** (`security.py`) - NEVER allowed (dd, sudo, shutdown, etc.)
-2. **Org Blocklist** (`~/.autocoder/config.yaml`) - Cannot be overridden by projects
-3. **Org Allowlist** (`~/.autocoder/config.yaml`) - Available to all projects
+2. **Org Blocklist** (`~/.autoforge/config.yaml`) - Cannot be overridden by projects
+3. **Org Allowlist** (`~/.autoforge/config.yaml`) - Available to all projects
 4. **Global Allowlist** (`security.py`) - Default commands (npm, git, curl, etc.)
-5. **Project Allowlist** (`.autocoder/allowed_commands.yaml`) - Project-specific commands
+5. **Project Allowlist** (`.autoforge/allowed_commands.yaml`) - Project-specific commands

 **Project Configuration:**

-Each project can define custom allowed commands in `.autocoder/allowed_commands.yaml`:
+Each project can define custom allowed commands in `.autoforge/allowed_commands.yaml`:

 ```yaml
 version: 1
@@ -211,7 +367,7 @@ commands:

 **Organization Configuration:**

-System administrators can set org-wide policies in `~/.autocoder/config.yaml`:
+System administrators can set org-wide policies in `~/.autoforge/config.yaml`:

 ```yaml
 version: 1
@@ -237,29 +393,64 @@ blocked_commands:
 - Blocklisted commands (sudo, dd, shutdown, etc.) can NEVER be allowed
 - Org-level blocked commands cannot be overridden by project configs

-**Testing:**
-```bash
-# Unit tests (136 tests - fast)
-python test_security.py
-
-# Integration tests (9 tests - uses real hooks)
-python test_security_integration.py
-```
-
 **Files:**
 - `security.py` - Command validation logic and hardcoded blocklist
- `test_security.py` - Unit tests for security system (136 tests)
- `test_security_integration.py` - Integration tests with real hooks (9 tests)
- `TEST_SECURITY.md` - Quick testing reference guide
+- `test_security.py` - Unit tests for security system
+- `test_security_integration.py` - Integration tests with real hooks
 - `examples/project_allowed_commands.yaml` - Project config example (all commented by default)
 - `examples/org_config.yaml` - Org config example (all commented by default)
 - `examples/README.md` - Comprehensive guide with use cases, testing, and troubleshooting
- `PHASE3_SPEC.md` - Specification for mid-session approval feature (future enhancement)
+
+### Vertex AI Configuration (Optional)
+
+Run coding agents via Google Cloud Vertex AI:
+
+1. Install and authenticate gcloud CLI: `gcloud auth application-default login`
+2. Configure `.env`:
+   ```
+   CLAUDE_CODE_USE_VERTEX=1
+   CLOUD_ML_REGION=us-east5
+   ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
+   ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
+   ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5@20250929
+   ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-haiku@20241022
+   ```
+
+**Note:** Use `@` instead of `-` in model names for Vertex AI.
+
+### Alternative API Providers (GLM, Ollama, Kimi, Custom)
+
+Alternative providers are configured via the **Settings UI** (gear icon > API Provider section). Select a provider, set the base URL, auth token, and model — no `.env` changes needed.
+
+**Available providers:** Claude (default), GLM (Zhipu AI), Ollama (local models), Kimi (Moonshot), Custom
+
+**Ollama notes:**
+- Requires Ollama v0.14.0+ with Anthropic API compatibility
+- Install: https://ollama.com → `ollama serve` → `ollama pull qwen3-coder`
+- Recommended models: `qwen3-coder`, `deepseek-coder-v2`, `codellama`
+- Performance depends on local hardware (GPU recommended)

 ## Claude Code Integration

- `.claude/commands/create-spec.md` - `/create-spec` slash command for interactive spec creation
- `.claude/skills/frontend-design/SKILL.md` - Skill for distinctive UI design
+**Slash commands** (`.claude/commands/`):
+- `/create-spec` - Interactive spec creation for new projects
+- `/expand-project` - Expand existing project with new features
+- `/gsd-to-autoforge-spec` - Convert GSD codebase mapping to app_spec.txt
+- `/check-code` - Run lint and type-check for code quality
+- `/checkpoint` - Create comprehensive checkpoint commit
+- `/review-pr` - Review pull requests
+
+**Custom agents** (`.claude/agents/`):
+- `coder.md` - Elite software architect agent for code implementation (Opus)
+- `code-review.md` - Code review agent for quality/security/performance analysis (Opus)
+- `deep-dive.md` - Technical investigator for deep analysis and debugging (Opus)
+
+**Skills** (`.claude/skills/`):
+- `frontend-design` - Distinctive, production-grade UI design
+- `gsd-to-autoforge-spec` - Convert GSD codebase mapping to AutoForge app_spec format
+- `playwright-cli` - Browser automation via Playwright CLI (copied to each project)
+
+**Other:**
 - `.claude/templates/` - Prompt templates copied to new projects
 - `examples/` - Configuration examples and documentation for security settings

@@ -267,12 +458,12 @@ python test_security_integration.py

 ### Prompt Loading Fallback Chain

-1. Project-specific: `{project_dir}/prompts/{name}.md`
+1. Project-specific: `{project_dir}/.autoforge/prompts/{name}.md` (or legacy `{project_dir}/prompts/{name}.md`)
 2. Base template: `.claude/templates/{name}.template.md`

 ### Agent Session Flow

-1. Check if `features.db` has features (determines initializer vs coding agent)
+1. Check if `.autoforge/features.db` has features (determines initializer vs coding agent)
 2. Create ClaudeSDKClient with security settings
 3. Send prompt and stream response
 4. Auto-continue with 3-second delay between sessions
@@ -290,9 +481,9 @@ The UI receives updates via WebSocket (`/ws/projects/{project_name}`):

 When running with `--parallel`, the orchestrator:
 1. Spawns multiple Claude agents as subprocesses (up to `--max-concurrency`)
-2. Each agent claims features atomically via `feature_claim_next`
+2. Each agent claims features atomically via `feature_claim_and_get`
 3. Features blocked by unmet dependencies are skipped
-4. Browser contexts are isolated per agent using `--isolated` flag
+4. Browser sessions are isolated per agent via `PLAYWRIGHT_CLI_SESSION` environment variable
 5. AgentTracker parses output and emits `agent_update` messages for UI

 ### Process Limits (Parallel Mode)
@@ -301,55 +492,17 @@ The orchestrator enforces strict bounds on concurrent processes:
 - `MAX_PARALLEL_AGENTS = 5` - Maximum concurrent coding agents
 - `MAX_TOTAL_AGENTS = 10` - Hard limit on total agents (coding + testing)
 - Testing agents are capped at `max_concurrency` (same as coding agents)
+- Total process count never exceeds 11 Python processes (1 orchestrator + 5 coding + 5 testing)

-**Expected process count during normal operation:**
- 1 orchestrator process
- Up to 5 coding agents
- Up to 5 testing agents
- Total: never exceeds 11 Python processes
+### Multi-Feature Batching

-**Stress Test Verification:**
-
-```bash
-# Windows - verify process bounds
-# 1. Note baseline count
-tasklist | findstr python | find /c /v ""
-
-# 2. Start parallel agent (max concurrency)
-python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5
-
-# 3. During run - should NEVER exceed baseline + 11
-tasklist | findstr python | find /c /v ""
-
-# 4. After stop via UI - should return to baseline
-tasklist | findstr python | find /c /v ""
-```
-
-```bash
-# macOS/Linux - verify process bounds
-# 1. Note baseline count
-pgrep -c python
-
-# 2. Start parallel agent
-python autonomous_agent_demo.py --project-dir test --parallel --max-concurrency 5
-
-# 3. During run - should NEVER exceed baseline + 11
-pgrep -c python
-
-# 4. After stop - should return to baseline
-pgrep -c python
-```
-
-**Log Verification:**
-
-```bash
-# Check spawn vs completion balance
-grep "Started testing agent" orchestrator_debug.log | wc -l
-grep "Testing agent.*completed\|failed" orchestrator_debug.log | wc -l
-
-# Watch for cap enforcement messages
-grep "at max testing agents\|At max total agents" orchestrator_debug.log
-```
+Agents can implement multiple features per session using `--batch-size` (1-3, default: 3):
+- `--batch-size N` - Max features per coding agent batch
+- `--testing-batch-size N` - Features per testing batch (1-5, default: 3)
+- `--batch-features 1,2,3` - Specific feature IDs for batch implementation
+- `--testing-batch-features 1,2,3` - Specific feature IDs for batch regression testing
+- `prompts.py` provides `get_batch_feature_prompt()` for multi-feature prompt generation
+- Configurable in UI via settings panel

 ### Design System

--- a/PHASE3_SPEC.md
+++ b/PHASE3_SPEC.md
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# AutoCoder
+# AutoForge

 [![Buy Me A Coffee](https://img.shields.io/badge/Buy%20Me%20A%20Coffee-FFDD00?style=flat&logo=buy-me-a-coffee&logoColor=black)](https://www.buymeacoffee.com/leonvanzyl)

@@ -6,17 +6,19 @@ A long-running autonomous coding agent powered by the Claude Agent SDK. This too

 ## Video Tutorial

-[![Watch the tutorial](https://img.youtube.com/vi/lGWFlpffWk4/hqdefault.jpg)](https://youtu.be/lGWFlpffWk4)
+[![Watch the tutorial](https://img.youtube.com/vi/nKiPOxDpcJY/hqdefault.jpg)](https://youtu.be/nKiPOxDpcJY)

-> **[Watch the setup and usage guide →](https://youtu.be/lGWFlpffWk4)**
+> **[Watch the setup and usage guide →](https://youtu.be/nKiPOxDpcJY)**

 ---

 ## Prerequisites

-### Claude Code CLI (Required)
+- **Node.js 20+** - Required for the CLI
+- **Python 3.11+** - Auto-detected on first run ([download](https://www.python.org/downloads/))
+- **Claude Code CLI** - Install and authenticate (see below)

-This project requires the Claude Code CLI to be installed. Install it using one of these methods:
+### Claude Code CLI (Required)

 **macOS / Linux:**
 ```bash
@@ -39,35 +41,63 @@ You need one of the following:

 ## Quick Start

-### Option 1: Web UI (Recommended)
+### Option 1: npm Install (Recommended)

-**Windows:**
-```cmd
-start_ui.bat
-```
-
-**macOS / Linux:**
 ```bash
-./start_ui.sh
+npm install -g autoforge-ai
+autoforge
 ```

+On first run, AutoForge automatically:
+1. Checks for Python 3.11+
+2. Creates a virtual environment at `~/.autoforge/venv/`
+3. Installs Python dependencies
+4. Copies a default config file to `~/.autoforge/.env`
+5. Starts the server and opens your browser
+
+### CLI Commands
+
+```
+autoforge                       Start the server (default)
+autoforge config                Open ~/.autoforge/.env in $EDITOR
+autoforge config --path         Print config file path
+autoforge config --show         Show active configuration values
+autoforge --port PORT           Custom port (default: auto from 8888)
+autoforge --host HOST           Custom host (default: 127.0.0.1)
+autoforge --no-browser          Don't auto-open browser
+autoforge --repair              Delete and recreate virtual environment
+autoforge --version             Print version
+autoforge --help                Show help
+```
+
+### Option 2: From Source (Development)
+
+Clone the repository and use the start scripts directly. This is the recommended path if you want to contribute or modify AutoForge itself.
+
+```bash
+git clone https://github.com/leonvanzyl/autoforge.git
+cd autoforge
+```
+
+**Web UI:**
+
+| Platform | Command |
+|---|---|
+| Windows | `start_ui.bat` |
+| macOS / Linux | `./start_ui.sh` |
+
 This launches the React-based web UI at `http://localhost:5173` with:
 - Project selection and creation
 - Kanban board view of features
 - Real-time agent output streaming
 - Start/pause/stop controls

-### Option 2: CLI Mode
+**CLI Mode:**

-**Windows:**
-```cmd
-start.bat
-```
-
-**macOS / Linux:**
-```bash
-./start.sh
-```
+| Platform | Command |
+|---|---|
+| Windows | `start.bat` |
+| macOS / Linux | `./start.sh` |

 The start script will:
 1. Check if Claude CLI is installed
@@ -130,44 +160,43 @@ Features are stored in SQLite via SQLAlchemy and managed through an MCP server t
 ## Project Structure

 ```
-autonomous-coding/
-├── start.bat                 # Windows CLI start script
-├── start.sh                  # macOS/Linux CLI start script
-├── start_ui.bat              # Windows Web UI start script
-├── start_ui.sh               # macOS/Linux Web UI start script
-├── start.py                  # CLI menu and project management
-├── start_ui.py               # Web UI backend (FastAPI server launcher)
-├── autonomous_agent_demo.py  # Agent entry point
-├── agent.py                  # Agent session logic
-├── client.py                 # Claude SDK client configuration
-├── security.py               # Bash command allowlist and validation
-├── progress.py               # Progress tracking utilities
-├── prompts.py                # Prompt loading utilities
+autoforge/
+├── bin/                         # npm CLI entry point
+├── lib/                         # CLI bootstrap and setup logic
+├── start.py                     # CLI menu and project management
+├── start_ui.py                  # Web UI backend (FastAPI server launcher)
+├── autonomous_agent_demo.py     # Agent entry point
+├── agent.py                     # Agent session logic
+├── client.py                    # Claude SDK client configuration
+├── security.py                  # Bash command allowlist and validation
+├── progress.py                  # Progress tracking utilities
+├── prompts.py                   # Prompt loading utilities
 ├── api/
-│   └── database.py           # SQLAlchemy models (Feature table)
+│   └── database.py              # SQLAlchemy models (Feature table)
 ├── mcp_server/
-│   └── feature_mcp.py        # MCP server for feature management tools
+│   └── feature_mcp.py           # MCP server for feature management tools
 ├── server/
-│   ├── main.py               # FastAPI REST API server
-│   ├── websocket.py          # WebSocket handler for real-time updates
-│   ├── schemas.py            # Pydantic schemas
-│   ├── routers/              # API route handlers
-│   └── services/             # Business logic services
-├── ui/                       # React frontend
+│   ├── main.py                  # FastAPI REST API server
+│   ├── websocket.py             # WebSocket handler for real-time updates
+│   ├── schemas.py               # Pydantic schemas
+│   ├── routers/                 # API route handlers
+│   └── services/                # Business logic services
+├── ui/                          # React frontend
 │   ├── src/
-│   │   ├── App.tsx           # Main app component
-│   │   ├── hooks/            # React Query and WebSocket hooks
-│   │   └── lib/              # API client and types
+│   │   ├── App.tsx              # Main app component
+│   │   ├── hooks/               # React Query and WebSocket hooks
+│   │   └── lib/                 # API client and types
 │   ├── package.json
 │   └── vite.config.ts
 ├── .claude/
 │   ├── commands/
-│   │   └── create-spec.md    # /create-spec slash command
-│   ├── skills/               # Claude Code skills
-│   └── templates/            # Prompt templates
-├── generations/              # Generated projects go here
-├── requirements.txt          # Python dependencies
-└── .env                      # Optional configuration (N8N webhook)
+│   │   └── create-spec.md       # /create-spec slash command
+│   ├── skills/                  # Claude Code skills
+│   └── templates/               # Prompt templates
+├── requirements.txt             # Python dependencies (development)
+├── requirements-prod.txt        # Python dependencies (npm install)
+├── package.json                 # npm package definition
+└── .env                         # Optional configuration
 ```

 ---
@@ -264,11 +293,20 @@ The UI receives live updates via WebSocket (`/ws/projects/{project_name}`):

 ---

-## Configuration (Optional)
+## Configuration
+
+AutoForge reads configuration from a `.env` file. The file location depends on how you installed AutoForge:
+
+| Install method | Config file location | Edit command |
+|---|---|---|
+| npm (global) | `~/.autoforge/.env` | `autoforge config` |
+| From source | `.env` in the project root | Edit directly |
+
+A default config file is created automatically on first run. Use `autoforge config` to open it in your editor, or `autoforge config --show` to print the active values.

 ### N8N Webhook Integration

-The agent can send progress notifications to an N8N webhook. Create a `.env` file:
+Add to your `.env` to send progress notifications to an N8N webhook:

 ```bash
 # Optional: N8N webhook for progress notifications
@@ -288,22 +326,28 @@ When test progress increases, the agent sends:
 }
 ```

-### Using GLM Models (Alternative to Claude)
+### Alternative API Providers (GLM, Ollama, Kimi, Custom)

-To use Zhipu AI's GLM models instead of Claude, add these variables to your `.env` file in the AutoCoder directory:
+Alternative providers are configured via the **Settings UI** (gear icon > API Provider). Select your provider, set the base URL, auth token, and model directly in the UI — no `.env` changes needed.
+
+Available providers: **Claude** (default), **GLM** (Zhipu AI), **Ollama** (local models), **Kimi** (Moonshot), **Custom**
+
+For Ollama, install [Ollama v0.14.0+](https://ollama.com), run `ollama serve`, and pull a coding model (e.g., `ollama pull qwen3-coder`). Then select "Ollama" in the Settings UI.
+
+### Using Vertex AI
+
+Add these variables to your `.env` file to run agents via Google Cloud Vertex AI:

 ```bash
-ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
-ANTHROPIC_AUTH_TOKEN=your-zhipu-api-key
-API_TIMEOUT_MS=3000000
-ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
-ANTHROPIC_DEFAULT_OPUS_MODEL=glm-4.7
-ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.5-air
+CLAUDE_CODE_USE_VERTEX=1
+CLOUD_ML_REGION=us-east5
+ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
+ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
+ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-5@20250929
+ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-haiku@20241022
 ```

-This routes AutoCoder's API requests through Zhipu's Claude-compatible API, allowing you to use GLM-4.7 and other models. **This only affects AutoCoder** - your global Claude Code settings remain unchanged.
-
-Get an API key at: https://z.ai/subscribe
+Requires `gcloud auth application-default login` first. Note the `@` separator (not `-`) in Vertex AI model names.

 ---

@@ -335,6 +379,18 @@ This is normal. The initializer agent is generating detailed test cases, which t
 **"Command blocked by security hook"**
 The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to `ALLOWED_COMMANDS` in `security.py`.

+**"Python 3.11+ required but not found"**
+Install Python 3.11 or later from [python.org](https://www.python.org/downloads/). Make sure `python3` (or `python` on Windows) is on your PATH.
+
+**"Python venv module not available"**
+On Debian/Ubuntu, the venv module is packaged separately. Install it with `sudo apt install python3.XX-venv` (replace `XX` with your Python minor version, e.g., `python3.12-venv`).
+
+**"AutoForge is already running"**
+A server instance is already active. Use the browser URL shown in the terminal, or stop the existing instance with Ctrl+C first.
+
+**Virtual environment issues after a Python upgrade**
+Run `autoforge --repair` to delete and recreate the virtual environment from scratch.
+
 ---

 ## License
--- a/SAMPLE_PROMPT.md
+++ b/SAMPLE_PROMPT.md
@@ -1,22 +0,0 @@
-Let's call it Simple Todo. This is a really simple web app that I can use to track my to-do items using a Kanban
-board. I should be able to add to-dos and then drag and drop them through the Kanban board. The different columns in
-the Kanban board are:
-
- To Do
- In Progress
- Done
-
-The app should use a neobrutalism design.
-
-There is no need for user authentication either. All the to-dos will be stored in local storage, so each user has
-access to all of their to-dos when they open their browser. So do not worry about implementing a backend with user
-authentication or a database. Simply store everything in local storage. As for the design, please try to avoid AI
-slop, so use your front-end design skills to design something beautiful and practical. As for the content of the
-to-dos, we should store:
-
- The name or the title at the very least
- Optionally, we can also set tags, due dates, and priorities which should be represented as beautiful little badges
-  on the to-do card Users should have the ability to easily clear out all the completed To-Dos. They should also be
-  able to filter and search for To-Dos as well.
-
-You choose the rest. Keep it simple. Should be 25 features.
--- a/agent.py
+++ b/agent.py
@@ -23,14 +23,27 @@ if sys.platform == "win32":
    sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace", line_buffering=True)

 from client import create_client
-from progress import count_passing_tests, has_features, print_progress_summary, print_session_header
+from progress import (
+    count_passing_tests,
+    has_features,
+    print_progress_summary,
+    print_session_header,
+)
 from prompts import (
    copy_spec_to_project,
+    get_batch_feature_prompt,
    get_coding_prompt,
    get_initializer_prompt,
    get_single_feature_prompt,
    get_testing_prompt,
 )
+from rate_limit_utils import (
+    calculate_error_backoff,
+    calculate_rate_limit_backoff,
+    clamp_retry_delay,
+    is_rate_limit_error,
+    parse_retry_after,
+)

 # Configuration
 AUTO_CONTINUE_DELAY_SECONDS = 3
@@ -106,8 +119,19 @@ async def run_agent_session(
        return "continue", response_text

    except Exception as e:
-        print(f"Error during agent session: {e}")
-        return "error", str(e)
+        error_str = str(e)
+        print(f"Error during agent session: {error_str}")
+
+        # Detect rate limit errors from exception message
+        if is_rate_limit_error(error_str):
+            # Try to extract retry-after time from error
+            retry_seconds = parse_retry_after(error_str)
+            if retry_seconds is not None:
+                return "rate_limit", str(retry_seconds)
+            else:
+                return "rate_limit", "unknown"
+
+        return "error", error_str


 async def run_autonomous_agent(
@@ -116,8 +140,10 @@ async def run_autonomous_agent(
    max_iterations: Optional[int] = None,
    yolo_mode: bool = False,
    feature_id: Optional[int] = None,
+    feature_ids: Optional[list[int]] = None,
    agent_type: Optional[str] = None,
    testing_feature_id: Optional[int] = None,
+    testing_feature_ids: Optional[list[int]] = None,
 ) -> None:
    """
    Run the autonomous agent loop.
@@ -128,8 +154,10 @@ async def run_autonomous_agent(
        max_iterations: Maximum number of iterations (None for unlimited)
        yolo_mode: If True, skip browser testing in coding agent prompts
        feature_id: If set, work only on this specific feature (used by orchestrator for coding agents)
+        feature_ids: If set, work on these features in batch (used by orchestrator for batch mode)
        agent_type: Type of agent: "initializer", "coding", "testing", or None (auto-detect)
-        testing_feature_id: For testing agents, the pre-claimed feature ID to test
+        testing_feature_id: For testing agents, the pre-claimed feature ID to test (legacy single mode)
+        testing_feature_ids: For testing agents, list of feature IDs to batch test
    """
    print("\n" + "=" * 70)
    print("  AUTONOMOUS CODING AGENT")
@@ -140,7 +168,9 @@ async def run_autonomous_agent(
        print(f"Agent type: {agent_type}")
    if yolo_mode:
        print("Mode: YOLO (testing agents disabled)")
-    if feature_id:
+    if feature_ids and len(feature_ids) > 1:
+        print(f"Feature batch: {', '.join(f'#{fid}' for fid in feature_ids)}")
+    elif feature_id:
        print(f"Feature assignment: #{feature_id}")
    if max_iterations:
        print(f"Max iterations: {max_iterations}")
@@ -183,6 +213,8 @@ async def run_autonomous_agent(

    # Main loop
    iteration = 0
+    rate_limit_retries = 0  # Track consecutive rate limit errors for exponential backoff
+    error_retries = 0  # Track consecutive non-rate-limit errors

    while True:
        iteration += 1
@@ -190,7 +222,7 @@ async def run_autonomous_agent(
        # Check if all features are already complete (before starting a new session)
        # Skip this check if running as initializer (needs to create features first)
        if not is_initializer and iteration == 1:
-            passing, in_progress, total = count_passing_tests(project_dir)
+            passing, in_progress, total, _nhi = count_passing_tests(project_dir)
            if total > 0 and passing == total:
                print("\n" + "=" * 70)
                print("  ALL FEATURES ALREADY COMPLETE!")
@@ -208,27 +240,23 @@ async def run_autonomous_agent(
        print_session_header(iteration, is_initializer)

        # Create client (fresh context)
-        # Pass agent_id for browser isolation in multi-agent scenarios
-        import os
-        if agent_type == "testing":
-            agent_id = f"testing-{os.getpid()}"  # Unique ID for testing agents
-        elif feature_id:
-            agent_id = f"feature-{feature_id}"
-        else:
-            agent_id = None
-        client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_id=agent_id)
+        client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_type=agent_type)

        # Choose prompt based on agent type
        if agent_type == "initializer":
            prompt = get_initializer_prompt(project_dir)
        elif agent_type == "testing":
-            prompt = get_testing_prompt(project_dir, testing_feature_id)
-        elif feature_id:
+            prompt = get_testing_prompt(project_dir, testing_feature_id, testing_feature_ids)
+        elif feature_ids and len(feature_ids) > 1:
+            # Batch mode (used by orchestrator for multi-feature coding agents)
+            prompt = get_batch_feature_prompt(feature_ids, project_dir, yolo_mode)
+        elif feature_id or (feature_ids is not None and len(feature_ids) == 1):
            # Single-feature mode (used by orchestrator for coding agents)
-            prompt = get_single_feature_prompt(feature_id, project_dir, yolo_mode)
+            fid = feature_id if feature_id is not None else feature_ids[0]  # type: ignore[index]
+            prompt = get_single_feature_prompt(fid, project_dir, yolo_mode)
        else:
            # General coding prompt (legacy path)
-            prompt = get_coding_prompt(project_dir)
+            prompt = get_coding_prompt(project_dir, yolo_mode=yolo_mode)

        # Run session with async context manager
        # Wrap in try/except to handle MCP server startup failures gracefully
@@ -250,13 +278,28 @@ async def run_autonomous_agent(

        # Handle status
        if status == "continue":
+            # Reset error retries on success; rate-limit retries reset only if no signal
+            error_retries = 0
+            reset_rate_limit_retries = True
+
            delay_seconds = AUTO_CONTINUE_DELAY_SECONDS
            target_time_str = None

-            if "limit reached" in response.lower():
-                print("Claude Agent SDK indicated limit reached.")
+            # Check for rate limit indicators in response text
+            if is_rate_limit_error(response):
+                print("Claude Agent SDK indicated rate limit reached.")
+                reset_rate_limit_retries = False

-                # Try to parse reset time from response
+                # Try to extract retry-after from response text first
+                retry_seconds = parse_retry_after(response)
+                if retry_seconds is not None:
+                    delay_seconds = clamp_retry_delay(retry_seconds)
+                else:
+                    # Use exponential backoff when retry-after unknown
+                    delay_seconds = calculate_rate_limit_backoff(rate_limit_retries)
+                    rate_limit_retries += 1
+
+                # Try to parse reset time from response (more specific format)
                match = re.search(
                    r"(?i)\bresets(?:\s+at)?\s+(\d+)(?::(\d+))?\s*(am|pm)\s*\(([^)]+)\)",
                    response,
@@ -285,9 +328,7 @@ async def run_autonomous_agent(
                            target += timedelta(days=1)

                        delta = target - now
-                        delay_seconds = min(
-                            delta.total_seconds(), 24 * 60 * 60
-                        )  # Clamp to 24 hours max
+                        delay_seconds = min(max(int(delta.total_seconds()), 1), 24 * 60 * 60)
                        target_time_str = target.strftime("%B %d, %Y at %I:%M %p %Z")

                    except Exception as e:
@@ -307,7 +348,7 @@ async def run_autonomous_agent(
            print_progress_summary(project_dir)

            # Check if all features are complete - exit gracefully if done
-            passing, in_progress, total = count_passing_tests(project_dir)
+            passing, in_progress, total, _nhi = count_passing_tests(project_dir)
            if total > 0 and passing == total:
                print("\n" + "=" * 70)
                print("  ALL FEATURES COMPLETE!")
@@ -316,20 +357,56 @@ async def run_autonomous_agent(
                print("The autonomous agent has finished its work.")
                break

-            # Single-feature mode OR testing agent: exit after one session
-            if feature_id is not None or agent_type == "testing":
+            # Single-feature mode, batch mode, or testing agent: exit after one session
+            if feature_ids and len(feature_ids) > 1:
+                print(f"\nBatch mode: Features {', '.join(f'#{fid}' for fid in feature_ids)} session complete.")
+                break
+            elif feature_id is not None or (feature_ids is not None and len(feature_ids) == 1):
+                fid = feature_id if feature_id is not None else feature_ids[0]  # type: ignore[index]
                if agent_type == "testing":
                    print("\nTesting agent complete. Terminating session.")
                else:
-                    print(f"\nSingle-feature mode: Feature #{feature_id} session complete.")
+                    print(f"\nSingle-feature mode: Feature #{fid} session complete.")
                break
+            elif agent_type == "testing":
+                print("\nTesting agent complete. Terminating session.")
+                break
+
+            # Reset rate limit retries only if no rate limit signal was detected
+            if reset_rate_limit_retries:
+                rate_limit_retries = 0
+
+            await asyncio.sleep(delay_seconds)
+
+        elif status == "rate_limit":
+            # Smart rate limit handling with exponential backoff
+            # Reset error counter so mixed events don't inflate delays
+            error_retries = 0
+            if response != "unknown":
+                try:
+                    delay_seconds = clamp_retry_delay(int(response))
+                except (ValueError, TypeError):
+                    # Malformed value - fall through to exponential backoff
+                    response = "unknown"
+            if response == "unknown":
+                # Use exponential backoff when retry-after unknown or malformed
+                delay_seconds = calculate_rate_limit_backoff(rate_limit_retries)
+                rate_limit_retries += 1
+                print(f"\nRate limit hit. Backoff wait: {delay_seconds} seconds (attempt #{rate_limit_retries})...")
+            else:
+                print(f"\nRate limit hit. Waiting {delay_seconds} seconds before retry...")

            await asyncio.sleep(delay_seconds)

        elif status == "error":
+            # Non-rate-limit errors: linear backoff capped at 5 minutes
+            # Reset rate limit counter so mixed events don't inflate delays
+            rate_limit_retries = 0
+            error_retries += 1
+            delay_seconds = calculate_error_backoff(error_retries)
            print("\nSession encountered an error")
-            print("Will retry with a fresh session...")
-            await asyncio.sleep(AUTO_CONTINUE_DELAY_SECONDS)
+            print(f"Will retry in {delay_seconds}s (attempt #{error_retries})...")
+            await asyncio.sleep(delay_seconds)

        # Small delay between sessions
        if max_iterations is None or iteration < max_iterations:
--- a/api/database.py
+++ b/api/database.py
@@ -8,7 +8,7 @@ SQLite database schema for feature storage using SQLAlchemy.
 import sys
 from datetime import datetime, timezone
 from pathlib import Path
-from typing import Optional
+from typing import Generator, Optional


 def _utc_now() -> datetime:
@@ -26,13 +26,16 @@ from sqlalchemy import (
    String,
    Text,
    create_engine,
+    event,
    text,
 )
-from sqlalchemy.ext.declarative import declarative_base
-from sqlalchemy.orm import Session, relationship, sessionmaker
+from sqlalchemy.orm import DeclarativeBase, Session, relationship, sessionmaker
 from sqlalchemy.types import JSON

-Base = declarative_base()
+
+class Base(DeclarativeBase):
+    """SQLAlchemy 2.0 style declarative base."""
+    pass


 class Feature(Base):
@@ -40,10 +43,10 @@ class Feature(Base):

    __tablename__ = "features"

-    # Composite index for common status query pattern (passes, in_progress)
+    # Composite index for common status query pattern (passes, in_progress, needs_human_input)
    # Used by feature_get_stats, get_ready_features, and other status queries
    __table_args__ = (
-        Index('ix_feature_status', 'passes', 'in_progress'),
+        Index('ix_feature_status', 'passes', 'in_progress', 'needs_human_input'),
    )

    id = Column(Integer, primary_key=True, index=True)
@@ -58,6 +61,11 @@ class Feature(Base):
    # NULL/empty = no dependencies (backwards compatible)
    dependencies = Column(JSON, nullable=True, default=None)

+    # Human input: agent can request structured input from a human
+    needs_human_input = Column(Boolean, nullable=False, default=False, index=True)
+    human_input_request = Column(JSON, nullable=True, default=None)   # Agent's structured request
+    human_input_response = Column(JSON, nullable=True, default=None)  # Human's response
+
    def to_dict(self) -> dict:
        """Convert feature to dictionary for JSON serialization."""
        return {
@@ -72,6 +80,10 @@ class Feature(Base):
            "in_progress": self.in_progress if self.in_progress is not None else False,
            # Dependencies: NULL/empty treated as empty list for backwards compat
            "dependencies": self.dependencies if self.dependencies else [],
+            # Human input fields
+            "needs_human_input": self.needs_human_input if self.needs_human_input is not None else False,
+            "human_input_request": self.human_input_request,
+            "human_input_response": self.human_input_response,
        }

    def get_dependencies_safe(self) -> list[int]:
@@ -180,7 +192,8 @@ class ScheduleOverride(Base):

 def get_database_path(project_dir: Path) -> Path:
    """Return the path to the SQLite database for a project."""
-    return project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    return get_features_db_path(project_dir)


 def get_database_url(project_dir: Path) -> str:
@@ -298,6 +311,21 @@ def _is_network_path(path: Path) -> bool:
    return False


+def _migrate_add_human_input_columns(engine) -> None:
+    """Add human input columns to existing databases that don't have them."""
+    with engine.connect() as conn:
+        result = conn.execute(text("PRAGMA table_info(features)"))
+        columns = [row[1] for row in result.fetchall()]
+
+        if "needs_human_input" not in columns:
+            conn.execute(text("ALTER TABLE features ADD COLUMN needs_human_input BOOLEAN DEFAULT 0"))
+        if "human_input_request" not in columns:
+            conn.execute(text("ALTER TABLE features ADD COLUMN human_input_request TEXT DEFAULT NULL"))
+        if "human_input_response" not in columns:
+            conn.execute(text("ALTER TABLE features ADD COLUMN human_input_response TEXT DEFAULT NULL"))
+        conn.commit()
+
+
 def _migrate_add_schedules_tables(engine) -> None:
    """Create schedules and schedule_overrides tables if they don't exist."""
    from sqlalchemy import inspect
@@ -307,11 +335,11 @@ def _migrate_add_schedules_tables(engine) -> None:

    # Create schedules table if missing
    if "schedules" not in existing_tables:
-        Schedule.__table__.create(bind=engine)
+        Schedule.__table__.create(bind=engine)  # type: ignore[attr-defined]

    # Create schedule_overrides table if missing
    if "schedule_overrides" not in existing_tables:
-        ScheduleOverride.__table__.create(bind=engine)
+        ScheduleOverride.__table__.create(bind=engine)  # type: ignore[attr-defined]

    # Add crash_count column if missing (for upgrades)
    if "schedules" in existing_tables:
@@ -332,49 +360,134 @@ def _migrate_add_schedules_tables(engine) -> None:
                conn.commit()


+def _configure_sqlite_immediate_transactions(engine) -> None:
+    """Configure engine for IMMEDIATE transactions via event hooks.
+
+    Per SQLAlchemy docs: https://docs.sqlalchemy.org/en/20/dialects/sqlite.html
+
+    This replaces fragile pysqlite implicit transaction handling with explicit
+    BEGIN IMMEDIATE at transaction start. Benefits:
+    - Acquires write lock immediately, preventing stale reads
+    - Works correctly regardless of prior ORM operations
+    - Future-proof: won't break when pysqlite legacy mode is removed in Python 3.16
+    """
+    @event.listens_for(engine, "connect")
+    def do_connect(dbapi_connection, connection_record):
+        # Disable pysqlite's implicit transaction handling
+        dbapi_connection.isolation_level = None
+
+        # Set busy_timeout on raw connection before any transactions
+        cursor = dbapi_connection.cursor()
+        try:
+            cursor.execute("PRAGMA busy_timeout=30000")
+        finally:
+            cursor.close()
+
+    @event.listens_for(engine, "begin")
+    def do_begin(conn):
+        # Use IMMEDIATE for all transactions to prevent stale reads
+        conn.exec_driver_sql("BEGIN IMMEDIATE")
+
+
 def create_database(project_dir: Path) -> tuple:
    """
    Create database and return engine + session maker.

+    Uses a cache to avoid creating new engines for each request, which improves
+    performance by reusing database connections.
+
    Args:
        project_dir: Directory containing the project

    Returns:
        Tuple of (engine, SessionLocal)
    """
+    cache_key = project_dir.as_posix()
+
+    if cache_key in _engine_cache:
+        return _engine_cache[cache_key]
+
    db_url = get_database_url(project_dir)
-    engine = create_engine(db_url, connect_args={
-        "check_same_thread": False,
-        "timeout": 30  # Wait up to 30s for locks
-    })
-    Base.metadata.create_all(bind=engine)
+
+    # Ensure parent directory exists (for .autoforge/ layout)
+    db_path = get_database_path(project_dir)
+    db_path.parent.mkdir(parents=True, exist_ok=True)

    # Choose journal mode based on filesystem type
    # WAL mode doesn't work reliably on network filesystems and can cause corruption
    is_network = _is_network_path(project_dir)
    journal_mode = "DELETE" if is_network else "WAL"

+    engine = create_engine(db_url, connect_args={
+        "check_same_thread": False,
+        "timeout": 30  # Wait up to 30s for locks
+    })
+
+    # Set journal mode BEFORE configuring event hooks
+    # PRAGMA journal_mode must run outside of a transaction, and our event hooks
+    # start a transaction with BEGIN IMMEDIATE on every operation
    with engine.connect() as conn:
-        conn.execute(text(f"PRAGMA journal_mode={journal_mode}"))
-        conn.execute(text("PRAGMA busy_timeout=30000"))
-        conn.commit()
+        # Get raw DBAPI connection to execute PRAGMA outside transaction
+        raw_conn = conn.connection.dbapi_connection
+        if raw_conn is None:
+            raise RuntimeError("Failed to get raw DBAPI connection")
+        cursor = raw_conn.cursor()
+        try:
+            cursor.execute(f"PRAGMA journal_mode={journal_mode}")
+            cursor.execute("PRAGMA busy_timeout=30000")
+        finally:
+            cursor.close()
+
+    # Configure IMMEDIATE transactions via event hooks AFTER setting PRAGMAs
+    # This must happen before create_all() and migrations run
+    _configure_sqlite_immediate_transactions(engine)
+
+    Base.metadata.create_all(bind=engine)

    # Migrate existing databases
    _migrate_add_in_progress_column(engine)
    _migrate_fix_null_boolean_fields(engine)
    _migrate_add_dependencies_column(engine)
    _migrate_add_testing_columns(engine)
+    _migrate_add_human_input_columns(engine)

    # Migrate to add schedules tables
    _migrate_add_schedules_tables(engine)

    SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+    # Cache the engine and session maker
+    _engine_cache[cache_key] = (engine, SessionLocal)
+
    return engine, SessionLocal


+def dispose_engine(project_dir: Path) -> bool:
+    """Dispose of and remove the cached engine for a project.
+
+    This closes all database connections, releasing file locks on Windows.
+    Should be called before deleting the database file.
+
+    Returns:
+        True if an engine was disposed, False if no engine was cached.
+    """
+    cache_key = project_dir.as_posix()
+
+    if cache_key in _engine_cache:
+        engine, _ = _engine_cache.pop(cache_key)
+        engine.dispose()
+        return True
+
+    return False
+
+
 # Global session maker - will be set when server starts
 _session_maker: Optional[sessionmaker] = None

+# Engine cache to avoid creating new engines for each request
+# Key: project directory path (as posix string), Value: (engine, SessionLocal)
+_engine_cache: dict[str, tuple] = {}
+

 def set_session_maker(session_maker: sessionmaker) -> None:
    """Set the global session maker."""
@@ -382,7 +495,7 @@ def set_session_maker(session_maker: sessionmaker) -> None:
    _session_maker = session_maker


-def get_db() -> Session:
+def get_db() -> Generator[Session, None, None]:
    """
    Dependency for FastAPI to get database session.

@@ -394,5 +507,55 @@ def get_db() -> Session:
    db = _session_maker()
    try:
        yield db
+    except Exception:
+        db.rollback()
+        raise
    finally:
        db.close()
+
+
+# =============================================================================
+# Atomic Transaction Helpers for Parallel Mode
+# =============================================================================
+# These helpers prevent database corruption when multiple processes access the
+# same SQLite database concurrently. They use IMMEDIATE transactions which
+# acquire write locks at the start (preventing stale reads) and atomic
+# UPDATE ... WHERE clauses (preventing check-then-modify races).
+
+
+from contextlib import contextmanager
+
+
+@contextmanager
+def atomic_transaction(session_maker):
+    """Context manager for atomic SQLite transactions.
+
+    Acquires a write lock immediately via BEGIN IMMEDIATE (configured by
+    engine event hooks), preventing stale reads in read-modify-write patterns.
+    This is essential for preventing race conditions in parallel mode.
+
+    Args:
+        session_maker: SQLAlchemy sessionmaker
+
+    Yields:
+        SQLAlchemy session with automatic commit/rollback
+
+    Example:
+        with atomic_transaction(session_maker) as session:
+            # All reads in this block are protected by write lock
+            feature = session.query(Feature).filter(...).first()
+            feature.priority = new_priority
+            # Commit happens automatically on exit
+    """
+    session = session_maker()
+    try:
+        yield session
+        session.commit()
+    except Exception:
+        try:
+            session.rollback()
+        except Exception:
+            pass  # Don't let rollback failure mask original error
+        raise
+    finally:
+        session.close()
--- a/api/dependency_resolver.py
+++ b/api/dependency_resolver.py
@@ -7,6 +7,7 @@ Includes cycle detection, validation, and helper functions for dependency manage
 """

 import heapq
+from collections import deque
 from typing import TypedDict

 # Security: Prevent DoS via excessive dependencies
@@ -300,15 +301,21 @@ def compute_scheduling_scores(features: list[dict]) -> dict[int, float]:
                parents[f["id"]].append(dep_id)

    # Calculate depths via BFS from roots
+    # Use visited set to prevent infinite loops from circular dependencies
+    # Use deque for O(1) popleft instead of list.pop(0) which is O(n)
    depths: dict[int, int] = {}
+    visited: set[int] = set()
    roots = [f["id"] for f in features if not parents[f["id"]]]
-    queue = [(root, 0) for root in roots]
-    while queue:
-        node_id, depth = queue.pop(0)
-        if node_id not in depths or depth > depths[node_id]:
-            depths[node_id] = depth
+    bfs_queue: deque[tuple[int, int]] = deque((root, 0) for root in roots)
+    while bfs_queue:
+        node_id, depth = bfs_queue.popleft()
+        if node_id in visited:
+            continue  # Skip already visited nodes (handles cycles)
+        visited.add(node_id)
+        depths[node_id] = depth
        for child_id in children[node_id]:
-            queue.append((child_id, depth + 1))
+            if child_id not in visited:
+                bfs_queue.append((child_id, depth + 1))

    # Handle orphaned nodes (shouldn't happen but be safe)
    for f in features:
--- a/autoforge_paths.py
+++ b/autoforge_paths.py
@@ -0,0 +1,326 @@
+"""
+AutoForge Path Resolution
+=========================
+
+Central module for resolving paths to autoforge-generated files within a project.
+
+Implements a tri-path resolution strategy for backward compatibility:
+
+    1. Check ``project_dir / ".autoforge" / X`` (current layout)
+    2. Check ``project_dir / ".autocoder" / X`` (legacy layout)
+    3. Check ``project_dir / X`` (legacy root-level layout)
+    4. Default to the new location for fresh projects
+
+This allows existing projects with root-level ``features.db``, ``.agent.lock``,
+etc. to keep working while new projects store everything under ``.autoforge/``.
+Projects using the old ``.autocoder/`` directory are auto-migrated on next start.
+
+The ``migrate_project_layout`` function can move an old-layout project to the
+new layout safely, with full integrity checks for SQLite databases.
+"""
+
+import logging
+import shutil
+import sqlite3
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# .gitignore content written into every .autoforge/ directory
+# ---------------------------------------------------------------------------
+_GITIGNORE_CONTENT = """\
+# AutoForge runtime files
+features.db
+features.db-wal
+features.db-shm
+assistant.db
+assistant.db-wal
+assistant.db-shm
+.agent.lock
+.devserver.lock
+.pause_drain
+.claude_settings.json
+.claude_assistant_settings.json
+.claude_settings.expand.*.json
+.progress_cache
+.migration_version
+"""
+
+
+# ---------------------------------------------------------------------------
+# Private helpers
+# ---------------------------------------------------------------------------
+
+def _resolve_path(project_dir: Path, filename: str) -> Path:
+    """Resolve a file path using tri-path strategy.
+
+    Checks the new ``.autoforge/`` location first, then the legacy
+    ``.autocoder/`` location, then the root-level location.  If none exist,
+    returns the new location so that newly-created files land in ``.autoforge/``.
+    """
+    new = project_dir / ".autoforge" / filename
+    if new.exists():
+        return new
+    legacy = project_dir / ".autocoder" / filename
+    if legacy.exists():
+        return legacy
+    old = project_dir / filename
+    if old.exists():
+        return old
+    return new  # default for new projects
+
+
+def _resolve_dir(project_dir: Path, dirname: str) -> Path:
+    """Resolve a directory path using tri-path strategy.
+
+    Same logic as ``_resolve_path`` but intended for directories such as
+    ``prompts/``.
+    """
+    new = project_dir / ".autoforge" / dirname
+    if new.exists():
+        return new
+    legacy = project_dir / ".autocoder" / dirname
+    if legacy.exists():
+        return legacy
+    old = project_dir / dirname
+    if old.exists():
+        return old
+    return new
+
+
+# ---------------------------------------------------------------------------
+# .autoforge directory management
+# ---------------------------------------------------------------------------
+
+def get_autoforge_dir(project_dir: Path) -> Path:
+    """Return the ``.autoforge`` directory path.  Does NOT create it."""
+    return project_dir / ".autoforge"
+
+
+def ensure_autoforge_dir(project_dir: Path) -> Path:
+    """Create the ``.autoforge/`` directory (if needed) and write its ``.gitignore``.
+
+    Returns:
+        The path to the ``.autoforge`` directory.
+    """
+    autoforge_dir = get_autoforge_dir(project_dir)
+    autoforge_dir.mkdir(parents=True, exist_ok=True)
+
+    gitignore_path = autoforge_dir / ".gitignore"
+    gitignore_path.write_text(_GITIGNORE_CONTENT, encoding="utf-8")
+
+    return autoforge_dir
+
+
+# ---------------------------------------------------------------------------
+# Dual-path file helpers
+# ---------------------------------------------------------------------------
+
+def get_features_db_path(project_dir: Path) -> Path:
+    """Resolve the path to ``features.db``."""
+    return _resolve_path(project_dir, "features.db")
+
+
+def get_assistant_db_path(project_dir: Path) -> Path:
+    """Resolve the path to ``assistant.db``."""
+    return _resolve_path(project_dir, "assistant.db")
+
+
+def get_agent_lock_path(project_dir: Path) -> Path:
+    """Resolve the path to ``.agent.lock``."""
+    return _resolve_path(project_dir, ".agent.lock")
+
+
+def get_devserver_lock_path(project_dir: Path) -> Path:
+    """Resolve the path to ``.devserver.lock``."""
+    return _resolve_path(project_dir, ".devserver.lock")
+
+
+def get_claude_settings_path(project_dir: Path) -> Path:
+    """Resolve the path to ``.claude_settings.json``."""
+    return _resolve_path(project_dir, ".claude_settings.json")
+
+
+def get_claude_assistant_settings_path(project_dir: Path) -> Path:
+    """Resolve the path to ``.claude_assistant_settings.json``."""
+    return _resolve_path(project_dir, ".claude_assistant_settings.json")
+
+
+def get_pause_drain_path(project_dir: Path) -> Path:
+    """Return the path to the ``.pause_drain`` signal file.
+
+    This file is created to request a graceful pause (drain mode).
+    Always uses the new location since it's a transient signal file.
+    """
+    return project_dir / ".autoforge" / ".pause_drain"
+
+
+def get_progress_cache_path(project_dir: Path) -> Path:
+    """Resolve the path to ``.progress_cache``."""
+    return _resolve_path(project_dir, ".progress_cache")
+
+
+def get_prompts_dir(project_dir: Path) -> Path:
+    """Resolve the path to the ``prompts/`` directory."""
+    return _resolve_dir(project_dir, "prompts")
+
+
+# ---------------------------------------------------------------------------
+# Non-dual-path helpers (always use new location)
+# ---------------------------------------------------------------------------
+
+def get_expand_settings_path(project_dir: Path, uuid_hex: str) -> Path:
+    """Return the path for an ephemeral expand-session settings file.
+
+    These files are short-lived and always stored in ``.autoforge/``.
+    """
+    return project_dir / ".autoforge" / f".claude_settings.expand.{uuid_hex}.json"
+
+
+# ---------------------------------------------------------------------------
+# Lock-file safety check
+# ---------------------------------------------------------------------------
+
+def has_agent_running(project_dir: Path) -> bool:
+    """Check whether any agent or dev-server lock file exists at either location.
+
+    Inspects the legacy root-level paths, the old ``.autocoder/`` paths, and
+    the new ``.autoforge/`` paths so that a running agent is detected
+    regardless of project layout.
+
+    Returns:
+        ``True`` if any ``.agent.lock`` or ``.devserver.lock`` exists.
+    """
+    lock_names = (".agent.lock", ".devserver.lock")
+    for name in lock_names:
+        if (project_dir / name).exists():
+            return True
+        # Check both old and new directory names for backward compatibility
+        if (project_dir / ".autocoder" / name).exists():
+            return True
+        if (project_dir / ".autoforge" / name).exists():
+            return True
+    return False
+
+
+# ---------------------------------------------------------------------------
+# Migration
+# ---------------------------------------------------------------------------
+
+def migrate_project_layout(project_dir: Path) -> list[str]:
+    """Migrate a project from the legacy root-level layout to ``.autoforge/``.
+
+    The migration is incremental and safe:
+
+    * If the agent is running (lock files present) the migration is skipped
+      entirely to avoid corrupting in-use databases.
+    * Each file/directory is migrated independently.  If any single step
+      fails the error is logged and migration continues with the remaining
+      items.  Partial migration is safe because the dual-path resolution
+      strategy will find files at whichever location they ended up in.
+
+    Returns:
+        A list of human-readable descriptions of what was migrated, e.g.
+        ``["prompts/ -> .autoforge/prompts/", "features.db -> .autoforge/features.db"]``.
+        An empty list means nothing was migrated (either everything is
+        already migrated, or the agent is running).
+    """
+    # Safety: refuse to migrate while an agent is running
+    if has_agent_running(project_dir):
+        logger.warning("Migration skipped: agent or dev-server is running for %s", project_dir)
+        return []
+
+    # --- 0. Migrate .autocoder/ → .autoforge/ directory -------------------
+    old_autocoder_dir = project_dir / ".autocoder"
+    new_autoforge_dir = project_dir / ".autoforge"
+    if old_autocoder_dir.exists() and old_autocoder_dir.is_dir() and not new_autoforge_dir.exists():
+        try:
+            old_autocoder_dir.rename(new_autoforge_dir)
+            logger.info("Migrated .autocoder/ -> .autoforge/")
+            migrated: list[str] = [".autocoder/ -> .autoforge/"]
+        except Exception:
+            logger.warning("Failed to migrate .autocoder/ -> .autoforge/", exc_info=True)
+            migrated = []
+    else:
+        migrated = []
+
+    autoforge_dir = ensure_autoforge_dir(project_dir)
+
+    # --- 1. Migrate prompts/ directory -----------------------------------
+    try:
+        old_prompts = project_dir / "prompts"
+        new_prompts = autoforge_dir / "prompts"
+        if old_prompts.exists() and old_prompts.is_dir() and not new_prompts.exists():
+            shutil.copytree(str(old_prompts), str(new_prompts))
+            shutil.rmtree(str(old_prompts))
+            migrated.append("prompts/ -> .autoforge/prompts/")
+            logger.info("Migrated prompts/ -> .autoforge/prompts/")
+    except Exception:
+        logger.warning("Failed to migrate prompts/ directory", exc_info=True)
+
+    # --- 2. Migrate SQLite databases (features.db, assistant.db) ---------
+    db_names = ("features.db", "assistant.db")
+    for db_name in db_names:
+        try:
+            old_db = project_dir / db_name
+            new_db = autoforge_dir / db_name
+            if old_db.exists() and not new_db.exists():
+                # Flush WAL to ensure all data is in the main database file
+                conn = sqlite3.connect(str(old_db))
+                try:
+                    cursor = conn.cursor()
+                    cursor.execute("PRAGMA wal_checkpoint(TRUNCATE)")
+                finally:
+                    conn.close()
+
+                # Copy the main database file (WAL is now flushed)
+                shutil.copy2(str(old_db), str(new_db))
+
+                # Verify the copy is intact
+                verify_conn = sqlite3.connect(str(new_db))
+                try:
+                    verify_cursor = verify_conn.cursor()
+                    result = verify_cursor.execute("PRAGMA integrity_check").fetchone()
+                    if result is None or result[0] != "ok":
+                        logger.error(
+                            "Integrity check failed for migrated %s: %s",
+                            db_name, result,
+                        )
+                        # Remove the broken copy; old file stays in place
+                        new_db.unlink(missing_ok=True)
+                        continue
+                finally:
+                    verify_conn.close()
+
+                # Remove old database files (.db, .db-wal, .db-shm)
+                old_db.unlink(missing_ok=True)
+                for suffix in ("-wal", "-shm"):
+                    wal_file = project_dir / f"{db_name}{suffix}"
+                    wal_file.unlink(missing_ok=True)
+
+                migrated.append(f"{db_name} -> .autoforge/{db_name}")
+                logger.info("Migrated %s -> .autoforge/%s", db_name, db_name)
+        except Exception:
+            logger.warning("Failed to migrate %s", db_name, exc_info=True)
+
+    # --- 3. Migrate simple files -----------------------------------------
+    simple_files = (
+        ".agent.lock",
+        ".devserver.lock",
+        ".claude_settings.json",
+        ".claude_assistant_settings.json",
+        ".progress_cache",
+    )
+    for filename in simple_files:
+        try:
+            old_file = project_dir / filename
+            new_file = autoforge_dir / filename
+            if old_file.exists() and not new_file.exists():
+                shutil.move(str(old_file), str(new_file))
+                migrated.append(f"{filename} -> .autoforge/{filename}")
+                logger.info("Migrated %s -> .autoforge/%s", filename, filename)
+        except Exception:
+            logger.warning("Failed to migrate %s", filename, exc_info=True)
+
+    return migrated
--- a/autonomous_agent_demo.py
+++ b/autonomous_agent_demo.py
@@ -44,8 +44,10 @@ from dotenv import load_dotenv
 # IMPORTANT: Must be called BEFORE importing other modules that read env vars at load time
 load_dotenv()

+import os
+
 from agent import run_autonomous_agent
-from registry import DEFAULT_MODEL, get_project_path
+from registry import DEFAULT_MODEL, get_effective_sdk_env, get_project_path


 def parse_args() -> argparse.Namespace:
@@ -133,6 +135,13 @@ Authentication:
        help="Work on a specific feature ID only (used by orchestrator for coding agents)",
    )

+    parser.add_argument(
+        "--feature-ids",
+        type=str,
+        default=None,
+        help="Comma-separated feature IDs to implement in batch (e.g., '5,8,12')",
+    )
+
    # Agent type for subprocess mode
    parser.add_argument(
        "--agent-type",
@@ -145,7 +154,14 @@ Authentication:
        "--testing-feature-id",
        type=int,
        default=None,
-        help="Feature ID to regression test (used by orchestrator for testing agents)",
+        help="Feature ID to regression test (used by orchestrator for testing agents, legacy single mode)",
+    )
+
+    parser.add_argument(
+        "--testing-feature-ids",
+        type=str,
+        default=None,
+        help="Comma-separated feature IDs to regression test in batch (e.g., '5,12,18')",
    )

    # Testing agent configuration
@@ -156,6 +172,20 @@ Authentication:
        help="Testing agents per coding agent (0-3, default: 1). Set to 0 to disable testing agents.",
    )

+    parser.add_argument(
+        "--testing-batch-size",
+        type=int,
+        default=3,
+        help="Number of features per testing batch (1-5, default: 3)",
+    )
+
+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=3,
+        help="Max features per coding agent batch (1-3, default: 3)",
+    )
+
    return parser.parse_args()


@@ -167,6 +197,14 @@ def main() -> None:
    # Note: Authentication is handled by start.bat/start.sh before this script runs.
    # The Claude SDK auto-detects credentials from ~/.claude/.credentials.json

+    # Apply UI-configured provider settings to this process's environment.
+    # This ensures CLI-launched agents respect Settings UI provider config (GLM, Ollama, etc.).
+    # Uses setdefault so explicit env vars / .env file take precedence.
+    sdk_overrides = get_effective_sdk_env()
+    for key, value in sdk_overrides.items():
+        if value:  # Only set non-empty values (empty values are used to clear conflicts)
+            os.environ.setdefault(key, value)
+
    # Handle deprecated --parallel flag
    if args.parallel is not None:
        print("WARNING: --parallel is deprecated. Use --concurrency instead.", flush=True)
@@ -193,6 +231,36 @@ def main() -> None:
            print("Use an absolute path or register the project first.")
            return

+    # Migrate project layout to .autoforge/ if needed (idempotent, safe)
+    from autoforge_paths import migrate_project_layout
+    migrated = migrate_project_layout(project_dir)
+    if migrated:
+        print(f"Migrated project files to .autoforge/: {', '.join(migrated)}", flush=True)
+
+    # Migrate project to current AutoForge version (idempotent, safe)
+    from prompts import migrate_project_to_current
+    version_migrated = migrate_project_to_current(project_dir)
+    if version_migrated:
+        print(f"Upgraded project: {', '.join(version_migrated)}", flush=True)
+
+    # Parse batch testing feature IDs (comma-separated string -> list[int])
+    testing_feature_ids: list[int] | None = None
+    if args.testing_feature_ids:
+        try:
+            testing_feature_ids = [int(x.strip()) for x in args.testing_feature_ids.split(",") if x.strip()]
+        except ValueError:
+            print(f"Error: --testing-feature-ids must be comma-separated integers, got: {args.testing_feature_ids}")
+            return
+
+    # Parse batch coding feature IDs (comma-separated string -> list[int])
+    coding_feature_ids: list[int] | None = None
+    if args.feature_ids:
+        try:
+            coding_feature_ids = [int(x.strip()) for x in args.feature_ids.split(",") if x.strip()]
+        except ValueError:
+            print(f"Error: --feature-ids must be comma-separated integers, got: {args.feature_ids}")
+            return
+
    try:
        if args.agent_type:
            # Subprocess mode - spawned by orchestrator for a specific role
@@ -203,12 +271,25 @@ def main() -> None:
                    max_iterations=args.max_iterations or 1,
                    yolo_mode=args.yolo,
                    feature_id=args.feature_id,
+                    feature_ids=coding_feature_ids,
                    agent_type=args.agent_type,
                    testing_feature_id=args.testing_feature_id,
+                    testing_feature_ids=testing_feature_ids,
                )
            )
        else:
            # Entry point mode - always use unified orchestrator
+            # Clean up stale temp files before starting (prevents temp folder bloat)
+            from temp_cleanup import cleanup_stale_temp
+            cleanup_stats = cleanup_stale_temp()
+            if cleanup_stats["dirs_deleted"] > 0 or cleanup_stats["files_deleted"] > 0:
+                mb_freed = cleanup_stats["bytes_freed"] / (1024 * 1024)
+                print(
+                    f"[CLEANUP] Removed {cleanup_stats['dirs_deleted']} dirs, "
+                    f"{cleanup_stats['files_deleted']} files ({mb_freed:.1f} MB freed)",
+                    flush=True,
+                )
+
            from parallel_orchestrator import run_parallel_orchestrator

            # Clamp concurrency to valid range (1-5)
@@ -223,6 +304,8 @@ def main() -> None:
                    model=args.model,
                    yolo_mode=args.yolo,
                    testing_agent_ratio=args.testing_ratio,
+                    testing_batch_size=args.testing_batch_size,
+                    batch_size=args.batch_size,
                )
            )
    except KeyboardInterrupt:
--- a/bin/autoforge.js
+++ b/bin/autoforge.js
@@ -0,0 +1,3 @@
+#!/usr/bin/env node
+import { run } from '../lib/cli.js';
+run(process.argv.slice(2));
--- a/client.py
+++ b/client.py
@@ -7,6 +7,7 @@ Functions for creating and configuring the Claude Agent SDK client.

 import json
 import os
+import re
 import shutil
 import sys
 from pathlib import Path
@@ -15,100 +16,183 @@ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from claude_agent_sdk.types import HookContext, HookInput, HookMatcher, SyncHookJSONOutput
 from dotenv import load_dotenv

-from security import bash_security_hook
+from security import SENSITIVE_DIRECTORIES, bash_security_hook

 # Load environment variables from .env file if present
 load_dotenv()

-# Default Playwright headless mode - can be overridden via PLAYWRIGHT_HEADLESS env var
-# When True, browser runs invisibly in background
-# When False, browser window is visible (default - useful for monitoring agent progress)
-DEFAULT_PLAYWRIGHT_HEADLESS = False
+# Extra read paths for cross-project file access (read-only)
+# Set EXTRA_READ_PATHS environment variable with comma-separated absolute paths
+# Example: EXTRA_READ_PATHS=/Volumes/Data/dev,/Users/shared/libs
+EXTRA_READ_PATHS_VAR = "EXTRA_READ_PATHS"

-# Environment variables to pass through to Claude CLI for API configuration
-# These allow using alternative API endpoints (e.g., GLM via z.ai) without
-# affecting the user's global Claude Code settings
-API_ENV_VARS = [
-    "ANTHROPIC_BASE_URL",              # Custom API endpoint (e.g., https://api.z.ai/api/anthropic)
-    "ANTHROPIC_AUTH_TOKEN",            # API authentication token
-    "API_TIMEOUT_MS",                  # Request timeout in milliseconds
-    "ANTHROPIC_DEFAULT_SONNET_MODEL",  # Model override for Sonnet
-    "ANTHROPIC_DEFAULT_OPUS_MODEL",    # Model override for Opus
-    "ANTHROPIC_DEFAULT_HAIKU_MODEL",   # Model override for Haiku
-]
+# Sensitive directories that should never be allowed via EXTRA_READ_PATHS.
+# Delegates to the canonical SENSITIVE_DIRECTORIES set in security.py so that
+# this blocklist and the filesystem browser API share a single source of truth.
+EXTRA_READ_PATHS_BLOCKLIST = SENSITIVE_DIRECTORIES


-def get_playwright_headless() -> bool:
+def convert_model_for_vertex(model: str) -> str:
    """
-    Get the Playwright headless mode setting.
+    Convert model name format for Vertex AI compatibility.

-    Reads from PLAYWRIGHT_HEADLESS environment variable, defaults to False.
-    Returns True for headless mode (invisible browser), False for visible browser.
+    Vertex AI uses @ to separate model name from version (e.g., claude-sonnet-4-5@20250929)
+    while the Anthropic API uses - (e.g., claude-sonnet-4-5-20250929).
+    Models without a date suffix (e.g., claude-opus-4-6) pass through unchanged.
+
+    Args:
+        model: Model name in Anthropic format (with hyphens)
+
+    Returns:
+        Model name in Vertex AI format (with @ before date) if Vertex AI is enabled,
+        otherwise returns the model unchanged.
    """
-    value = os.getenv("PLAYWRIGHT_HEADLESS", "false").lower()
-    # Accept various truthy/falsy values
-    return value in ("true", "1", "yes", "on")
+    # Only convert if Vertex AI is enabled
+    if os.getenv("CLAUDE_CODE_USE_VERTEX") != "1":
+        return model
+
+    # Pattern: claude-{name}-{version}-{date} -> claude-{name}-{version}@{date}
+    # Example: claude-sonnet-4-5-20250929 -> claude-sonnet-4-5@20250929
+    # The date is always 8 digits at the end
+    match = re.match(r'^(claude-.+)-(\d{8})$', model)
+    if match:
+        base_name, date = match.groups()
+        return f"{base_name}@{date}"
+
+    # If already in @ format or doesn't match expected pattern, return as-is
+    return model


-# Feature MCP tools for feature/test management
-FEATURE_MCP_TOOLS = [
-    # Core feature operations
+def get_extra_read_paths() -> list[Path]:
+    """
+    Get extra read-only paths from EXTRA_READ_PATHS environment variable.
+
+    Parses comma-separated absolute paths and validates each one:
+    - Must be an absolute path
+    - Must exist and be a directory
+    - Cannot be or contain sensitive directories (e.g., .ssh, .aws)
+
+    Returns:
+        List of validated, canonicalized Path objects.
+    """
+    raw_value = os.getenv(EXTRA_READ_PATHS_VAR, "").strip()
+    if not raw_value:
+        return []
+
+    validated_paths: list[Path] = []
+    home_dir = Path.home()
+
+    for path_str in raw_value.split(","):
+        path_str = path_str.strip()
+        if not path_str:
+            continue
+
+        # Parse and canonicalize the path
+        try:
+            path = Path(path_str).resolve()
+        except (OSError, ValueError) as e:
+            print(f"   - Warning: Invalid EXTRA_READ_PATHS path '{path_str}': {e}")
+            continue
+
+        # Must be absolute (resolve() makes it absolute, but check original input)
+        if not Path(path_str).is_absolute():
+            print(f"   - Warning: EXTRA_READ_PATHS requires absolute paths, skipping: {path_str}")
+            continue
+
+        # Must exist
+        if not path.exists():
+            print(f"   - Warning: EXTRA_READ_PATHS path does not exist, skipping: {path_str}")
+            continue
+
+        # Must be a directory
+        if not path.is_dir():
+            print(f"   - Warning: EXTRA_READ_PATHS path is not a directory, skipping: {path_str}")
+            continue
+
+        # Check against sensitive directory blocklist
+        is_blocked = False
+        for sensitive in EXTRA_READ_PATHS_BLOCKLIST:
+            sensitive_path = (home_dir / sensitive).resolve()
+            try:
+                # Block if path IS the sensitive dir or is INSIDE it
+                if path == sensitive_path or path.is_relative_to(sensitive_path):
+                    print(f"   - Warning: EXTRA_READ_PATHS blocked sensitive path: {path_str}")
+                    is_blocked = True
+                    break
+                # Also block if sensitive dir is INSIDE the requested path
+                if sensitive_path.is_relative_to(path):
+                    print(f"   - Warning: EXTRA_READ_PATHS path contains sensitive directory ({sensitive}): {path_str}")
+                    is_blocked = True
+                    break
+            except (OSError, ValueError):
+                # is_relative_to can raise on some edge cases
+                continue
+
+        if is_blocked:
+            continue
+
+        validated_paths.append(path)
+
+    return validated_paths
+
+
+# Per-agent-type MCP tool lists.
+# Only expose the tools each agent type actually needs, reducing tool schema
+# overhead and preventing agents from calling tools meant for other roles.
+#
+# Tools intentionally omitted from ALL agent lists (UI/orchestrator only):
+#   feature_remove_dependency
+#
+# The ghost tool "feature_release_testing" was removed entirely -- it was
+# listed here but never implemented in mcp_server/feature_mcp.py.
+
+CODING_AGENT_TOOLS = [
    "mcp__features__feature_get_stats",
-    "mcp__features__feature_get_by_id",  # Get assigned feature details
-    "mcp__features__feature_get_summary",  # Lightweight: id, name, status, deps only
-    "mcp__features__feature_mark_in_progress",
-    "mcp__features__feature_claim_and_get",  # Atomic claim + get details
-    "mcp__features__feature_mark_passing",
-    "mcp__features__feature_mark_failing",  # Mark regression detected
-    "mcp__features__feature_skip",
-    "mcp__features__feature_create_bulk",
-    "mcp__features__feature_create",
-    "mcp__features__feature_clear_in_progress",
-    "mcp__features__feature_release_testing",  # Release testing claim
-    # Dependency management
-    "mcp__features__feature_add_dependency",
-    "mcp__features__feature_remove_dependency",
-    "mcp__features__feature_set_dependencies",
-    # Query tools
+    "mcp__features__feature_get_by_id",
+    "mcp__features__feature_get_summary",
    "mcp__features__feature_get_ready",
    "mcp__features__feature_get_blocked",
    "mcp__features__feature_get_graph",
+    "mcp__features__feature_claim_and_get",
+    "mcp__features__feature_mark_in_progress",
+    "mcp__features__feature_mark_passing",
+    "mcp__features__feature_mark_failing",
+    "mcp__features__feature_skip",
+    "mcp__features__feature_clear_in_progress",
 ]

-# Playwright MCP tools for browser automation
-PLAYWRIGHT_TOOLS = [
-    # Core navigation & screenshots
-    "mcp__playwright__browser_navigate",
-    "mcp__playwright__browser_navigate_back",
-    "mcp__playwright__browser_take_screenshot",
-    "mcp__playwright__browser_snapshot",
-
-    # Element interaction
-    "mcp__playwright__browser_click",
-    "mcp__playwright__browser_type",
-    "mcp__playwright__browser_fill_form",
-    "mcp__playwright__browser_select_option",
-    "mcp__playwright__browser_hover",
-    "mcp__playwright__browser_drag",
-    "mcp__playwright__browser_press_key",
-
-    # JavaScript & debugging
-    "mcp__playwright__browser_evaluate",
-    # "mcp__playwright__browser_run_code",  # REMOVED - causes Playwright MCP server crash
-    "mcp__playwright__browser_console_messages",
-    "mcp__playwright__browser_network_requests",
-
-    # Browser management
-    "mcp__playwright__browser_close",
-    "mcp__playwright__browser_resize",
-    "mcp__playwright__browser_tabs",
-    "mcp__playwright__browser_wait_for",
-    "mcp__playwright__browser_handle_dialog",
-    "mcp__playwright__browser_file_upload",
-    "mcp__playwright__browser_install",
+TESTING_AGENT_TOOLS = [
+    "mcp__features__feature_get_stats",
+    "mcp__features__feature_get_by_id",
+    "mcp__features__feature_get_summary",
+    "mcp__features__feature_get_ready",
+    "mcp__features__feature_get_blocked",
+    "mcp__features__feature_get_graph",
+    "mcp__features__feature_mark_passing",
+    "mcp__features__feature_mark_failing",
 ]

-# Built-in tools
+INITIALIZER_AGENT_TOOLS = [
+    "mcp__features__feature_get_stats",
+    "mcp__features__feature_get_ready",
+    "mcp__features__feature_get_blocked",
+    "mcp__features__feature_get_graph",
+    "mcp__features__feature_create_bulk",
+    "mcp__features__feature_create",
+    "mcp__features__feature_add_dependency",
+    "mcp__features__feature_set_dependencies",
+]
+
+# Union of all agent tool lists -- used for permissions (all tools remain
+# *permitted* so the MCP server can respond, but only the agent-type-specific
+# list is included in allowed_tools, which controls what the LLM sees).
+ALL_FEATURE_MCP_TOOLS = sorted(
+    set(CODING_AGENT_TOOLS) | set(TESTING_AGENT_TOOLS) | set(INITIALIZER_AGENT_TOOLS)
+)
+
+# Built-in tools available to agents.
+# WebFetch and WebSearch are included so coding agents can look up current
+# documentation for frameworks and libraries they are implementing.
 BUILTIN_TOOLS = [
    "Read",
    "Write",
@@ -125,7 +209,7 @@ def create_client(
    project_dir: Path,
    model: str,
    yolo_mode: bool = False,
-    agent_id: str | None = None,
+    agent_type: str = "coding",
 ):
    """
    Create a Claude Agent SDK client with multi-layered security.
@@ -133,9 +217,9 @@ def create_client(
    Args:
        project_dir: Directory for the project
        model: Claude model to use
-        yolo_mode: If True, skip Playwright MCP server for rapid prototyping
-        agent_id: Optional unique identifier for browser isolation in parallel mode.
-                  When provided, each agent gets its own browser profile.
+        yolo_mode: If True, skip browser testing for rapid prototyping
+        agent_type: One of "coding", "testing", or "initializer". Controls which
+                    MCP tools are exposed and the max_turns limit.

    Returns:
        Configured ClaudeSDKClient (from claude_agent_sdk)
@@ -149,13 +233,31 @@ def create_client(
    Note: Authentication is handled by start.bat/start.sh before this runs.
    The Claude SDK auto-detects credentials from the Claude CLI configuration
    """
-    # Build allowed tools list based on mode
-    # In YOLO mode, exclude Playwright tools for faster prototyping
-    allowed_tools = [*BUILTIN_TOOLS, *FEATURE_MCP_TOOLS]
-    if not yolo_mode:
-        allowed_tools.extend(PLAYWRIGHT_TOOLS)
+    # Select the feature MCP tools appropriate for this agent type
+    feature_tools_map = {
+        "coding": CODING_AGENT_TOOLS,
+        "testing": TESTING_AGENT_TOOLS,
+        "initializer": INITIALIZER_AGENT_TOOLS,
+    }
+    feature_tools = feature_tools_map.get(agent_type, CODING_AGENT_TOOLS)

-    # Build permissions list
+    # Select max_turns based on agent type:
+    #   - coding/initializer: 300 turns (complex multi-step implementation)
+    #   - testing: 100 turns (focused verification of a single feature)
+    max_turns_map = {
+        "coding": 300,
+        "testing": 100,
+        "initializer": 300,
+    }
+    max_turns = max_turns_map.get(agent_type, 300)
+
+    # Build allowed tools list based on agent type.
+    allowed_tools = [*BUILTIN_TOOLS, *feature_tools]
+
+    # Build permissions list.
+    # We permit ALL feature MCP tools at the security layer (so the MCP server
+    # can respond if called), but the LLM only *sees* the agent-type-specific
+    # subset via allowed_tools above.
    permissions_list = [
        # Allow all file operations within the project directory
        "Read(./**)",
@@ -166,15 +268,21 @@ def create_client(
        # Bash permission granted here, but actual commands are validated
        # by the bash_security_hook (see security.py for allowed commands)
        "Bash(*)",
-        # Allow web tools for documentation lookup
-        "WebFetch",
-        "WebSearch",
+        # Allow web tools for looking up framework/library documentation
+        "WebFetch(*)",
+        "WebSearch(*)",
        # Allow Feature MCP tools for feature management
-        *FEATURE_MCP_TOOLS,
+        *ALL_FEATURE_MCP_TOOLS,
    ]
-    if not yolo_mode:
-        # Allow Playwright MCP tools for browser automation (standard mode only)
-        permissions_list.extend(PLAYWRIGHT_TOOLS)
+
+    # Add extra read paths from environment variable (read-only access)
+    # Paths are validated, canonicalized, and checked against sensitive blocklist
+    extra_read_paths = get_extra_read_paths()
+    for path in extra_read_paths:
+        # Add read-only permissions for each validated path
+        permissions_list.append(f"Read({path}/**)")
+        permissions_list.append(f"Glob({path}/**)")
+        permissions_list.append(f"Grep({path}/**)")

    # Create comprehensive security settings
    # Note: Using relative paths ("./**") restricts access to project directory
@@ -191,18 +299,22 @@ def create_client(
    project_dir.mkdir(parents=True, exist_ok=True)

    # Write settings to a file in the project directory
-    settings_file = project_dir / ".claude_settings.json"
+    from autoforge_paths import get_claude_settings_path
+    settings_file = get_claude_settings_path(project_dir)
+    settings_file.parent.mkdir(parents=True, exist_ok=True)
    with open(settings_file, "w") as f:
        json.dump(security_settings, f, indent=2)

    print(f"Created security settings at {settings_file}")
    print("   - Sandbox enabled (OS-level bash isolation)")
    print(f"   - Filesystem restricted to: {project_dir.resolve()}")
+    if extra_read_paths:
+        print(f"   - Extra read paths (validated): {', '.join(str(p) for p in extra_read_paths)}")
    print("   - Bash commands restricted to allowlist (see security.py)")
    if yolo_mode:
-        print("   - MCP servers: features (database) - YOLO MODE (no Playwright)")
+        print("   - MCP servers: features (database) - YOLO MODE (no browser testing)")
    else:
-        print("   - MCP servers: playwright (browser), features (database)")
+        print("   - MCP servers: features (database)")
    print("   - Project settings enabled (skills, commands, CLAUDE.md)")
    print()

@@ -226,41 +338,32 @@ def create_client(
            },
        },
    }
-    if not yolo_mode:
-        # Include Playwright MCP server for browser automation (standard mode only)
-        # Headless mode is configurable via PLAYWRIGHT_HEADLESS environment variable
-        playwright_args = ["@playwright/mcp@latest", "--viewport-size", "1280x720"]
-        if get_playwright_headless():
-            playwright_args.append("--headless")
-
-        # Browser isolation for parallel execution
-        # Each agent gets its own isolated browser context to prevent tab conflicts
-        if agent_id:
-            # Use --isolated for ephemeral browser context
-            # This creates a fresh, isolated context without persistent state
-            # Note: --isolated and --user-data-dir are mutually exclusive
-            playwright_args.append("--isolated")
-            print(f"   - Browser isolation enabled for agent: {agent_id}")
-
-        mcp_servers["playwright"] = {
-            "command": "npx",
-            "args": playwright_args,
-        }
-
    # Build environment overrides for API endpoint configuration
-    # These override system env vars for the Claude CLI subprocess,
-    # allowing AutoCoder to use alternative APIs (e.g., GLM) without
-    # affecting the user's global Claude Code settings
-    sdk_env = {}
-    for var in API_ENV_VARS:
-        value = os.getenv(var)
-        if value:
-            sdk_env[var] = value
+    # Uses get_effective_sdk_env() which reads provider settings from the database,
+    # ensuring UI-configured alternative providers (GLM, Ollama, Kimi, Custom) propagate
+    # correctly to the Claude CLI subprocess
+    from registry import get_effective_sdk_env
+    sdk_env = get_effective_sdk_env()

+    # Detect alternative API mode (Ollama, GLM, or Vertex AI)
+    base_url = sdk_env.get("ANTHROPIC_BASE_URL", "")
+    is_vertex = sdk_env.get("CLAUDE_CODE_USE_VERTEX") == "1"
+    is_alternative_api = bool(base_url) or is_vertex
+    is_ollama = "localhost:11434" in base_url or "127.0.0.1:11434" in base_url
+    is_azure = "services.ai.azure.com" in base_url
+    model = convert_model_for_vertex(model)
    if sdk_env:
        print(f"   - API overrides: {', '.join(sdk_env.keys())}")
-        if "ANTHROPIC_BASE_URL" in sdk_env:
-            print(f"   - GLM Mode: Using {sdk_env['ANTHROPIC_BASE_URL']}")
+        if is_vertex:
+            project_id = sdk_env.get("ANTHROPIC_VERTEX_PROJECT_ID", "unknown")
+            region = sdk_env.get("CLOUD_ML_REGION", "unknown")
+            print(f"   - Vertex AI Mode: Using GCP project '{project_id}' with model '{model}' in region '{region}'")
+        elif is_ollama:
+            print("   - Ollama Mode: Using local models")
+        elif is_azure:
+            print(f"   - Azure Mode: Using {base_url}")
+        elif "ANTHROPIC_BASE_URL" in sdk_env:
+            print(f"   - Alternative API: Using {sdk_env['ANTHROPIC_BASE_URL']}")

    # Create a wrapper for bash_security_hook that passes project_dir via context
    async def bash_hook_with_context(input_data, tool_use_id=None, context=None):
@@ -270,9 +373,10 @@ def create_client(
        context["project_dir"] = str(project_dir.resolve())
        return await bash_security_hook(input_data, tool_use_id, context)

-    # PreCompact hook for logging and customizing context compaction
+    # PreCompact hook for logging and customizing context compaction.
    # Compaction is handled automatically by Claude Code CLI when context approaches limits.
-    # This hook allows us to log when compaction occurs and optionally provide custom instructions.
+    # This hook provides custom instructions that guide the summarizer to preserve
+    # critical workflow state while discarding verbose/redundant content.
    async def pre_compact_hook(
        input_data: HookInput,
        tool_use_id: str | None,
@@ -285,8 +389,9 @@ def create_client(
        - "auto": Automatic compaction when context approaches token limits
        - "manual": User-initiated compaction via /compact command

-        The hook can customize compaction via hookSpecificOutput:
-        - customInstructions: String with focus areas for summarization
+        Returns custom instructions that guide the compaction summarizer to:
+        1. Preserve critical workflow state (feature ID, modified files, test results)
+        2. Discard verbose content (screenshots, long grep outputs, repeated reads)
        """
        trigger = input_data.get("trigger", "auto")
        custom_instructions = input_data.get("custom_instructions")
@@ -297,18 +402,53 @@ def create_client(
            print("[Context] Manual compaction requested")

        if custom_instructions:
-            print(f"[Context] Custom instructions: {custom_instructions}")
+            print(f"[Context] Custom instructions provided: {custom_instructions}")

-        # Return empty dict to allow compaction to proceed with default behavior
-        # To customize, return:
-        # {
-        #     "hookSpecificOutput": {
-        #         "hookEventName": "PreCompact",
-        #         "customInstructions": "Focus on preserving file paths and test results"
-        #     }
-        # }
-        return SyncHookJSONOutput()
+        # Build compaction instructions that preserve workflow-critical context
+        # while discarding verbose content that inflates token usage.
+        #
+        # The summarizer receives these instructions and uses them to decide
+        # what to keep vs. discard during context compaction.
+        compaction_guidance = "\n".join([
+            "## PRESERVE (critical workflow state)",
+            "- Current feature ID, feature name, and feature status (pending/in_progress/passing/failing)",
+            "- List of all files created or modified during this session, with their paths",
+            "- Last test/lint/type-check results: command run, pass/fail status, and key error messages",
+            "- Current step in the workflow (e.g., implementing, testing, fixing lint errors)",
+            "- Any dependency information (which features block this one)",
+            "- Git operations performed (commits, branches created)",
+            "- MCP tool call results (feature_claim_and_get, feature_mark_passing, etc.)",
+            "- Key architectural decisions made during this session",
+            "",
+            "## DISCARD (verbose content safe to drop)",
+            "- Full screenshot base64 data (just note that a screenshot was taken and what it showed)",
+            "- Long grep/find/glob output listings (summarize to: searched for X, found Y relevant files)",
+            "- Repeated file reads of the same file (keep only the latest read or a summary of changes)",
+            "- Full file contents from Read tool (summarize to: read file X, key sections were Y)",
+            "- Verbose npm/pip install output (just note: dependencies installed successfully/failed)",
+            "- Full lint/type-check output when passing (just note: lint passed with no errors)",
+            "- Browser console message dumps (summarize to: N errors found, key error was X)",
+            "- Redundant tool result confirmations ([Done] markers)",
+        ])

+        print("[Context] Applying custom compaction instructions (preserve workflow state, discard verbose content)")
+
+        # The SDK's HookSpecificOutput union type does not yet include a
+        # PreCompactHookSpecificOutput variant, but the CLI protocol accepts
+        # {"hookEventName": "PreCompact", "customInstructions": "..."}.
+        # The dict is serialized to JSON and sent to the CLI process directly,
+        # so the runtime behavior is correct despite the type mismatch.
+        return SyncHookJSONOutput(
+            hookSpecificOutput={  # type: ignore[typeddict-item]
+                "hookEventName": "PreCompact",
+                "customInstructions": compaction_guidance,
+            }
+        )
+
+    # PROMPT CACHING: The Claude Code CLI applies cache_control breakpoints internally.
+    # Our system_prompt benefits from automatic caching without explicit configuration.
+    # If explicit cache_control is needed, the SDK would need to accept content blocks
+    # with cache_control fields (not currently supported in v0.1.x).
    return ClaudeSDKClient(
        options=ClaudeAgentOptions(
            model=model,
@@ -317,7 +457,7 @@ def create_client(
            setting_sources=["project"],  # Enable skills, commands, and CLAUDE.md from project dir
            max_buffer_size=10 * 1024 * 1024,  # 10MB for large Playwright screenshots
            allowed_tools=allowed_tools,
-            mcp_servers=mcp_servers,
+            mcp_servers=mcp_servers,  # type: ignore[arg-type]  # SDK accepts dict config at runtime
            hooks={
                "PreToolUse": [
                    HookMatcher(matcher="Bash", hooks=[bash_hook_with_context]),
@@ -329,14 +469,15 @@ def create_client(
                    HookMatcher(hooks=[pre_compact_hook]),
                ],
            },
-            max_turns=1000,
+            max_turns=max_turns,
            cwd=str(project_dir.resolve()),
            settings=str(settings_file.resolve()),  # Use absolute path
            env=sdk_env,  # Pass API configuration overrides to CLI subprocess
            # Enable extended context beta for better handling of long sessions.
            # This provides up to 1M tokens of context with automatic compaction.
            # See: https://docs.anthropic.com/en/api/beta-headers
-            betas=["context-1m-2025-08-07"],
+            # Disabled for alternative APIs (Ollama, GLM, Vertex AI) as they don't support this beta.
+            betas=[] if is_alternative_api else ["context-1m-2025-08-07"],
            # Note on context management:
            # The Claude Agent SDK handles context management automatically through the
            # underlying Claude Code CLI. When context approaches limits, the CLI
@@ -346,7 +487,7 @@ def create_client(
            # parameters. Instead, context is managed via:
            # 1. betas=["context-1m-2025-08-07"] - Extended context window
            # 2. PreCompact hook - Intercept and customize compaction behavior
-            # 3. max_turns - Limit conversation turns (set to 1000 for long sessions)
+            # 3. max_turns - Limit conversation turns (per agent type: coding=300, testing=100)
            #
            # Future SDK versions may add explicit compaction controls. When available,
            # consider adding:
--- a/env_constants.py
+++ b/env_constants.py
@@ -0,0 +1,28 @@
+"""
+Shared Environment Variable Constants
+======================================
+
+Single source of truth for environment variables forwarded to Claude CLI
+subprocesses.  Imported by both ``client.py`` (agent sessions) and
+``server/services/chat_constants.py`` (chat sessions) to avoid maintaining
+duplicate lists.
+
+These allow autoforge to use alternative API endpoints (Ollama, GLM,
+Vertex AI) without affecting the user's global Claude Code settings.
+"""
+
+API_ENV_VARS: list[str] = [
+    # Core API configuration
+    "ANTHROPIC_BASE_URL",              # Custom API endpoint (e.g., https://api.z.ai/api/anthropic)
+    "ANTHROPIC_AUTH_TOKEN",            # API authentication token
+    "ANTHROPIC_API_KEY",               # API key (used by Kimi and other providers)
+    "API_TIMEOUT_MS",                  # Request timeout in milliseconds
+    # Model tier overrides
+    "ANTHROPIC_DEFAULT_SONNET_MODEL",  # Model override for Sonnet
+    "ANTHROPIC_DEFAULT_OPUS_MODEL",    # Model override for Opus
+    "ANTHROPIC_DEFAULT_HAIKU_MODEL",   # Model override for Haiku
+    # Vertex AI configuration
+    "CLAUDE_CODE_USE_VERTEX",          # Enable Vertex AI mode (set to "1")
+    "CLOUD_ML_REGION",                 # GCP region (e.g., us-east5)
+    "ANTHROPIC_VERTEX_PROJECT_ID",     # GCP project ID
+]
--- a/examples/OPTIMIZE_CONFIG.md
+++ b/examples/OPTIMIZE_CONFIG.md
@@ -179,7 +179,7 @@ To see what you can reduce:

 ```bash
 # Count commands by prefix
-grep "^  - name:" .autocoder/allowed_commands.yaml | \
+grep "^  - name:" .autoforge/allowed_commands.yaml | \
  sed 's/^  - name: //' | \
  cut -d' ' -f1 | \
  sort | uniq -c | sort -rn
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,4 +1,4 @@
-# AutoCoder Security Configuration Examples
+# AutoForge Security Configuration Examples

 This directory contains example configuration files for controlling which bash commands the autonomous coding agent can execute.

@@ -18,11 +18,11 @@ This directory contains example configuration files for controlling which bash c

 ### For a Single Project (Most Common)

-When you create a new project with AutoCoder, it automatically creates:
+When you create a new project with AutoForge, it automatically creates:

 ```text
 my-project/
-  .autocoder/
+  .autoforge/
    allowed_commands.yaml    ← Automatically created from template
 ```

@@ -34,17 +34,17 @@ If you want commands available across **all projects**, manually create:

 ```bash
 # Copy the example to your home directory
-cp examples/org_config.yaml ~/.autocoder/config.yaml
+cp examples/org_config.yaml ~/.autoforge/config.yaml

 # Edit it to add org-wide commands
-nano ~/.autocoder/config.yaml
+nano ~/.autoforge/config.yaml
 ```

 ---

 ## Project-Level Configuration

-**File:** `{project_dir}/.autocoder/allowed_commands.yaml`
+**File:** `{project_dir}/.autoforge/allowed_commands.yaml`

 **Purpose:** Define commands needed for THIS specific project.

@@ -82,7 +82,7 @@ commands:

 ## Organization-Level Configuration

-**File:** `~/.autocoder/config.yaml`
+**File:** `~/.autoforge/config.yaml`

 **Purpose:** Define commands and policies for ALL projects.

@@ -127,13 +127,13 @@ When the agent tries to run a command, the system checks in this order:
 └─────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────┐
-│ 2. ORG BLOCKLIST (~/.autocoder/config.yaml)         │
+│ 2. ORG BLOCKLIST (~/.autoforge/config.yaml)         │
 │    Commands you block organization-wide             │
 │    ❌ Projects CANNOT override these                │
 └─────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────┐
-│ 3. ORG ALLOWLIST (~/.autocoder/config.yaml)         │
+│ 3. ORG ALLOWLIST (~/.autoforge/config.yaml)         │
 │    Commands available to all projects               │
 │    ✅ Automatically available                       │
 └─────────────────────────────────────────────────────┘
@@ -145,7 +145,7 @@ When the agent tries to run a command, the system checks in this order:
 └─────────────────────────────────────────────────────┘
                         ↓
 ┌─────────────────────────────────────────────────────┐
-│ 5. PROJECT ALLOWLIST (.autocoder/allowed_commands)  │
+│ 5. PROJECT ALLOWLIST (.autoforge/allowed_commands)  │
 │    Project-specific commands                        │
 │    ✅ Available only to this project                │
 └─────────────────────────────────────────────────────┘
@@ -195,7 +195,7 @@ Matches:

 ### iOS Development

-**Project config** (`.autocoder/allowed_commands.yaml`):
+**Project config** (`.autoforge/allowed_commands.yaml`):
 ```yaml
 version: 1
 commands:
@@ -245,7 +245,7 @@ commands:

 ### Enterprise Organization (Restrictive)

-**Org config** (`~/.autocoder/config.yaml`):
+**Org config** (`~/.autoforge/config.yaml`):
 ```yaml
 version: 1

@@ -265,7 +265,7 @@ blocked_commands:

 ### Startup Team (Permissive)

-**Org config** (`~/.autocoder/config.yaml`):
+**Org config** (`~/.autoforge/config.yaml`):
 ```yaml
 version: 1

@@ -394,7 +394,7 @@ These commands are **NEVER allowed**, even with user approval:

 **Solution:** Add the command to your project config:
 ```yaml
-# In .autocoder/allowed_commands.yaml
+# In .autoforge/allowed_commands.yaml
 commands:
  - name: X
    description: What this command does
@@ -405,7 +405,7 @@ commands:
 **Cause:** The command is in the org blocklist or hardcoded blocklist.

 **Solution:**
- If in org blocklist: Edit `~/.autocoder/config.yaml` to remove it
+- If in org blocklist: Edit `~/.autoforge/config.yaml` to remove it
 - If in hardcoded blocklist: Cannot be allowed (by design)

 ### Error: "Could not parse YAML config"
@@ -422,8 +422,8 @@ commands:
 **Solution:**
 1. Restart the agent (changes are loaded on startup)
 2. Verify file location:
-   - Project: `{project}/.autocoder/allowed_commands.yaml`
-   - Org: `~/.autocoder/config.yaml` (must be manually created)
+   - Project: `{project}/.autoforge/allowed_commands.yaml`
+   - Org: `~/.autoforge/config.yaml` (must be manually created)
 3. Check YAML is valid (run through a YAML validator)

 ---
@@ -432,7 +432,7 @@ commands:

 ### Running the Tests

-AutoCoder has comprehensive tests for the security system:
+AutoForge has comprehensive tests for the security system:

 **Unit Tests** (136 tests - fast):
 ```bash
@@ -481,7 +481,7 @@ python start.py
 cd path/to/security-test

 # Edit the config
-nano .autocoder/allowed_commands.yaml
+nano .autoforge/allowed_commands.yaml
 ```

 **3. Add a test command (e.g., Swift):**
@@ -509,7 +509,7 @@ Or:
 ```text
 Command 'wget' is not allowed.
 To allow this command:
-  1. Add to .autocoder/allowed_commands.yaml for this project, OR
+  1. Add to .autoforge/allowed_commands.yaml for this project, OR
  2. Request mid-session approval (the agent can ask)
 ```

--- a/examples/org_config.yaml
+++ b/examples/org_config.yaml
@@ -1,6 +1,6 @@
-# Organization-Level AutoCoder Configuration
+# Organization-Level AutoForge Configuration
 # ============================================
-# Location: ~/.autocoder/config.yaml
+# Location: ~/.autoforge/config.yaml
 #
 # IMPORTANT: This file is OPTIONAL and must be manually created by you.
 # It does NOT exist by default.
@@ -22,7 +22,7 @@ version: 1
 # Organization-Wide Allowed Commands
 # ==========================================
 # These commands become available to ALL projects automatically.
-# Projects don't need to add them to their own .autocoder/allowed_commands.yaml
+# Projects don't need to add them to their own .autoforge/allowed_commands.yaml
 #
 # By default, this is empty. Uncomment and add commands as needed.

@@ -122,7 +122,7 @@ approval_timeout_minutes: 5
 #    Default commands: npm, git, curl, ls, cat, etc.
 #    Always available to all projects.
 #
-# 5. Project Allowed Commands (.autocoder/allowed_commands.yaml)
+# 5. Project Allowed Commands (.autoforge/allowed_commands.yaml)
 #    Project-specific commands defined in each project.
 #    LOWEST PRIORITY (can't override blocks above).
 #
@@ -165,7 +165,7 @@ approval_timeout_minutes: 5
 # ==========================================
 # To Create This File
 # ==========================================
-# 1. Copy this example to: ~/.autocoder/config.yaml
+# 1. Copy this example to: ~/.autoforge/config.yaml
 # 2. Uncomment and customize the sections you need
 # 3. Leave empty lists if you don't need org-level controls
 #
--- a/examples/project_allowed_commands.yaml
+++ b/examples/project_allowed_commands.yaml
@@ -1,12 +1,12 @@
 # Project-Specific Allowed Commands
 # ==================================
-# Location: {project_dir}/.autocoder/allowed_commands.yaml
+# Location: {project_dir}/.autoforge/allowed_commands.yaml
 #
 # This file defines bash commands that the autonomous coding agent can use
 # for THIS SPECIFIC PROJECT, beyond the default allowed commands.
 #
-# When you create a new project, AutoCoder automatically creates this file
-# in your project's .autocoder/ directory. You can customize it for your
+# When you create a new project, AutoForge automatically creates this file
+# in your project's .autoforge/ directory. You can customize it for your
 # project's specific needs (iOS, Rust, Python, etc.).

 version: 1
@@ -115,7 +115,7 @@ commands: []
 # Limits:
 #   - Maximum 100 commands per project
 #   - Commands in the blocklist (sudo, dd, shutdown, etc.) can NEVER be allowed
-#   - Org-level blocked commands (see ~/.autocoder/config.yaml) cannot be overridden
+#   - Org-level blocked commands (see ~/.autoforge/config.yaml) cannot be overridden
 #
 # Default Allowed Commands (always available):
 #   File operations: ls, cat, head, tail, wc, grep, cp, mkdir, mv, rm, touch
--- a/lib/cli.js
+++ b/lib/cli.js
@@ -0,0 +1,834 @@
+/**
+ * AutoForge CLI
+ * =============
+ *
+ * Main CLI module for the AutoForge npm global package.
+ * Handles Python detection, virtual environment management,
+ * config loading, and uvicorn server lifecycle.
+ *
+ * Uses only Node.js built-in modules -- no external dependencies.
+ */
+
+import { execFileSync, spawn, execSync } from 'node:child_process';
+import { createHash } from 'node:crypto';
+import { existsSync, readFileSync, writeFileSync, mkdirSync, unlinkSync, rmSync, copyFileSync } from 'node:fs';
+import { createRequire } from 'node:module';
+import { createServer } from 'node:net';
+import { homedir, platform } from 'node:os';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+// ---------------------------------------------------------------------------
+// Path constants
+// ---------------------------------------------------------------------------
+
+/** Root of the autoforge npm package (one level up from lib/) */
+const PKG_DIR = dirname(dirname(fileURLToPath(import.meta.url)));
+
+/** User config home: ~/.autoforge/ */
+const CONFIG_HOME = join(homedir(), '.autoforge');
+
+/** Virtual-environment directory managed by the CLI */
+const VENV_DIR = join(CONFIG_HOME, 'venv');
+
+/** Composite marker written after a successful pip install */
+const DEPS_MARKER = join(VENV_DIR, '.deps-installed');
+
+/** PID file for the running server */
+const PID_FILE = join(CONFIG_HOME, 'server.pid');
+
+/** Path to the production requirements file inside the package */
+const REQUIREMENTS_FILE = join(PKG_DIR, 'requirements-prod.txt');
+
+/** Path to the .env example shipped with the package */
+const ENV_EXAMPLE = join(PKG_DIR, '.env.example');
+
+/** User .env config file */
+const ENV_FILE = join(CONFIG_HOME, '.env');
+
+const IS_WIN = platform() === 'win32';
+
+// ---------------------------------------------------------------------------
+// Package version (read lazily via createRequire)
+// ---------------------------------------------------------------------------
+
+const require = createRequire(import.meta.url);
+const { version: VERSION } = require(join(PKG_DIR, 'package.json'));
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+/** Indented console output matching the spec format. */
+function log(msg = '') {
+  console.log(`  ${msg}`);
+}
+
+/** Print a fatal error and exit. */
+function die(msg) {
+  console.error(`\n  Error: ${msg}\n`);
+  process.exit(1);
+}
+
+/**
+ * Parse a Python version string like "Python 3.13.6" and return
+ * { major, minor, patch, raw } or null on failure.
+ */
+function parsePythonVersion(raw) {
+  const m = raw.match(/Python\s+(\d+)\.(\d+)\.(\d+)/);
+  if (!m) return null;
+  return {
+    major: Number(m[1]),
+    minor: Number(m[2]),
+    patch: Number(m[3]),
+    raw: `${m[1]}.${m[2]}.${m[3]}`,
+  };
+}
+
+/**
+ * Try a single Python candidate. Returns { exe, version } or null.
+ * `candidate` is either a bare name or an array of args (e.g. ['py', '-3']).
+ */
+function tryPythonCandidate(candidate) {
+  const args = Array.isArray(candidate) ? candidate : [candidate];
+  const exe = args[0];
+  const extraArgs = args.slice(1);
+
+  try {
+    const out = execFileSync(exe, [...extraArgs, '--version'], {
+      encoding: 'utf8',
+      timeout: 10_000,
+      stdio: ['pipe', 'pipe', 'pipe'],
+    });
+
+    const ver = parsePythonVersion(out);
+    if (!ver) return null;
+
+    // Require 3.11+
+    if (ver.major < 3 || (ver.major === 3 && ver.minor < 11)) {
+      return { exe: args.join(' '), version: ver, tooOld: true };
+    }
+
+    return { exe: args.join(' '), version: ver, tooOld: false };
+  } catch {
+    return null;
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Python detection
+// ---------------------------------------------------------------------------
+
+/**
+ * Find a suitable Python >= 3.11 interpreter.
+ *
+ * Search order is platform-dependent:
+ *   Windows:     python -> py -3 -> python3
+ *   macOS/Linux: python3 -> python
+ *
+ * The AUTOFORGE_PYTHON env var overrides automatic detection.
+ *
+ * After finding a candidate we also verify that the venv module is
+ * available (Debian/Ubuntu strip it out of the base package).
+ */
+function findPython() {
+  // Allow explicit override via environment variable
+  const override = process.env.AUTOFORGE_PYTHON;
+  if (override) {
+    const result = tryPythonCandidate(override);
+    if (!result) {
+      die(`AUTOFORGE_PYTHON is set to "${override}" but it could not be executed.`);
+    }
+    if (result.tooOld) {
+      die(
+        `Python ${result.version.raw} found (via AUTOFORGE_PYTHON), but 3.11+ required.\n` +
+        '  Install Python 3.11+ from https://python.org'
+      );
+    }
+    return result;
+  }
+
+  // Platform-specific candidate order
+  const candidates = IS_WIN
+    ? ['python', ['py', '-3'], 'python3']
+    : ['python3', 'python'];
+
+  let bestTooOld = null;
+
+  for (const candidate of candidates) {
+    const result = tryPythonCandidate(candidate);
+    if (!result) continue;
+
+    if (result.tooOld) {
+      // Remember the first "too old" result for a better error message
+      if (!bestTooOld) bestTooOld = result;
+      continue;
+    }
+
+    // Verify venv module is available (Debian/Ubuntu may need python3-venv)
+    try {
+      const exeParts = result.exe.split(' ');
+      execFileSync(exeParts[0], [...exeParts.slice(1), '-c', 'import ensurepip'], {
+        encoding: 'utf8',
+        timeout: 10_000,
+        stdio: ['pipe', 'pipe', 'pipe'],
+      });
+    } catch {
+      die(
+        `Python venv module not available.\n` +
+        `  Run: sudo apt install python3.${result.version.minor}-venv`
+      );
+    }
+
+    return result;
+  }
+
+  // Provide the most helpful error message we can
+  if (bestTooOld) {
+    die(
+      `Python ${bestTooOld.version.raw} found, but 3.11+ required.\n` +
+      '  Install Python 3.11+ from https://python.org'
+    );
+  }
+  die(
+    'Python 3.11+ required but not found.\n' +
+    '  Install from https://python.org'
+  );
+}
+
+// ---------------------------------------------------------------------------
+// Venv management
+// ---------------------------------------------------------------------------
+
+/** Return the path to the Python executable inside the venv. */
+function venvPython() {
+  return IS_WIN
+    ? join(VENV_DIR, 'Scripts', 'python.exe')
+    : join(VENV_DIR, 'bin', 'python');
+}
+
+/** SHA-256 hash of the requirements-prod.txt file contents. */
+function requirementsHash() {
+  const content = readFileSync(REQUIREMENTS_FILE, 'utf8');
+  return createHash('sha256').update(content).digest('hex');
+}
+
+/**
+ * Read the composite deps marker. Returns the parsed JSON object
+ * or null if the file is missing / corrupt.
+ */
+function readMarker() {
+  try {
+    return JSON.parse(readFileSync(DEPS_MARKER, 'utf8'));
+  } catch {
+    return null;
+  }
+}
+
+/**
+ * Ensure the virtual environment exists and dependencies are installed.
+ * Returns true if all setup steps were already satisfied (fast path).
+ *
+ * @param {object} python - The result of findPython()
+ * @param {boolean} forceRecreate - If true, delete and recreate the venv
+ */
+function ensureVenv(python, forceRecreate) {
+  mkdirSync(CONFIG_HOME, { recursive: true });
+
+  const marker = readMarker();
+  const reqHash = requirementsHash();
+  const pyExe = venvPython();
+
+  // Determine if the venv itself needs to be (re)created
+  let needsCreate = forceRecreate || !existsSync(pyExe);
+
+  if (!needsCreate && marker) {
+    // Recreate if Python major.minor changed
+    const markerMinor = marker.python_version;
+    const currentMinor = `${python.version.major}.${python.version.minor}`;
+    if (markerMinor && markerMinor !== currentMinor) {
+      needsCreate = true;
+    }
+
+    // Recreate if the recorded python path no longer exists
+    if (marker.python_path && !existsSync(marker.python_path)) {
+      needsCreate = true;
+    }
+  }
+
+  let depsUpToDate = false;
+  if (!needsCreate && marker && marker.requirements_hash === reqHash) {
+    depsUpToDate = true;
+  }
+
+  // Fast path: nothing to do
+  if (!needsCreate && depsUpToDate) {
+    return true;
+  }
+
+  // --- Slow path: show setup progress ---
+
+  log('[2/3] Setting up environment...');
+
+  if (needsCreate) {
+    if (existsSync(VENV_DIR)) {
+      log('      Removing old virtual environment...');
+      rmSync(VENV_DIR, { recursive: true, force: true });
+    }
+
+    log(`      Creating virtual environment at ~/.autoforge/venv/`);
+    const exeParts = python.exe.split(' ');
+    try {
+      execFileSync(exeParts[0], [...exeParts.slice(1), '-m', 'venv', VENV_DIR], {
+        encoding: 'utf8',
+        timeout: 120_000,
+        stdio: ['pipe', 'pipe', 'pipe'],
+      });
+    } catch (err) {
+      die(`Failed to create virtual environment: ${err.message}`);
+    }
+  }
+
+  // Install / update dependencies
+  log('      Installing dependencies...');
+  try {
+    execFileSync(pyExe, ['-m', 'pip', 'install', '-q', '--upgrade', 'pip'], {
+      encoding: 'utf8',
+      timeout: 300_000,
+      stdio: ['pipe', 'pipe', 'pipe'],
+    });
+
+    execFileSync(pyExe, ['-m', 'pip', 'install', '-q', '-r', REQUIREMENTS_FILE], {
+      encoding: 'utf8',
+      timeout: 600_000,
+      stdio: ['pipe', 'pipe', 'pipe'],
+    });
+  } catch (err) {
+    die(`Failed to install dependencies: ${err.message}`);
+  }
+
+  // Write marker only after pip succeeds to prevent partial state
+  const markerData = {
+    requirements_hash: reqHash,
+    python_version: `${python.version.major}.${python.version.minor}`,
+    python_path: pyExe,
+    created_at: new Date().toISOString(),
+  };
+  writeFileSync(DEPS_MARKER, JSON.stringify(markerData, null, 2), 'utf8');
+
+  log('      Done');
+  return false;
+}
+
+// ---------------------------------------------------------------------------
+// Config (.env) management
+// ---------------------------------------------------------------------------
+
+/**
+ * Parse a .env file into a plain object.
+ * Handles comments, blank lines, and quoted values.
+ */
+function parseEnvFile(filePath) {
+  const env = {};
+  if (!existsSync(filePath)) return env;
+
+  const lines = readFileSync(filePath, 'utf8').split('\n');
+  for (const line of lines) {
+    const trimmed = line.trim();
+    if (!trimmed || trimmed.startsWith('#')) continue;
+
+    const eqIdx = trimmed.indexOf('=');
+    if (eqIdx === -1) continue;
+
+    const key = trimmed.slice(0, eqIdx).trim();
+    let value = trimmed.slice(eqIdx + 1).trim();
+
+    // Strip matching quotes (single or double)
+    if (
+      (value.startsWith('"') && value.endsWith('"')) ||
+      (value.startsWith("'") && value.endsWith("'"))
+    ) {
+      value = value.slice(1, -1);
+    }
+
+    if (key) {
+      env[key] = value;
+    }
+  }
+  return env;
+}
+
+/**
+ * Ensure ~/.autoforge/.env exists. On first run, copy .env.example
+ * from the package directory and print a notice.
+ *
+ * Returns true if the file was newly created.
+ */
+function ensureEnvFile() {
+  if (existsSync(ENV_FILE)) return false;
+
+  mkdirSync(CONFIG_HOME, { recursive: true });
+
+  if (existsSync(ENV_EXAMPLE)) {
+    copyFileSync(ENV_EXAMPLE, ENV_FILE);
+  } else {
+    // Fallback: create a minimal placeholder
+    writeFileSync(ENV_FILE, '# AutoForge configuration\n# See documentation for available options.\n', 'utf8');
+  }
+  return true;
+}
+
+// ---------------------------------------------------------------------------
+// Port detection
+// ---------------------------------------------------------------------------
+
+/**
+ * Find an available TCP port starting from `start`.
+ * Tries by actually binding a socket (most reliable cross-platform approach).
+ */
+function findAvailablePort(start = 8888, maxAttempts = 20) {
+  for (let port = start; port < start + maxAttempts; port++) {
+    try {
+      const server = createServer();
+      // Use a synchronous-like approach: try to listen, then close immediately
+      const result = new Promise((resolve, reject) => {
+        server.once('error', reject);
+        server.listen(port, '127.0.0.1', () => {
+          server.close(() => resolve(port));
+        });
+      });
+      // We cannot await here (sync context), so use the blocking approach:
+      // Try to bind synchronously using a different technique.
+      server.close();
+    } catch {
+      // fall through
+    }
+  }
+  // Synchronous fallback: try to connect; if connection refused, port is free.
+  for (let port = start; port < start + maxAttempts; port++) {
+    try {
+      execFileSync(process.execPath, [
+        '-e',
+        `const s=require("net").createServer();` +
+        `s.listen(${port},"127.0.0.1",()=>{s.close();process.exit(0)});` +
+        `s.on("error",()=>process.exit(1))`,
+      ], { timeout: 3000, stdio: 'pipe' });
+      return port;
+    } catch {
+      continue;
+    }
+  }
+  die(`No available ports found in range ${start}-${start + maxAttempts - 1}`);
+}
+
+// ---------------------------------------------------------------------------
+// PID file management
+// ---------------------------------------------------------------------------
+
+/** Read PID from the PID file. Returns the PID number or null. */
+function readPid() {
+  try {
+    const content = readFileSync(PID_FILE, 'utf8').trim();
+    const pid = Number(content);
+    return Number.isFinite(pid) && pid > 0 ? pid : null;
+  } catch {
+    return null;
+  }
+}
+
+/** Check whether a process with the given PID is still running. */
+function isProcessAlive(pid) {
+  try {
+    process.kill(pid, 0); // signal 0 = existence check
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+/** Write the PID file. */
+function writePid(pid) {
+  mkdirSync(CONFIG_HOME, { recursive: true });
+  writeFileSync(PID_FILE, String(pid), 'utf8');
+}
+
+/** Remove the PID file. */
+function removePid() {
+  try {
+    unlinkSync(PID_FILE);
+  } catch {
+    // Ignore -- file may already be gone
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Browser opening
+// ---------------------------------------------------------------------------
+
+/** Open a URL in the user's default browser (best-effort). */
+function openBrowser(url) {
+  try {
+    if (IS_WIN) {
+      // "start" is a cmd built-in; the empty title string avoids
+      // issues when the URL contains special characters.
+      execSync(`start "" "${url}"`, { stdio: 'ignore' });
+    } else if (platform() === 'darwin') {
+      execFileSync('open', [url], { stdio: 'ignore' });
+    } else {
+      // Linux: only attempt if a display server is available and
+      // we are not in an SSH session.
+      const hasDisplay = process.env.DISPLAY || process.env.WAYLAND_DISPLAY;
+      const isSSH = !!process.env.SSH_TTY;
+      if (hasDisplay && !isSSH) {
+        execFileSync('xdg-open', [url], { stdio: 'ignore' });
+      }
+    }
+  } catch {
+    // Non-fatal: user can open the URL manually
+  }
+}
+
+/** Detect headless / CI environments where opening a browser is pointless. */
+function isHeadless() {
+  if (process.env.CI) return true;
+  if (process.env.CODESPACES) return true;
+  if (process.env.SSH_TTY) return true;
+  // Linux without a display server
+  if (!IS_WIN && platform() !== 'darwin' && !process.env.DISPLAY && !process.env.WAYLAND_DISPLAY) {
+    return true;
+  }
+  return false;
+}
+
+// ---------------------------------------------------------------------------
+// Process cleanup
+// ---------------------------------------------------------------------------
+
+/** Kill a process tree. On Windows uses taskkill; elsewhere sends SIGTERM. */
+function killProcess(pid) {
+  try {
+    if (IS_WIN) {
+      execSync(`taskkill /pid ${pid} /t /f`, { stdio: 'ignore' });
+    } else {
+      process.kill(pid, 'SIGTERM');
+    }
+  } catch {
+    // Process may already be gone
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Playwright CLI
+// ---------------------------------------------------------------------------
+
+/**
+ * Ensure playwright-cli is available globally for browser automation.
+ * Returns true if available (already installed or freshly installed).
+ *
+ * @param {boolean} showProgress - If true, print install progress
+ */
+function ensurePlaywrightCli(showProgress) {
+  try {
+    execSync('playwright-cli --version', {
+      timeout: 10_000,
+      stdio: ['pipe', 'pipe', 'pipe'],
+    });
+    return true;
+  } catch {
+    // Not installed — try to install
+  }
+
+  if (showProgress) {
+    log('      Installing playwright-cli for browser automation...');
+  }
+  try {
+    execSync('npm install -g @playwright/cli', {
+      timeout: 120_000,
+      stdio: ['pipe', 'pipe', 'pipe'],
+    });
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+// ---------------------------------------------------------------------------
+// CLI commands
+// ---------------------------------------------------------------------------
+
+function printVersion() {
+  console.log(`autoforge v${VERSION}`);
+}
+
+function printHelp() {
+  console.log(`
+  AutoForge v${VERSION}
+  Autonomous coding agent with web UI
+
+  Usage:
+    autoforge                       Start the server (default)
+    autoforge config                Open ~/.autoforge/.env in $EDITOR
+    autoforge config --path         Print config file path
+    autoforge config --show         Show effective configuration
+
+  Options:
+    --port PORT                     Custom port (default: auto from 8888)
+    --host HOST                     Custom host (default: 127.0.0.1)
+    --no-browser                    Don't auto-open browser
+    --repair                        Delete and recreate virtual environment
+    --dev                           Development mode (requires cloned repo)
+    --version                       Print version
+    --help                          Show this help
+`);
+}
+
+function handleConfig(args) {
+  ensureEnvFile();
+
+  if (args.includes('--path')) {
+    console.log(ENV_FILE);
+    return;
+  }
+
+  if (args.includes('--show')) {
+    if (!existsSync(ENV_FILE)) {
+      log('No configuration file found.');
+      return;
+    }
+    const lines = readFileSync(ENV_FILE, 'utf8').split('\n');
+    const active = lines.filter(l => {
+      const t = l.trim();
+      return t && !t.startsWith('#');
+    });
+    if (active.length === 0) {
+      log('No active configuration. All lines are commented out.');
+      log(`Edit: ${ENV_FILE}`);
+    } else {
+      for (const line of active) {
+        console.log(line);
+      }
+    }
+    return;
+  }
+
+  // Open in editor
+  const editor = process.env.EDITOR || process.env.VISUAL || (IS_WIN ? 'notepad' : 'vi');
+  try {
+    execFileSync(editor, [ENV_FILE], { stdio: 'inherit' });
+  } catch {
+    log(`Could not open editor "${editor}".`);
+    log(`Edit the file manually: ${ENV_FILE}`);
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Main server start
+// ---------------------------------------------------------------------------
+
+function startServer(opts) {
+  const { port: requestedPort, host, noBrowser, repair } = opts;
+
+  // Step 1: Find Python
+  const fastPath = !repair && existsSync(venvPython()) && readMarker()?.requirements_hash === requirementsHash();
+
+  let python;
+  if (fastPath) {
+    // Skip the Python search header on fast path -- we already have a working venv
+    python = null;
+  } else {
+    log(`[1/3] Checking Python...`);
+    python = findPython();
+    log(`      Found Python ${python.version.raw} at ${python.exe}`);
+  }
+
+  // Step 2: Ensure venv and deps
+  if (!python) {
+    // Fast path still needs a python reference for potential repair
+    python = findPython();
+  }
+  const wasAlreadyReady = ensureVenv(python, repair);
+
+  // Ensure playwright-cli for browser automation (quick check, installs once)
+  if (!ensurePlaywrightCli(!wasAlreadyReady)) {
+    log('');
+    log('  Note: playwright-cli not available (browser automation will be limited)');
+    log('  Install manually: npm install -g @playwright/cli');
+    log('');
+  }
+
+  // Step 3: Config file
+  const configCreated = ensureEnvFile();
+
+  // Load .env into process.env for the spawned server
+  const dotenvVars = parseEnvFile(ENV_FILE);
+
+  // Determine port
+  const port = requestedPort || findAvailablePort();
+
+  // Check for already-running instance
+  const existingPid = readPid();
+  if (existingPid && isProcessAlive(existingPid)) {
+    log(`AutoForge is already running at http://${host}:${port}`);
+    log('Opening browser...');
+    if (!noBrowser && !isHeadless()) {
+      openBrowser(`http://${host}:${port}`);
+    }
+    return;
+  }
+
+  // Clean up stale PID file
+  if (existingPid) {
+    removePid();
+  }
+
+  // Show server startup step only on slow path
+  if (!wasAlreadyReady) {
+    log('[3/3] Starting server...');
+  }
+
+  if (configCreated) {
+    log(`      Created config file: ~/.autoforge/.env`);
+    log('      Edit this file to configure API providers (Ollama, Vertex AI, z.ai)');
+    log('');
+  }
+
+  // Security warning for non-localhost host
+  if (host !== '127.0.0.1') {
+    console.log('');
+    console.log('  !! SECURITY WARNING !!');
+    console.log(`  Remote access enabled on host: ${host}`);
+    console.log('  The AutoForge UI will be accessible from other machines.');
+    console.log('  Ensure you understand the security implications.');
+    console.log('');
+  }
+
+  // Build environment for uvicorn
+  const serverEnv = { ...process.env, ...dotenvVars, PYTHONPATH: PKG_DIR };
+
+  // Enable remote access flag for the FastAPI server
+  if (host !== '127.0.0.1') {
+    serverEnv.AUTOFORGE_ALLOW_REMOTE = '1';
+  }
+
+  // Spawn uvicorn
+  const pyExe = venvPython();
+  const child = spawn(
+    pyExe,
+    [
+      '-m', 'uvicorn',
+      'server.main:app',
+      '--host', host,
+      '--port', String(port),
+    ],
+    {
+      cwd: PKG_DIR,
+      env: serverEnv,
+      stdio: 'inherit',
+    }
+  );
+
+  writePid(child.pid);
+
+  // Open browser after a short delay to let the server start
+  if (!noBrowser && !isHeadless()) {
+    setTimeout(() => openBrowser(`http://${host}:${port}`), 2000);
+  }
+
+  const url = `http://${host}:${port}`;
+  console.log('');
+  log(`Server running at ${url}`);
+  log('Press Ctrl+C to stop');
+
+  // Graceful shutdown handlers
+  const cleanup = () => {
+    killProcess(child.pid);
+    removePid();
+  };
+
+  process.on('SIGINT', () => {
+    console.log('');
+    cleanup();
+    process.exit(0);
+  });
+
+  process.on('SIGTERM', () => {
+    cleanup();
+    process.exit(0);
+  });
+
+  // If the child exits on its own, clean up and propagate the exit code
+  child.on('exit', (code) => {
+    removePid();
+    process.exit(code ?? 1);
+  });
+}
+
+// ---------------------------------------------------------------------------
+// Entry point
+// ---------------------------------------------------------------------------
+
+/**
+ * Main CLI entry point.
+ *
+ * @param {string[]} args - Command-line arguments (process.argv.slice(2))
+ */
+export function run(args) {
+  // --version / -v
+  if (args.includes('--version') || args.includes('-v')) {
+    printVersion();
+    return;
+  }
+
+  // --help / -h
+  if (args.includes('--help') || args.includes('-h')) {
+    printHelp();
+    return;
+  }
+
+  // --dev guard: this only works from a cloned repository
+  if (args.includes('--dev')) {
+    die(
+      'Dev mode requires a cloned repository.\n' +
+      '  Clone from https://github.com/paperlinguist/autocoder and run start_ui.sh'
+    );
+    return;
+  }
+
+  // "config" subcommand
+  if (args[0] === 'config') {
+    handleConfig(args.slice(1));
+    return;
+  }
+
+  // Parse flags for server start
+  const host = getFlagValue(args, '--host') || '127.0.0.1';
+  const portStr = getFlagValue(args, '--port');
+  const port = portStr ? Number(portStr) : null;
+  const noBrowser = args.includes('--no-browser');
+  const repair = args.includes('--repair');
+
+  if (port !== null && (!Number.isFinite(port) || port < 1 || port > 65535)) {
+    die('Invalid port number. Must be between 1 and 65535.');
+  }
+
+  // Print banner
+  console.log('');
+  log(`AutoForge v${VERSION}`);
+  console.log('');
+
+  startServer({ port, host, noBrowser, repair });
+}
+
+// ---------------------------------------------------------------------------
+// Argument parsing helpers
+// ---------------------------------------------------------------------------
+
+/**
+ * Extract the value following a flag from the args array.
+ * E.g. getFlagValue(['--port', '9000', '--host', '0.0.0.0'], '--port') => '9000'
+ */
+function getFlagValue(args, flag) {
+  const idx = args.indexOf(flag);
+  if (idx === -1 || idx + 1 >= args.length) return null;
+  return args[idx + 1];
+}
--- a/mcp_server/feature_mcp.py
+++ b/mcp_server/feature_mcp.py
@@ -30,18 +30,18 @@ orchestrator, not by agents. Agents receive pre-assigned feature IDs.
 import json
 import os
 import sys
-import threading
 from contextlib import asynccontextmanager
 from pathlib import Path
 from typing import Annotated

 from mcp.server.fastmcp import FastMCP
 from pydantic import BaseModel, Field
+from sqlalchemy import text

 # Add parent directory to path so we can import from api module
 sys.path.insert(0, str(Path(__file__).parent.parent))

-from api.database import Feature, create_database
+from api.database import Feature, atomic_transaction, create_database
 from api.dependency_resolver import (
    MAX_DEPENDENCIES_PER_FEATURE,
    compute_scheduling_scores,
@@ -96,8 +96,9 @@ class BulkCreateInput(BaseModel):
 _session_maker = None
 _engine = None

-# Lock for priority assignment to prevent race conditions
-_priority_lock = threading.Lock()
+# NOTE: The old threading.Lock() was removed because it only worked per-process,
+# not cross-process. In parallel mode, multiple MCP servers run in separate
+# processes, so the lock was useless. We now use atomic SQL operations instead.


@asynccontextmanager
@@ -150,17 +151,20 @@ def feature_get_stats() -> str:
        result = session.query(
            func.count(Feature.id).label('total'),
            func.sum(case((Feature.passes == True, 1), else_=0)).label('passing'),
-            func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress')
+            func.sum(case((Feature.in_progress == True, 1), else_=0)).label('in_progress'),
+            func.sum(case((Feature.needs_human_input == True, 1), else_=0)).label('needs_human_input')
        ).first()

        total = result.total or 0
        passing = int(result.passing or 0)
        in_progress = int(result.in_progress or 0)
+        needs_human_input = int(result.needs_human_input or 0)
        percentage = round((passing / total) * 100, 1) if total > 0 else 0.0

        return json.dumps({
            "passing": passing,
            "in_progress": in_progress,
+            "needs_human_input": needs_human_input,
            "total": total,
            "percentage": percentage
        })
@@ -220,6 +224,7 @@ def feature_get_summary(
            "name": feature.name,
            "passes": feature.passes,
            "in_progress": feature.in_progress,
+            "needs_human_input": feature.needs_human_input if feature.needs_human_input is not None else False,
            "dependencies": feature.dependencies or []
        })
    finally:
@@ -243,15 +248,25 @@ def feature_mark_passing(
    """
    session = get_session()
    try:
-        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-
-        if feature is None:
-            return json.dumps({"error": f"Feature with ID {feature_id} not found"})
-
-        feature.passes = True
-        feature.in_progress = False
+        # Atomic update with state guard - prevents double-pass in parallel mode
+        result = session.execute(text("""
+            UPDATE features
+            SET passes = 1, in_progress = 0
+            WHERE id = :id AND passes = 0
+        """), {"id": feature_id})
        session.commit()

+        if result.rowcount == 0:
+            # Check why the update didn't match
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+            if feature is None:
+                return json.dumps({"error": f"Feature with ID {feature_id} not found"})
+            if feature.passes:
+                return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
+            return json.dumps({"error": "Failed to mark feature passing for unknown reason"})
+
+        # Get the feature name for the response
+        feature = session.query(Feature).filter(Feature.id == feature_id).first()
        return json.dumps({"success": True, "feature_id": feature_id, "name": feature.name})
    except Exception as e:
        session.rollback()
@@ -284,14 +299,20 @@ def feature_mark_failing(
    """
    session = get_session()
    try:
+        # Check if feature exists first
        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-
        if feature is None:
            return json.dumps({"error": f"Feature with ID {feature_id} not found"})

-        feature.passes = False
-        feature.in_progress = False
+        # Atomic update for parallel safety
+        session.execute(text("""
+            UPDATE features
+            SET passes = 0, in_progress = 0
+            WHERE id = :id
+        """), {"id": feature_id})
        session.commit()
+
+        # Refresh to get updated state
        session.refresh(feature)

        return json.dumps({
@@ -337,25 +358,28 @@ def feature_skip(
            return json.dumps({"error": "Cannot skip a feature that is already passing"})

        old_priority = feature.priority
+        name = feature.name

-        # Use lock to prevent race condition in priority assignment
-        with _priority_lock:
-            # Get max priority and set this feature to max + 1
-            max_priority_result = session.query(Feature.priority).order_by(Feature.priority.desc()).first()
-            new_priority = (max_priority_result[0] + 1) if max_priority_result else 1
-
-            feature.priority = new_priority
-            feature.in_progress = False
-            session.commit()
+        # Atomic update: set priority to max+1 in a single statement
+        # This prevents race conditions where two features get the same priority
+        session.execute(text("""
+            UPDATE features
+            SET priority = (SELECT COALESCE(MAX(priority), 0) + 1 FROM features),
+                in_progress = 0
+            WHERE id = :id
+        """), {"id": feature_id})
+        session.commit()

+        # Refresh to get new priority
        session.refresh(feature)
+        new_priority = feature.priority

        return json.dumps({
-            "id": feature.id,
-            "name": feature.name,
+            "id": feature_id,
+            "name": name,
            "old_priority": old_priority,
            "new_priority": new_priority,
-            "message": f"Feature '{feature.name}' moved to end of queue"
+            "message": f"Feature '{name}' moved to end of queue"
        })
    except Exception as e:
        session.rollback()
@@ -381,21 +405,29 @@ def feature_mark_in_progress(
    """
    session = get_session()
    try:
-        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-
-        if feature is None:
-            return json.dumps({"error": f"Feature with ID {feature_id} not found"})
-
-        if feature.passes:
-            return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
-
-        if feature.in_progress:
-            return json.dumps({"error": f"Feature with ID {feature_id} is already in-progress"})
-
-        feature.in_progress = True
+        # Atomic claim: only succeeds if feature is not already claimed, passing, or blocked for human input
+        result = session.execute(text("""
+            UPDATE features
+            SET in_progress = 1
+            WHERE id = :id AND passes = 0 AND in_progress = 0 AND needs_human_input = 0
+        """), {"id": feature_id})
        session.commit()
-        session.refresh(feature)

+        if result.rowcount == 0:
+            # Check why the claim failed
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+            if feature is None:
+                return json.dumps({"error": f"Feature with ID {feature_id} not found"})
+            if feature.passes:
+                return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
+            if feature.in_progress:
+                return json.dumps({"error": f"Feature with ID {feature_id} is already in-progress"})
+            if getattr(feature, 'needs_human_input', False):
+                return json.dumps({"error": f"Feature with ID {feature_id} is blocked waiting for human input"})
+            return json.dumps({"error": "Failed to mark feature in-progress for unknown reason"})
+
+        # Fetch the claimed feature
+        feature = session.query(Feature).filter(Feature.id == feature_id).first()
        return json.dumps(feature.to_dict())
    except Exception as e:
        session.rollback()
@@ -421,24 +453,38 @@ def feature_claim_and_get(
    """
    session = get_session()
    try:
+        # First check if feature exists
        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-
        if feature is None:
            return json.dumps({"error": f"Feature with ID {feature_id} not found"})

        if feature.passes:
            return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})

-        # Idempotent: if already in-progress, just return details
-        already_claimed = feature.in_progress
-        if not already_claimed:
-            feature.in_progress = True
-            session.commit()
-            session.refresh(feature)
+        if getattr(feature, 'needs_human_input', False):
+            return json.dumps({"error": f"Feature with ID {feature_id} is blocked waiting for human input"})

-        result = feature.to_dict()
-        result["already_claimed"] = already_claimed
-        return json.dumps(result)
+        # Try atomic claim: only succeeds if not already claimed and not blocked for human input
+        result = session.execute(text("""
+            UPDATE features
+            SET in_progress = 1
+            WHERE id = :id AND passes = 0 AND in_progress = 0 AND needs_human_input = 0
+        """), {"id": feature_id})
+        session.commit()
+
+        # Determine if we claimed it or it was already claimed
+        already_claimed = result.rowcount == 0
+        if already_claimed:
+            # Verify it's in_progress (not some other failure condition)
+            session.refresh(feature)
+            if not feature.in_progress:
+                return json.dumps({"error": f"Failed to claim feature {feature_id} for unknown reason"})
+
+        # Refresh to get current state
+        session.refresh(feature)
+        result_dict = feature.to_dict()
+        result_dict["already_claimed"] = already_claimed
+        return json.dumps(result_dict)
    except Exception as e:
        session.rollback()
        return json.dumps({"error": f"Failed to claim feature: {str(e)}"})
@@ -463,15 +509,20 @@ def feature_clear_in_progress(
    """
    session = get_session()
    try:
+        # Check if feature exists
        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-
        if feature is None:
            return json.dumps({"error": f"Feature with ID {feature_id} not found"})

-        feature.in_progress = False
+        # Atomic update - idempotent, safe in parallel mode
+        session.execute(text("""
+            UPDATE features
+            SET in_progress = 0
+            WHERE id = :id
+        """), {"id": feature_id})
        session.commit()
-        session.refresh(feature)

+        session.refresh(feature)
        return json.dumps(feature.to_dict())
    except Exception as e:
        session.rollback()
@@ -506,13 +557,14 @@ def feature_create_bulk(
    Returns:
        JSON with: created (int) - number of features created, with_dependencies (int)
    """
-    session = get_session()
    try:
-        # Use lock to prevent race condition in priority assignment
-        with _priority_lock:
-            # Get the starting priority
-            max_priority_result = session.query(Feature.priority).order_by(Feature.priority.desc()).first()
-            start_priority = (max_priority_result[0] + 1) if max_priority_result else 1
+        # Use atomic transaction for bulk inserts to prevent priority conflicts
+        with atomic_transaction(_session_maker) as session:
+            # Get the starting priority atomically within the transaction
+            result = session.execute(text("""
+                SELECT COALESCE(MAX(priority), 0) FROM features
+            """)).fetchone()
+            start_priority = (result[0] or 0) + 1

            # First pass: validate all features and their index-based dependencies
            for i, feature_data in enumerate(features):
@@ -546,7 +598,7 @@ def feature_create_bulk(
                                "error": f"Feature at index {i} cannot depend on feature at index {idx} (forward reference not allowed)"
                            })

-            # Second pass: create all features
+            # Second pass: create all features with reserved priorities
            created_features: list[Feature] = []
            for i, feature_data in enumerate(features):
                db_feature = Feature(
@@ -571,20 +623,16 @@ def feature_create_bulk(
                if indices:
                    # Convert indices to actual feature IDs
                    dep_ids = [created_features[idx].id for idx in indices]
-                    created_features[i].dependencies = sorted(dep_ids)
+                    created_features[i].dependencies = sorted(dep_ids)  # type: ignore[assignment]  # SQLAlchemy JSON Column accepts list at runtime
                    deps_count += 1

-            session.commit()
-
-        return json.dumps({
-            "created": len(created_features),
-            "with_dependencies": deps_count
-        })
+            # Commit happens automatically on context manager exit
+            return json.dumps({
+                "created": len(created_features),
+                "with_dependencies": deps_count
+            })
    except Exception as e:
-        session.rollback()
        return json.dumps({"error": str(e)})
-    finally:
-        session.close()


@mcp.tool()
@@ -608,13 +656,14 @@ def feature_create(
    Returns:
        JSON with the created feature details including its ID
    """
-    session = get_session()
    try:
-        # Use lock to prevent race condition in priority assignment
-        with _priority_lock:
-            # Get the next priority
-            max_priority_result = session.query(Feature.priority).order_by(Feature.priority.desc()).first()
-            next_priority = (max_priority_result[0] + 1) if max_priority_result else 1
+        # Use atomic transaction to prevent priority collisions
+        with atomic_transaction(_session_maker) as session:
+            # Get the next priority atomically within the transaction
+            result = session.execute(text("""
+                SELECT COALESCE(MAX(priority), 0) + 1 FROM features
+            """)).fetchone()
+            next_priority = result[0]

            db_feature = Feature(
                priority=next_priority,
@@ -626,20 +675,18 @@ def feature_create(
                in_progress=False,
            )
            session.add(db_feature)
-            session.commit()
+            session.flush()  # Get the ID

-        session.refresh(db_feature)
+            feature_dict = db_feature.to_dict()
+            # Commit happens automatically on context manager exit

        return json.dumps({
            "success": True,
            "message": f"Created feature: {name}",
-            "feature": db_feature.to_dict()
+            "feature": feature_dict
        })
    except Exception as e:
-        session.rollback()
        return json.dumps({"error": str(e)})
-    finally:
-        session.close()


@mcp.tool()
@@ -659,52 +706,49 @@ def feature_add_dependency(
    Returns:
        JSON with success status and updated dependencies list, or error message
    """
-    session = get_session()
    try:
-        # Security: Self-reference check
+        # Security: Self-reference check (can do before transaction)
        if feature_id == dependency_id:
            return json.dumps({"error": "A feature cannot depend on itself"})

-        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-        dependency = session.query(Feature).filter(Feature.id == dependency_id).first()
+        # Use atomic transaction for consistent cycle detection
+        with atomic_transaction(_session_maker) as session:
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+            dependency = session.query(Feature).filter(Feature.id == dependency_id).first()

-        if not feature:
-            return json.dumps({"error": f"Feature {feature_id} not found"})
-        if not dependency:
-            return json.dumps({"error": f"Dependency feature {dependency_id} not found"})
+            if not feature:
+                return json.dumps({"error": f"Feature {feature_id} not found"})
+            if not dependency:
+                return json.dumps({"error": f"Dependency feature {dependency_id} not found"})

-        current_deps = feature.dependencies or []
+            current_deps = feature.dependencies or []

-        # Security: Max dependencies limit
-        if len(current_deps) >= MAX_DEPENDENCIES_PER_FEATURE:
-            return json.dumps({"error": f"Maximum {MAX_DEPENDENCIES_PER_FEATURE} dependencies allowed per feature"})
+            # Security: Max dependencies limit
+            if len(current_deps) >= MAX_DEPENDENCIES_PER_FEATURE:
+                return json.dumps({"error": f"Maximum {MAX_DEPENDENCIES_PER_FEATURE} dependencies allowed per feature"})

-        # Check if already exists
-        if dependency_id in current_deps:
-            return json.dumps({"error": "Dependency already exists"})
+            # Check if already exists
+            if dependency_id in current_deps:
+                return json.dumps({"error": "Dependency already exists"})

-        # Security: Circular dependency check
-        # would_create_circular_dependency(features, source_id, target_id)
-        # source_id = feature gaining the dependency, target_id = feature being depended upon
-        all_features = [f.to_dict() for f in session.query(Feature).all()]
-        if would_create_circular_dependency(all_features, feature_id, dependency_id):
-            return json.dumps({"error": "Cannot add: would create circular dependency"})
+            # Security: Circular dependency check
+            # Within IMMEDIATE transaction, snapshot is protected by write lock
+            all_features = [f.to_dict() for f in session.query(Feature).all()]
+            if would_create_circular_dependency(all_features, feature_id, dependency_id):
+                return json.dumps({"error": "Cannot add: would create circular dependency"})

-        # Add dependency
-        current_deps.append(dependency_id)
-        feature.dependencies = sorted(current_deps)
-        session.commit()
+            # Add dependency atomically
+            new_deps = sorted(current_deps + [dependency_id])
+            feature.dependencies = new_deps
+            # Commit happens automatically on context manager exit

-        return json.dumps({
-            "success": True,
-            "feature_id": feature_id,
-            "dependencies": feature.dependencies
-        })
+            return json.dumps({
+                "success": True,
+                "feature_id": feature_id,
+                "dependencies": new_deps
+            })
    except Exception as e:
-        session.rollback()
        return json.dumps({"error": f"Failed to add dependency: {str(e)}"})
-    finally:
-        session.close()


@mcp.tool()
@@ -721,30 +765,29 @@ def feature_remove_dependency(
    Returns:
        JSON with success status and updated dependencies list, or error message
    """
-    session = get_session()
    try:
-        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-        if not feature:
-            return json.dumps({"error": f"Feature {feature_id} not found"})
+        # Use atomic transaction for consistent read-modify-write
+        with atomic_transaction(_session_maker) as session:
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+            if not feature:
+                return json.dumps({"error": f"Feature {feature_id} not found"})

-        current_deps = feature.dependencies or []
-        if dependency_id not in current_deps:
-            return json.dumps({"error": "Dependency does not exist"})
+            current_deps = feature.dependencies or []
+            if dependency_id not in current_deps:
+                return json.dumps({"error": "Dependency does not exist"})

-        current_deps.remove(dependency_id)
-        feature.dependencies = current_deps if current_deps else None
-        session.commit()
+            # Remove dependency atomically
+            new_deps = [d for d in current_deps if d != dependency_id]
+            feature.dependencies = new_deps if new_deps else None
+            # Commit happens automatically on context manager exit

-        return json.dumps({
-            "success": True,
-            "feature_id": feature_id,
-            "dependencies": feature.dependencies or []
-        })
+            return json.dumps({
+                "success": True,
+                "feature_id": feature_id,
+                "dependencies": new_deps
+            })
    except Exception as e:
-        session.rollback()
        return json.dumps({"error": f"Failed to remove dependency: {str(e)}"})
-    finally:
-        session.close()


@mcp.tool()
@@ -772,6 +815,8 @@ def feature_get_ready(
        for f in all_features:
            if f.passes or f.in_progress:
                continue
+            if getattr(f, 'needs_human_input', False):
+                continue
            deps = f.dependencies or []
            if all(dep_id in passing_ids for dep_id in deps):
                ready.append(f.to_dict())
@@ -854,6 +899,8 @@ def feature_get_graph() -> str:

            if f.passes:
                status = "done"
+            elif getattr(f, 'needs_human_input', False):
+                status = "needs_human_input"
            elif blocking:
                status = "blocked"
            elif f.in_progress:
@@ -897,9 +944,8 @@ def feature_set_dependencies(
    Returns:
        JSON with success status and updated dependencies list, or error message
    """
-    session = get_session()
    try:
-        # Security: Self-reference check
+        # Security: Self-reference check (can do before transaction)
        if feature_id in dependency_ids:
            return json.dumps({"error": "A feature cannot depend on itself"})

@@ -911,46 +957,172 @@ def feature_set_dependencies(
        if len(dependency_ids) != len(set(dependency_ids)):
            return json.dumps({"error": "Duplicate dependencies not allowed"})

-        feature = session.query(Feature).filter(Feature.id == feature_id).first()
-        if not feature:
-            return json.dumps({"error": f"Feature {feature_id} not found"})
+        # Use atomic transaction for consistent cycle detection
+        with atomic_transaction(_session_maker) as session:
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+            if not feature:
+                return json.dumps({"error": f"Feature {feature_id} not found"})

-        # Validate all dependencies exist
-        all_feature_ids = {f.id for f in session.query(Feature).all()}
-        missing = [d for d in dependency_ids if d not in all_feature_ids]
-        if missing:
-            return json.dumps({"error": f"Dependencies not found: {missing}"})
+            # Validate all dependencies exist
+            all_feature_ids = {f.id for f in session.query(Feature).all()}
+            missing = [d for d in dependency_ids if d not in all_feature_ids]
+            if missing:
+                return json.dumps({"error": f"Dependencies not found: {missing}"})

-        # Check for circular dependencies
-        all_features = [f.to_dict() for f in session.query(Feature).all()]
-        # Temporarily update the feature's dependencies for cycle check
-        test_features = []
-        for f in all_features:
-            if f["id"] == feature_id:
-                test_features.append({**f, "dependencies": dependency_ids})
-            else:
-                test_features.append(f)
+            # Check for circular dependencies
+            # Within IMMEDIATE transaction, snapshot is protected by write lock
+            all_features = [f.to_dict() for f in session.query(Feature).all()]
+            test_features = []
+            for f in all_features:
+                if f["id"] == feature_id:
+                    test_features.append({**f, "dependencies": dependency_ids})
+                else:
+                    test_features.append(f)

-        for dep_id in dependency_ids:
-            # source_id = feature_id (gaining dep), target_id = dep_id (being depended upon)
-            if would_create_circular_dependency(test_features, feature_id, dep_id):
-                return json.dumps({"error": f"Cannot add dependency {dep_id}: would create circular dependency"})
+            for dep_id in dependency_ids:
+                if would_create_circular_dependency(test_features, feature_id, dep_id):
+                    return json.dumps({"error": f"Cannot add dependency {dep_id}: would create circular dependency"})

-        # Set dependencies
-        feature.dependencies = sorted(dependency_ids) if dependency_ids else None
+            # Set dependencies atomically
+            sorted_deps = sorted(dependency_ids) if dependency_ids else None
+            feature.dependencies = sorted_deps
+            # Commit happens automatically on context manager exit
+
+            return json.dumps({
+                "success": True,
+                "feature_id": feature_id,
+                "dependencies": sorted_deps or []
+            })
+    except Exception as e:
+        return json.dumps({"error": f"Failed to set dependencies: {str(e)}"})
+
+
+@mcp.tool()
+def feature_request_human_input(
+    feature_id: Annotated[int, Field(description="The ID of the feature that needs human input", ge=1)],
+    prompt: Annotated[str, Field(min_length=1, description="Explain what you need from the human and why")],
+    fields: Annotated[list[dict], Field(min_length=1, description="List of input fields to collect")]
+) -> str:
+    """Request structured input from a human for a feature that is blocked.
+
+    Use this ONLY when the feature genuinely cannot proceed without human intervention:
+    - Creating API keys or external accounts
+    - Choosing between design approaches that require human preference
+    - Configuring external services the agent cannot access
+    - Providing credentials or secrets
+
+    Do NOT use this for issues you can solve yourself (debugging, reading docs, etc.).
+
+    The feature will be moved out of in_progress and into a "needs human input" state.
+    Once the human provides their response, the feature returns to the pending queue
+    and will include the human's response when you pick it up again.
+
+    Args:
+        feature_id: The ID of the feature that needs human input
+        prompt: A clear explanation of what you need and why
+        fields: List of input fields, each with:
+            - id (str): Unique field identifier
+            - label (str): Human-readable label
+            - type (str): "text", "textarea", "select", or "boolean" (default: "text")
+            - required (bool): Whether the field is required (default: true)
+            - placeholder (str, optional): Placeholder text
+            - options (list, optional): For select type: [{value, label}]
+
+    Returns:
+        JSON with success confirmation or error message
+    """
+    # Validate fields
+    VALID_FIELD_TYPES = {"text", "textarea", "select", "boolean"}
+    seen_ids: set[str] = set()
+    for i, field in enumerate(fields):
+        if "id" not in field or "label" not in field:
+            return json.dumps({"error": f"Field at index {i} missing required 'id' or 'label'"})
+        fid = field["id"]
+        flabel = field["label"]
+        if not isinstance(fid, str) or not fid.strip():
+            return json.dumps({"error": f"Field at index {i} has empty or invalid 'id'"})
+        if not isinstance(flabel, str) or not flabel.strip():
+            return json.dumps({"error": f"Field at index {i} has empty or invalid 'label'"})
+        if fid in seen_ids:
+            return json.dumps({"error": f"Duplicate field id '{fid}' at index {i}"})
+        seen_ids.add(fid)
+        ftype = field.get("type", "text")
+        if ftype not in VALID_FIELD_TYPES:
+            return json.dumps({"error": f"Field at index {i} has invalid type '{ftype}'. Must be one of: {', '.join(sorted(VALID_FIELD_TYPES))}"})
+        if ftype == "select" and not field.get("options"):
+            return json.dumps({"error": f"Field at index {i} is type 'select' but missing 'options' array"})
+
+    request_data = {
+        "prompt": prompt,
+        "fields": fields,
+    }
+
+    session = get_session()
+    try:
+        # Atomically set needs_human_input, clear in_progress, store request, clear previous response
+        result = session.execute(text("""
+            UPDATE features
+            SET needs_human_input = 1,
+                in_progress = 0,
+                human_input_request = :request,
+                human_input_response = NULL
+            WHERE id = :id AND passes = 0 AND in_progress = 1
+        """), {"id": feature_id, "request": json.dumps(request_data)})
        session.commit()

+        if result.rowcount == 0:
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+            if feature is None:
+                return json.dumps({"error": f"Feature with ID {feature_id} not found"})
+            if feature.passes:
+                return json.dumps({"error": f"Feature with ID {feature_id} is already passing"})
+            if not feature.in_progress:
+                return json.dumps({"error": f"Feature with ID {feature_id} is not in progress"})
+            return json.dumps({"error": "Failed to request human input for unknown reason"})
+
+        feature = session.query(Feature).filter(Feature.id == feature_id).first()
        return json.dumps({
            "success": True,
            "feature_id": feature_id,
-            "dependencies": feature.dependencies or []
+            "name": feature.name,
+            "message": f"Feature '{feature.name}' is now blocked waiting for human input"
        })
    except Exception as e:
        session.rollback()
-        return json.dumps({"error": f"Failed to set dependencies: {str(e)}"})
+        return json.dumps({"error": f"Failed to request human input: {str(e)}"})
    finally:
        session.close()


+@mcp.tool()
+def ask_user(
+    questions: Annotated[list[dict], Field(description="List of questions to ask, each with question, header, options (list of {label, description}), and multiSelect (bool)")]
+) -> str:
+    """Ask the user structured questions with selectable options.
+
+    Use this when you need clarification or want to offer choices to the user.
+    Each question has a short header, the question text, and 2-4 clickable options.
+    The user's selections will be returned as your next message.
+
+    Args:
+        questions: List of questions, each with:
+            - question (str): The question to ask
+            - header (str): Short label (max 12 chars)
+            - options (list): Each with label (str) and description (str)
+            - multiSelect (bool): Allow multiple selections (default false)
+
+    Returns:
+        Acknowledgment that questions were presented to the user
+    """
+    # Validate input
+    for i, q in enumerate(questions):
+        if not all(key in q for key in ["question", "header", "options"]):
+            return json.dumps({"error": f"Question at index {i} missing required fields"})
+        if len(q["options"]) < 2 or len(q["options"]) > 4:
+            return json.dumps({"error": f"Question at index {i} must have 2-4 options"})
+
+    return "Questions presented to the user. Their response will arrive as your next message."
+
+
 if __name__ == "__main__":
    mcp.run()
--- a/package.json
+++ b/package.json
@@ -0,0 +1,55 @@
+{
+  "name": "autoforge-ai",
+  "version": "0.1.12",
+  "description": "Autonomous coding agent with web UI - build complete apps with AI",
+  "license": "AGPL-3.0",
+  "bin": {
+    "autoforge": "./bin/autoforge.js"
+  },
+  "type": "module",
+  "engines": {
+    "node": ">=20"
+  },
+  "files": [
+    "bin/",
+    "lib/",
+    "api/",
+    "server/",
+    "mcp_server/",
+    "ui/dist/",
+    "ui/package.json",
+    ".claude/commands/",
+    ".claude/skills/",
+    ".claude/templates/",
+    "examples/",
+    "start.py",
+    "agent.py",
+    "auth.py",
+    "autoforge_paths.py",
+    "autonomous_agent_demo.py",
+    "client.py",
+    "env_constants.py",
+    "parallel_orchestrator.py",
+    "progress.py",
+    "prompts.py",
+    "registry.py",
+    "rate_limit_utils.py",
+    "security.py",
+    "temp_cleanup.py",
+    "requirements-prod.txt",
+    "pyproject.toml",
+    ".env.example",
+    "!**/__pycache__/",
+    "!**/*.pyc"
+  ],
+  "keywords": [
+    "ai",
+    "coding-agent",
+    "claude",
+    "autonomous",
+    "code-generation"
+  ],
+  "scripts": {
+    "prepublishOnly": "npm --prefix ui install && npm --prefix ui run build"
+  }
+}
--- a/parallel_orchestrator.py
+++ b/parallel_orchestrator.py
--- a/progress.py
+++ b/progress.py
@@ -10,12 +10,21 @@ import json
 import os
 import sqlite3
 import urllib.request
+from contextlib import closing
 from datetime import datetime, timezone
 from pathlib import Path

 WEBHOOK_URL = os.environ.get("PROGRESS_N8N_WEBHOOK_URL")
 PROGRESS_CACHE_FILE = ".progress_cache"

+# SQLite connection settings for parallel mode safety
+SQLITE_TIMEOUT = 30  # seconds to wait for locks
+
+
+def _get_connection(db_file: Path) -> sqlite3.Connection:
+    """Get a SQLite connection with proper timeout settings for parallel mode."""
+    return sqlite3.connect(db_file, timeout=SQLITE_TIMEOUT)
+

 def has_features(project_dir: Path) -> bool:
    """
@@ -31,78 +40,93 @@ def has_features(project_dir: Path) -> bool:

    Returns False if no features exist (initializer needs to run).
    """
-    import sqlite3
-
    # Check legacy JSON file first
    json_file = project_dir / "feature_list.json"
    if json_file.exists():
        return True

    # Check SQLite database
-    db_file = project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
        return False

    try:
-        conn = sqlite3.connect(db_file)
-        cursor = conn.cursor()
-        cursor.execute("SELECT COUNT(*) FROM features")
-        count = cursor.fetchone()[0]
-        conn.close()
-        return count > 0
+        with closing(_get_connection(db_file)) as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT COUNT(*) FROM features")
+            count: int = cursor.fetchone()[0]
+            return bool(count > 0)
    except Exception:
        # Database exists but can't be read or has no features table
        return False


-def count_passing_tests(project_dir: Path) -> tuple[int, int, int]:
+def count_passing_tests(project_dir: Path) -> tuple[int, int, int, int]:
    """
-    Count passing, in_progress, and total tests via direct database access.
+    Count passing, in_progress, total, and needs_human_input tests via direct database access.

    Args:
        project_dir: Directory containing the project

    Returns:
-        (passing_count, in_progress_count, total_count)
+        (passing_count, in_progress_count, total_count, needs_human_input_count)
    """
-    db_file = project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
-        return 0, 0, 0
+        return 0, 0, 0, 0

    try:
-        conn = sqlite3.connect(db_file)
-        cursor = conn.cursor()
-        # Single aggregate query instead of 3 separate COUNT queries
-        # Handle case where in_progress column doesn't exist yet (legacy DBs)
-        try:
-            cursor.execute("""
-                SELECT
-                    COUNT(*) as total,
-                    SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
-                    SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress
-                FROM features
-            """)
-            row = cursor.fetchone()
-            total = row[0] or 0
-            passing = row[1] or 0
-            in_progress = row[2] or 0
-        except sqlite3.OperationalError:
-            # Fallback for databases without in_progress column
-            cursor.execute("""
-                SELECT
-                    COUNT(*) as total,
-                    SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing
-                FROM features
-            """)
-            row = cursor.fetchone()
-            total = row[0] or 0
-            passing = row[1] or 0
-            in_progress = 0
-        conn.close()
-        return passing, in_progress, total
+        with closing(_get_connection(db_file)) as conn:
+            cursor = conn.cursor()
+            # Single aggregate query instead of separate COUNT queries
+            # Handle case where columns don't exist yet (legacy DBs)
+            try:
+                cursor.execute("""
+                    SELECT
+                        COUNT(*) as total,
+                        SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
+                        SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress,
+                        SUM(CASE WHEN needs_human_input = 1 THEN 1 ELSE 0 END) as needs_human_input
+                    FROM features
+                """)
+                row = cursor.fetchone()
+                total = row[0] or 0
+                passing = row[1] or 0
+                in_progress = row[2] or 0
+                needs_human_input = row[3] or 0
+            except sqlite3.OperationalError:
+                # Fallback for databases without newer columns
+                try:
+                    cursor.execute("""
+                        SELECT
+                            COUNT(*) as total,
+                            SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing,
+                            SUM(CASE WHEN in_progress = 1 THEN 1 ELSE 0 END) as in_progress
+                        FROM features
+                    """)
+                    row = cursor.fetchone()
+                    total = row[0] or 0
+                    passing = row[1] or 0
+                    in_progress = row[2] or 0
+                    needs_human_input = 0
+                except sqlite3.OperationalError:
+                    cursor.execute("""
+                        SELECT
+                            COUNT(*) as total,
+                            SUM(CASE WHEN passes = 1 THEN 1 ELSE 0 END) as passing
+                        FROM features
+                    """)
+                    row = cursor.fetchone()
+                    total = row[0] or 0
+                    passing = row[1] or 0
+                    in_progress = 0
+                    needs_human_input = 0
+            return passing, in_progress, total, needs_human_input
    except Exception as e:
        print(f"[Database error in count_passing_tests: {e}]")
-        return 0, 0, 0
+        return 0, 0, 0, 0


 def get_all_passing_features(project_dir: Path) -> list[dict]:
@@ -115,22 +139,22 @@ def get_all_passing_features(project_dir: Path) -> list[dict]:
    Returns:
        List of dicts with id, category, name for each passing feature
    """
-    db_file = project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
        return []

    try:
-        conn = sqlite3.connect(db_file)
-        cursor = conn.cursor()
-        cursor.execute(
-            "SELECT id, category, name FROM features WHERE passes = 1 ORDER BY priority ASC"
-        )
-        features = [
-            {"id": row[0], "category": row[1], "name": row[2]}
-            for row in cursor.fetchall()
-        ]
-        conn.close()
-        return features
+        with closing(_get_connection(db_file)) as conn:
+            cursor = conn.cursor()
+            cursor.execute(
+                "SELECT id, category, name FROM features WHERE passes = 1 ORDER BY priority ASC"
+            )
+            features = [
+                {"id": row[0], "category": row[1], "name": row[2]}
+                for row in cursor.fetchall()
+            ]
+            return features
    except Exception:
        return []

@@ -140,7 +164,8 @@ def send_progress_webhook(passing: int, total: int, project_dir: Path) -> None:
    if not WEBHOOK_URL:
        return  # Webhook not configured

-    cache_file = project_dir / PROGRESS_CACHE_FILE
+    from autoforge_paths import get_progress_cache_path
+    cache_file = get_progress_cache_path(project_dir)
    previous = 0
    previous_passing_ids = set()

@@ -226,7 +251,7 @@ def print_session_header(session_num: int, is_initializer: bool) -> None:

 def print_progress_summary(project_dir: Path) -> None:
    """Print a summary of current progress."""
-    passing, in_progress, total = count_passing_tests(project_dir)
+    passing, in_progress, total, _needs_human_input = count_passing_tests(project_dir)

    if total > 0:
        percentage = (passing / total) * 100
--- a/prompts.py
+++ b/prompts.py
@@ -9,16 +9,21 @@ Fallback chain:
 2. Base template: .claude/templates/{name}.template.md
 """

+import re
 import shutil
 from pathlib import Path

 # Base templates location (generic templates)
 TEMPLATES_DIR = Path(__file__).parent / ".claude" / "templates"

+# Migration version — bump when adding new migration steps
+CURRENT_MIGRATION_VERSION = 1
+

 def get_project_prompts_dir(project_dir: Path) -> Path:
    """Get the prompts directory for a specific project."""
-    return project_dir / "prompts"
+    from autoforge_paths import get_prompts_dir
+    return get_prompts_dir(project_dir)


 def load_prompt(name: str, project_dir: Path | None = None) -> str:
@@ -69,42 +74,119 @@ def get_initializer_prompt(project_dir: Path | None = None) -> str:
    return load_prompt("initializer_prompt", project_dir)


-def get_coding_prompt(project_dir: Path | None = None) -> str:
-    """Load the coding agent prompt (project-specific if available)."""
-    return load_prompt("coding_prompt", project_dir)
+def _strip_browser_testing_sections(prompt: str) -> str:
+    """Strip browser automation and Playwright testing instructions from prompt.
+
+    Used in YOLO mode where browser testing is skipped entirely. Replaces
+    browser-related sections with a brief YOLO-mode note while preserving
+    all non-testing instructions (implementation, git, progress notes, etc.).
+
+    Args:
+        prompt: The full coding prompt text.
+
+    Returns:
+        The prompt with browser testing sections replaced by YOLO guidance.
+    """
+    original_prompt = prompt
+
+    # Replace STEP 5 (browser automation verification) with YOLO note
+    prompt = re.sub(
+        r"### STEP 5: VERIFY WITH BROWSER AUTOMATION.*?(?=### STEP 5\.5:)",
+        "### STEP 5: VERIFY FEATURE (YOLO MODE)\n\n"
+        "**YOLO mode is active.** Skip browser automation testing. "
+        "Instead, verify your feature works by ensuring:\n"
+        "- Code compiles without errors (lint and type-check pass)\n"
+        "- Server starts without errors after your changes\n"
+        "- No obvious runtime errors in server logs\n\n",
+        prompt,
+        flags=re.DOTALL,
+    )
+
+    # Replace the marking rule with YOLO-appropriate wording
+    prompt = prompt.replace(
+        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**",
+        "**YOLO mode: Mark a feature as passing after lint/type-check succeeds and server starts cleanly.**",
+    )
+
+    # Replace the BROWSER AUTOMATION reference section
+    prompt = re.sub(
+        r"## BROWSER AUTOMATION\n\n.*?(?=---)",
+        "## VERIFICATION (YOLO MODE)\n\n"
+        "Browser automation is disabled in YOLO mode. "
+        "Verify features by running lint, type-check, and confirming the dev server starts without errors.\n\n",
+        prompt,
+        flags=re.DOTALL,
+    )
+
+    # In STEP 4, replace browser automation reference with YOLO guidance
+    prompt = prompt.replace(
+        "2. Test manually using browser automation (see Step 5)",
+        "2. Verify code compiles (lint and type-check pass)",
+    )
+
+    if prompt == original_prompt:
+        print("[YOLO] Warning: No browser testing sections found to strip. "
+              "Project-specific prompt may need manual YOLO adaptation.")
+
+    return prompt


-def get_testing_prompt(project_dir: Path | None = None, testing_feature_id: int | None = None) -> str:
-    """Load the testing agent prompt (project-specific if available).
+def get_coding_prompt(project_dir: Path | None = None, yolo_mode: bool = False) -> str:
+    """Load the coding agent prompt (project-specific if available).

    Args:
        project_dir: Optional project directory for project-specific prompts
-        testing_feature_id: If provided, the pre-assigned feature ID to test.
-            The orchestrator claims the feature before spawning the agent.
+        yolo_mode: If True, strip browser automation / Playwright testing
+            instructions and replace with YOLO-mode guidance. This reduces
+            prompt tokens since YOLO mode skips all browser testing anyway.

    Returns:
-        The testing prompt, with pre-assigned feature instructions if applicable.
+        The coding prompt, optionally stripped of testing instructions.
+    """
+    prompt = load_prompt("coding_prompt", project_dir)
+
+    if yolo_mode:
+        prompt = _strip_browser_testing_sections(prompt)
+
+    return prompt
+
+
+def get_testing_prompt(
+    project_dir: Path | None = None,
+    testing_feature_id: int | None = None,
+    testing_feature_ids: list[int] | None = None,
+) -> str:
+    """Load the testing agent prompt (project-specific if available).
+
+    Supports both single-feature and multi-feature testing modes. When
+    testing_feature_ids is provided, the template's {{TESTING_FEATURE_IDS}}
+    placeholder is replaced with the comma-separated list. Falls back to
+    the legacy single-feature header when only testing_feature_id is given.
+
+    Args:
+        project_dir: Optional project directory for project-specific prompts
+        testing_feature_id: If provided, the pre-assigned feature ID to test (legacy single mode).
+        testing_feature_ids: If provided, a list of feature IDs to test (batch mode).
+            Takes precedence over testing_feature_id when both are set.
+
+    Returns:
+        The testing prompt, with feature assignment instructions populated.
    """
    base_prompt = load_prompt("testing_prompt", project_dir)

+    # Batch mode: replace the {{TESTING_FEATURE_IDS}} placeholder in the template
+    if testing_feature_ids is not None and len(testing_feature_ids) > 0:
+        ids_str = ", ".join(str(fid) for fid in testing_feature_ids)
+        return base_prompt.replace("{{TESTING_FEATURE_IDS}}", ids_str)
+
+    # Legacy single-feature mode: prepend header and replace placeholder
    if testing_feature_id is not None:
-        # Prepend pre-assigned feature instructions
-        pre_assigned_header = f"""## ASSIGNED FEATURE
+        # Replace the placeholder with the single ID for template consistency
+        base_prompt = base_prompt.replace("{{TESTING_FEATURE_IDS}}", str(testing_feature_id))
+        return base_prompt

-**You are assigned to regression test Feature #{testing_feature_id}.**
-
-### Your workflow:
-1. Call `feature_get_by_id` with ID {testing_feature_id} to get the feature details
-2. Verify the feature through the UI using browser automation
-3. If regression found, call `feature_mark_failing` with feature_id={testing_feature_id}
-4. Exit when done (no cleanup needed)
-
---
-
-"""
-        return pre_assigned_header + base_prompt
-
-    return base_prompt
+    # No feature assignment -- return template with placeholder cleared
+    return base_prompt.replace("{{TESTING_FEATURE_IDS}}", "(none assigned)")


 def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None, yolo_mode: bool = False) -> str:
@@ -117,13 +199,13 @@ def get_single_feature_prompt(feature_id: int, project_dir: Path | None = None,
    Args:
        feature_id: The specific feature ID to work on
        project_dir: Optional project directory for project-specific prompts
-        yolo_mode: Ignored (kept for backward compatibility). Testing is now
-                   handled by separate testing agents, not YOLO prompts.
+        yolo_mode: If True, strip browser testing instructions from the base
+            coding prompt for reduced token usage in YOLO mode.

    Returns:
        The prompt with single-feature header prepended
    """
-    base_prompt = get_coding_prompt(project_dir)
+    base_prompt = get_coding_prompt(project_dir, yolo_mode=yolo_mode)

    # Minimal header - the base prompt already contains the full workflow
    single_feature_header = f"""## ASSIGNED FEATURE: #{feature_id}
@@ -138,6 +220,52 @@ If blocked, use `feature_skip` and document the blocker.
    return single_feature_header + base_prompt


+def get_batch_feature_prompt(
+    feature_ids: list[int],
+    project_dir: Path | None = None,
+    yolo_mode: bool = False,
+) -> str:
+    """Prepend batch-feature assignment header to base coding prompt.
+
+    Used in parallel mode to assign multiple features to an agent.
+    Features should be implemented sequentially in the given order.
+
+    Args:
+        feature_ids: List of feature IDs to implement in order
+        project_dir: Optional project directory for project-specific prompts
+        yolo_mode: If True, strip browser testing instructions from the base prompt
+
+    Returns:
+        The prompt with batch-feature header prepended
+    """
+    base_prompt = get_coding_prompt(project_dir, yolo_mode=yolo_mode)
+    ids_str = ", ".join(f"#{fid}" for fid in feature_ids)
+
+    batch_header = f"""## ASSIGNED FEATURES (BATCH): {ids_str}
+
+You have been assigned {len(feature_ids)} features to implement sequentially.
+Process them IN ORDER: {ids_str}
+
+### Workflow for each feature:
+1. Call `feature_claim_and_get` with the feature ID to get its details
+2. Implement the feature fully
+3. Verify it works (browser testing if applicable)
+4. Call `feature_mark_passing` to mark it complete
+5. Git commit the changes
+6. Move to the next feature
+
+### Important:
+- Complete each feature fully before starting the next
+- Mark each feature passing individually as you go
+- If blocked on a feature, use `feature_skip` and move to the next one
+- Other agents are handling other features - focus only on yours
+
+---
+
+"""
+    return batch_header + base_prompt
+
+
 def get_app_spec(project_dir: Path) -> str:
    """
    Load the app spec from the project.
@@ -190,9 +318,9 @@ def scaffold_project_prompts(project_dir: Path) -> Path:
    project_prompts = get_project_prompts_dir(project_dir)
    project_prompts.mkdir(parents=True, exist_ok=True)

-    # Create .autocoder directory for configuration files
-    autocoder_dir = project_dir / ".autocoder"
-    autocoder_dir.mkdir(parents=True, exist_ok=True)
+    # Create .autoforge directory with .gitignore for runtime files
+    from autoforge_paths import ensure_autoforge_dir
+    autoforge_dir = ensure_autoforge_dir(project_dir)

    # Define template mappings: (source_template, destination_name)
    templates = [
@@ -215,20 +343,81 @@ def scaffold_project_prompts(project_dir: Path) -> Path:
            except (OSError, PermissionError) as e:
                print(f"  Warning: Could not copy {dest_name}: {e}")

-    # Copy allowed_commands.yaml template to .autocoder/
+    # Copy allowed_commands.yaml template to .autoforge/
    examples_dir = Path(__file__).parent / "examples"
    allowed_commands_template = examples_dir / "project_allowed_commands.yaml"
-    allowed_commands_dest = autocoder_dir / "allowed_commands.yaml"
+    allowed_commands_dest = autoforge_dir / "allowed_commands.yaml"
    if allowed_commands_template.exists() and not allowed_commands_dest.exists():
        try:
            shutil.copy(allowed_commands_template, allowed_commands_dest)
-            copied_files.append(".autocoder/allowed_commands.yaml")
+            copied_files.append(".autoforge/allowed_commands.yaml")
        except (OSError, PermissionError) as e:
            print(f"  Warning: Could not copy allowed_commands.yaml: {e}")

+    # Copy Playwright CLI skill for browser automation
+    skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli"
+    skills_dest = project_dir / ".claude" / "skills" / "playwright-cli"
+    if skills_src.exists() and not skills_dest.exists():
+        try:
+            shutil.copytree(skills_src, skills_dest)
+            copied_files.append(".claude/skills/playwright-cli/")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not copy playwright-cli skill: {e}")
+
+    # Ensure .playwright-cli/ and .playwright/ are in project .gitignore
+    project_gitignore = project_dir / ".gitignore"
+    entries_to_add = [".playwright-cli/", ".playwright/"]
+    existing_lines: list[str] = []
+    if project_gitignore.exists():
+        try:
+            existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines()
+        except (OSError, PermissionError):
+            pass
+    missing_entries = [e for e in entries_to_add if e not in existing_lines]
+    if missing_entries:
+        try:
+            with open(project_gitignore, "a", encoding="utf-8") as f:
+                # Add newline before entries if file doesn't end with one
+                if existing_lines and existing_lines[-1].strip():
+                    f.write("\n")
+                for entry in missing_entries:
+                    f.write(f"{entry}\n")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not update .gitignore: {e}")
+
+    # Scaffold .playwright/cli.config.json for browser settings
+    playwright_config_dir = project_dir / ".playwright"
+    playwright_config_file = playwright_config_dir / "cli.config.json"
+    if not playwright_config_file.exists():
+        try:
+            playwright_config_dir.mkdir(parents=True, exist_ok=True)
+            import json
+            config = {
+                "browser": {
+                    "browserName": "chromium",
+                    "launchOptions": {
+                        "channel": "chrome",
+                        "headless": True,
+                    },
+                    "contextOptions": {
+                        "viewport": {"width": 1280, "height": 720},
+                    },
+                    "isolated": True,
+                },
+            }
+            with open(playwright_config_file, "w", encoding="utf-8") as f:
+                json.dump(config, f, indent=2)
+                f.write("\n")
+            copied_files.append(".playwright/cli.config.json")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not create playwright config: {e}")
+
    if copied_files:
        print(f"  Created project files: {', '.join(copied_files)}")

+    # Stamp new projects at the current migration version so they never trigger migration
+    _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION)
+
    return project_prompts


@@ -300,3 +489,330 @@ def copy_spec_to_project(project_dir: Path) -> None:
            return

    print("Warning: No app_spec.txt found to copy to project directory")
+
+
+# ---------------------------------------------------------------------------
+# Project version migration
+# ---------------------------------------------------------------------------
+
+# Replacement content: coding_prompt.md STEP 5 section (Playwright CLI)
+_CLI_STEP5_CONTENT = """\
+### STEP 5: VERIFY WITH BROWSER AUTOMATION
+
+**CRITICAL:** You MUST verify features through the actual UI.
+
+Use `playwright-cli` for browser automation:
+
+- Open the browser: `playwright-cli open http://localhost:PORT`
+- Take a snapshot to see page elements: `playwright-cli snapshot`
+- Read the snapshot YAML file to see element refs
+- Click elements by ref: `playwright-cli click e5`
+- Type text: `playwright-cli type "search query"`
+- Fill form fields: `playwright-cli fill e3 "value"`
+- Take screenshots: `playwright-cli screenshot`
+- Read the screenshot file to verify visual appearance
+- Check console errors: `playwright-cli console`
+- Close browser when done: `playwright-cli close`
+
+**Token-efficient workflow:** `playwright-cli screenshot` and `snapshot` save files
+to `.playwright-cli/`. You will see a file link in the output. Read the file only
+when you need to verify visual appearance or find element refs.
+
+**DO:**
+- Test through the UI with clicks and keyboard input
+- Take screenshots and read them to verify visual appearance
+- Check for console errors with `playwright-cli console`
+- Verify complete user workflows end-to-end
+- Always run `playwright-cli close` when finished testing
+
+**DON'T:**
+- Only test with curl commands
+- Use JavaScript evaluation to bypass UI (`eval` and `run-code` are blocked)
+- Skip visual verification
+- Mark tests passing without thorough verification
+
+"""
+
+# Replacement content: coding_prompt.md BROWSER AUTOMATION reference section
+_CLI_BROWSER_SECTION = """\
+## BROWSER AUTOMATION
+
+Use `playwright-cli` commands for UI verification. Key commands: `open`, `goto`,
+`snapshot`, `click`, `type`, `fill`, `screenshot`, `console`, `close`.
+
+**How it works:** `playwright-cli` uses a persistent browser daemon. `open` starts it,
+subsequent commands interact via socket, `close` shuts it down. Screenshots and snapshots
+save to `.playwright-cli/` -- read the files when you need to verify content.
+
+Test like a human user with mouse and keyboard. Use `playwright-cli console` to detect
+JS errors. Don't bypass UI with JavaScript evaluation.
+
+"""
+
+# Replacement content: testing_prompt.md STEP 2 section (Playwright CLI)
+_CLI_TESTING_STEP2 = """\
+### STEP 2: VERIFY THE FEATURE
+
+**CRITICAL:** You MUST verify the feature through the actual UI using browser automation.
+
+For the feature returned:
+1. Read and understand the feature's verification steps
+2. Navigate to the relevant part of the application
+3. Execute each verification step using browser automation
+4. Take screenshots and read them to verify visual appearance
+5. Check for console errors
+
+### Browser Automation (Playwright CLI)
+
+**Navigation & Screenshots:**
+- `playwright-cli open <url>` - Open browser and navigate
+- `playwright-cli goto <url>` - Navigate to URL
+- `playwright-cli screenshot` - Save screenshot to `.playwright-cli/`
+- `playwright-cli snapshot` - Save page snapshot with element refs to `.playwright-cli/`
+
+**Element Interaction:**
+- `playwright-cli click <ref>` - Click elements (ref from snapshot)
+- `playwright-cli type <text>` - Type text
+- `playwright-cli fill <ref> <text>` - Fill form fields
+- `playwright-cli select <ref> <val>` - Select dropdown
+- `playwright-cli press <key>` - Keyboard input
+
+**Debugging:**
+- `playwright-cli console` - Check for JS errors
+- `playwright-cli network` - Monitor API calls
+
+**Cleanup:**
+- `playwright-cli close` - Close browser when done (ALWAYS do this)
+
+**Note:** Screenshots and snapshots save to files. Read the file to see the content.
+
+"""
+
+# Replacement content: testing_prompt.md AVAILABLE TOOLS browser subsection
+_CLI_TESTING_TOOLS = """\
+### Browser Automation (Playwright CLI)
+Use `playwright-cli` commands for browser interaction. Key commands:
+- `playwright-cli open <url>` - Open browser
+- `playwright-cli goto <url>` - Navigate to URL
+- `playwright-cli screenshot` - Take screenshot (saved to `.playwright-cli/`)
+- `playwright-cli snapshot` - Get page snapshot with element refs
+- `playwright-cli click <ref>` - Click element
+- `playwright-cli type <text>` - Type text
+- `playwright-cli fill <ref> <text>` - Fill form field
+- `playwright-cli console` - Check for JS errors
+- `playwright-cli close` - Close browser (always do this when done)
+
+"""
+
+
+def _get_migration_version(project_dir: Path) -> int:
+    """Read the migration version from .autoforge/.migration_version."""
+    from autoforge_paths import get_autoforge_dir
+    version_file = get_autoforge_dir(project_dir) / ".migration_version"
+    if not version_file.exists():
+        return 0
+    try:
+        return int(version_file.read_text().strip())
+    except (ValueError, OSError):
+        return 0
+
+
+def _set_migration_version(project_dir: Path, version: int) -> None:
+    """Write the migration version to .autoforge/.migration_version."""
+    from autoforge_paths import get_autoforge_dir
+    version_file = get_autoforge_dir(project_dir) / ".migration_version"
+    version_file.parent.mkdir(parents=True, exist_ok=True)
+    version_file.write_text(str(version))
+
+
+def _migrate_coding_prompt_to_cli(content: str) -> str:
+    """Replace MCP-based Playwright sections with CLI-based content in coding prompt."""
+    # Replace STEP 5 section (from header to just before STEP 5.5)
+    content = re.sub(
+        r"### STEP 5: VERIFY WITH BROWSER AUTOMATION.*?(?=### STEP 5\.5:)",
+        _CLI_STEP5_CONTENT,
+        content,
+        count=1,
+        flags=re.DOTALL,
+    )
+
+    # Replace BROWSER AUTOMATION reference section (from header to next ---)
+    content = re.sub(
+        r"## BROWSER AUTOMATION\n\n.*?(?=---)",
+        _CLI_BROWSER_SECTION,
+        content,
+        count=1,
+        flags=re.DOTALL,
+    )
+
+    # Replace inline screenshot rule
+    content = content.replace(
+        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH SCREENSHOTS.**",
+        "**ONLY MARK A FEATURE AS PASSING AFTER VERIFICATION WITH BROWSER AUTOMATION.**",
+    )
+
+    # Replace inline screenshot references (various phrasings from old templates)
+    for old_phrase in (
+        "(inline only -- do NOT save to disk)",
+        "(inline only, never save to disk)",
+        "(inline mode only -- never save to disk)",
+    ):
+        content = content.replace(old_phrase, "(saved to `.playwright-cli/`)")
+
+    return content
+
+
+def _migrate_testing_prompt_to_cli(content: str) -> str:
+    """Replace MCP-based Playwright sections with CLI-based content in testing prompt."""
+    # Replace AVAILABLE TOOLS browser subsection FIRST (before STEP 2, to avoid
+    # matching the new CLI subsection header that the STEP 2 replacement inserts).
+    # In old prompts, ### Browser Automation (Playwright) only exists in AVAILABLE TOOLS.
+    content = re.sub(
+        r"### Browser Automation \(Playwright[^)]*\)\n.*?(?=---)",
+        _CLI_TESTING_TOOLS,
+        content,
+        count=1,
+        flags=re.DOTALL,
+    )
+
+    # Replace STEP 2 verification section (from header to just before STEP 3)
+    content = re.sub(
+        r"### STEP 2: VERIFY THE FEATURE.*?(?=### STEP 3:)",
+        _CLI_TESTING_STEP2,
+        content,
+        count=1,
+        flags=re.DOTALL,
+    )
+
+    # Replace inline screenshot references (various phrasings from old templates)
+    for old_phrase in (
+        "(inline only -- do NOT save to disk)",
+        "(inline only, never save to disk)",
+        "(inline mode only -- never save to disk)",
+    ):
+        content = content.replace(old_phrase, "(saved to `.playwright-cli/`)")
+
+    return content
+
+
+def _migrate_v0_to_v1(project_dir: Path) -> list[str]:
+    """Migrate from v0 (MCP-based Playwright) to v1 (Playwright CLI).
+
+    Four idempotent sub-steps:
+    A. Copy playwright-cli skill to project
+    B. Scaffold .playwright/cli.config.json
+    C. Update .gitignore with .playwright-cli/ and .playwright/
+    D. Update coding_prompt.md and testing_prompt.md
+    """
+    import json
+
+    migrated: list[str] = []
+
+    # A. Copy Playwright CLI skill
+    skills_src = Path(__file__).parent / ".claude" / "skills" / "playwright-cli"
+    skills_dest = project_dir / ".claude" / "skills" / "playwright-cli"
+    if skills_src.exists() and not skills_dest.exists():
+        try:
+            shutil.copytree(skills_src, skills_dest)
+            migrated.append("Copied playwright-cli skill")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not copy playwright-cli skill: {e}")
+
+    # B. Scaffold .playwright/cli.config.json
+    playwright_config_dir = project_dir / ".playwright"
+    playwright_config_file = playwright_config_dir / "cli.config.json"
+    if not playwright_config_file.exists():
+        try:
+            playwright_config_dir.mkdir(parents=True, exist_ok=True)
+            config = {
+                "browser": {
+                    "browserName": "chromium",
+                    "launchOptions": {
+                        "channel": "chrome",
+                        "headless": True,
+                    },
+                    "contextOptions": {
+                        "viewport": {"width": 1280, "height": 720},
+                    },
+                    "isolated": True,
+                },
+            }
+            with open(playwright_config_file, "w", encoding="utf-8") as f:
+                json.dump(config, f, indent=2)
+                f.write("\n")
+            migrated.append("Created .playwright/cli.config.json")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not create playwright config: {e}")
+
+    # C. Update .gitignore
+    project_gitignore = project_dir / ".gitignore"
+    entries_to_add = [".playwright-cli/", ".playwright/"]
+    existing_lines: list[str] = []
+    if project_gitignore.exists():
+        try:
+            existing_lines = project_gitignore.read_text(encoding="utf-8").splitlines()
+        except (OSError, PermissionError):
+            pass
+    missing_entries = [e for e in entries_to_add if e not in existing_lines]
+    if missing_entries:
+        try:
+            with open(project_gitignore, "a", encoding="utf-8") as f:
+                if existing_lines and existing_lines[-1].strip():
+                    f.write("\n")
+                for entry in missing_entries:
+                    f.write(f"{entry}\n")
+            migrated.append(f"Added {', '.join(missing_entries)} to .gitignore")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not update .gitignore: {e}")
+
+    # D. Update prompts
+    prompts_dir = get_project_prompts_dir(project_dir)
+
+    # D1. Update coding_prompt.md
+    coding_prompt_path = prompts_dir / "coding_prompt.md"
+    if coding_prompt_path.exists():
+        try:
+            content = coding_prompt_path.read_text(encoding="utf-8")
+            if "Playwright MCP" in content or "browser_navigate" in content or "browser_take_screenshot" in content:
+                updated = _migrate_coding_prompt_to_cli(content)
+                if updated != content:
+                    coding_prompt_path.write_text(updated, encoding="utf-8")
+                    migrated.append("Updated coding_prompt.md to Playwright CLI")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not update coding_prompt.md: {e}")
+
+    # D2. Update testing_prompt.md
+    testing_prompt_path = prompts_dir / "testing_prompt.md"
+    if testing_prompt_path.exists():
+        try:
+            content = testing_prompt_path.read_text(encoding="utf-8")
+            if "browser_navigate" in content or "browser_take_screenshot" in content:
+                updated = _migrate_testing_prompt_to_cli(content)
+                if updated != content:
+                    testing_prompt_path.write_text(updated, encoding="utf-8")
+                    migrated.append("Updated testing_prompt.md to Playwright CLI")
+        except (OSError, PermissionError) as e:
+            print(f"  Warning: Could not update testing_prompt.md: {e}")
+
+    return migrated
+
+
+def migrate_project_to_current(project_dir: Path) -> list[str]:
+    """Migrate an existing project to the current AutoForge version.
+
+    Idempotent — safe to call on every agent start. Returns list of
+    human-readable descriptions of what was migrated.
+    """
+    current = _get_migration_version(project_dir)
+    if current >= CURRENT_MIGRATION_VERSION:
+        return []
+
+    migrated: list[str] = []
+
+    if current < 1:
+        migrated.extend(_migrate_v0_to_v1(project_dir))
+
+    # Future: if current < 2: migrated.extend(_migrate_v1_to_v2(project_dir))
+
+    _set_migration_version(project_dir, CURRENT_MIGRATION_VERSION)
+    return migrated
--- a/rate_limit_utils.py
+++ b/rate_limit_utils.py
@@ -0,0 +1,132 @@
+"""
+Rate Limit Utilities
+====================
+
+Shared utilities for detecting and handling API rate limits.
+Used by both agent.py (production) and test_rate_limit_utils.py (tests).
+"""
+
+import random
+import re
+from typing import Optional
+
+# Regex patterns for rate limit detection (used in both exception messages and response text)
+# These patterns use word boundaries to avoid false positives like "PR #429" or "please wait while I..."
+RATE_LIMIT_REGEX_PATTERNS = [
+    r"\brate[_\s]?limit",         # "rate limit", "rate_limit", "ratelimit"
+    r"\btoo\s+many\s+requests",   # "too many requests"
+    r"\bhttp\s*429\b",            # "http 429", "http429"
+    r"\bstatus\s*429\b",          # "status 429", "status429"
+    r"\berror\s*429\b",           # "error 429", "error429"
+    r"\b429\s+too\s+many",        # "429 too many"
+    r"\b(?:server|api|system)\s+(?:is\s+)?overloaded\b",  # "server is overloaded", "api overloaded"
+    r"\bquota\s*exceeded\b",      # "quota exceeded"
+]
+
+# Compiled regex for efficient matching
+_RATE_LIMIT_REGEX = re.compile(
+    "|".join(RATE_LIMIT_REGEX_PATTERNS),
+    re.IGNORECASE
+)
+
+
+def parse_retry_after(error_message: str) -> Optional[int]:
+    """
+    Extract retry-after seconds from various error message formats.
+
+    Handles common formats:
+    - "Retry-After: 60"
+    - "retry after 60 seconds"
+    - "try again in 5 seconds"
+    - "30 seconds remaining"
+
+    Args:
+        error_message: The error message to parse
+
+    Returns:
+        Seconds to wait, or None if not parseable.
+    """
+    # Patterns require explicit "seconds" or "s" unit, OR no unit at all (end of string/sentence)
+    # This prevents matching "30 minutes" or "1 hour" since those have non-seconds units
+    patterns = [
+        r"retry.?after[:\s]+(\d+)\s*(?:seconds?|s\b)",  # Requires seconds unit
+        r"retry.?after[:\s]+(\d+)(?:\s*$|\s*[,.])",     # Or end of string/sentence
+        r"try again in\s+(\d+)\s*(?:seconds?|s\b)",     # Requires seconds unit
+        r"try again in\s+(\d+)(?:\s*$|\s*[,.])",        # Or end of string/sentence
+        r"(\d+)\s*seconds?\s*(?:remaining|left|until)",
+    ]
+
+    for pattern in patterns:
+        match = re.search(pattern, error_message, re.IGNORECASE)
+        if match:
+            return int(match.group(1))
+
+    return None
+
+
+def is_rate_limit_error(error_message: str) -> bool:
+    """
+    Detect if an error message indicates a rate limit.
+
+    Uses regex patterns with word boundaries to avoid false positives
+    like "PR #429", "please wait while I...", or "Node v14.29.0".
+
+    Args:
+        error_message: The error message to check
+
+    Returns:
+        True if the message indicates a rate limit, False otherwise.
+    """
+    return bool(_RATE_LIMIT_REGEX.search(error_message))
+
+
+def calculate_rate_limit_backoff(retries: int) -> int:
+    """
+    Calculate exponential backoff with jitter for rate limits.
+
+    Base formula: min(15 * 2^retries, 3600)
+    Jitter: adds 0-30% random jitter to prevent thundering herd.
+    Base sequence: ~15-20s, ~30-40s, ~60-78s, ~120-156s, ...
+
+    The lower starting delay (15s vs 60s) allows faster recovery from
+    transient rate limits, while jitter prevents synchronized retries
+    when multiple agents hit limits simultaneously.
+
+    Args:
+        retries: Number of consecutive rate limit retries (0-indexed)
+
+    Returns:
+        Delay in seconds (clamped to 1-3600 range, with jitter)
+    """
+    base = int(min(max(15 * (2 ** retries), 1), 3600))
+    jitter = random.uniform(0, base * 0.3)
+    return int(base + jitter)
+
+
+def calculate_error_backoff(retries: int) -> int:
+    """
+    Calculate linear backoff for non-rate-limit errors.
+
+    Formula: min(30 * retries, 300) - caps at 5 minutes
+    Sequence: 30s, 60s, 90s, 120s, ... 300s
+
+    Args:
+        retries: Number of consecutive error retries (1-indexed)
+
+    Returns:
+        Delay in seconds (clamped to 1-300 range)
+    """
+    return min(max(30 * retries, 1), 300)
+
+
+def clamp_retry_delay(delay_seconds: int) -> int:
+    """
+    Clamp a retry delay to a safe range (1-3600 seconds).
+
+    Args:
+        delay_seconds: The raw delay value
+
+    Returns:
+        Delay clamped to 1-3600 seconds
+    """
+    return min(max(delay_seconds, 1), 3600)
--- a/registry.py
+++ b/registry.py
@@ -3,7 +3,7 @@ Project Registry Module
 =======================

 Cross-platform project registry for storing project name to path mappings.
-Uses SQLite database stored at ~/.autocoder/registry.db.
+Uses SQLite database stored at ~/.autoforge/registry.db.
 """

 import logging
@@ -16,14 +16,29 @@ from datetime import datetime
 from pathlib import Path
 from typing import Any

-from sqlalchemy import Column, DateTime, String, create_engine
-from sqlalchemy.ext.declarative import declarative_base
-from sqlalchemy.orm import sessionmaker
+from sqlalchemy import Column, DateTime, Integer, String, create_engine, text
+from sqlalchemy.orm import DeclarativeBase, sessionmaker

 # Module logger
 logger = logging.getLogger(__name__)


+def _migrate_registry_dir() -> None:
+    """Migrate ~/.autocoder/ to ~/.autoforge/ if needed.
+
+    Provides backward compatibility by automatically renaming the old
+    config directory to the new location on first access.
+    """
+    old_dir = Path.home() / ".autocoder"
+    new_dir = Path.home() / ".autoforge"
+    if old_dir.exists() and not new_dir.exists():
+        try:
+            old_dir.rename(new_dir)
+            logger.info("Migrated registry directory: ~/.autocoder/ -> ~/.autoforge/")
+        except Exception:
+            logger.warning("Failed to migrate ~/.autocoder/ to ~/.autoforge/", exc_info=True)
+
+
 # =============================================================================
 # Model Configuration (Single Source of Truth)
 # =============================================================================
@@ -31,15 +46,31 @@ logger = logging.getLogger(__name__)
 # Available models with display names
 # To add a new model: add an entry here with {"id": "model-id", "name": "Display Name"}
 AVAILABLE_MODELS = [
-    {"id": "claude-opus-4-5-20251101", "name": "Claude Opus 4.5"},
-    {"id": "claude-sonnet-4-5-20250929", "name": "Claude Sonnet 4.5"},
+    {"id": "claude-opus-4-6", "name": "Claude Opus"},
+    {"id": "claude-sonnet-4-5-20250929", "name": "Claude Sonnet"},
 ]

+# Map legacy model IDs to their current replacements.
+# Used by get_all_settings() to auto-migrate stale values on first read after upgrade.
+LEGACY_MODEL_MAP = {
+    "claude-opus-4-5-20251101": "claude-opus-4-6",
+}
+
 # List of valid model IDs (derived from AVAILABLE_MODELS)
 VALID_MODELS = [m["id"] for m in AVAILABLE_MODELS]

 # Default model and settings
-DEFAULT_MODEL = "claude-opus-4-5-20251101"
+# Respect ANTHROPIC_DEFAULT_OPUS_MODEL env var for Foundry/custom deployments
+# Guard against empty/whitespace values by trimming and falling back when blank
+_env_default_model = os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL")
+if _env_default_model is not None:
+    _env_default_model = _env_default_model.strip()
+DEFAULT_MODEL = _env_default_model or "claude-opus-4-6"
+
+# Ensure env-provided DEFAULT_MODEL is in VALID_MODELS for validation consistency
+# (idempotent: only adds if missing, doesn't alter AVAILABLE_MODELS semantics)
+if DEFAULT_MODEL and DEFAULT_MODEL not in VALID_MODELS:
+    VALID_MODELS.append(DEFAULT_MODEL)
 DEFAULT_YOLO_MODE = False

 # SQLite connection settings
@@ -75,7 +106,9 @@ class RegistryPermissionDenied(RegistryError):
 # SQLAlchemy Model
 # =============================================================================

-Base = declarative_base()
+class Base(DeclarativeBase):
+    """SQLAlchemy 2.0 style declarative base."""
+    pass


 class Project(Base):
@@ -85,6 +118,7 @@ class Project(Base):
    name = Column(String(50), primary_key=True, index=True)
    path = Column(String, nullable=False)  # POSIX format for cross-platform
    created_at = Column(DateTime, nullable=False)
+    default_concurrency = Column(Integer, nullable=False, default=3)


 class Settings(Base):
@@ -108,12 +142,15 @@ _engine_lock = threading.Lock()

 def get_config_dir() -> Path:
    """
-    Get the config directory: ~/.autocoder/
+    Get the config directory: ~/.autoforge/
+
+    Automatically migrates from ~/.autocoder/ if needed.

    Returns:
-        Path to ~/.autocoder/ (created if it doesn't exist)
+        Path to ~/.autoforge/ (created if it doesn't exist)
    """
-    config_dir = Path.home() / ".autocoder"
+    _migrate_registry_dir()
+    config_dir = Path.home() / ".autoforge"
    config_dir.mkdir(parents=True, exist_ok=True)
    return config_dir

@@ -146,12 +183,26 @@ def _get_engine():
                    }
                )
                Base.metadata.create_all(bind=_engine)
+                _migrate_add_default_concurrency(_engine)
                _SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=_engine)
                logger.debug("Initialized registry database at: %s", db_path)

    return _engine, _SessionLocal


+def _migrate_add_default_concurrency(engine) -> None:
+    """Add default_concurrency column if missing (for existing databases)."""
+    with engine.connect() as conn:
+        result = conn.execute(text("PRAGMA table_info(projects)"))
+        columns = [row[1] for row in result.fetchall()]
+        if "default_concurrency" not in columns:
+            conn.execute(text(
+                "ALTER TABLE projects ADD COLUMN default_concurrency INTEGER DEFAULT 3"
+            ))
+            conn.commit()
+            logger.info("Migrated projects table: added default_concurrency column")
+
+
@contextmanager
 def _get_session():
    """
@@ -307,7 +358,8 @@ def list_registered_projects() -> dict[str, dict[str, Any]]:
        return {
            p.name: {
                "path": p.path,
-                "created_at": p.created_at.isoformat() if p.created_at else None
+                "created_at": p.created_at.isoformat() if p.created_at else None,
+                "default_concurrency": getattr(p, 'default_concurrency', 3) or 3
            }
            for p in projects
        }
@@ -333,7 +385,8 @@ def get_project_info(name: str) -> dict[str, Any] | None:
            return None
        return {
            "path": project.path,
-            "created_at": project.created_at.isoformat() if project.created_at else None
+            "created_at": project.created_at.isoformat() if project.created_at else None,
+            "default_concurrency": getattr(project, 'default_concurrency', 3) or 3
        }
    finally:
        session.close()
@@ -362,6 +415,55 @@ def update_project_path(name: str, new_path: Path) -> bool:
    return True


+def get_project_concurrency(name: str) -> int:
+    """
+    Get project's default concurrency (1-5).
+
+    Args:
+        name: The project name.
+
+    Returns:
+        The default concurrency value (defaults to 3 if not set or project not found).
+    """
+    _, SessionLocal = _get_engine()
+    session = SessionLocal()
+    try:
+        project = session.query(Project).filter(Project.name == name).first()
+        if project is None:
+            return 3
+        return getattr(project, 'default_concurrency', 3) or 3
+    finally:
+        session.close()
+
+
+def set_project_concurrency(name: str, concurrency: int) -> bool:
+    """
+    Set project's default concurrency (1-5).
+
+    Args:
+        name: The project name.
+        concurrency: The concurrency value (1-5).
+
+    Returns:
+        True if updated, False if project wasn't found.
+
+    Raises:
+        ValueError: If concurrency is not between 1 and 5.
+    """
+    if concurrency < 1 or concurrency > 5:
+        raise ValueError("concurrency must be between 1 and 5")
+
+    with _get_session() as session:
+        project = session.query(Project).filter(Project.name == name).first()
+        if not project:
+            return False
+
+        project.default_concurrency = concurrency
+
+    logger.info("Set project '%s' default_concurrency to %d", name, concurrency)
+    return True
+
+
 # =============================================================================
 # Validation Functions
 # =============================================================================
@@ -502,6 +604,9 @@ def get_all_settings() -> dict[str, str]:
    """
    Get all settings as a dictionary.

+    Automatically migrates legacy model IDs (e.g. claude-opus-4-5-20251101 -> claude-opus-4-6)
+    on first read after upgrade. This is a one-time silent migration.
+
    Returns:
        Dictionary mapping setting keys to values.
    """
@@ -510,9 +615,171 @@ def get_all_settings() -> dict[str, str]:
        session = SessionLocal()
        try:
            settings = session.query(Settings).all()
-            return {s.key: s.value for s in settings}
+            result = {s.key: s.value for s in settings}
+
+            # Auto-migrate legacy model IDs
+            migrated = False
+            for key in ("model", "api_model"):
+                old_id = result.get(key)
+                if old_id and old_id in LEGACY_MODEL_MAP:
+                    new_id = LEGACY_MODEL_MAP[old_id]
+                    setting = session.query(Settings).filter(Settings.key == key).first()
+                    if setting:
+                        setting.value = new_id
+                        setting.updated_at = datetime.now()
+                        result[key] = new_id
+                        migrated = True
+                        logger.info("Migrated setting '%s': %s -> %s", key, old_id, new_id)
+
+            if migrated:
+                session.commit()
+
+            return result
        finally:
            session.close()
    except Exception as e:
        logger.warning("Failed to read settings: %s", e)
        return {}
+
+
+# =============================================================================
+# API Provider Definitions
+# =============================================================================
+
+API_PROVIDERS: dict[str, dict[str, Any]] = {
+    "claude": {
+        "name": "Claude (Anthropic)",
+        "base_url": None,
+        "requires_auth": False,
+        "models": [
+            {"id": "claude-opus-4-6", "name": "Claude Opus"},
+            {"id": "claude-sonnet-4-5-20250929", "name": "Claude Sonnet"},
+        ],
+        "default_model": "claude-opus-4-6",
+    },
+    "kimi": {
+        "name": "Kimi K2.5 (Moonshot)",
+        "base_url": "https://api.kimi.com/coding/",
+        "requires_auth": True,
+        "auth_env_var": "ANTHROPIC_API_KEY",
+        "models": [{"id": "kimi-k2.5", "name": "Kimi K2.5"}],
+        "default_model": "kimi-k2.5",
+    },
+    "glm": {
+        "name": "GLM (Zhipu AI)",
+        "base_url": "https://api.z.ai/api/anthropic",
+        "requires_auth": True,
+        "auth_env_var": "ANTHROPIC_AUTH_TOKEN",
+        "models": [
+            {"id": "glm-4.7", "name": "GLM 4.7"},
+            {"id": "glm-4.5-air", "name": "GLM 4.5 Air"},
+        ],
+        "default_model": "glm-4.7",
+    },
+    "azure": {
+        "name": "Azure Anthropic (Claude)",
+        "base_url": "",
+        "requires_auth": True,
+        "auth_env_var": "ANTHROPIC_API_KEY",
+        "models": [
+            {"id": "claude-opus-4-6", "name": "Claude Opus"},
+            {"id": "claude-sonnet-4-5", "name": "Claude Sonnet"},
+            {"id": "claude-haiku-4-5", "name": "Claude Haiku"},
+        ],
+        "default_model": "claude-opus-4-6",
+    },
+    "ollama": {
+        "name": "Ollama (Local)",
+        "base_url": "http://localhost:11434",
+        "requires_auth": False,
+        "models": [
+            {"id": "qwen3-coder", "name": "Qwen3 Coder"},
+            {"id": "deepseek-coder-v2", "name": "DeepSeek Coder V2"},
+        ],
+        "default_model": "qwen3-coder",
+    },
+    "custom": {
+        "name": "Custom Provider",
+        "base_url": "",
+        "requires_auth": True,
+        "auth_env_var": "ANTHROPIC_AUTH_TOKEN",
+        "models": [],
+        "default_model": "",
+    },
+}
+
+
+def get_effective_sdk_env() -> dict[str, str]:
+    """Build environment variable dict for Claude SDK based on current API provider settings.
+
+    When api_provider is "claude" (or unset), falls back to existing env vars (current behavior).
+    For other providers, builds env dict from stored settings (api_base_url, api_auth_token, api_model).
+
+    Returns:
+        Dict ready to merge into subprocess env or pass to SDK.
+    """
+    all_settings = get_all_settings()
+    provider_id = all_settings.get("api_provider", "claude")
+
+    if provider_id == "claude":
+        # Default behavior: forward existing env vars
+        from env_constants import API_ENV_VARS
+        sdk_env: dict[str, str] = {}
+        for var in API_ENV_VARS:
+            value = os.getenv(var)
+            if value:
+                sdk_env[var] = value
+        return sdk_env
+
+    # Alternative provider: build env from settings
+    provider = API_PROVIDERS.get(provider_id)
+    if not provider:
+        logger.warning("Unknown API provider '%s', falling back to claude", provider_id)
+        from env_constants import API_ENV_VARS
+        sdk_env = {}
+        for var in API_ENV_VARS:
+            value = os.getenv(var)
+            if value:
+                sdk_env[var] = value
+        return sdk_env
+
+    sdk_env = {}
+
+    # Explicitly clear credentials that could leak from the server process env.
+    # For providers using ANTHROPIC_AUTH_TOKEN (GLM, Custom), clear ANTHROPIC_API_KEY.
+    # For providers using ANTHROPIC_API_KEY (Kimi), clear ANTHROPIC_AUTH_TOKEN.
+    # This prevents the Claude CLI from using the wrong credentials.
+    auth_env_var = provider.get("auth_env_var", "ANTHROPIC_AUTH_TOKEN")
+    if auth_env_var == "ANTHROPIC_AUTH_TOKEN":
+        sdk_env["ANTHROPIC_API_KEY"] = ""
+    elif auth_env_var == "ANTHROPIC_API_KEY":
+        sdk_env["ANTHROPIC_AUTH_TOKEN"] = ""
+
+    # Clear Vertex AI vars when using non-Vertex alternative providers
+    sdk_env["CLAUDE_CODE_USE_VERTEX"] = ""
+    sdk_env["CLOUD_ML_REGION"] = ""
+    sdk_env["ANTHROPIC_VERTEX_PROJECT_ID"] = ""
+
+    # Base URL
+    base_url = all_settings.get("api_base_url") or provider.get("base_url")
+    if base_url:
+        sdk_env["ANTHROPIC_BASE_URL"] = base_url
+
+    # Auth token
+    auth_token = all_settings.get("api_auth_token")
+    if auth_token:
+        sdk_env[auth_env_var] = auth_token
+
+    # Model - set all three tier overrides to the same model
+    model = all_settings.get("api_model") or provider.get("default_model")
+    if model:
+        sdk_env["ANTHROPIC_DEFAULT_OPUS_MODEL"] = model
+        sdk_env["ANTHROPIC_DEFAULT_SONNET_MODEL"] = model
+        sdk_env["ANTHROPIC_DEFAULT_HAIKU_MODEL"] = model
+
+    # Timeout
+    timeout = all_settings.get("api_timeout_ms")
+    if timeout:
+        sdk_env["API_TIMEOUT_MS"] = timeout
+
+    return sdk_env
--- a/requirements-prod.txt
+++ b/requirements-prod.txt
@@ -0,0 +1,14 @@
+# Production runtime dependencies only
+# For development, use requirements.txt (includes ruff, mypy, pytest)
+claude-agent-sdk>=0.1.0,<0.2.0
+python-dotenv>=1.0.0
+sqlalchemy>=2.0.0
+fastapi>=0.115.0
+uvicorn[standard]>=0.32.0
+websockets>=13.0
+python-multipart>=0.0.17
+psutil>=6.0.0
+aiofiles>=24.0.0
+apscheduler>=3.10.0,<4.0.0
+pywinpty>=2.0.0; sys_platform == "win32"
+pyyaml>=6.0.0
--- a/requirements.txt
+++ b/requirements.txt
@@ -15,3 +15,4 @@ pyyaml>=6.0.0
 ruff>=0.8.0
 mypy>=1.13.0
 pytest>=8.0.0
+types-PyYAML>=6.0.0
--- a/security.py
+++ b/security.py
@@ -6,13 +6,22 @@ Pre-tool-use hooks that validate bash commands for security.
 Uses an allowlist approach - only explicitly permitted commands can run.
 """

+import logging
 import os
+import re
 import shlex
 from pathlib import Path
 from typing import Optional

 import yaml

+# Logger for security-related events (fallback parsing, validation failures, etc.)
+logger = logging.getLogger(__name__)
+
+# Regex pattern for valid pkill process names (no regex metacharacters allowed)
+# Matches alphanumeric names with dots, underscores, and hyphens
+VALID_PROCESS_NAME_PATTERN = re.compile(r"^[A-Za-z0-9._-]+$")
+
 # Allowed commands for development tasks
 # Minimal set needed for the autonomous coding demo
 ALLOWED_COMMANDS = {
@@ -57,10 +66,12 @@ ALLOWED_COMMANDS = {
    "bash",
    # Script execution
    "init.sh",  # Init scripts; validated separately
+    # Browser automation
+    "playwright-cli",  # Playwright CLI for browser testing; validated separately
 }

 # Commands that need additional validation even when in the allowlist
-COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"}
+COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh", "playwright-cli"}

 # Commands that are NEVER allowed, even with user approval
 # These commands can cause permanent system damage or security breaches
@@ -88,6 +99,31 @@ BLOCKED_COMMANDS = {
    "ufw",
 }

+# Sensitive directories (relative to home) that should never be exposed.
+# Used by both the EXTRA_READ_PATHS validator (client.py) and the filesystem
+# browser API (server/routers/filesystem.py) to block credential/key directories.
+# This is the single source of truth -- import from here in both places.
+#
+# SENSITIVE_DIRECTORIES is the union of the previous filesystem browser blocklist
+# (filesystem.py) and the previous EXTRA_READ_PATHS blocklist (client.py).
+# Some entries are new to each consumer -- this is intentional for defense-in-depth.
+SENSITIVE_DIRECTORIES = {
+    ".ssh",
+    ".aws",
+    ".azure",
+    ".kube",
+    ".gnupg",
+    ".gpg",
+    ".password-store",
+    ".docker",
+    ".config/gcloud",
+    ".config/gh",
+    ".npmrc",
+    ".pypirc",
+    ".netrc",
+    ".terraform",
+}
+
 # Commands that trigger emphatic warnings but CAN be approved (Phase 3)
 # For now, these are blocked like BLOCKED_COMMANDS until Phase 3 implements approval
 DANGEROUS_COMMANDS = {
@@ -135,6 +171,45 @@ def split_command_segments(command_string: str) -> list[str]:
    return result


+def _extract_primary_command(segment: str) -> str | None:
+    """
+    Fallback command extraction when shlex fails.
+
+    Extracts the first word that looks like a command, handling cases
+    like complex docker exec commands with nested quotes.
+
+    Args:
+        segment: The command segment to parse
+
+    Returns:
+        The primary command name, or None if extraction fails
+    """
+    # Remove leading whitespace
+    segment = segment.lstrip()
+
+    if not segment:
+        return None
+
+    # Skip env var assignments at start (VAR=value cmd)
+    words = segment.split()
+    while words and "=" in words[0] and not words[0].startswith("="):
+        words = words[1:]
+
+    if not words:
+        return None
+
+    # Extract first token (the command)
+    first_word = words[0]
+
+    # Match valid command characters (alphanumeric, dots, underscores, hyphens, slashes)
+    match = re.match(r"^([a-zA-Z0-9_./-]+)", first_word)
+    if match:
+        cmd = match.group(1)
+        return os.path.basename(cmd)
+
+    return None
+
+
 def extract_commands(command_string: str) -> list[str]:
    """
    Extract command names from a shell command string.
@@ -151,7 +226,6 @@ def extract_commands(command_string: str) -> list[str]:
    commands = []

    # shlex doesn't treat ; as a separator, so we need to pre-process
-    import re

    # Split on semicolons that aren't inside quotes (simple heuristic)
    # This handles common cases like "echo hello; ls"
@@ -166,8 +240,21 @@ def extract_commands(command_string: str) -> list[str]:
            tokens = shlex.split(segment)
        except ValueError:
            # Malformed command (unclosed quotes, etc.)
-            # Return empty to trigger block (fail-safe)
-            return []
+            # Try fallback extraction instead of blocking entirely
+            fallback_cmd = _extract_primary_command(segment)
+            if fallback_cmd:
+                logger.debug(
+                    "shlex fallback used: segment=%r -> command=%r",
+                    segment,
+                    fallback_cmd,
+                )
+                commands.append(fallback_cmd)
+            else:
+                logger.debug(
+                    "shlex fallback failed: segment=%r (no command extracted)",
+                    segment,
+                )
+            continue

        if not tokens:
            continue
@@ -219,23 +306,37 @@ def extract_commands(command_string: str) -> list[str]:
    return commands


-def validate_pkill_command(command_string: str) -> tuple[bool, str]:
+# Default pkill process names (hardcoded baseline, always available)
+DEFAULT_PKILL_PROCESSES = {
+    "node",
+    "npm",
+    "npx",
+    "vite",
+    "next",
+}
+
+
+def validate_pkill_command(
+    command_string: str,
+    extra_processes: Optional[set[str]] = None
+) -> tuple[bool, str]:
    """
    Validate pkill commands - only allow killing dev-related processes.

    Uses shlex to parse the command, avoiding regex bypass vulnerabilities.

+    Args:
+        command_string: The pkill command to validate
+        extra_processes: Optional set of additional process names to allow
+                        (from org/project config pkill_processes)
+
    Returns:
        Tuple of (is_allowed, reason_if_blocked)
    """
-    # Allowed process names for pkill
-    allowed_process_names = {
-        "node",
-        "npm",
-        "npx",
-        "vite",
-        "next",
-    }
+    # Merge default processes with any extra configured processes
+    allowed_process_names = DEFAULT_PKILL_PROCESSES.copy()
+    if extra_processes:
+        allowed_process_names |= extra_processes

    try:
        tokens = shlex.split(command_string)
@@ -254,17 +355,19 @@ def validate_pkill_command(command_string: str) -> tuple[bool, str]:
    if not args:
        return False, "pkill requires a process name"

-    # The target is typically the last non-flag argument
-    target = args[-1]
+    # Validate every non-flag argument (pkill accepts multiple patterns on BSD)
+    # This defensively ensures no disallowed process can be targeted
+    targets = []
+    for arg in args:
+        # For -f flag (full command line match), take the first word as process name
+        # e.g., "pkill -f 'node server.js'" -> target is "node server.js", process is "node"
+        t = arg.split()[0] if " " in arg else arg
+        targets.append(t)

-    # For -f flag (full command line match), extract the first word as process name
-    # e.g., "pkill -f 'node server.js'" -> target is "node server.js", process is "node"
-    if " " in target:
-        target = target.split()[0]
-
-    if target in allowed_process_names:
+    disallowed = [t for t in targets if t not in allowed_process_names]
+    if not disallowed:
        return True, ""
-    return False, f"pkill only allowed for dev processes: {allowed_process_names}"
+    return False, f"pkill only allowed for processes: {sorted(allowed_process_names)}"


 def validate_chmod_command(command_string: str) -> tuple[bool, str]:
@@ -337,22 +440,35 @@ def validate_init_script(command_string: str) -> tuple[bool, str]:
    return False, f"Only ./init.sh is allowed, got: {script}"


-def get_command_for_validation(cmd: str, segments: list[str]) -> str:
+def validate_playwright_command(command_string: str) -> tuple[bool, str]:
    """
-    Find the specific command segment that contains the given command.
+    Validate playwright-cli commands - block dangerous subcommands.

-    Args:
-        cmd: The command name to find
-        segments: List of command segments
+    Blocks `run-code` (arbitrary Node.js execution) and `eval` (arbitrary JS
+    evaluation) which bypass the security sandbox.

    Returns:
-        The segment containing the command, or empty string if not found
+        Tuple of (is_allowed, reason_if_blocked)
    """
-    for segment in segments:
-        segment_commands = extract_commands(segment)
-        if cmd in segment_commands:
-            return segment
-    return ""
+    try:
+        tokens = shlex.split(command_string)
+    except ValueError:
+        return False, "Could not parse playwright-cli command"
+
+    if not tokens:
+        return False, "Empty command"
+
+    BLOCKED_SUBCOMMANDS = {"run-code", "eval"}
+
+    # Find the subcommand: first non-flag token after 'playwright-cli'
+    for token in tokens[1:]:
+        if token.startswith("-"):
+            continue  # skip flags like -s=agent-1
+        if token in BLOCKED_SUBCOMMANDS:
+            return False, f"playwright-cli '{token}' is not allowed"
+        break  # first non-flag token is the subcommand
+
+    return True, ""


 def matches_pattern(command: str, pattern: str) -> bool:
@@ -396,19 +512,97 @@ def matches_pattern(command: str, pattern: str) -> bool:
    return False


+def _validate_command_list(commands: list, config_path: Path, field_name: str) -> bool:
+    """
+    Validate a list of command entries from a YAML config.
+
+    Each entry must be a dict with a non-empty string 'name' field.
+    Used by both load_org_config() and load_project_commands() to avoid
+    duplicating the same validation logic.
+
+    Args:
+        commands: List of command entries to validate
+        config_path: Path to the config file (for log messages)
+        field_name: Name of the YAML field being validated (e.g., 'allowed_commands', 'commands')
+
+    Returns:
+        True if all entries are valid, False otherwise
+    """
+    if not isinstance(commands, list):
+        logger.warning(f"Config at {config_path}: '{field_name}' must be a list")
+        return False
+    for i, cmd in enumerate(commands):
+        if not isinstance(cmd, dict):
+            logger.warning(f"Config at {config_path}: {field_name}[{i}] must be a dict")
+            return False
+        if "name" not in cmd:
+            logger.warning(f"Config at {config_path}: {field_name}[{i}] missing 'name'")
+            return False
+        if not isinstance(cmd["name"], str) or cmd["name"].strip() == "":
+            logger.warning(f"Config at {config_path}: {field_name}[{i}] has invalid 'name'")
+            return False
+    return True
+
+
+def _validate_pkill_processes(config: dict, config_path: Path) -> Optional[list[str]]:
+    """
+    Validate and normalize pkill_processes from a YAML config.
+
+    Each entry must be a non-empty string matching VALID_PROCESS_NAME_PATTERN
+    (alphanumeric, dots, underscores, hyphens only -- no regex metacharacters).
+    Used by both load_org_config() and load_project_commands().
+
+    Args:
+        config: Parsed YAML config dict that may contain 'pkill_processes'
+        config_path: Path to the config file (for log messages)
+
+    Returns:
+        Normalized list of process names, or None if validation fails.
+        Returns an empty list if 'pkill_processes' is not present.
+    """
+    if "pkill_processes" not in config:
+        return []
+
+    processes = config["pkill_processes"]
+    if not isinstance(processes, list):
+        logger.warning(f"Config at {config_path}: 'pkill_processes' must be a list")
+        return None
+
+    normalized = []
+    for i, proc in enumerate(processes):
+        if not isinstance(proc, str):
+            logger.warning(f"Config at {config_path}: pkill_processes[{i}] must be a string")
+            return None
+        proc = proc.strip()
+        if not proc or not VALID_PROCESS_NAME_PATTERN.fullmatch(proc):
+            logger.warning(f"Config at {config_path}: pkill_processes[{i}] has invalid value '{proc}'")
+            return None
+        normalized.append(proc)
+    return normalized
+
+
 def get_org_config_path() -> Path:
    """
    Get the organization-level config file path.

    Returns:
-        Path to ~/.autocoder/config.yaml
+        Path to ~/.autoforge/config.yaml (falls back to ~/.autocoder/config.yaml)
    """
-    return Path.home() / ".autocoder" / "config.yaml"
+    new_path = Path.home() / ".autoforge" / "config.yaml"
+    if new_path.exists():
+        return new_path
+    # Backward compatibility: check old location
+    old_path = Path.home() / ".autocoder" / "config.yaml"
+    if old_path.exists():
+        return old_path
+    return new_path


 def load_org_config() -> Optional[dict]:
    """
-    Load organization-level config from ~/.autocoder/config.yaml.
+    Load organization-level config from ~/.autoforge/config.yaml.
+
+    Falls back to ~/.autocoder/config.yaml for backward compatibility.

    Returns:
        Dict with parsed org config, or None if file doesn't exist or is invalid
@@ -423,41 +617,48 @@ def load_org_config() -> Optional[dict]:
            config = yaml.safe_load(f)

        if not config:
+            logger.warning(f"Org config at {config_path} is empty")
            return None

        # Validate structure
        if not isinstance(config, dict):
+            logger.warning(f"Org config at {config_path} must be a YAML dictionary")
            return None

        if "version" not in config:
+            logger.warning(f"Org config at {config_path} missing required 'version' field")
            return None

        # Validate allowed_commands if present
        if "allowed_commands" in config:
-            allowed = config["allowed_commands"]
-            if not isinstance(allowed, list):
+            if not _validate_command_list(config["allowed_commands"], config_path, "allowed_commands"):
                return None
-            for cmd in allowed:
-                if not isinstance(cmd, dict):
-                    return None
-                if "name" not in cmd:
-                    return None
-                # Validate that name is a non-empty string
-                if not isinstance(cmd["name"], str) or cmd["name"].strip() == "":
-                    return None

        # Validate blocked_commands if present
        if "blocked_commands" in config:
            blocked = config["blocked_commands"]
            if not isinstance(blocked, list):
+                logger.warning(f"Org config at {config_path}: 'blocked_commands' must be a list")
                return None
-            for cmd in blocked:
+            for i, cmd in enumerate(blocked):
                if not isinstance(cmd, str):
+                    logger.warning(f"Org config at {config_path}: blocked_commands[{i}] must be a string")
                    return None

+        # Validate pkill_processes if present
+        normalized = _validate_pkill_processes(config, config_path)
+        if normalized is None:
+            return None
+        if normalized:
+            config["pkill_processes"] = normalized
+
        return config

-    except (yaml.YAMLError, IOError, OSError):
+    except yaml.YAMLError as e:
+        logger.warning(f"Failed to parse org config at {config_path}: {e}")
+        return None
+    except (IOError, OSError) as e:
+        logger.warning(f"Failed to read org config at {config_path}: {e}")
        return None


@@ -471,7 +672,10 @@ def load_project_commands(project_dir: Path) -> Optional[dict]:
    Returns:
        Dict with parsed YAML config, or None if file doesn't exist or is invalid
    """
-    config_path = project_dir / ".autocoder" / "allowed_commands.yaml"
+    # Check new location first, fall back to old for backward compatibility
+    config_path = project_dir.resolve() / ".autoforge" / "allowed_commands.yaml"
+    if not config_path.exists():
+        config_path = project_dir.resolve() / ".autocoder" / "allowed_commands.yaml"

    if not config_path.exists():
        return None
@@ -481,36 +685,43 @@ def load_project_commands(project_dir: Path) -> Optional[dict]:
            config = yaml.safe_load(f)

        if not config:
+            logger.warning(f"Project config at {config_path} is empty")
            return None

        # Validate structure
        if not isinstance(config, dict):
+            logger.warning(f"Project config at {config_path} must be a YAML dictionary")
            return None

        if "version" not in config:
+            logger.warning(f"Project config at {config_path} missing required 'version' field")
            return None

        commands = config.get("commands", [])
-        if not isinstance(commands, list):
-            return None

        # Enforce 100 command limit
-        if len(commands) > 100:
+        if isinstance(commands, list) and len(commands) > 100:
+            logger.warning(f"Project config at {config_path} exceeds 100 command limit ({len(commands)} commands)")
            return None

-        # Validate each command entry
-        for cmd in commands:
-            if not isinstance(cmd, dict):
-                return None
-            if "name" not in cmd:
-                return None
-            # Validate name is a string
-            if not isinstance(cmd["name"], str):
-                return None
+        # Validate each command entry using shared helper
+        if not _validate_command_list(commands, config_path, "commands"):
+            return None
+
+        # Validate pkill_processes if present
+        normalized = _validate_pkill_processes(config, config_path)
+        if normalized is None:
+            return None
+        if normalized:
+            config["pkill_processes"] = normalized

        return config

-    except (yaml.YAMLError, IOError, OSError):
+    except yaml.YAMLError as e:
+        logger.warning(f"Failed to parse project config at {config_path}: {e}")
+        return None
+    except (IOError, OSError) as e:
+        logger.warning(f"Failed to read project config at {config_path}: {e}")
        return None


@@ -518,8 +729,12 @@ def validate_project_command(cmd_config: dict) -> tuple[bool, str]:
    """
    Validate a single command entry from project config.

+    Checks that the command has a valid name and is not in any blocklist.
+    Called during hierarchy resolution to gate each project command before
+    it is added to the effective allowed set.
+
    Args:
-        cmd_config: Dict with command configuration (name, description, args)
+        cmd_config: Dict with command configuration (name, description)

    Returns:
        Tuple of (is_valid, error_message)
@@ -549,15 +764,6 @@ def validate_project_command(cmd_config: dict) -> tuple[bool, str]:
    if "description" in cmd_config and not isinstance(cmd_config["description"], str):
        return False, "Description must be a string"

-    # Args validation (Phase 1 - just check structure)
-    if "args" in cmd_config:
-        args = cmd_config["args"]
-        if not isinstance(args, list):
-            return False, "Args must be a list"
-        for arg in args:
-            if not isinstance(arg, str):
-                return False, "Each arg must be a string"
-
    return True, ""


@@ -628,6 +834,42 @@ def get_project_allowed_commands(project_dir: Optional[Path]) -> set[str]:
    return allowed


+def get_effective_pkill_processes(project_dir: Optional[Path]) -> set[str]:
+    """
+    Get effective pkill process names after hierarchy resolution.
+
+    Merges processes from:
+    1. DEFAULT_PKILL_PROCESSES (hardcoded baseline)
+    2. Org config pkill_processes
+    3. Project config pkill_processes
+
+    Args:
+        project_dir: Path to the project directory, or None
+
+    Returns:
+        Set of allowed process names for pkill
+    """
+    # Start with default processes
+    processes = DEFAULT_PKILL_PROCESSES.copy()
+
+    # Add org-level pkill_processes
+    org_config = load_org_config()
+    if org_config:
+        org_processes = org_config.get("pkill_processes", [])
+        if isinstance(org_processes, list):
+            processes |= {p for p in org_processes if isinstance(p, str) and p.strip()}
+
+    # Add project-level pkill_processes
+    if project_dir:
+        project_config = load_project_commands(project_dir)
+        if project_config:
+            proj_processes = project_config.get("pkill_processes", [])
+            if isinstance(proj_processes, list):
+                processes |= {p for p in proj_processes if isinstance(p, str) and p.strip()}
+
+    return processes
+
+
 def is_command_allowed(command: str, allowed_commands: set[str]) -> bool:
    """
    Check if a command is allowed (supports patterns).
@@ -692,6 +934,9 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None):
    # Get effective commands using hierarchy resolution
    allowed_commands, blocked_commands = get_effective_commands(project_dir)

+    # Get effective pkill processes (includes org/project config)
+    pkill_processes = get_effective_pkill_processes(project_dir)
+
    # Split into segments for per-command validation
    segments = split_command_segments(command)

@@ -709,7 +954,7 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None):
            # Provide helpful error message with config hint
            error_msg = f"Command '{cmd}' is not allowed.\n"
            error_msg += "To allow this command:\n"
-            error_msg += "  1. Add to .autocoder/allowed_commands.yaml for this project, OR\n"
+            error_msg += "  1. Add to .autoforge/allowed_commands.yaml for this project, OR\n"
            error_msg += "  2. Request mid-session approval (the agent can ask)\n"
            error_msg += "Note: Some commands are blocked at org-level and cannot be overridden."
            return {
@@ -719,13 +964,20 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None):

        # Additional validation for sensitive commands
        if cmd in COMMANDS_NEEDING_EXTRA_VALIDATION:
-            # Find the specific segment containing this command
-            cmd_segment = get_command_for_validation(cmd, segments)
+            # Find the specific segment containing this command by searching
+            # each segment's extracted commands for a match
+            cmd_segment = ""
+            for segment in segments:
+                if cmd in extract_commands(segment):
+                    cmd_segment = segment
+                    break
            if not cmd_segment:
                cmd_segment = command  # Fallback to full command

            if cmd == "pkill":
-                allowed, reason = validate_pkill_command(cmd_segment)
+                # Pass configured extra processes (beyond defaults)
+                extra_procs = pkill_processes - DEFAULT_PKILL_PROCESSES
+                allowed, reason = validate_pkill_command(cmd_segment, extra_procs if extra_procs else None)
                if not allowed:
                    return {"decision": "block", "reason": reason}
            elif cmd == "chmod":
@@ -736,5 +988,9 @@ async def bash_security_hook(input_data, tool_use_id=None, context=None):
                allowed, reason = validate_init_script(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
+            elif cmd == "playwright-cli":
+                allowed, reason = validate_playwright_command(cmd_segment)
+                if not allowed:
+                    return {"decision": "block", "reason": reason}

    return {}
--- a/server/main.py
+++ b/server/main.py
@@ -7,6 +7,7 @@ Provides REST API, WebSocket, and static file serving.
 """

 import asyncio
+import logging
 import os
 import shutil
 import sys
@@ -42,6 +43,7 @@ from .routers import (
 )
 from .schemas import SetupStatus
 from .services.assistant_chat_session import cleanup_all_sessions as cleanup_assistant_sessions
+from .services.chat_constants import ROOT_DIR
 from .services.dev_server_manager import (
    cleanup_all_devservers,
    cleanup_orphaned_devserver_locks,
@@ -53,13 +55,23 @@ from .services.terminal_manager import cleanup_all_terminals
 from .websocket import project_websocket

 # Paths
-ROOT_DIR = Path(__file__).parent.parent
 UI_DIST_DIR = ROOT_DIR / "ui" / "dist"


@asynccontextmanager
 async def lifespan(app: FastAPI):
    """Lifespan context manager for startup and shutdown."""
+    # Startup - clean up stale temp files (Playwright profiles, .node cache, etc.)
+    try:
+        from temp_cleanup import cleanup_stale_temp
+        stats = cleanup_stale_temp()
+        if stats["dirs_deleted"] > 0 or stats["files_deleted"] > 0:
+            mb_freed = stats["bytes_freed"] / (1024 * 1024)
+            logger.info("Startup temp cleanup: %d dirs, %d files, %.1f MB freed",
+                        stats["dirs_deleted"], stats["files_deleted"], mb_freed)
+    except Exception as e:
+        logger.warning("Startup temp cleanup failed (non-fatal): %s", e)
+
    # Startup - clean up orphaned lock files from previous runs
    cleanup_orphaned_locks()
    cleanup_orphaned_devserver_locks()
@@ -88,35 +100,58 @@ app = FastAPI(
    lifespan=lifespan,
 )

-# CORS - allow only localhost origins for security
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=[
-        "http://localhost:5173",      # Vite dev server
-        "http://127.0.0.1:5173",
-        "http://localhost:8888",      # Production
-        "http://127.0.0.1:8888",
-    ],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
+# Module logger
+logger = logging.getLogger(__name__)
+
+# Check if remote access is enabled via environment variable
+# Set by start_ui.py when --host is not 127.0.0.1
+ALLOW_REMOTE = os.environ.get("AUTOFORGE_ALLOW_REMOTE", "").lower() in ("1", "true", "yes")
+
+if ALLOW_REMOTE:
+    logger.warning(
+        "ALLOW_REMOTE is enabled. Terminal WebSocket is exposed without sandboxing. "
+        "Only use this in trusted network environments."
+    )
+
+# CORS - allow all origins when remote access is enabled, otherwise localhost only
+if ALLOW_REMOTE:
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=["*"],  # Allow all origins for remote access
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+else:
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=[
+            "http://localhost:5173",      # Vite dev server
+            "http://127.0.0.1:5173",
+            "http://localhost:8888",      # Production
+            "http://127.0.0.1:8888",
+        ],
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )


 # ============================================================================
 # Security Middleware
 # ============================================================================

-@app.middleware("http")
-async def require_localhost(request: Request, call_next):
-    """Only allow requests from localhost."""
-    client_host = request.client.host if request.client else None
+if not ALLOW_REMOTE:
+    @app.middleware("http")
+    async def require_localhost(request: Request, call_next):
+        """Only allow requests from localhost (disabled when AUTOFORGE_ALLOW_REMOTE=1)."""
+        client_host = request.client.host if request.client else None

-    # Allow localhost connections
-    if client_host not in ("127.0.0.1", "::1", "localhost", None):
-        raise HTTPException(status_code=403, detail="Localhost access only")
+        # Allow localhost connections
+        if client_host not in ("127.0.0.1", "::1", "localhost", None):
+            raise HTTPException(status_code=403, detail="Localhost access only")

-    return await call_next(request)
+        return await call_next(request)


 # ============================================================================
@@ -208,7 +243,14 @@ if UI_DIST_DIR.exists():
            raise HTTPException(status_code=404)

        # Try to serve the file directly
-        file_path = UI_DIST_DIR / path
+        file_path = (UI_DIST_DIR / path).resolve()
+
+        # Ensure resolved path is within UI_DIST_DIR (prevent path traversal)
+        try:
+            file_path.relative_to(UI_DIST_DIR.resolve())
+        except ValueError:
+            raise HTTPException(status_code=404)
+
        if file_path.exists() and file_path.is_file():
            return FileResponse(file_path)

--- a/server/routers/agent.py
+++ b/server/routers/agent.py
@@ -6,31 +6,22 @@ API endpoints for agent control (start/stop/pause/resume).
 Uses project registry for path lookups.
 """

-import re
 from pathlib import Path

 from fastapi import APIRouter, HTTPException

 from ..schemas import AgentActionResponse, AgentStartRequest, AgentStatus
+from ..services.chat_constants import ROOT_DIR
 from ..services.process_manager import get_manager
+from ..utils.project_helpers import get_project_path as _get_project_path
+from ..utils.validation import validate_project_name


-def _get_project_path(project_name: str) -> Path:
-    """Get project path from registry."""
-    import sys
-    root = Path(__file__).parent.parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
-
-
-def _get_settings_defaults() -> tuple[bool, str, int]:
+def _get_settings_defaults() -> tuple[bool, str, int, bool, int]:
    """Get defaults from global settings.

    Returns:
-        Tuple of (yolo_mode, model, testing_agent_ratio)
+        Tuple of (yolo_mode, model, testing_agent_ratio, playwright_headless, batch_size)
    """
    import sys
    root = Path(__file__).parent.parent.parent
@@ -41,7 +32,7 @@ def _get_settings_defaults() -> tuple[bool, str, int]:

    settings = get_all_settings()
    yolo_mode = (settings.get("yolo_mode") or "false").lower() == "true"
-    model = settings.get("model", DEFAULT_MODEL)
+    model = settings.get("api_model") or settings.get("model", DEFAULT_MODEL)

    # Parse testing agent settings with defaults
    try:
@@ -49,24 +40,18 @@ def _get_settings_defaults() -> tuple[bool, str, int]:
    except (ValueError, TypeError):
        testing_agent_ratio = 1

-    return yolo_mode, model, testing_agent_ratio
+    playwright_headless = (settings.get("playwright_headless") or "true").lower() == "true"
+
+    try:
+        batch_size = int(settings.get("batch_size", "3"))
+    except (ValueError, TypeError):
+        batch_size = 3
+
+    return yolo_mode, model, testing_agent_ratio, playwright_headless, batch_size


 router = APIRouter(prefix="/api/projects/{project_name}/agent", tags=["agent"])

-# Root directory for process manager
-ROOT_DIR = Path(__file__).parent.parent.parent
-
-
-def validate_project_name(name: str) -> str:
-    """Validate and sanitize project name to prevent path traversal."""
-    if not re.match(r'^[a-zA-Z0-9_-]{1,50}$', name):
-        raise HTTPException(
-            status_code=400,
-            detail="Invalid project name"
-        )
-    return name
-

 def get_project_manager(project_name: str):
    """Get the process manager for a project."""
@@ -93,7 +78,7 @@ async def get_agent_status(project_name: str):
    return AgentStatus(
        status=manager.status,
        pid=manager.pid,
-        started_at=manager.started_at,
+        started_at=manager.started_at.isoformat() if manager.started_at else None,
        yolo_mode=manager.yolo_mode,
        model=manager.model,
        parallel_mode=manager.parallel_mode,
@@ -111,18 +96,22 @@ async def start_agent(
    manager = get_project_manager(project_name)

    # Get defaults from global settings if not provided in request
-    default_yolo, default_model, default_testing_ratio = _get_settings_defaults()
+    default_yolo, default_model, default_testing_ratio, playwright_headless, default_batch_size = _get_settings_defaults()

    yolo_mode = request.yolo_mode if request.yolo_mode is not None else default_yolo
    model = request.model if request.model else default_model
    max_concurrency = request.max_concurrency or 1
    testing_agent_ratio = request.testing_agent_ratio if request.testing_agent_ratio is not None else default_testing_ratio

+    batch_size = default_batch_size
+
    success, message = await manager.start(
        yolo_mode=yolo_mode,
        model=model,
        max_concurrency=max_concurrency,
        testing_agent_ratio=testing_agent_ratio,
+        playwright_headless=playwright_headless,
+        batch_size=batch_size,
    )

    # Notify scheduler of manual start (to prevent auto-stop during scheduled window)
@@ -186,3 +175,31 @@ async def resume_agent(project_name: str):
        status=manager.status,
        message=message,
    )
+
+
+@router.post("/graceful-pause", response_model=AgentActionResponse)
+async def graceful_pause_agent(project_name: str):
+    """Request a graceful pause (drain mode) - finish current work then pause."""
+    manager = get_project_manager(project_name)
+
+    success, message = await manager.graceful_pause()
+
+    return AgentActionResponse(
+        success=success,
+        status=manager.status,
+        message=message,
+    )
+
+
+@router.post("/graceful-resume", response_model=AgentActionResponse)
+async def graceful_resume_agent(project_name: str):
+    """Resume from a graceful pause."""
+    manager = get_project_manager(project_name)
+
+    success, message = await manager.graceful_resume()
+
+    return AgentActionResponse(
+        success=success,
+        status=manager.status,
+        message=message,
+    )
--- a/server/routers/assistant_chat.py
+++ b/server/routers/assistant_chat.py
@@ -7,8 +7,6 @@ WebSocket and REST endpoints for the read-only project assistant.

 import json
 import logging
-import re
-from pathlib import Path
 from typing import Optional

 from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
@@ -27,30 +25,13 @@ from ..services.assistant_database import (
    get_conversation,
    get_conversations,
 )
+from ..utils.project_helpers import get_project_path as _get_project_path
+from ..utils.validation import validate_project_name

 logger = logging.getLogger(__name__)

 router = APIRouter(prefix="/api/assistant", tags=["assistant-chat"])

-# Root directory
-ROOT_DIR = Path(__file__).parent.parent.parent
-
-
-def _get_project_path(project_name: str) -> Optional[Path]:
-    """Get project path from registry."""
-    import sys
-    root = Path(__file__).parent.parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
-
-
-def validate_project_name(name: str) -> bool:
-    """Validate project name to prevent path traversal."""
-    return bool(re.match(r'^[a-zA-Z0-9_-]{1,50}$', name))
-

 # ============================================================================
 # Pydantic Models
@@ -145,9 +126,9 @@ async def create_project_conversation(project_name: str):

    conversation = create_conversation(project_dir, project_name)
    return ConversationSummary(
-        id=conversation.id,
-        project_name=conversation.project_name,
-        title=conversation.title,
+        id=int(conversation.id),
+        project_name=str(conversation.project_name),
+        title=str(conversation.title) if conversation.title else None,
        created_at=conversation.created_at.isoformat() if conversation.created_at else None,
        updated_at=conversation.updated_at.isoformat() if conversation.updated_at else None,
        message_count=0,
@@ -226,30 +207,38 @@ async def assistant_chat_websocket(websocket: WebSocket, project_name: str):
    Client -> Server:
    - {"type": "start", "conversation_id": int | null} - Start/resume session
    - {"type": "message", "content": "..."} - Send user message
+    - {"type": "answer", "answers": {...}} - Answer to structured questions
    - {"type": "ping"} - Keep-alive ping

    Server -> Client:
    - {"type": "conversation_created", "conversation_id": int} - New conversation created
    - {"type": "text", "content": "..."} - Text chunk from Claude
    - {"type": "tool_call", "tool": "...", "input": {...}} - Tool being called
+    - {"type": "question", "questions": [...]} - Structured questions for user
    - {"type": "response_done"} - Response complete
    - {"type": "error", "content": "..."} - Error message
    - {"type": "pong"} - Keep-alive pong
    """
-    if not validate_project_name(project_name):
+    # Always accept WebSocket first to avoid opaque 403 errors
+    await websocket.accept()
+
+    try:
+        project_name = validate_project_name(project_name)
+    except HTTPException:
+        await websocket.send_json({"type": "error", "content": "Invalid project name"})
        await websocket.close(code=4000, reason="Invalid project name")
        return

    project_dir = _get_project_path(project_name)
    if not project_dir:
+        await websocket.send_json({"type": "error", "content": "Project not found in registry"})
        await websocket.close(code=4004, reason="Project not found in registry")
        return

    if not project_dir.exists():
+        await websocket.send_json({"type": "error", "content": "Project directory not found"})
        await websocket.close(code=4004, reason="Project directory not found")
        return
-
-    await websocket.accept()
    logger.info(f"Assistant WebSocket connected for project: {project_name}")

    session: Optional[AssistantChatSession] = None
@@ -316,6 +305,34 @@ async def assistant_chat_websocket(websocket: WebSocket, project_name: str):
                    async for chunk in session.send_message(user_content):
                        await websocket.send_json(chunk)

+                elif msg_type == "answer":
+                    # User answered a structured question
+                    if not session:
+                        session = get_session(project_name)
+                        if not session:
+                            await websocket.send_json({
+                                "type": "error",
+                                "content": "No active session. Send 'start' first."
+                            })
+                            continue
+
+                    # Format the answers as a natural response
+                    answers = message.get("answers", {})
+                    if isinstance(answers, dict):
+                        response_parts = []
+                        for question_idx, answer_value in answers.items():
+                            if isinstance(answer_value, list):
+                                response_parts.append(", ".join(answer_value))
+                            else:
+                                response_parts.append(str(answer_value))
+                        user_response = "; ".join(response_parts) if response_parts else "OK"
+                    else:
+                        user_response = str(answers)
+
+                    # Stream Claude's response
+                    async for chunk in session.send_message(user_response):
+                        await websocket.send_json(chunk)
+
                else:
                    await websocket.send_json({
                        "type": "error",
--- a/server/routers/devserver.py
+++ b/server/routers/devserver.py
@@ -6,7 +6,8 @@ API endpoints for dev server control (start/stop) and configuration.
 Uses project registry for path lookups and project_config for command detection.
 """

-import re
+import logging
+import shlex
 import sys
 from pathlib import Path

@@ -26,38 +27,22 @@ from ..services.project_config import (
    get_project_config,
    set_dev_command,
 )
+from ..utils.project_helpers import get_project_path as _get_project_path
+from ..utils.validation import validate_project_name

-# Add root to path for registry import
+# Add root to path for security module import
 _root = Path(__file__).parent.parent.parent
 if str(_root) not in sys.path:
    sys.path.insert(0, str(_root))

-from registry import get_project_path as registry_get_project_path
+from security import extract_commands, get_effective_commands, is_command_allowed

-
-def _get_project_path(project_name: str) -> Path | None:
-    """Get project path from registry."""
-    return registry_get_project_path(project_name)
+logger = logging.getLogger(__name__)


 router = APIRouter(prefix="/api/projects/{project_name}/devserver", tags=["devserver"])


-# ============================================================================
-# Helper Functions
-# ============================================================================
-
-
-def validate_project_name(name: str) -> str:
-    """Validate and sanitize project name to prevent path traversal."""
-    if not re.match(r'^[a-zA-Z0-9_-]{1,50}$', name):
-        raise HTTPException(
-            status_code=400,
-            detail="Invalid project name"
-        )
-    return name
-
-
 def get_project_dir(project_name: str) -> Path:
    """
    Get the validated project directory for a project name.
@@ -88,6 +73,116 @@ def get_project_dir(project_name: str) -> Path:

    return project_dir

+ALLOWED_RUNNERS = {
+    "npm", "pnpm", "yarn", "npx",
+    "uvicorn", "python", "python3",
+    "flask", "poetry",
+    "cargo", "go",
+}
+
+ALLOWED_NPM_SCRIPTS = {"dev", "start", "serve", "develop", "server", "preview"}
+
+# Allowed Python -m modules for dev servers
+ALLOWED_PYTHON_MODULES = {"uvicorn", "flask", "gunicorn", "http.server"}
+
+BLOCKED_SHELLS = {"sh", "bash", "zsh", "cmd", "powershell", "pwsh", "cmd.exe"}
+
+
+def validate_custom_command_strict(cmd: str) -> None:
+    """
+    Strict allowlist validation for dev server commands.
+    Prevents arbitrary command execution (no sh -c, no cmd /c, no python -c, etc.)
+    """
+    if not isinstance(cmd, str) or not cmd.strip():
+        raise ValueError("custom_command cannot be empty")
+
+    argv = shlex.split(cmd, posix=(sys.platform != "win32"))
+    if not argv:
+        raise ValueError("custom_command could not be parsed")
+
+    base = Path(argv[0]).name.lower()
+
+    # Block direct shells / interpreters commonly used for command injection
+    if base in BLOCKED_SHELLS:
+        raise ValueError(f"custom_command runner not allowed: {base}")
+
+    if base not in ALLOWED_RUNNERS:
+        raise ValueError(
+            f"custom_command runner not allowed: {base}. "
+            f"Allowed: {', '.join(sorted(ALLOWED_RUNNERS))}"
+        )
+
+    # Block one-liner execution for python
+    lowered = [a.lower() for a in argv]
+    if base in {"python", "python3"}:
+        if "-c" in lowered:
+            raise ValueError("python -c is not allowed")
+        if len(argv) >= 3 and argv[1] == "-m":
+            # Allow: python -m <allowed_module> ...
+            if argv[2] not in ALLOWED_PYTHON_MODULES:
+                raise ValueError(
+                    f"python -m {argv[2]} is not allowed. "
+                    f"Allowed modules: {', '.join(sorted(ALLOWED_PYTHON_MODULES))}"
+                )
+        elif len(argv) >= 2 and argv[1].endswith(".py"):
+            # Allow: python manage.py runserver, python app.py, etc.
+            pass
+        else:
+            raise ValueError(
+                "Python commands must use 'python -m <module> ...' or 'python <script>.py ...'"
+            )
+
+    if base == "flask":
+        # Allow: flask run [--host ...] [--port ...]
+        if len(argv) < 2 or argv[1] != "run":
+            raise ValueError("flask custom_command must be 'flask run [options]'")
+
+    if base == "poetry":
+        # Allow: poetry run <subcmd> ...
+        if len(argv) < 3 or argv[1] != "run":
+            raise ValueError("poetry custom_command must be 'poetry run <command> ...'")
+
+    if base == "uvicorn":
+        if len(argv) < 2 or ":" not in argv[1]:
+            raise ValueError("uvicorn must specify an app like module:app")
+
+        allowed_flags = {"--host", "--port", "--reload", "--log-level", "--workers"}
+        for a in argv[2:]:
+            if a.startswith("-"):
+                # Handle --flag=value syntax
+                flag_key = a.split("=", 1)[0]
+                if flag_key not in allowed_flags:
+                    raise ValueError(f"uvicorn flag not allowed: {flag_key}")
+
+    if base in {"npm", "pnpm", "yarn"}:
+        # Allow only known safe scripts (no arbitrary exec)
+        if base == "npm":
+            if len(argv) < 3 or argv[1] != "run" or argv[2] not in ALLOWED_NPM_SCRIPTS:
+                raise ValueError(
+                    f"npm custom_command must be 'npm run <script>' where script is one of: "
+                    f"{', '.join(sorted(ALLOWED_NPM_SCRIPTS))}"
+                )
+        elif base == "pnpm":
+            ok = (
+                (len(argv) >= 2 and argv[1] in ALLOWED_NPM_SCRIPTS)
+                or (len(argv) >= 3 and argv[1] == "run" and argv[2] in ALLOWED_NPM_SCRIPTS)
+            )
+            if not ok:
+                raise ValueError(
+                    f"pnpm custom_command must use a known script: "
+                    f"{', '.join(sorted(ALLOWED_NPM_SCRIPTS))}"
+                )
+        elif base == "yarn":
+            ok = (
+                (len(argv) >= 2 and argv[1] in ALLOWED_NPM_SCRIPTS)
+                or (len(argv) >= 3 and argv[1] == "run" and argv[2] in ALLOWED_NPM_SCRIPTS)
+            )
+            if not ok:
+                raise ValueError(
+                    f"yarn custom_command must use a known script: "
+                    f"{', '.join(sorted(ALLOWED_NPM_SCRIPTS))}"
+                )
+

 def get_project_devserver_manager(project_name: str):
    """
@@ -106,6 +201,45 @@ def get_project_devserver_manager(project_name: str):
    return get_devserver_manager(project_name, project_dir)


+def validate_dev_command(command: str, project_dir: Path) -> None:
+    """
+    Validate a dev server command against the security allowlist.
+
+    Extracts all commands from the shell string and checks each against
+    the effective allowlist (global + org + project). Raises HTTPException
+    if any command is blocked or not allowed.
+
+    Args:
+        command: The shell command string to validate
+        project_dir: Project directory for loading project-level allowlists
+
+    Raises:
+        HTTPException 400: If the command fails validation
+    """
+    commands = extract_commands(command)
+    if not commands:
+        raise HTTPException(
+            status_code=400,
+            detail="Could not parse command for security validation"
+        )
+
+    allowed_commands, blocked_commands = get_effective_commands(project_dir)
+
+    for cmd in commands:
+        if cmd in blocked_commands:
+            logger.warning("Blocked dev server command '%s' (in blocklist) for project dir %s", cmd, project_dir)
+            raise HTTPException(
+                status_code=400,
+                detail=f"Command '{cmd}' is blocked and cannot be used as a dev server command"
+            )
+        if not is_command_allowed(cmd, allowed_commands):
+            logger.warning("Rejected dev server command '%s' (not in allowlist) for project dir %s", cmd, project_dir)
+            raise HTTPException(
+                status_code=400,
+                detail=f"Command '{cmd}' is not in the allowed commands list"
+            )
+
+
 # ============================================================================
 # Endpoints
 # ============================================================================
@@ -129,7 +263,7 @@ async def get_devserver_status(project_name: str) -> DevServerStatus:
        pid=manager.pid,
        url=manager.detected_url,
        command=manager._command,
-        started_at=manager.started_at,
+        started_at=manager.started_at.isoformat() if manager.started_at else None,
    )


@@ -157,9 +291,12 @@ async def start_devserver(
    # Determine which command to use
    command: str | None
    if request.command:
-        command = request.command
-    else:
-        command = get_dev_command(project_dir)
+        raise HTTPException(
+            status_code=400,
+            detail="Direct command execution is disabled. Use /config to set a safe custom_command."
+        )
+
+    command = get_dev_command(project_dir)

    if not command:
        raise HTTPException(
@@ -167,7 +304,17 @@ async def start_devserver(
            detail="No dev command available. Configure a custom command or ensure project type can be detected."
        )

-    # Now command is definitely str
+    # Validate command against security allowlist before execution
+    validate_dev_command(command, project_dir)
+
+    # Defense-in-depth: also run strict structural validation at execution time
+    # (catches config file tampering that bypasses the /config endpoint)
+    try:
+        validate_custom_command_strict(command)
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    # Now command is definitely str and validated
    success, message = await manager.start(command)

    return DevServerActionResponse(
@@ -258,6 +405,15 @@ async def update_devserver_config(
        except ValueError as e:
            raise HTTPException(status_code=400, detail=str(e))
    else:
+        # Strict structural validation first (most specific errors)
+        try:
+            validate_custom_command_strict(update.custom_command)
+        except ValueError as e:
+            raise HTTPException(status_code=400, detail=str(e))
+
+        # Then validate against security allowlist
+        validate_dev_command(update.custom_command, project_dir)
+
        # Set the custom command
        try:
            set_dev_command(project_dir, update.custom_command)
--- a/server/routers/expand_project.py
+++ b/server/routers/expand_project.py
@@ -8,7 +8,6 @@ Allows adding multiple features to existing projects via natural language.

 import json
 import logging
-from pathlib import Path
 from typing import Optional

 from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
@@ -22,27 +21,13 @@ from ..services.expand_chat_session import (
    list_expand_sessions,
    remove_expand_session,
 )
+from ..utils.project_helpers import get_project_path as _get_project_path
 from ..utils.validation import validate_project_name

 logger = logging.getLogger(__name__)

 router = APIRouter(prefix="/api/expand", tags=["expand-project"])

-# Root directory
-ROOT_DIR = Path(__file__).parent.parent.parent
-
-
-def _get_project_path(project_name: str) -> Path:
-    """Get project path from registry."""
-    import sys
-    root = Path(__file__).parent.parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
-
-


 # ============================================================================
@@ -119,30 +104,37 @@ async def expand_project_websocket(websocket: WebSocket, project_name: str):
    - {"type": "error", "content": "..."} - Error message
    - {"type": "pong"} - Keep-alive pong
    """
+    # Always accept the WebSocket first to avoid opaque 403 errors.
+    # Starlette returns 403 if we close before accepting.
+    await websocket.accept()
+
    try:
        project_name = validate_project_name(project_name)
    except HTTPException:
+        await websocket.send_json({"type": "error", "content": "Invalid project name"})
        await websocket.close(code=4000, reason="Invalid project name")
        return

    # Look up project directory from registry
    project_dir = _get_project_path(project_name)
    if not project_dir:
+        await websocket.send_json({"type": "error", "content": "Project not found in registry"})
        await websocket.close(code=4004, reason="Project not found in registry")
        return

    if not project_dir.exists():
+        await websocket.send_json({"type": "error", "content": "Project directory not found"})
        await websocket.close(code=4004, reason="Project directory not found")
        return

    # Verify project has app_spec.txt
-    spec_path = project_dir / "prompts" / "app_spec.txt"
+    from autoforge_paths import get_prompts_dir
+    spec_path = get_prompts_dir(project_dir) / "app_spec.txt"
    if not spec_path.exists():
+        await websocket.send_json({"type": "error", "content": "Project has no spec. Create a spec first before expanding."})
        await websocket.close(code=4004, reason="Project has no spec. Create spec first.")
        return

-    await websocket.accept()
-
    session: Optional[ExpandChatSession] = None

    try:
--- a/server/routers/features.py
+++ b/server/routers/features.py
@@ -8,10 +8,12 @@ API endpoints for feature/test case management.
 import logging
 from contextlib import contextmanager
 from pathlib import Path
+from typing import Literal

 from fastapi import APIRouter, HTTPException

 from ..schemas import (
+    DependencyGraphEdge,
    DependencyGraphNode,
    DependencyGraphResponse,
    DependencyUpdate,
@@ -21,7 +23,9 @@ from ..schemas import (
    FeatureListResponse,
    FeatureResponse,
    FeatureUpdate,
+    HumanInputResponse,
 )
+from ..utils.project_helpers import get_project_path as _get_project_path
 from ..utils.validation import validate_project_name

 # Lazy imports to avoid circular dependencies
@@ -31,17 +35,6 @@ _Feature = None
 logger = logging.getLogger(__name__)


-def _get_project_path(project_name: str) -> Path:
-    """Get project path from registry."""
-    import sys
-    root = Path(__file__).parent.parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
-
-
 def _get_db_classes():
    """Lazy import of database classes."""
    global _create_database, _Feature
@@ -71,6 +64,9 @@ def get_db_session(project_dir: Path):
    session = SessionLocal()
    try:
        yield session
+    except Exception:
+        session.rollback()
+        raise
    finally:
        session.close()

@@ -109,6 +105,9 @@ def feature_to_response(f, passing_ids: set[int] | None = None) -> FeatureRespon
        in_progress=f.in_progress if f.in_progress is not None else False,
        blocked=blocked,
        blocking_dependencies=blocking,
+        needs_human_input=getattr(f, 'needs_human_input', False) or False,
+        human_input_request=getattr(f, 'human_input_request', None),
+        human_input_response=getattr(f, 'human_input_response', None),
    )


@@ -131,7 +130,8 @@ async def list_features(project_name: str):
    if not project_dir.exists():
        raise HTTPException(status_code=404, detail="Project directory not found")

-    db_file = project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
        return FeatureListResponse(pending=[], in_progress=[], done=[])

@@ -147,11 +147,14 @@ async def list_features(project_name: str):
            pending = []
            in_progress = []
            done = []
+            needs_human_input_list = []

            for f in all_features:
                feature_response = feature_to_response(f, passing_ids)
                if f.passes:
                    done.append(feature_response)
+                elif getattr(f, 'needs_human_input', False):
+                    needs_human_input_list.append(feature_response)
                elif f.in_progress:
                    in_progress.append(feature_response)
                else:
@@ -161,6 +164,7 @@ async def list_features(project_name: str):
                pending=pending,
                in_progress=in_progress,
                done=done,
+                needs_human_input=needs_human_input_list,
            )
    except HTTPException:
        raise
@@ -326,7 +330,8 @@ async def get_dependency_graph(project_name: str):
    if not project_dir.exists():
        raise HTTPException(status_code=404, detail="Project directory not found")

-    db_file = project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
        return DependencyGraphResponse(nodes=[], edges=[])

@@ -344,8 +349,11 @@ async def get_dependency_graph(project_name: str):
                deps = f.dependencies or []
                blocking = [d for d in deps if d not in passing_ids]

+                status: Literal["pending", "in_progress", "done", "blocked", "needs_human_input"]
                if f.passes:
                    status = "done"
+                elif getattr(f, 'needs_human_input', False):
+                    status = "needs_human_input"
                elif blocking:
                    status = "blocked"
                elif f.in_progress:
@@ -363,7 +371,7 @@ async def get_dependency_graph(project_name: str):
                ))

                for dep_id in deps:
-                    edges.append({"source": dep_id, "target": f.id})
+                    edges.append(DependencyGraphEdge(source=dep_id, target=f.id))

            return DependencyGraphResponse(nodes=nodes, edges=edges)
    except HTTPException:
@@ -390,7 +398,8 @@ async def get_feature(project_name: str, feature_id: int):
    if not project_dir.exists():
        raise HTTPException(status_code=404, detail="Project directory not found")

-    db_file = project_dir / "features.db"
+    from autoforge_paths import get_features_db_path
+    db_file = get_features_db_path(project_dir)
    if not db_file.exists():
        raise HTTPException(status_code=404, detail="No features database found")

@@ -551,9 +560,9 @@ async def skip_feature(project_name: str, feature_id: int):
            if not feature:
                raise HTTPException(status_code=404, detail=f"Feature {feature_id} not found")

-            # Set priority to max + 1000 to push to end
+            # Set priority to max + 1 to push to end (consistent with MCP server)
            max_priority = session.query(Feature).order_by(Feature.priority.desc()).first()
-            feature.priority = (max_priority.priority if max_priority else 0) + 1000
+            feature.priority = (max_priority.priority + 1) if max_priority else 1

            session.commit()

@@ -565,6 +574,71 @@ async def skip_feature(project_name: str, feature_id: int):
        raise HTTPException(status_code=500, detail="Failed to skip feature")


+@router.post("/{feature_id}/resolve-human-input", response_model=FeatureResponse)
+async def resolve_human_input(project_name: str, feature_id: int, response: HumanInputResponse):
+    """Resolve a human input request for a feature.
+
+    Validates all required fields have values, stores the response,
+    and returns the feature to the pending queue for agents to pick up.
+    """
+    project_name = validate_project_name(project_name)
+    project_dir = _get_project_path(project_name)
+
+    if not project_dir:
+        raise HTTPException(status_code=404, detail=f"Project '{project_name}' not found in registry")
+
+    if not project_dir.exists():
+        raise HTTPException(status_code=404, detail="Project directory not found")
+
+    _, Feature = _get_db_classes()
+
+    try:
+        with get_db_session(project_dir) as session:
+            feature = session.query(Feature).filter(Feature.id == feature_id).first()
+
+            if not feature:
+                raise HTTPException(status_code=404, detail=f"Feature {feature_id} not found")
+
+            if not getattr(feature, 'needs_human_input', False):
+                raise HTTPException(status_code=400, detail="Feature is not waiting for human input")
+
+            # Validate required fields
+            request_data = feature.human_input_request
+            if request_data and isinstance(request_data, dict):
+                for field_def in request_data.get("fields", []):
+                    if field_def.get("required", True):
+                        field_id = field_def.get("id")
+                        if field_id not in response.fields or response.fields[field_id] in (None, ""):
+                            raise HTTPException(
+                                status_code=400,
+                                detail=f"Required field '{field_def.get('label', field_id)}' is missing"
+                            )
+
+            # Store response and return to pending queue
+            from datetime import datetime, timezone
+            response_data = {
+                "fields": {k: v for k, v in response.fields.items()},
+                "responded_at": datetime.now(timezone.utc).isoformat(),
+            }
+            feature.human_input_response = response_data
+            feature.needs_human_input = False
+            # Keep in_progress=False, passes=False so it returns to pending
+
+            session.commit()
+            session.refresh(feature)
+
+            # Compute passing IDs for response
+            all_features = session.query(Feature).all()
+            passing_ids = {f.id for f in all_features if f.passes}
+
+            return feature_to_response(feature, passing_ids)
+    except HTTPException:
+        raise
+    except Exception:
+        logger.exception("Failed to resolve human input")
+        raise HTTPException(status_code=500, detail="Failed to resolve human input")
+
+
 # ============================================================================
 # Dependency Management Endpoints
 # ============================================================================
--- a/server/routers/filesystem.py
+++ b/server/routers/filesystem.py
@@ -6,6 +6,7 @@ API endpoints for browsing the filesystem for project folder selection.
 Provides cross-platform support for Windows, macOS, and Linux.
 """

+import functools
 import logging
 import os
 import re
@@ -14,6 +15,8 @@ from pathlib import Path

 from fastapi import APIRouter, HTTPException, Query

+from security import SENSITIVE_DIRECTORIES
+
 # Module logger
 logger = logging.getLogger(__name__)

@@ -77,17 +80,10 @@ LINUX_BLOCKED = {
    "/opt",
 }

-# Universal blocked paths (relative to home directory)
-UNIVERSAL_BLOCKED_RELATIVE = {
-    ".ssh",
-    ".aws",
-    ".gnupg",
-    ".config/gh",
-    ".netrc",
-    ".docker",
-    ".kube",
-    ".terraform",
-}
+# Universal blocked paths (relative to home directory).
+# Delegates to the canonical SENSITIVE_DIRECTORIES set in security.py so that
+# the filesystem browser and the EXTRA_READ_PATHS validator share one source of truth.
+UNIVERSAL_BLOCKED_RELATIVE = SENSITIVE_DIRECTORIES

 # Patterns for files that should not be shown
 HIDDEN_PATTERNS = [
@@ -99,8 +95,14 @@ HIDDEN_PATTERNS = [
 ]


-def get_blocked_paths() -> set[Path]:
-    """Get the set of blocked paths for the current platform."""
+@functools.lru_cache(maxsize=1)
+def get_blocked_paths() -> frozenset[Path]:
+    """
+    Get the set of blocked paths for the current platform.
+
+    Cached because the platform and home directory do not change at runtime,
+    and this function is called once per directory entry in list_directory().
+    """
    home = Path.home()
    blocked = set()

@@ -119,7 +121,7 @@ def get_blocked_paths() -> set[Path]:
    for rel in UNIVERSAL_BLOCKED_RELATIVE:
        blocked.add((home / rel).resolve())

-    return blocked
+    return frozenset(blocked)


 def is_path_blocked(path: Path) -> bool:
--- a/server/routers/projects.py
+++ b/server/routers/projects.py
@@ -10,6 +10,7 @@ import re
 import shutil
 import sys
 from pathlib import Path
+from typing import Any, Callable

 from fastapi import APIRouter, HTTPException

@@ -18,16 +19,18 @@ from ..schemas import (
    ProjectDetail,
    ProjectPrompts,
    ProjectPromptsUpdate,
+    ProjectSettingsUpdate,
    ProjectStats,
    ProjectSummary,
 )

 # Lazy imports to avoid circular dependencies
+# These are initialized by _init_imports() before first use.
 _imports_initialized = False
-_check_spec_exists = None
-_scaffold_project_prompts = None
-_get_project_prompts_dir = None
-_count_passing_tests = None
+_check_spec_exists: Callable[..., Any] | None = None
+_scaffold_project_prompts: Callable[..., Any] | None = None
+_get_project_prompts_dir: Callable[..., Any] | None = None
+_count_passing_tests: Callable[..., Any] | None = None


 def _init_imports():
@@ -63,13 +66,23 @@ def _get_registry_functions():
        sys.path.insert(0, str(root))

    from registry import (
+        get_project_concurrency,
        get_project_path,
        list_registered_projects,
        register_project,
+        set_project_concurrency,
        unregister_project,
        validate_project_path,
    )
-    return register_project, unregister_project, get_project_path, list_registered_projects, validate_project_path
+    return (
+        register_project,
+        unregister_project,
+        get_project_path,
+        list_registered_projects,
+        validate_project_path,
+        get_project_concurrency,
+        set_project_concurrency,
+    )


 router = APIRouter(prefix="/api/projects", tags=["projects"])
@@ -88,7 +101,8 @@ def validate_project_name(name: str) -> str:
 def get_project_stats(project_dir: Path) -> ProjectStats:
    """Get statistics for a project."""
    _init_imports()
-    passing, in_progress, total = _count_passing_tests(project_dir)
+    assert _count_passing_tests is not None  # guaranteed by _init_imports()
+    passing, in_progress, total, _needs_human_input = _count_passing_tests(project_dir)
    percentage = (passing / total * 100) if total > 0 else 0.0
    return ProjectStats(
        passing=passing,
@@ -102,7 +116,9 @@ def get_project_stats(project_dir: Path) -> ProjectStats:
 async def list_projects():
    """List all registered projects."""
    _init_imports()
-    _, _, _, list_registered_projects, validate_project_path = _get_registry_functions()
+    assert _check_spec_exists is not None  # guaranteed by _init_imports()
+    (_, _, _, list_registered_projects, validate_project_path,
+     get_project_concurrency, _) = _get_registry_functions()

    projects = list_registered_projects()
    result = []
@@ -123,6 +139,7 @@ async def list_projects():
            path=info["path"],
            has_spec=has_spec,
            stats=stats,
+            default_concurrency=info.get("default_concurrency", 3),
        ))

    return result
@@ -132,7 +149,9 @@ async def list_projects():
 async def create_project(project: ProjectCreate):
    """Create a new project at the specified path."""
    _init_imports()
-    register_project, _, get_project_path, list_registered_projects, _ = _get_registry_functions()
+    assert _scaffold_project_prompts is not None  # guaranteed by _init_imports()
+    (register_project, _, get_project_path, list_registered_projects,
+     _, _, _) = _get_registry_functions()

    name = validate_project_name(project.name)
    project_path = Path(project.path).resolve()
@@ -203,6 +222,7 @@ async def create_project(project: ProjectCreate):
        path=project_path.as_posix(),
        has_spec=False,  # Just created, no spec yet
        stats=ProjectStats(passing=0, total=0, percentage=0.0),
+        default_concurrency=3,
    )


@@ -210,7 +230,9 @@ async def create_project(project: ProjectCreate):
 async def get_project(name: str):
    """Get detailed information about a project."""
    _init_imports()
-    _, _, get_project_path, _, _ = _get_registry_functions()
+    assert _check_spec_exists is not None  # guaranteed by _init_imports()
+    assert _get_project_prompts_dir is not None  # guaranteed by _init_imports()
+    (_, _, get_project_path, _, _, get_project_concurrency, _) = _get_registry_functions()

    name = validate_project_name(name)
    project_dir = get_project_path(name)
@@ -231,6 +253,7 @@ async def get_project(name: str):
        has_spec=has_spec,
        stats=stats,
        prompts_dir=str(prompts_dir),
+        default_concurrency=get_project_concurrency(name),
    )


@@ -244,7 +267,7 @@ async def delete_project(name: str, delete_files: bool = False):
        delete_files: If True, also delete the project directory and files
    """
    _init_imports()
-    _, unregister_project, get_project_path, _, _ = _get_registry_functions()
+    (_, unregister_project, get_project_path, _, _, _, _) = _get_registry_functions()

    name = validate_project_name(name)
    project_dir = get_project_path(name)
@@ -253,8 +276,8 @@ async def delete_project(name: str, delete_files: bool = False):
        raise HTTPException(status_code=404, detail=f"Project '{name}' not found")

    # Check if agent is running
-    lock_file = project_dir / ".agent.lock"
-    if lock_file.exists():
+    from autoforge_paths import has_agent_running
+    if has_agent_running(project_dir):
        raise HTTPException(
            status_code=409,
            detail="Cannot delete project while agent is running. Stop the agent first."
@@ -280,7 +303,8 @@ async def delete_project(name: str, delete_files: bool = False):
 async def get_project_prompts(name: str):
    """Get the content of project prompt files."""
    _init_imports()
-    _, _, get_project_path, _, _ = _get_registry_functions()
+    assert _get_project_prompts_dir is not None  # guaranteed by _init_imports()
+    (_, _, get_project_path, _, _, _, _) = _get_registry_functions()

    name = validate_project_name(name)
    project_dir = get_project_path(name)
@@ -291,7 +315,7 @@ async def get_project_prompts(name: str):
    if not project_dir.exists():
        raise HTTPException(status_code=404, detail="Project directory not found")

-    prompts_dir = _get_project_prompts_dir(project_dir)
+    prompts_dir: Path = _get_project_prompts_dir(project_dir)

    def read_file(filename: str) -> str:
        filepath = prompts_dir / filename
@@ -313,7 +337,8 @@ async def get_project_prompts(name: str):
 async def update_project_prompts(name: str, prompts: ProjectPromptsUpdate):
    """Update project prompt files."""
    _init_imports()
-    _, _, get_project_path, _, _ = _get_registry_functions()
+    assert _get_project_prompts_dir is not None  # guaranteed by _init_imports()
+    (_, _, get_project_path, _, _, _, _) = _get_registry_functions()

    name = validate_project_name(name)
    project_dir = get_project_path(name)
@@ -343,7 +368,7 @@ async def update_project_prompts(name: str, prompts: ProjectPromptsUpdate):
 async def get_project_stats_endpoint(name: str):
    """Get current progress statistics for a project."""
    _init_imports()
-    _, _, get_project_path, _, _ = _get_registry_functions()
+    (_, _, get_project_path, _, _, _, _) = _get_registry_functions()

    name = validate_project_name(name)
    project_dir = get_project_path(name)
@@ -355,3 +380,145 @@ async def get_project_stats_endpoint(name: str):
        raise HTTPException(status_code=404, detail="Project directory not found")

    return get_project_stats(project_dir)
+
+
+@router.post("/{name}/reset")
+async def reset_project(name: str, full_reset: bool = False):
+    """
+    Reset a project to its initial state.
+
+    Args:
+        name: Project name to reset
+        full_reset: If True, also delete prompts/ directory (triggers setup wizard)
+
+    Returns:
+        Dictionary with list of deleted files and reset type
+    """
+    _init_imports()
+    (_, _, get_project_path, _, _, _, _) = _get_registry_functions()
+
+    name = validate_project_name(name)
+    project_dir = get_project_path(name)
+
+    if not project_dir:
+        raise HTTPException(status_code=404, detail=f"Project '{name}' not found")
+
+    if not project_dir.exists():
+        raise HTTPException(status_code=404, detail="Project directory not found")
+
+    # Check if agent is running
+    from autoforge_paths import has_agent_running
+    if has_agent_running(project_dir):
+        raise HTTPException(
+            status_code=409,
+            detail="Cannot reset project while agent is running. Stop the agent first."
+        )
+
+    # Dispose of database engines to release file locks (required on Windows)
+    # Import here to avoid circular imports
+    from api.database import dispose_engine as dispose_features_engine
+    from server.services.assistant_database import dispose_engine as dispose_assistant_engine
+
+    dispose_features_engine(project_dir)
+    dispose_assistant_engine(project_dir)
+
+    deleted_files: list[str] = []
+
+    from autoforge_paths import (
+        get_assistant_db_path,
+        get_claude_assistant_settings_path,
+        get_claude_settings_path,
+        get_features_db_path,
+    )
+
+    # Build list of files to delete using path helpers (finds files at current location)
+    # Plus explicit old-location fallbacks for backward compatibility
+    db_path = get_features_db_path(project_dir)
+    asst_path = get_assistant_db_path(project_dir)
+    reset_files: list[Path] = [
+        db_path,
+        db_path.with_suffix(".db-wal"),
+        db_path.with_suffix(".db-shm"),
+        asst_path,
+        asst_path.with_suffix(".db-wal"),
+        asst_path.with_suffix(".db-shm"),
+        get_claude_settings_path(project_dir),
+        get_claude_assistant_settings_path(project_dir),
+        # Also clean old root-level locations if they exist
+        project_dir / "features.db",
+        project_dir / "features.db-wal",
+        project_dir / "features.db-shm",
+        project_dir / "assistant.db",
+        project_dir / "assistant.db-wal",
+        project_dir / "assistant.db-shm",
+        project_dir / ".claude_settings.json",
+        project_dir / ".claude_assistant_settings.json",
+    ]
+
+    for file_path in reset_files:
+        if file_path.exists():
+            try:
+                relative = file_path.relative_to(project_dir)
+                file_path.unlink()
+                deleted_files.append(str(relative))
+            except Exception as e:
+                raise HTTPException(status_code=500, detail=f"Failed to delete {file_path.name}: {e}")
+
+    # Full reset: also delete prompts directory
+    if full_reset:
+        from autoforge_paths import get_prompts_dir
+        # Delete prompts from both possible locations
+        for prompts_dir in [get_prompts_dir(project_dir), project_dir / "prompts"]:
+            if prompts_dir.exists():
+                try:
+                    relative = prompts_dir.relative_to(project_dir)
+                    shutil.rmtree(prompts_dir)
+                    deleted_files.append(f"{relative}/")
+                except Exception as e:
+                    raise HTTPException(status_code=500, detail=f"Failed to delete prompts: {e}")
+
+    return {
+        "success": True,
+        "reset_type": "full" if full_reset else "quick",
+        "deleted_files": deleted_files,
+        "message": f"Project '{name}' has been reset" + (" (full reset)" if full_reset else " (quick reset)")
+    }
+
+
+@router.patch("/{name}/settings", response_model=ProjectDetail)
+async def update_project_settings(name: str, settings: ProjectSettingsUpdate):
+    """Update project-level settings (concurrency, etc.)."""
+    _init_imports()
+    assert _check_spec_exists is not None  # guaranteed by _init_imports()
+    assert _get_project_prompts_dir is not None  # guaranteed by _init_imports()
+    (_, _, get_project_path, _, _, get_project_concurrency,
+     set_project_concurrency) = _get_registry_functions()
+
+    name = validate_project_name(name)
+    project_dir = get_project_path(name)
+
+    if not project_dir:
+        raise HTTPException(status_code=404, detail=f"Project '{name}' not found")
+
+    if not project_dir.exists():
+        raise HTTPException(status_code=404, detail="Project directory not found")
+
+    # Update concurrency if provided
+    if settings.default_concurrency is not None:
+        success = set_project_concurrency(name, settings.default_concurrency)
+        if not success:
+            raise HTTPException(status_code=500, detail="Failed to update concurrency")
+
+    # Return updated project details
+    has_spec = _check_spec_exists(project_dir)
+    stats = get_project_stats(project_dir)
+    prompts_dir = _get_project_prompts_dir(project_dir)
+
+    return ProjectDetail(
+        name=name,
+        path=project_dir.as_posix(),
+        has_spec=has_spec,
+        stats=stats,
+        prompts_dir=str(prompts_dir),
+        default_concurrency=get_project_concurrency(name),
+    )
--- a/server/routers/schedules.py
+++ b/server/routers/schedules.py
@@ -6,12 +6,10 @@ API endpoints for managing agent schedules.
 Provides CRUD operations for time-based schedule configuration.
 """

-import re
-import sys
 from contextlib import contextmanager
 from datetime import datetime, timedelta, timezone
 from pathlib import Path
-from typing import Generator, Tuple
+from typing import TYPE_CHECKING, Generator, Tuple

 from fastapi import APIRouter, HTTPException
 from sqlalchemy.orm import Session
@@ -26,17 +24,21 @@ from ..schemas import (
    ScheduleResponse,
    ScheduleUpdate,
 )
+from ..utils.project_helpers import get_project_path as _get_project_path
+from ..utils.validation import validate_project_name
+
+if TYPE_CHECKING:
+    from api.database import Schedule as ScheduleModel


-def _get_project_path(project_name: str) -> Path:
-    """Get project path from registry."""
-    root = Path(__file__).parent.parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
+def _schedule_to_response(schedule: "ScheduleModel") -> ScheduleResponse:
+    """Convert a Schedule ORM object to a ScheduleResponse Pydantic model.

+    SQLAlchemy Column descriptors resolve to Python types at instance access time,
+    but mypy sees the Column[T] descriptor type. Using model_validate with
+    from_attributes handles this conversion correctly.
+    """
+    return ScheduleResponse.model_validate(schedule, from_attributes=True)

 router = APIRouter(
    prefix="/api/projects/{project_name}/schedules",
@@ -44,16 +46,6 @@ router = APIRouter(
 )


-def validate_project_name(name: str) -> str:
-    """Validate and sanitize project name to prevent path traversal."""
-    if not re.match(r'^[a-zA-Z0-9_-]{1,50}$', name):
-        raise HTTPException(
-            status_code=400,
-            detail="Invalid project name"
-        )
-    return name
-
-
@contextmanager
 def _get_db_session(project_name: str) -> Generator[Tuple[Session, Path], None, None]:
    """Get database session for a project as a context manager.
@@ -84,6 +76,9 @@ def _get_db_session(project_name: str) -> Generator[Tuple[Session, Path], None,
    db = SessionLocal()
    try:
        yield db, project_path
+    except Exception:
+        db.rollback()
+        raise
    finally:
        db.close()

@@ -99,21 +94,7 @@ async def list_schedules(project_name: str):
        ).order_by(Schedule.start_time).all()

        return ScheduleListResponse(
-            schedules=[
-                ScheduleResponse(
-                    id=s.id,
-                    project_name=s.project_name,
-                    start_time=s.start_time,
-                    duration_minutes=s.duration_minutes,
-                    days_of_week=s.days_of_week,
-                    enabled=s.enabled,
-                    yolo_mode=s.yolo_mode,
-                    model=s.model,
-                    crash_count=s.crash_count,
-                    created_at=s.created_at,
-                )
-                for s in schedules
-            ]
+            schedules=[_schedule_to_response(s) for s in schedules]
        )


@@ -187,18 +168,7 @@ async def create_schedule(project_name: str, data: ScheduleCreate):
                    except Exception as e:
                        logger.error(f"Failed to start agent for schedule {schedule.id}: {e}", exc_info=True)

-        return ScheduleResponse(
-            id=schedule.id,
-            project_name=schedule.project_name,
-            start_time=schedule.start_time,
-            duration_minutes=schedule.duration_minutes,
-            days_of_week=schedule.days_of_week,
-            enabled=schedule.enabled,
-            yolo_mode=schedule.yolo_mode,
-            model=schedule.model,
-            crash_count=schedule.crash_count,
-            created_at=schedule.created_at,
-        )
+        return _schedule_to_response(schedule)


@router.get("/next", response_model=NextRunResponse)
@@ -277,18 +247,7 @@ async def get_schedule(project_name: str, schedule_id: int):
        if not schedule:
            raise HTTPException(status_code=404, detail="Schedule not found")

-        return ScheduleResponse(
-            id=schedule.id,
-            project_name=schedule.project_name,
-            start_time=schedule.start_time,
-            duration_minutes=schedule.duration_minutes,
-            days_of_week=schedule.days_of_week,
-            enabled=schedule.enabled,
-            yolo_mode=schedule.yolo_mode,
-            model=schedule.model,
-            crash_count=schedule.crash_count,
-            created_at=schedule.created_at,
-        )
+        return _schedule_to_response(schedule)


@router.patch("/{schedule_id}", response_model=ScheduleResponse)
@@ -331,18 +290,7 @@ async def update_schedule(
            # Was enabled, now disabled - remove jobs
            scheduler.remove_schedule(schedule_id)

-        return ScheduleResponse(
-            id=schedule.id,
-            project_name=schedule.project_name,
-            start_time=schedule.start_time,
-            duration_minutes=schedule.duration_minutes,
-            days_of_week=schedule.days_of_week,
-            enabled=schedule.enabled,
-            yolo_mode=schedule.yolo_mode,
-            model=schedule.model,
-            crash_count=schedule.crash_count,
-            created_at=schedule.created_at,
-        )
+        return _schedule_to_response(schedule)


@router.delete("/{schedule_id}", status_code=204)
--- a/server/routers/settings.py
+++ b/server/routers/settings.py
@@ -7,26 +7,26 @@ Settings are stored in the registry database and shared across all projects.
 """

 import mimetypes
-import os
 import sys
-from pathlib import Path

 from fastapi import APIRouter

-from ..schemas import ModelInfo, ModelsResponse, SettingsResponse, SettingsUpdate
+from ..schemas import ModelInfo, ModelsResponse, ProviderInfo, ProvidersResponse, SettingsResponse, SettingsUpdate
+from ..services.chat_constants import ROOT_DIR

 # Mimetype fix for Windows - must run before StaticFiles is mounted
 mimetypes.add_type("text/javascript", ".js", True)

-# Add root to path for registry import
-ROOT_DIR = Path(__file__).parent.parent.parent
+# Ensure root is on sys.path for registry import
 if str(ROOT_DIR) not in sys.path:
    sys.path.insert(0, str(ROOT_DIR))

 from registry import (
+    API_PROVIDERS,
    AVAILABLE_MODELS,
    DEFAULT_MODEL,
    get_all_settings,
+    get_setting,
    set_setting,
 )

@@ -38,18 +38,40 @@ def _parse_yolo_mode(value: str | None) -> bool:
    return (value or "false").lower() == "true"


-def _is_glm_mode() -> bool:
-    """Check if GLM API is configured via environment variables."""
-    return bool(os.getenv("ANTHROPIC_BASE_URL"))
+@router.get("/providers", response_model=ProvidersResponse)
+async def get_available_providers():
+    """Get list of available API providers."""
+    current = get_setting("api_provider", "claude") or "claude"
+    providers = []
+    for pid, pdata in API_PROVIDERS.items():
+        providers.append(ProviderInfo(
+            id=pid,
+            name=pdata["name"],
+            base_url=pdata.get("base_url"),
+            models=[ModelInfo(id=m["id"], name=m["name"]) for m in pdata.get("models", [])],
+            default_model=pdata.get("default_model", ""),
+            requires_auth=pdata.get("requires_auth", False),
+        ))
+    return ProvidersResponse(providers=providers, current=current)


@router.get("/models", response_model=ModelsResponse)
 async def get_available_models():
    """Get list of available models.

-    Frontend should call this to get the current list of models
-    instead of hardcoding them.
+    Returns models for the currently selected API provider.
    """
+    current_provider = get_setting("api_provider", "claude") or "claude"
+    provider = API_PROVIDERS.get(current_provider)
+
+    if provider and current_provider != "claude":
+        provider_models = provider.get("models", [])
+        return ModelsResponse(
+            models=[ModelInfo(id=m["id"], name=m["name"]) for m in provider_models],
+            default=provider.get("default_model", ""),
+        )
+
+    # Default: return Claude models
    return ModelsResponse(
        models=[ModelInfo(id=m["id"], name=m["name"]) for m in AVAILABLE_MODELS],
        default=DEFAULT_MODEL,
@@ -78,11 +100,23 @@ async def get_settings():
    """Get current global settings."""
    all_settings = get_all_settings()

+    api_provider = all_settings.get("api_provider", "claude")
+
+    glm_mode = api_provider == "glm"
+    ollama_mode = api_provider == "ollama"
+
    return SettingsResponse(
        yolo_mode=_parse_yolo_mode(all_settings.get("yolo_mode")),
        model=all_settings.get("model", DEFAULT_MODEL),
-        glm_mode=_is_glm_mode(),
+        glm_mode=glm_mode,
+        ollama_mode=ollama_mode,
        testing_agent_ratio=_parse_int(all_settings.get("testing_agent_ratio"), 1),
+        playwright_headless=_parse_bool(all_settings.get("playwright_headless"), default=True),
+        batch_size=_parse_int(all_settings.get("batch_size"), 3),
+        api_provider=api_provider,
+        api_base_url=all_settings.get("api_base_url"),
+        api_has_auth_token=bool(all_settings.get("api_auth_token")),
+        api_model=all_settings.get("api_model"),
    )


@@ -98,11 +132,53 @@ async def update_settings(update: SettingsUpdate):
    if update.testing_agent_ratio is not None:
        set_setting("testing_agent_ratio", str(update.testing_agent_ratio))

+    if update.playwright_headless is not None:
+        set_setting("playwright_headless", "true" if update.playwright_headless else "false")
+
+    if update.batch_size is not None:
+        set_setting("batch_size", str(update.batch_size))
+
+    # API provider settings
+    if update.api_provider is not None:
+        old_provider = get_setting("api_provider", "claude")
+        set_setting("api_provider", update.api_provider)
+
+        # When provider changes, auto-set defaults for the new provider
+        if update.api_provider != old_provider:
+            provider = API_PROVIDERS.get(update.api_provider)
+            if provider:
+                # Auto-set base URL from provider definition
+                if provider.get("base_url"):
+                    set_setting("api_base_url", provider["base_url"])
+                # Auto-set model to provider's default
+                if provider.get("default_model") and update.api_model is None:
+                    set_setting("api_model", provider["default_model"])
+
+    if update.api_base_url is not None:
+        set_setting("api_base_url", update.api_base_url)
+
+    if update.api_auth_token is not None:
+        set_setting("api_auth_token", update.api_auth_token)
+
+    if update.api_model is not None:
+        set_setting("api_model", update.api_model)
+
    # Return updated settings
    all_settings = get_all_settings()
+    api_provider = all_settings.get("api_provider", "claude")
+    glm_mode = api_provider == "glm"
+    ollama_mode = api_provider == "ollama"
+
    return SettingsResponse(
        yolo_mode=_parse_yolo_mode(all_settings.get("yolo_mode")),
        model=all_settings.get("model", DEFAULT_MODEL),
-        glm_mode=_is_glm_mode(),
+        glm_mode=glm_mode,
+        ollama_mode=ollama_mode,
        testing_agent_ratio=_parse_int(all_settings.get("testing_agent_ratio"), 1),
+        playwright_headless=_parse_bool(all_settings.get("playwright_headless"), default=True),
+        batch_size=_parse_int(all_settings.get("batch_size"), 3),
+        api_provider=api_provider,
+        api_base_url=all_settings.get("api_base_url"),
+        api_has_auth_token=bool(all_settings.get("api_auth_token")),
+        api_model=all_settings.get("api_model"),
    )
--- a/server/routers/spec_creation.py
+++ b/server/routers/spec_creation.py
@@ -7,8 +7,6 @@ WebSocket and REST endpoints for interactive spec creation with Claude.

 import json
 import logging
-import re
-from pathlib import Path
 from typing import Optional

 from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
@@ -22,30 +20,13 @@ from ..services.spec_chat_session import (
    list_sessions,
    remove_session,
 )
+from ..utils.project_helpers import get_project_path as _get_project_path
+from ..utils.validation import is_valid_project_name, validate_project_name

 logger = logging.getLogger(__name__)

 router = APIRouter(prefix="/api/spec", tags=["spec-creation"])

-# Root directory
-ROOT_DIR = Path(__file__).parent.parent.parent
-
-
-def _get_project_path(project_name: str) -> Path:
-    """Get project path from registry."""
-    import sys
-    root = Path(__file__).parent.parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
-
-
-def validate_project_name(name: str) -> bool:
-    """Validate project name to prevent path traversal."""
-    return bool(re.match(r'^[a-zA-Z0-9_-]{1,50}$', name))
-

 # ============================================================================
 # REST Endpoints
@@ -68,7 +49,7 @@ async def list_spec_sessions():
@router.get("/sessions/{project_name}", response_model=SpecSessionStatus)
 async def get_session_status(project_name: str):
    """Get status of a spec creation session."""
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    session = get_session(project_name)
@@ -86,7 +67,7 @@ async def get_session_status(project_name: str):
@router.delete("/sessions/{project_name}")
 async def cancel_session(project_name: str):
    """Cancel and remove a spec creation session."""
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    session = get_session(project_name)
@@ -114,7 +95,7 @@ async def get_spec_file_status(project_name: str):
    This is used for polling to detect when Claude has finished writing spec files.
    Claude writes this status file as the final step after completing all spec work.
    """
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    project_dir = _get_project_path(project_name)
@@ -124,7 +105,8 @@ async def get_spec_file_status(project_name: str):
    if not project_dir.exists():
        raise HTTPException(status_code=404, detail="Project directory not found")

-    status_file = project_dir / "prompts" / ".spec_status.json"
+    from autoforge_paths import get_prompts_dir
+    status_file = get_prompts_dir(project_dir) / ".spec_status.json"

    if not status_file.exists():
        return SpecFileStatus(
@@ -184,22 +166,28 @@ async def spec_chat_websocket(websocket: WebSocket, project_name: str):
    - {"type": "error", "content": "..."} - Error message
    - {"type": "pong"} - Keep-alive pong
    """
-    if not validate_project_name(project_name):
+    # Always accept WebSocket first to avoid opaque 403 errors
+    await websocket.accept()
+
+    try:
+        project_name = validate_project_name(project_name)
+    except HTTPException:
+        await websocket.send_json({"type": "error", "content": "Invalid project name"})
        await websocket.close(code=4000, reason="Invalid project name")
        return

    # Look up project directory from registry
    project_dir = _get_project_path(project_name)
    if not project_dir:
+        await websocket.send_json({"type": "error", "content": "Project not found in registry"})
        await websocket.close(code=4004, reason="Project not found in registry")
        return

    if not project_dir.exists():
+        await websocket.send_json({"type": "error", "content": "Project directory not found"})
        await websocket.close(code=4004, reason="Project directory not found")
        return

-    await websocket.accept()
-
    session: Optional[SpecChatSession] = None

    try:
--- a/server/routers/terminal.py
+++ b/server/routers/terminal.py
@@ -12,8 +12,6 @@ import base64
 import json
 import logging
 import re
-import sys
-from pathlib import Path

 from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
 from pydantic import BaseModel
@@ -27,13 +25,8 @@ from ..services.terminal_manager import (
    rename_terminal,
    stop_terminal_session,
 )
-
-# Add project root to path for registry import
-_root = Path(__file__).parent.parent.parent
-if str(_root) not in sys.path:
-    sys.path.insert(0, str(_root))
-
-from registry import get_project_path as registry_get_project_path
+from ..utils.project_helpers import get_project_path as _get_project_path
+from ..utils.validation import is_valid_project_name

 logger = logging.getLogger(__name__)

@@ -48,27 +41,6 @@ class TerminalCloseCode:
    FAILED_TO_START = 4500


-def _get_project_path(project_name: str) -> Path | None:
-    """Get project path from registry."""
-    return registry_get_project_path(project_name)
-
-
-def validate_project_name(name: str) -> bool:
-    """
-    Validate project name to prevent path traversal attacks.
-
-    Allows only alphanumeric characters, underscores, and hyphens.
-    Maximum length of 50 characters.
-
-    Args:
-        name: The project name to validate
-
-    Returns:
-        True if valid, False otherwise
-    """
-    return bool(re.match(r"^[a-zA-Z0-9_-]{1,50}$", name))
-
-
 def validate_terminal_id(terminal_id: str) -> bool:
    """
    Validate terminal ID format.
@@ -117,7 +89,7 @@ async def list_project_terminals(project_name: str) -> list[TerminalInfoResponse
    Returns:
        List of terminal info objects
    """
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    project_dir = _get_project_path(project_name)
@@ -150,7 +122,7 @@ async def create_project_terminal(
    Returns:
        The created terminal info
    """
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    project_dir = _get_project_path(project_name)
@@ -176,7 +148,7 @@ async def rename_project_terminal(
    Returns:
        The updated terminal info
    """
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    if not validate_terminal_id(terminal_id):
@@ -208,7 +180,7 @@ async def delete_project_terminal(project_name: str, terminal_id: str) -> dict:
    Returns:
        Success message
    """
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
        raise HTTPException(status_code=400, detail="Invalid project name")

    if not validate_terminal_id(terminal_id):
@@ -249,8 +221,12 @@ async def terminal_websocket(websocket: WebSocket, project_name: str, terminal_i
    - {"type": "pong"} - Keep-alive response
    - {"type": "error", "message": "..."} - Error message
    """
+    # Always accept WebSocket first to avoid opaque 403 errors
+    await websocket.accept()
+
    # Validate project name
-    if not validate_project_name(project_name):
+    if not is_valid_project_name(project_name):
+        await websocket.send_json({"type": "error", "message": "Invalid project name"})
        await websocket.close(
            code=TerminalCloseCode.INVALID_PROJECT_NAME, reason="Invalid project name"
        )
@@ -258,6 +234,7 @@ async def terminal_websocket(websocket: WebSocket, project_name: str, terminal_i

    # Validate terminal ID
    if not validate_terminal_id(terminal_id):
+        await websocket.send_json({"type": "error", "message": "Invalid terminal ID"})
        await websocket.close(
            code=TerminalCloseCode.INVALID_PROJECT_NAME, reason="Invalid terminal ID"
        )
@@ -266,6 +243,7 @@ async def terminal_websocket(websocket: WebSocket, project_name: str, terminal_i
    # Look up project directory from registry
    project_dir = _get_project_path(project_name)
    if not project_dir:
+        await websocket.send_json({"type": "error", "message": "Project not found in registry"})
        await websocket.close(
            code=TerminalCloseCode.PROJECT_NOT_FOUND,
            reason="Project not found in registry",
@@ -273,6 +251,7 @@ async def terminal_websocket(websocket: WebSocket, project_name: str, terminal_i
        return

    if not project_dir.exists():
+        await websocket.send_json({"type": "error", "message": "Project directory not found"})
        await websocket.close(
            code=TerminalCloseCode.PROJECT_NOT_FOUND,
            reason="Project directory not found",
@@ -282,14 +261,13 @@ async def terminal_websocket(websocket: WebSocket, project_name: str, terminal_i
    # Verify terminal exists in metadata
    terminal_info = get_terminal_info(project_name, terminal_id)
    if not terminal_info:
+        await websocket.send_json({"type": "error", "message": "Terminal not found"})
        await websocket.close(
            code=TerminalCloseCode.PROJECT_NOT_FOUND,
            reason="Terminal not found",
        )
        return

-    await websocket.accept()
-
    # Get or create terminal session for this project/terminal
    session = get_terminal_session(project_name, project_dir, terminal_id)

--- a/server/schemas.py
+++ b/server/schemas.py
@@ -45,6 +45,7 @@ class ProjectSummary(BaseModel):
    path: str
    has_spec: bool
    stats: ProjectStats
+    default_concurrency: int = 3


 class ProjectDetail(BaseModel):
@@ -54,6 +55,7 @@ class ProjectDetail(BaseModel):
    has_spec: bool
    stats: ProjectStats
    prompts_dir: str
+    default_concurrency: int = 3


 class ProjectPrompts(BaseModel):
@@ -70,6 +72,18 @@ class ProjectPromptsUpdate(BaseModel):
    coding_prompt: str | None = None


+class ProjectSettingsUpdate(BaseModel):
+    """Request schema for updating project-level settings."""
+    default_concurrency: int | None = None
+
+    @field_validator('default_concurrency')
+    @classmethod
+    def validate_concurrency(cls, v: int | None) -> int | None:
+        if v is not None and (v < 1 or v > 5):
+            raise ValueError("default_concurrency must be between 1 and 5")
+        return v
+
+
 # ============================================================================
 # Feature Schemas
 # ============================================================================
@@ -106,16 +120,41 @@ class FeatureResponse(FeatureBase):
    in_progress: bool
    blocked: bool = False  # Computed: has unmet dependencies
    blocking_dependencies: list[int] = Field(default_factory=list)  # Computed
+    needs_human_input: bool = False
+    human_input_request: dict | None = None
+    human_input_response: dict | None = None

    class Config:
        from_attributes = True


+class HumanInputField(BaseModel):
+    """Schema for a single human input field."""
+    id: str
+    label: str
+    type: Literal["text", "textarea", "select", "boolean"] = "text"
+    required: bool = True
+    placeholder: str | None = None
+    options: list[dict] | None = None  # For select: [{value, label}]
+
+
+class HumanInputRequest(BaseModel):
+    """Schema for an agent's human input request."""
+    prompt: str
+    fields: list[HumanInputField]
+
+
+class HumanInputResponse(BaseModel):
+    """Schema for a human's response to an input request."""
+    fields: dict[str, str | bool | list[str]]
+
+
 class FeatureListResponse(BaseModel):
    """Response containing list of features organized by status."""
    pending: list[FeatureResponse]
    in_progress: list[FeatureResponse]
    done: list[FeatureResponse]
+    needs_human_input: list[FeatureResponse] = Field(default_factory=list)


 class FeatureBulkCreate(BaseModel):
@@ -139,7 +178,7 @@ class DependencyGraphNode(BaseModel):
    id: int
    name: str
    category: str
-    status: Literal["pending", "in_progress", "done", "blocked"]
+    status: Literal["pending", "in_progress", "done", "blocked", "needs_human_input"]
    priority: int
    dependencies: list[int]

@@ -176,9 +215,12 @@ class AgentStartRequest(BaseModel):
    @field_validator('model')
    @classmethod
    def validate_model(cls, v: str | None) -> str | None:
-        """Validate model is in the allowed list."""
+        """Validate model is in the allowed list (Claude) or allow any model for alternative providers."""
        if v is not None and v not in VALID_MODELS:
-            raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
+            from registry import get_all_settings
+            settings = get_all_settings()
+            if settings.get("api_provider", "claude") == "claude":
+                raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
        return v

    @field_validator('max_concurrency')
@@ -200,7 +242,7 @@ class AgentStartRequest(BaseModel):

 class AgentStatus(BaseModel):
    """Current agent status."""
-    status: Literal["stopped", "running", "paused", "crashed"]
+    status: Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"]
    pid: int | None = None
    started_at: datetime | None = None
    yolo_mode: bool = False
@@ -240,6 +282,7 @@ class WSProgressMessage(BaseModel):
    in_progress: int
    total: int
    percentage: float
+    needs_human_input: int = 0


 class WSFeatureUpdateMessage(BaseModel):
@@ -377,12 +420,35 @@ class ModelInfo(BaseModel):
    name: str


+class ProviderInfo(BaseModel):
+    """Information about an API provider."""
+    id: str
+    name: str
+    base_url: str | None = None
+    models: list[ModelInfo]
+    default_model: str
+    requires_auth: bool = False
+
+
+class ProvidersResponse(BaseModel):
+    """Response schema for available providers list."""
+    providers: list[ProviderInfo]
+    current: str
+
+
 class SettingsResponse(BaseModel):
    """Response schema for global settings."""
    yolo_mode: bool = False
    model: str = DEFAULT_MODEL
-    glm_mode: bool = False  # True if GLM API is configured via .env
+    glm_mode: bool = False  # True when api_provider is "glm"
+    ollama_mode: bool = False  # True when api_provider is "ollama"
    testing_agent_ratio: int = 1  # Regression testing agents (0-3)
+    playwright_headless: bool = True
+    batch_size: int = 3  # Features per coding agent batch (1-3)
+    api_provider: str = "claude"
+    api_base_url: str | None = None
+    api_has_auth_token: bool = False  # Never expose actual token
+    api_model: str | None = None


 class ModelsResponse(BaseModel):
@@ -396,12 +462,32 @@ class SettingsUpdate(BaseModel):
    yolo_mode: bool | None = None
    model: str | None = None
    testing_agent_ratio: int | None = None  # 0-3
+    playwright_headless: bool | None = None
+    batch_size: int | None = None  # Features per agent batch (1-3)
+    api_provider: str | None = None
+    api_base_url: str | None = Field(None, max_length=500)
+    api_auth_token: str | None = Field(None, max_length=500)  # Write-only, never returned
+    api_model: str | None = Field(None, max_length=200)
+
+    @field_validator('api_base_url')
+    @classmethod
+    def validate_api_base_url(cls, v: str | None) -> str | None:
+        if v is not None and v.strip():
+            v = v.strip()
+            if not v.startswith(("http://", "https://")):
+                raise ValueError("api_base_url must start with http:// or https://")
+        return v

    @field_validator('model')
    @classmethod
-    def validate_model(cls, v: str | None) -> str | None:
-        if v is not None and v not in VALID_MODELS:
-            raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
+    def validate_model(cls, v: str | None, info) -> str | None:  # type: ignore[override]
+        if v is not None:
+            # Skip VALID_MODELS check when using an alternative API provider
+            api_provider = info.data.get("api_provider")
+            if api_provider and api_provider != "claude":
+                return v
+            if v not in VALID_MODELS:
+                raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
        return v

    @field_validator('testing_agent_ratio')
@@ -411,6 +497,13 @@ class SettingsUpdate(BaseModel):
            raise ValueError("testing_agent_ratio must be between 0 and 3")
        return v

+    @field_validator('batch_size')
+    @classmethod
+    def validate_batch_size(cls, v: int | None) -> int | None:
+        if v is not None and (v < 1 or v > 3):
+            raise ValueError("batch_size must be between 1 and 3")
+        return v
+

 # ============================================================================
 # Dev Server Schemas
@@ -507,9 +600,12 @@ class ScheduleCreate(BaseModel):
    @field_validator('model')
    @classmethod
    def validate_model(cls, v: str | None) -> str | None:
-        """Validate model is in the allowed list."""
+        """Validate model is in the allowed list (Claude) or allow any model for alternative providers."""
        if v is not None and v not in VALID_MODELS:
-            raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
+            from registry import get_all_settings
+            settings = get_all_settings()
+            if settings.get("api_provider", "claude") == "claude":
+                raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
        return v


@@ -529,9 +625,12 @@ class ScheduleUpdate(BaseModel):
    @field_validator('model')
    @classmethod
    def validate_model(cls, v: str | None) -> str | None:
-        """Validate model is in the allowed list."""
+        """Validate model is in the allowed list (Claude) or allow any model for alternative providers."""
        if v is not None and v not in VALID_MODELS:
-            raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
+            from registry import get_all_settings
+            settings = get_all_settings()
+            if settings.get("api_provider", "claude") == "claude":
+                raise ValueError(f"Invalid model. Must be one of: {VALID_MODELS}")
        return v


--- a/server/services/assistant_chat_session.py
+++ b/server/services/assistant_chat_session.py
@@ -25,25 +25,13 @@ from .assistant_database import (
    create_conversation,
    get_messages,
 )
+from .chat_constants import ROOT_DIR

 # Load environment variables from .env file if present
 load_dotenv()

 logger = logging.getLogger(__name__)

-# Root directory of the project
-ROOT_DIR = Path(__file__).parent.parent.parent
-
-# Environment variables to pass through to Claude CLI for API configuration
-API_ENV_VARS = [
-    "ANTHROPIC_BASE_URL",
-    "ANTHROPIC_AUTH_TOKEN",
-    "API_TIMEOUT_MS",
-    "ANTHROPIC_DEFAULT_SONNET_MODEL",
-    "ANTHROPIC_DEFAULT_OPUS_MODEL",
-    "ANTHROPIC_DEFAULT_HAIKU_MODEL",
-]
-
 # Read-only feature MCP tools
 READONLY_FEATURE_MCP_TOOLS = [
    "mcp__features__feature_get_stats",
@@ -59,8 +47,13 @@ FEATURE_MANAGEMENT_TOOLS = [
    "mcp__features__feature_skip",
 ]

+# Interactive tools
+INTERACTIVE_TOOLS = [
+    "mcp__features__ask_user",
+]
+
 # Combined list for assistant
-ASSISTANT_FEATURE_TOOLS = READONLY_FEATURE_MCP_TOOLS + FEATURE_MANAGEMENT_TOOLS
+ASSISTANT_FEATURE_TOOLS = READONLY_FEATURE_MCP_TOOLS + FEATURE_MANAGEMENT_TOOLS + INTERACTIVE_TOOLS

 # Read-only built-in tools (no Write, Edit, Bash)
 READONLY_BUILTIN_TOOLS = [
@@ -76,7 +69,8 @@ def get_system_prompt(project_name: str, project_dir: Path) -> str:
    """Generate the system prompt for the assistant with project context."""
    # Try to load app_spec.txt for context
    app_spec_content = ""
-    app_spec_path = project_dir / "prompts" / "app_spec.txt"
+    from autoforge_paths import get_prompts_dir
+    app_spec_path = get_prompts_dir(project_dir) / "app_spec.txt"
    if app_spec_path.exists():
        try:
            app_spec_content = app_spec_path.read_text(encoding="utf-8")
@@ -90,6 +84,8 @@ def get_system_prompt(project_name: str, project_dir: Path) -> str:

 Your role is to help users understand the codebase, answer questions about features, and manage the project backlog. You can READ files and CREATE/MANAGE features, but you cannot modify source code.

+You have MCP tools available for feature management. Use them directly by calling the tool -- do not suggest CLI commands, bash commands, or curl commands to the user. You can create features yourself using the feature_create and feature_create_bulk tools.
+
 ## What You CAN Do

 **Codebase Analysis (Read-Only):**
@@ -132,19 +128,26 @@ If the user asks you to modify code, explain that you're a project assistant and
 - **feature_create_bulk**: Create multiple features at once
 - **feature_skip**: Move a feature to the end of the queue

+**Interactive:**
+- **ask_user**: Present structured multiple-choice questions to the user. Use this when you need to clarify requirements, offer design choices, or guide a decision. The user sees clickable option buttons and their selection is returned as your next message.
+
 ## Creating Features

-When a user asks to add a feature, gather the following information:
-1. **Category**: A grouping like "Authentication", "API", "UI", "Database"
-2. **Name**: A concise, descriptive name
-3. **Description**: What the feature should do
-4. **Steps**: How to verify/implement the feature (as a list)
+When a user asks to add a feature, use the `feature_create` or `feature_create_bulk` MCP tools directly:
+
+For a **single feature**, call `feature_create` with:
+- category: A grouping like "Authentication", "API", "UI", "Database"
+- name: A concise, descriptive name
+- description: What the feature should do
+- steps: List of verification/implementation steps
+
+For **multiple features**, call `feature_create_bulk` with an array of feature objects.

 You can ask clarifying questions if the user's request is vague, or make reasonable assumptions for simple requests.

 **Example interaction:**
 User: "Add a feature for S3 sync"
-You: I'll create that feature. Let me add it to the backlog...
+You: I'll create that feature now.
 [calls feature_create with appropriate parameters]
 You: Done! I've added "S3 Sync Integration" to your backlog. It's now visible on the kanban board.

@@ -162,7 +165,7 @@ class AssistantChatSession:
    """
    Manages a read-only assistant conversation for a project.

-    Uses Claude Opus 4.5 with only read-only tools enabled.
+    Uses Claude Opus with only read-only tools enabled.
    Persists conversation history to SQLite.
    """

@@ -208,7 +211,7 @@ class AssistantChatSession:
        # Create a new conversation if we don't have one
        if is_new_conversation:
            conv = create_conversation(self.project_dir, self.project_name)
-            self.conversation_id = conv.id
+            self.conversation_id = int(conv.id)  # type coercion: Column[int] -> int
            yield {"type": "conversation_created", "conversation_id": self.conversation_id}

        # Build permissions list for assistant access (read + feature management)
@@ -229,7 +232,9 @@ class AssistantChatSession:
                "allow": permissions_list,
            },
        }
-        settings_file = self.project_dir / ".claude_assistant_settings.json"
+        from autoforge_paths import get_claude_assistant_settings_path
+        settings_file = get_claude_assistant_settings_path(self.project_dir)
+        settings_file.parent.mkdir(parents=True, exist_ok=True)
        with open(settings_file, "w") as f:
            json.dump(security_settings, f, indent=2)

@@ -261,11 +266,11 @@ class AssistantChatSession:
        system_cli = shutil.which("claude")

        # Build environment overrides for API configuration
-        sdk_env = {var: os.getenv(var) for var in API_ENV_VARS if os.getenv(var)}
+        from registry import DEFAULT_MODEL, get_effective_sdk_env
+        sdk_env = get_effective_sdk_env()

-        # Determine model from environment or use default
-        # This allows using alternative APIs (e.g., GLM via z.ai) that may not support Claude model names
-        model = os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", "claude-opus-4-5-20251101")
+        # Determine model from SDK env (provider-aware) or fallback to env/default
+        model = sdk_env.get("ANTHROPIC_DEFAULT_OPUS_MODEL") or os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", DEFAULT_MODEL)

        try:
            logger.info("Creating ClaudeSDKClient...")
@@ -277,7 +282,7 @@ class AssistantChatSession:
                    # This avoids Windows command line length limit (~8191 chars)
                    setting_sources=["project"],
                    allowed_tools=[*READONLY_BUILTIN_TOOLS, *ASSISTANT_FEATURE_TOOLS],
-                    mcp_servers=mcp_servers,
+                    mcp_servers=mcp_servers,  # type: ignore[arg-type]  # SDK accepts dict config at runtime
                    permission_mode="bypassPermissions",
                    max_turns=100,
                    cwd=str(self.project_dir.resolve()),
@@ -303,6 +308,8 @@ class AssistantChatSession:
                greeting = f"Hello! I'm your project assistant for **{self.project_name}**. I can help you understand the codebase, explain features, and answer questions about the project. What would you like to know?"

                # Store the greeting in the database
+                # conversation_id is guaranteed non-None here (set on line 206 above)
+                assert self.conversation_id is not None
                add_message(self.project_dir, self.conversation_id, "assistant", greeting)

                yield {"type": "text", "content": greeting}
@@ -403,6 +410,17 @@ class AssistantChatSession:
                    elif block_type == "ToolUseBlock" and hasattr(block, "name"):
                        tool_name = block.name
                        tool_input = getattr(block, "input", {})
+
+                        # Intercept ask_user tool calls -> yield as question message
+                        if tool_name == "mcp__features__ask_user":
+                            questions = tool_input.get("questions", [])
+                            if questions:
+                                yield {
+                                    "type": "question",
+                                    "questions": questions,
+                                }
+                                continue
+
                        yield {
                            "type": "tool_call",
                            "tool": tool_name,
--- a/server/services/assistant_database.py
+++ b/server/services/assistant_database.py
@@ -7,20 +7,28 @@ Each project has its own assistant.db file in the project directory.
 """

 import logging
+import threading
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Optional

 from sqlalchemy import Column, DateTime, ForeignKey, Integer, String, Text, create_engine, func
-from sqlalchemy.orm import declarative_base, relationship, sessionmaker
+from sqlalchemy.engine import Engine
+from sqlalchemy.orm import DeclarativeBase, relationship, sessionmaker

 logger = logging.getLogger(__name__)

-Base = declarative_base()
+class Base(DeclarativeBase):
+    """SQLAlchemy 2.0 style declarative base."""
+    pass

 # Engine cache to avoid creating new engines for each request
 # Key: project directory path (as posix string), Value: SQLAlchemy engine
-_engine_cache: dict[str, object] = {}
+_engine_cache: dict[str, Engine] = {}
+
+# Lock for thread-safe access to the engine cache
+# Prevents race conditions when multiple threads create engines simultaneously
+_cache_lock = threading.Lock()


 def _utc_now() -> datetime:
@@ -56,7 +64,8 @@ class ConversationMessage(Base):

 def get_db_path(project_dir: Path) -> Path:
    """Get the path to the assistant database for a project."""
-    return project_dir / "assistant.db"
+    from autoforge_paths import get_assistant_db_path
+    return get_assistant_db_path(project_dir)


 def get_engine(project_dir: Path):
@@ -64,21 +73,57 @@ def get_engine(project_dir: Path):

    Uses a cache to avoid creating new engines for each request, which improves
    performance by reusing database connections.
+
+    Thread-safe: Uses a lock to prevent race conditions when multiple threads
+    try to create engines simultaneously for the same project.
    """
    cache_key = project_dir.as_posix()

-    if cache_key not in _engine_cache:
-        db_path = get_db_path(project_dir)
-        # Use as_posix() for cross-platform compatibility with SQLite connection strings
-        db_url = f"sqlite:///{db_path.as_posix()}"
-        engine = create_engine(db_url, echo=False)
-        Base.metadata.create_all(engine)
-        _engine_cache[cache_key] = engine
-        logger.debug(f"Created new database engine for {cache_key}")
+    # Double-checked locking for thread safety and performance
+    if cache_key in _engine_cache:
+        return _engine_cache[cache_key]
+
+    with _cache_lock:
+        # Check again inside the lock in case another thread created it
+        if cache_key not in _engine_cache:
+            db_path = get_db_path(project_dir)
+            # Use as_posix() for cross-platform compatibility with SQLite connection strings
+            db_url = f"sqlite:///{db_path.as_posix()}"
+            engine = create_engine(
+                db_url,
+                echo=False,
+                connect_args={
+                    "check_same_thread": False,
+                    "timeout": 30,  # Wait up to 30s for locks
+                }
+            )
+            Base.metadata.create_all(engine)
+            _engine_cache[cache_key] = engine
+            logger.debug(f"Created new database engine for {cache_key}")

    return _engine_cache[cache_key]


+def dispose_engine(project_dir: Path) -> bool:
+    """Dispose of and remove the cached engine for a project.
+
+    This closes all database connections, releasing file locks on Windows.
+    Should be called before deleting the database file.
+
+    Returns:
+        True if an engine was disposed, False if no engine was cached.
+    """
+    cache_key = project_dir.as_posix()
+
+    if cache_key in _engine_cache:
+        engine = _engine_cache.pop(cache_key)
+        engine.dispose()
+        logger.debug(f"Disposed database engine for {cache_key}")
+        return True
+
+    return False
+
+
 def get_session(project_dir: Path):
    """Get a new database session for a project."""
    engine = get_engine(project_dir)
--- a/server/services/chat_constants.py
+++ b/server/services/chat_constants.py
@@ -0,0 +1,57 @@
+"""
+Chat Session Constants
+======================
+
+Shared constants for all chat session types (assistant, spec, expand).
+
+The canonical ``API_ENV_VARS`` list lives in ``env_constants.py`` at the
+project root and is re-exported here for convenience so that existing
+imports (``from .chat_constants import API_ENV_VARS``) continue to work.
+"""
+
+import sys
+from pathlib import Path
+from typing import AsyncGenerator
+
+# -------------------------------------------------------------------
+# Root directory of the autoforge project (repository root).
+# Used throughout the server package whenever the repo root is needed.
+# -------------------------------------------------------------------
+ROOT_DIR = Path(__file__).parent.parent.parent
+
+# Ensure the project root is on sys.path so we can import env_constants
+# from the root-level module without requiring a package install.
+_root_str = str(ROOT_DIR)
+if _root_str not in sys.path:
+    sys.path.insert(0, _root_str)
+
+# -------------------------------------------------------------------
+# Environment variables forwarded to Claude CLI subprocesses.
+# Single source of truth lives in env_constants.py at the project root.
+# Re-exported here so existing ``from .chat_constants import API_ENV_VARS``
+# imports continue to work unchanged.
+# -------------------------------------------------------------------
+from env_constants import API_ENV_VARS  # noqa: E402, F401
+
+
+async def make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:
+    """Yield a single multimodal user message in Claude Agent SDK format.
+
+    The Claude Agent SDK's ``query()`` method accepts either a plain string
+    or an ``AsyncIterable[dict]`` for custom message formats.  This helper
+    wraps a list of content blocks (text and/or images) in the expected
+    envelope.
+
+    Args:
+        content_blocks: List of content-block dicts, e.g.
+            ``[{"type": "text", "text": "..."}, {"type": "image", ...}]``.
+
+    Yields:
+        A single dict representing the user message.
+    """
+    yield {
+        "type": "user",
+        "message": {"role": "user", "content": content_blocks},
+        "parent_tool_use_id": None,
+        "session_id": "default",
+    }
--- a/server/services/dev_server_manager.py
+++ b/server/services/dev_server_manager.py
@@ -14,10 +14,11 @@ This is a simplified version of AgentProcessManager, tailored for dev servers:
 import asyncio
 import logging
 import re
+import shlex
 import subprocess
 import sys
 import threading
-from datetime import datetime
+from datetime import datetime, timezone
 from pathlib import Path
 from typing import Awaitable, Callable, Literal, Set

@@ -114,7 +115,8 @@ class DevServerProcessManager:
        self._callbacks_lock = threading.Lock()

        # Lock file to prevent multiple instances (stored in project directory)
-        self.lock_file = self.project_dir / ".devserver.lock"
+        from autoforge_paths import get_devserver_lock_path
+        self.lock_file = get_devserver_lock_path(self.project_dir)

    @property
    def status(self) -> Literal["stopped", "running", "crashed"]:
@@ -289,39 +291,54 @@ class DevServerProcessManager:
        Start the dev server as a subprocess.

        Args:
-            command: The shell command to run (e.g., "npm run dev")
+            command: The command to run (e.g., "npm run dev")

        Returns:
            Tuple of (success, message)
        """
-        if self.status == "running":
+        # Already running?
+        if self.process and self.status == "running":
            return False, "Dev server is already running"

+        # Lock check (prevents double-start)
        if not self._check_lock():
-            return False, "Another dev server instance is already running for this project"
+            return False, "Dev server already running (lock file present)"

-        # Validate that project directory exists
-        if not self.project_dir.exists():
-            return False, f"Project directory does not exist: {self.project_dir}"
+        command = (command or "").strip()
+        if not command:
+            return False, "Empty dev server command"

-        self._command = command
-        self._detected_url = None  # Reset URL detection
+        # SECURITY: block shell operators/metacharacters (defense-in-depth)
+        # NOTE: On Windows, .cmd/.bat files are executed via cmd.exe even with
+        # shell=False (CPython limitation), so metacharacter blocking is critical.
+        # Single & is a cmd.exe command separator, ^ is cmd escape, % enables
+        # environment variable expansion, > < enable redirection.
+        dangerous_ops = ["&&", "||", ";", "|", "`", "$(", "&", ">", "<", "^", "%"]
+        if any(op in command for op in dangerous_ops):
+            return False, "Shell operators are not allowed in dev server command"
+        # Block newline injection (cmd.exe interprets newlines as command separators)
+        if "\n" in command or "\r" in command:
+            return False, "Newlines are not allowed in dev server command"
+
+        # Parse into argv and execute without shell
+        argv = shlex.split(command, posix=(sys.platform != "win32"))
+        if not argv:
+            return False, "Empty dev server command"
+
+        base = Path(argv[0]).name.lower()
+
+        # Defense-in-depth: reject direct shells/interpreters commonly used for injection
+        if base in {"sh", "bash", "zsh", "cmd", "powershell", "pwsh"}:
+            return False, f"Shell runner '{base}' is not allowed for dev server commands"
+
+        # Windows: use .cmd shims for Node package managers
+        if sys.platform == "win32" and base in {"npm", "pnpm", "yarn", "npx"} and not argv[0].lower().endswith(".cmd"):
+            argv[0] = argv[0] + ".cmd"

        try:
-            # Determine shell based on platform
-            if sys.platform == "win32":
-                # On Windows, use cmd.exe
-                shell_cmd = ["cmd", "/c", command]
-            else:
-                # On Unix-like systems, use sh
-                shell_cmd = ["sh", "-c", command]
-
-            # Start subprocess with piped stdout/stderr
-            # stdin=DEVNULL prevents interactive dev servers from blocking on stdin
-            # On Windows, use CREATE_NO_WINDOW to prevent console window from flashing
            if sys.platform == "win32":
                self.process = subprocess.Popen(
-                    shell_cmd,
+                    argv,
                    stdin=subprocess.DEVNULL,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.STDOUT,
@@ -330,23 +347,33 @@ class DevServerProcessManager:
                )
            else:
                self.process = subprocess.Popen(
-                    shell_cmd,
+                    argv,
                    stdin=subprocess.DEVNULL,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.STDOUT,
                    cwd=str(self.project_dir),
                )

-            self._create_lock()
-            self.started_at = datetime.now()
-            self.status = "running"
+            self._command = command
+            self.started_at = datetime.now(timezone.utc)
+            self._detected_url = None

-            # Start output streaming task
+            # Create lock once we have a PID
+            self._create_lock()
+
+            # Start output streaming
+            self.status = "running"
            self._output_task = asyncio.create_task(self._stream_output())

-            return True, f"Dev server started with PID {self.process.pid}"
+            return True, "Dev server started"
+
+        except FileNotFoundError:
+            self.status = "stopped"
+            self.process = None
+            return False, f"Command not found: {argv[0]}"
        except Exception as e:
-            logger.exception("Failed to start dev server")
+            self.status = "stopped"
+            self.process = None
            return False, f"Failed to start dev server: {e}"

    async def stop(self) -> tuple[bool, str]:
@@ -428,7 +455,9 @@ class DevServerProcessManager:


 # Global registry of dev server managers per project with thread safety
-_managers: dict[str, DevServerProcessManager] = {}
+# Key is (project_name, resolved_project_dir) to prevent cross-project contamination
+# when different projects share the same name but have different paths
+_managers: dict[tuple[str, str], DevServerProcessManager] = {}
 _managers_lock = threading.Lock()


@@ -444,18 +473,11 @@ def get_devserver_manager(project_name: str, project_dir: Path) -> DevServerProc
        DevServerProcessManager instance for the project
    """
    with _managers_lock:
-        if project_name in _managers:
-            manager = _managers[project_name]
-            # Update project_dir in case project was moved
-            if manager.project_dir.resolve() != project_dir.resolve():
-                logger.info(
-                    f"Project {project_name} path updated: {manager.project_dir} -> {project_dir}"
-                )
-                manager.project_dir = project_dir
-                manager.lock_file = project_dir / ".devserver.lock"
-            return manager
-        _managers[project_name] = DevServerProcessManager(project_name, project_dir)
-        return _managers[project_name]
+        # Use composite key to prevent cross-project UI contamination (#71)
+        key = (project_name, str(project_dir.resolve()))
+        if key not in _managers:
+            _managers[key] = DevServerProcessManager(project_name, project_dir)
+        return _managers[key]


 async def cleanup_all_devservers() -> None:
@@ -492,8 +514,18 @@ def cleanup_orphaned_devserver_locks() -> int:
            if not project_path.exists():
                continue

-            lock_file = project_path / ".devserver.lock"
-            if not lock_file.exists():
+            # Check both legacy and new locations for lock files
+            from autoforge_paths import get_autoforge_dir
+            lock_locations = [
+                project_path / ".devserver.lock",
+                get_autoforge_dir(project_path) / ".devserver.lock",
+            ]
+            lock_file = None
+            for candidate in lock_locations:
+                if candidate.exists():
+                    lock_file = candidate
+                    break
+            if lock_file is None:
                continue

            try:
--- a/server/services/expand_chat_session.py
+++ b/server/services/expand_chat_session.py
@@ -10,60 +10,41 @@ import asyncio
 import json
 import logging
 import os
-import re
 import shutil
+import sys
 import threading
 import uuid
 from datetime import datetime
 from pathlib import Path
-from typing import AsyncGenerator, Optional
+from typing import Any, AsyncGenerator, Optional

 from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from dotenv import load_dotenv

 from ..schemas import ImageAttachment
+from .chat_constants import ROOT_DIR, make_multimodal_message

 # Load environment variables from .env file if present
 load_dotenv()

 logger = logging.getLogger(__name__)

-# Environment variables to pass through to Claude CLI for API configuration
-API_ENV_VARS = [
-    "ANTHROPIC_BASE_URL",
-    "ANTHROPIC_AUTH_TOKEN",
-    "API_TIMEOUT_MS",
-    "ANTHROPIC_DEFAULT_SONNET_MODEL",
-    "ANTHROPIC_DEFAULT_OPUS_MODEL",
-    "ANTHROPIC_DEFAULT_HAIKU_MODEL",
+# Feature MCP tools needed for expand session
+EXPAND_FEATURE_TOOLS = [
+    "mcp__features__feature_create",
+    "mcp__features__feature_create_bulk",
+    "mcp__features__feature_get_stats",
 ]


-async def _make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:
-    """
-    Create an async generator that yields a properly formatted multimodal message.
-    """
-    yield {
-        "type": "user",
-        "message": {"role": "user", "content": content_blocks},
-        "parent_tool_use_id": None,
-        "session_id": "default",
-    }
-
-
-# Root directory of the project
-ROOT_DIR = Path(__file__).parent.parent.parent
-
-
 class ExpandChatSession:
    """
    Manages a project expansion conversation.

    Unlike SpecChatSession which writes spec files, this session:
    1. Reads existing app_spec.txt for context
-    2. Parses feature definitions from Claude's output
-    3. Creates features via REST API
-    4. Tracks which features were created during the session
+    2. Chats with the user to define new features
+    3. Claude creates features via the feature_create_bulk MCP tool
    """

    def __init__(self, project_name: str, project_dir: Path):
@@ -122,7 +103,8 @@ class ExpandChatSession:
            return

        # Verify project has existing spec
-        spec_path = self.project_dir / "prompts" / "app_spec.txt"
+        from autoforge_paths import get_prompts_dir
+        spec_path = get_prompts_dir(self.project_dir) / "app_spec.txt"
        if not spec_path.exists():
            yield {
                "type": "error",
@@ -145,17 +127,24 @@ class ExpandChatSession:
            return

        # Create temporary security settings file (unique per session to avoid conflicts)
+        # Note: permission_mode="bypassPermissions" is safe here because:
+        # 1. Only Read/Glob file tools are allowed (no Write/Edit)
+        # 2. MCP tools are restricted to feature creation only
+        # 3. No Bash access - cannot execute arbitrary commands
        security_settings = {
            "sandbox": {"enabled": True},
            "permissions": {
-                "defaultMode": "acceptEdits",
+                "defaultMode": "bypassPermissions",
                "allow": [
                    "Read(./**)",
                    "Glob(./**)",
+                    *EXPAND_FEATURE_TOOLS,
                ],
            },
        }
-        settings_file = self.project_dir / f".claude_settings.expand.{uuid.uuid4().hex}.json"
+        from autoforge_paths import get_expand_settings_path
+        settings_file = get_expand_settings_path(self.project_dir, uuid.uuid4().hex)
+        settings_file.parent.mkdir(parents=True, exist_ok=True)
        self._settings_file = settings_file
        with open(settings_file, "w", encoding="utf-8") as f:
            json.dump(security_settings, f, indent=2)
@@ -165,11 +154,23 @@ class ExpandChatSession:
        system_prompt = skill_content.replace("$ARGUMENTS", project_path)

        # Build environment overrides for API configuration
-        sdk_env = {var: os.getenv(var) for var in API_ENV_VARS if os.getenv(var)}
+        from registry import DEFAULT_MODEL, get_effective_sdk_env
+        sdk_env = get_effective_sdk_env()

-        # Determine model from environment or use default
-        # This allows using alternative APIs (e.g., GLM via z.ai) that may not support Claude model names
-        model = os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", "claude-opus-4-5-20251101")
+        # Determine model from SDK env (provider-aware) or fallback to env/default
+        model = sdk_env.get("ANTHROPIC_DEFAULT_OPUS_MODEL") or os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", DEFAULT_MODEL)
+
+        # Build MCP servers config for feature creation
+        mcp_servers = {
+            "features": {
+                "command": sys.executable,
+                "args": ["-m", "mcp_server.feature_mcp"],
+                "env": {
+                    "PROJECT_DIR": str(self.project_dir.resolve()),
+                    "PYTHONPATH": str(ROOT_DIR.resolve()),
+                },
+            },
+        }

        # Create Claude SDK client
        try:
@@ -181,8 +182,13 @@ class ExpandChatSession:
                    allowed_tools=[
                        "Read",
                        "Glob",
+                        "Grep",
+                        "WebFetch",
+                        "WebSearch",
+                        *EXPAND_FEATURE_TOOLS,
                    ],
-                    permission_mode="acceptEdits",
+                    mcp_servers=mcp_servers,  # type: ignore[arg-type]  # SDK accepts dict config at runtime
+                    permission_mode="bypassPermissions",
                    max_turns=100,
                    cwd=str(self.project_dir.resolve()),
                    settings=str(settings_file.resolve()),
@@ -267,14 +273,15 @@ class ExpandChatSession:
        """
        Internal method to query Claude and stream responses.

-        Handles text responses and detects feature creation blocks.
+        Feature creation is handled by Claude calling the feature_create_bulk
+        MCP tool directly -- no text parsing needed.
        """
        if not self.client:
            return

        # Build the message content
        if attachments and len(attachments) > 0:
-            content_blocks = []
+            content_blocks: list[dict[str, Any]] = []
            if message:
                content_blocks.append({"type": "text", "text": message})
            for att in attachments:
@@ -286,14 +293,11 @@ class ExpandChatSession:
                        "data": att.base64Data,
                    }
                })
-            await self.client.query(_make_multimodal_message(content_blocks))
+            await self.client.query(make_multimodal_message(content_blocks))
            logger.info(f"Sent multimodal message with {len(attachments)} image(s)")
        else:
            await self.client.query(message)

-        # Accumulate full response to detect feature blocks
-        full_response = ""
-
        # Stream the response
        async for msg in self.client.receive_response():
            msg_type = type(msg).__name__
@@ -305,7 +309,6 @@ class ExpandChatSession:
                    if block_type == "TextBlock" and hasattr(block, "text"):
                        text = block.text
                        if text:
-                            full_response += text
                            yield {"type": "text", "content": text}

                            self.messages.append({
@@ -314,123 +317,6 @@ class ExpandChatSession:
                                "timestamp": datetime.now().isoformat()
                            })

-        # Check for feature creation blocks in full response (handle multiple blocks)
-        features_matches = re.findall(
-            r'<features_to_create>\s*(\[[\s\S]*?\])\s*</features_to_create>',
-            full_response
-        )
-
-        if features_matches:
-            # Collect all features from all blocks, deduplicating by name
-            all_features: list[dict] = []
-            seen_names: set[str] = set()
-
-            for features_json in features_matches:
-                try:
-                    features_data = json.loads(features_json)
-
-                    if features_data and isinstance(features_data, list):
-                        for feature in features_data:
-                            name = feature.get("name", "")
-                            if name and name not in seen_names:
-                                seen_names.add(name)
-                                all_features.append(feature)
-                except json.JSONDecodeError as e:
-                    logger.error(f"Failed to parse features JSON block: {e}")
-                    # Continue processing other blocks
-
-            if all_features:
-                try:
-                    # Create all deduplicated features
-                    created = await self._create_features_bulk(all_features)
-
-                    if created:
-                        self.features_created += len(created)
-                        self.created_feature_ids.extend([f["id"] for f in created])
-
-                        yield {
-                            "type": "features_created",
-                            "count": len(created),
-                            "features": created
-                        }
-
-                        logger.info(f"Created {len(created)} features for {self.project_name}")
-                except Exception:
-                    logger.exception("Failed to create features")
-                    yield {
-                        "type": "error",
-                        "content": "Failed to create features"
-                    }
-
-    async def _create_features_bulk(self, features: list[dict]) -> list[dict]:
-        """
-        Create features directly in the database.
-
-        Args:
-            features: List of feature dictionaries with category, name, description, steps
-
-        Returns:
-            List of created feature dictionaries with IDs
-
-        Note:
-            Uses flush() to get IDs immediately without re-querying by priority range,
-            which could pick up rows from concurrent writers.
-        """
-        # Import database classes
-        import sys
-        root = Path(__file__).parent.parent.parent
-        if str(root) not in sys.path:
-            sys.path.insert(0, str(root))
-
-        from api.database import Feature, create_database
-
-        # Get database session
-        _, SessionLocal = create_database(self.project_dir)
-        session = SessionLocal()
-
-        try:
-            # Determine starting priority
-            max_priority_feature = session.query(Feature).order_by(Feature.priority.desc()).first()
-            current_priority = (max_priority_feature.priority + 1) if max_priority_feature else 1
-
-            created_rows: list = []
-
-            for f in features:
-                db_feature = Feature(
-                    priority=current_priority,
-                    category=f.get("category", "functional"),
-                    name=f.get("name", "Unnamed feature"),
-                    description=f.get("description", ""),
-                    steps=f.get("steps", []),
-                    passes=False,
-                    in_progress=False,
-                )
-                session.add(db_feature)
-                created_rows.append(db_feature)
-                current_priority += 1
-
-            # Flush to get IDs without relying on priority range query
-            session.flush()
-
-            # Build result from the flushed objects (IDs are now populated)
-            created_features = [
-                {
-                    "id": db_feature.id,
-                    "name": db_feature.name,
-                    "category": db_feature.category,
-                }
-                for db_feature in created_rows
-            ]
-
-            session.commit()
-            return created_features
-
-        except Exception:
-            session.rollback()
-            raise
-        finally:
-            session.close()
-
    def get_features_created(self) -> int:
        """Get the total number of features created in this session."""
        return self.features_created
--- a/server/services/process_manager.py
+++ b/server/services/process_manager.py
@@ -15,7 +15,7 @@ import sys
 import threading
 from datetime import datetime
 from pathlib import Path
-from typing import Awaitable, Callable, Literal, Set
+from typing import Any, Awaitable, Callable, Literal, Set

 import psutil

@@ -77,7 +77,7 @@ class AgentProcessManager:
        self.project_dir = project_dir
        self.root_dir = root_dir
        self.process: subprocess.Popen | None = None
-        self._status: Literal["stopped", "running", "paused", "crashed"] = "stopped"
+        self._status: Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"] = "stopped"
        self.started_at: datetime | None = None
        self._output_task: asyncio.Task | None = None
        self.yolo_mode: bool = False  # YOLO mode for rapid prototyping
@@ -92,14 +92,15 @@ class AgentProcessManager:
        self._callbacks_lock = threading.Lock()

        # Lock file to prevent multiple instances (stored in project directory)
-        self.lock_file = self.project_dir / ".agent.lock"
+        from autoforge_paths import get_agent_lock_path
+        self.lock_file = get_agent_lock_path(self.project_dir)

    @property
-    def status(self) -> Literal["stopped", "running", "paused", "crashed"]:
+    def status(self) -> Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"]:
        return self._status

    @status.setter
-    def status(self, value: Literal["stopped", "running", "paused", "crashed"]):
+    def status(self, value: Literal["stopped", "running", "paused", "crashed", "pausing", "paused_graceful"]):
        old_status = self._status
        self._status = value
        if old_status != value:
@@ -226,6 +227,68 @@ class AgentProcessManager:
        """Remove lock file."""
        self.lock_file.unlink(missing_ok=True)

+    def _apply_playwright_headless(self, headless: bool) -> None:
+        """Update .playwright/cli.config.json with the current headless setting.
+
+        playwright-cli reads this config file on each ``open`` command, so
+        updating it before the agent starts is sufficient.
+        """
+        config_file = self.project_dir / ".playwright" / "cli.config.json"
+        if not config_file.exists():
+            return
+        try:
+            import json
+            config = json.loads(config_file.read_text(encoding="utf-8"))
+            launch_opts = config.get("browser", {}).get("launchOptions", {})
+            if launch_opts.get("headless") == headless:
+                return  # already correct
+            launch_opts["headless"] = headless
+            config.setdefault("browser", {})["launchOptions"] = launch_opts
+            config_file.write_text(json.dumps(config, indent=2) + "\n", encoding="utf-8")
+            logger.info("Set playwright headless=%s for %s", headless, self.project_name)
+        except Exception:
+            logger.warning("Failed to update playwright config", exc_info=True)
+
+    def _cleanup_stale_features(self) -> None:
+        """Clear in_progress flag for all features when agent stops/crashes.
+
+        When the agent process exits (normally or crash), any features left
+        with in_progress=True were being worked on and didn't complete.
+        Reset them so they can be picked up on next agent start.
+        """
+        try:
+            from autoforge_paths import get_features_db_path
+            features_db = get_features_db_path(self.project_dir)
+            if not features_db.exists():
+                return
+
+            from sqlalchemy import create_engine
+            from sqlalchemy.orm import sessionmaker
+
+            from api.database import Feature
+
+            engine = create_engine(f"sqlite:///{features_db}")
+            Session = sessionmaker(bind=engine)
+            session = Session()
+            try:
+                stuck = session.query(Feature).filter(
+                    Feature.in_progress == True,  # noqa: E712
+                    Feature.passes == False,  # noqa: E712
+                ).all()
+                if stuck:
+                    for f in stuck:
+                        f.in_progress = False  # type: ignore[assignment]
+                    session.commit()
+                    logger.info(
+                        "Cleaned up %d stuck feature(s) for %s",
+                        len(stuck), self.project_name,
+                    )
+            finally:
+                session.close()
+                engine.dispose()
+        except Exception as e:
+            logger.warning("Failed to cleanup features for %s: %s", self.project_name, e)
+
    async def _broadcast_output(self, line: str) -> None:
        """Broadcast output line to all registered callbacks."""
        with self._callbacks_lock:
@@ -267,6 +330,12 @@ class AgentProcessManager:
                    for help_line in AUTH_ERROR_HELP.strip().split('\n'):
                        await self._broadcast_output(help_line)

+                # Detect graceful pause status transitions from orchestrator output
+                if "All agents drained - paused." in decoded:
+                    self.status = "paused_graceful"
+                elif "Resuming from graceful pause..." in decoded:
+                    self.status = "running"
+
                await self._broadcast_output(sanitized)

        except asyncio.CancelledError:
@@ -277,7 +346,7 @@ class AgentProcessManager:
            # Check if process ended
            if self.process and self.process.poll() is not None:
                exit_code = self.process.returncode
-                if exit_code != 0 and self.status == "running":
+                if exit_code != 0 and self.status in ("running", "pausing", "paused_graceful"):
                    # Check buffered output for auth errors if we haven't detected one yet
                    if not auth_error_detected:
                        combined_output = '\n'.join(output_buffer)
@@ -285,9 +354,16 @@ class AgentProcessManager:
                            for help_line in AUTH_ERROR_HELP.strip().split('\n'):
                                await self._broadcast_output(help_line)
                    self.status = "crashed"
-                elif self.status == "running":
+                elif self.status in ("running", "pausing", "paused_graceful"):
                    self.status = "stopped"
+                self._cleanup_stale_features()
                self._remove_lock()
+                # Clean up drain signal file if present
+                try:
+                    from autoforge_paths import get_pause_drain_path
+                    get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
+                except Exception:
+                    pass

    async def start(
        self,
@@ -296,26 +372,41 @@ class AgentProcessManager:
        parallel_mode: bool = False,
        max_concurrency: int | None = None,
        testing_agent_ratio: int = 1,
+        playwright_headless: bool = True,
+        batch_size: int = 3,
    ) -> tuple[bool, str]:
        """
        Start the agent as a subprocess.

        Args:
            yolo_mode: If True, run in YOLO mode (skip testing agents)
-            model: Model to use (e.g., claude-opus-4-5-20251101)
+            model: Model to use (e.g., claude-opus-4-6)
            parallel_mode: DEPRECATED - ignored, always uses unified orchestrator
            max_concurrency: Max concurrent coding agents (1-5, default 1)
            testing_agent_ratio: Number of regression testing agents (0-3, default 1)
+            playwright_headless: If True, run browser in headless mode

        Returns:
            Tuple of (success, message)
        """
-        if self.status in ("running", "paused"):
+        if self.status in ("running", "paused", "pausing", "paused_graceful"):
            return False, f"Agent is already {self.status}"

        if not self._check_lock():
            return False, "Another agent instance is already running for this project"

+        # Clean up stale browser daemons from previous runs
+        try:
+            subprocess.run(
+                ["playwright-cli", "kill-all"],
+                timeout=5, capture_output=True,
+            )
+        except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+            pass
+
+        # Clean up features stuck from a previous crash/stop
+        self._cleanup_stale_features()
+
        # Store for status queries
        self.yolo_mode = yolo_mode
        self.model = model
@@ -346,17 +437,41 @@ class AgentProcessManager:
        # Add testing agent configuration
        cmd.extend(["--testing-ratio", str(testing_agent_ratio)])

+        # Add --batch-size flag for multi-feature batching
+        cmd.extend(["--batch-size", str(batch_size)])
+
+        # Apply headless setting to .playwright/cli.config.json so playwright-cli
+        # picks it up (the only mechanism it supports for headless control)
+        self._apply_playwright_headless(playwright_headless)
+
        try:
            # Start subprocess with piped stdout/stderr
            # Use project_dir as cwd so Claude SDK sandbox allows access to project files
-            # IMPORTANT: Set PYTHONUNBUFFERED to ensure output isn't delayed
-            self.process = subprocess.Popen(
-                cmd,
-                stdout=subprocess.PIPE,
-                stderr=subprocess.STDOUT,
-                cwd=str(self.project_dir),
-                env={**os.environ, "PYTHONUNBUFFERED": "1"},
-            )
+            # stdin=DEVNULL prevents blocking if Claude CLI or child process tries to read stdin
+            # CREATE_NO_WINDOW on Windows prevents console window pop-ups
+            # PYTHONUNBUFFERED ensures output isn't delayed
+            # Build subprocess environment with API provider settings
+            from registry import get_effective_sdk_env
+            api_env = get_effective_sdk_env()
+            subprocess_env = {
+                **os.environ,
+                "PYTHONUNBUFFERED": "1",
+                "PLAYWRIGHT_CLI_SESSION": f"agent-{self.project_name}-{os.getpid()}",
+                "NODE_COMPILE_CACHE": "",  # Disable V8 compile caching to prevent .node file accumulation in %TEMP%
+                **api_env,
+            }
+
+            popen_kwargs: dict[str, Any] = {
+                "stdin": subprocess.DEVNULL,
+                "stdout": subprocess.PIPE,
+                "stderr": subprocess.STDOUT,
+                "cwd": str(self.project_dir),
+                "env": subprocess_env,
+            }
+            if sys.platform == "win32":
+                popen_kwargs["creationflags"] = subprocess.CREATE_NO_WINDOW
+
+            self.process = subprocess.Popen(cmd, **popen_kwargs)

            # Atomic lock creation - if it fails, another process beat us
            if not self._create_lock():
@@ -401,6 +516,15 @@ class AgentProcessManager:
                except asyncio.CancelledError:
                    pass

+            # Kill browser daemons before stopping agent
+            try:
+                subprocess.run(
+                    ["playwright-cli", "kill-all"],
+                    timeout=5, capture_output=True,
+                )
+            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+                pass
+
            # CRITICAL: Kill entire process tree, not just orchestrator
            # This ensures all spawned coding/testing agents are also terminated
            proc = self.process  # Capture reference before async call
@@ -412,7 +536,14 @@ class AgentProcessManager:
                result.children_terminated, result.children_killed
            )

+            self._cleanup_stale_features()
            self._remove_lock()
+            # Clean up drain signal file if present
+            try:
+                from autoforge_paths import get_pause_drain_path
+                get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
+            except Exception:
+                pass
            self.status = "stopped"
            self.process = None
            self.started_at = None
@@ -473,6 +604,47 @@ class AgentProcessManager:
            logger.exception("Failed to resume agent")
            return False, f"Failed to resume agent: {e}"

+    async def graceful_pause(self) -> tuple[bool, str]:
+        """Request a graceful pause (drain mode).
+
+        Creates a signal file that the orchestrator polls. Running agents
+        finish their current work before the orchestrator enters a paused state.
+
+        Returns:
+            Tuple of (success, message)
+        """
+        if not self.process or self.status not in ("running",):
+            return False, "Agent is not running"
+
+        try:
+            from autoforge_paths import get_pause_drain_path
+            drain_path = get_pause_drain_path(self.project_dir)
+            drain_path.parent.mkdir(parents=True, exist_ok=True)
+            drain_path.write_text(str(self.process.pid))
+            self.status = "pausing"
+            return True, "Graceful pause requested"
+        except Exception as e:
+            logger.exception("Failed to request graceful pause")
+            return False, f"Failed to request graceful pause: {e}"
+
+    async def graceful_resume(self) -> tuple[bool, str]:
+        """Resume from a graceful pause by removing the drain signal file.
+
+        Returns:
+            Tuple of (success, message)
+        """
+        if not self.process or self.status not in ("pausing", "paused_graceful"):
+            return False, "Agent is not in a graceful pause state"
+
+        try:
+            from autoforge_paths import get_pause_drain_path
+            get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
+            self.status = "running"
+            return True, "Agent resumed from graceful pause"
+        except Exception as e:
+            logger.exception("Failed to resume from graceful pause")
+            return False, f"Failed to resume: {e}"
+
    async def healthcheck(self) -> bool:
        """
        Check if the agent process is still alive.
@@ -488,7 +660,14 @@ class AgentProcessManager:
        poll = self.process.poll()
        if poll is not None:
            # Process has terminated
-            if self.status in ("running", "paused"):
+            if self.status in ("running", "paused", "pausing", "paused_graceful"):
+                self._cleanup_stale_features()
+                # Clean up drain signal file if present
+                try:
+                    from autoforge_paths import get_pause_drain_path
+                    get_pause_drain_path(self.project_dir).unlink(missing_ok=True)
+                except Exception:
+                    pass
                self.status = "crashed"
                self._remove_lock()
            return False
@@ -510,7 +689,9 @@ class AgentProcessManager:


 # Global registry of process managers per project with thread safety
-_managers: dict[str, AgentProcessManager] = {}
+# Key is (project_name, resolved_project_dir) to prevent cross-project contamination
+# when different projects share the same name but have different paths
+_managers: dict[tuple[str, str], AgentProcessManager] = {}
 _managers_lock = threading.Lock()


@@ -523,9 +704,11 @@ def get_manager(project_name: str, project_dir: Path, root_dir: Path) -> AgentPr
        root_dir: Root directory of the autonomous-coding-ui project
    """
    with _managers_lock:
-        if project_name not in _managers:
-            _managers[project_name] = AgentProcessManager(project_name, project_dir, root_dir)
-        return _managers[project_name]
+        # Use composite key to prevent cross-project UI contamination (#71)
+        key = (project_name, str(project_dir.resolve()))
+        if key not in _managers:
+            _managers[key] = AgentProcessManager(project_name, project_dir, root_dir)
+        return _managers[key]


 async def cleanup_all_managers() -> None:
@@ -569,8 +752,24 @@ def cleanup_orphaned_locks() -> int:
            if not project_path.exists():
                continue

-            lock_file = project_path / ".agent.lock"
-            if not lock_file.exists():
+            # Clean up stale drain signal files
+            from autoforge_paths import get_autoforge_dir, get_pause_drain_path
+            drain_file = get_pause_drain_path(project_path)
+            if drain_file.exists():
+                drain_file.unlink(missing_ok=True)
+                logger.info("Removed stale drain signal file for project '%s'", name)
+
+            # Check both legacy and new locations for lock files
+            lock_locations = [
+                project_path / ".agent.lock",
+                get_autoforge_dir(project_path) / ".agent.lock",
+            ]
+            lock_file = None
+            for candidate in lock_locations:
+                if candidate.exists():
+                    lock_file = candidate
+                    break
+            if lock_file is None:
                continue

            try:
--- a/server/services/project_config.py
+++ b/server/services/project_config.py
@@ -6,7 +6,7 @@ Handles project type detection and dev command configuration.
 Detects project types by scanning for configuration files and provides
 default or custom dev commands for each project.

-Configuration is stored in {project_dir}/.autocoder/config.json.
+Configuration is stored in {project_dir}/.autoforge/config.json.
 """

 import json
@@ -88,13 +88,22 @@ def _get_config_path(project_dir: Path) -> Path:
    """
    Get the path to the project config file.

+    Checks the new .autoforge/ location first, falls back to .autocoder/
+    for backward compatibility.
+
    Args:
        project_dir: Path to the project directory.

    Returns:
-        Path to the .autocoder/config.json file.
+        Path to the config.json file in the appropriate directory.
    """
-    return project_dir / ".autocoder" / "config.json"
+    new_path = project_dir / ".autoforge" / "config.json"
+    if new_path.exists():
+        return new_path
+    old_path = project_dir / ".autocoder" / "config.json"
+    if old_path.exists():
+        return old_path
+    return new_path


 def _load_config(project_dir: Path) -> dict:
@@ -137,7 +146,7 @@ def _save_config(project_dir: Path, config: dict) -> None:
    """
    Save the project configuration to disk.

-    Creates the .autocoder directory if it doesn't exist.
+    Creates the .autoforge directory if it doesn't exist.

    Args:
        project_dir: Path to the project directory.
@@ -148,7 +157,7 @@ def _save_config(project_dir: Path, config: dict) -> None:
    """
    config_path = _get_config_path(project_dir)

-    # Ensure the .autocoder directory exists
+    # Ensure the .autoforge directory exists
    config_path.parent.mkdir(parents=True, exist_ok=True)

    try:
@@ -408,11 +417,11 @@ def clear_dev_command(project_dir: Path) -> None:
            config_path.unlink(missing_ok=True)
            logger.info("Removed empty config file for %s", project_dir.name)

-            # Also remove .autocoder directory if empty
-            autocoder_dir = config_path.parent
-            if autocoder_dir.exists() and not any(autocoder_dir.iterdir()):
-                autocoder_dir.rmdir()
-                logger.debug("Removed empty .autocoder directory for %s", project_dir.name)
+            # Also remove .autoforge directory if empty
+            autoforge_dir = config_path.parent
+            if autoforge_dir.exists() and not any(autoforge_dir.iterdir()):
+                autoforge_dir.rmdir()
+                logger.debug("Removed empty .autoforge directory for %s", project_dir.name)
        except OSError as e:
            logger.warning("Failed to clean up config for %s: %s", project_dir.name, e)
    else:
--- a/server/services/scheduler_service.py
+++ b/server/services/scheduler_service.py
@@ -92,8 +92,9 @@ class SchedulerService:
    async def _load_project_schedules(self, project_name: str, project_dir: Path) -> int:
        """Load schedules for a single project. Returns count of schedules loaded."""
        from api.database import Schedule, create_database
+        from autoforge_paths import get_features_db_path

-        db_path = project_dir / "features.db"
+        db_path = get_features_db_path(project_dir)
        if not db_path.exists():
            return 0

@@ -567,8 +568,9 @@ class SchedulerService:
    ):
        """Check if a project should be started on server startup."""
        from api.database import Schedule, ScheduleOverride, create_database
+        from autoforge_paths import get_features_db_path

-        db_path = project_dir / "features.db"
+        db_path = get_features_db_path(project_dir)
        if not db_path.exists():
            return

--- a/server/services/spec_chat_session.py
+++ b/server/services/spec_chat_session.py
@@ -13,49 +13,19 @@ import shutil
 import threading
 from datetime import datetime
 from pathlib import Path
-from typing import AsyncGenerator, Optional
+from typing import Any, AsyncGenerator, Optional

 from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
 from dotenv import load_dotenv

 from ..schemas import ImageAttachment
+from .chat_constants import ROOT_DIR, make_multimodal_message

 # Load environment variables from .env file if present
 load_dotenv()

 logger = logging.getLogger(__name__)

-# Environment variables to pass through to Claude CLI for API configuration
-API_ENV_VARS = [
-    "ANTHROPIC_BASE_URL",
-    "ANTHROPIC_AUTH_TOKEN",
-    "API_TIMEOUT_MS",
-    "ANTHROPIC_DEFAULT_SONNET_MODEL",
-    "ANTHROPIC_DEFAULT_OPUS_MODEL",
-    "ANTHROPIC_DEFAULT_HAIKU_MODEL",
-]
-
-
-async def _make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:
-    """
-    Create an async generator that yields a properly formatted multimodal message.
-
-    The Claude Agent SDK's query() method accepts either:
-    - A string (simple text)
-    - An AsyncIterable[dict] (for custom message formats)
-
-    This function wraps content blocks in the expected message format.
-    """
-    yield {
-        "type": "user",
-        "message": {"role": "user", "content": content_blocks},
-        "parent_tool_use_id": None,
-        "session_id": "default",
-    }
-
-# Root directory of the project
-ROOT_DIR = Path(__file__).parent.parent.parent
-

 class SpecChatSession:
    """
@@ -125,7 +95,8 @@ class SpecChatSession:
        # Delete app_spec.txt so Claude can create it fresh
        # The SDK requires reading existing files before writing, but app_spec.txt is created new
        # Note: We keep initializer_prompt.md so Claude can read and update the template
-        prompts_dir = self.project_dir / "prompts"
+        from autoforge_paths import get_prompts_dir
+        prompts_dir = get_prompts_dir(self.project_dir)
        app_spec_path = prompts_dir / "app_spec.txt"
        if app_spec_path.exists():
            app_spec_path.unlink()
@@ -145,7 +116,9 @@ class SpecChatSession:
                ],
            },
        }
-        settings_file = self.project_dir / ".claude_settings.json"
+        from autoforge_paths import get_claude_settings_path
+        settings_file = get_claude_settings_path(self.project_dir)
+        settings_file.parent.mkdir(parents=True, exist_ok=True)
        with open(settings_file, "w") as f:
            json.dump(security_settings, f, indent=2)

@@ -167,11 +140,11 @@ class SpecChatSession:
        system_cli = shutil.which("claude")

        # Build environment overrides for API configuration
-        sdk_env = {var: os.getenv(var) for var in API_ENV_VARS if os.getenv(var)}
+        from registry import DEFAULT_MODEL, get_effective_sdk_env
+        sdk_env = get_effective_sdk_env()

-        # Determine model from environment or use default
-        # This allows using alternative APIs (e.g., GLM via z.ai) that may not support Claude model names
-        model = os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", "claude-opus-4-5-20251101")
+        # Determine model from SDK env (provider-aware) or fallback to env/default
+        model = sdk_env.get("ANTHROPIC_DEFAULT_OPUS_MODEL") or os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", DEFAULT_MODEL)

        try:
            self.client = ClaudeSDKClient(
@@ -289,7 +262,7 @@ class SpecChatSession:
        # Build the message content
        if attachments and len(attachments) > 0:
            # Multimodal message: build content blocks array
-            content_blocks = []
+            content_blocks: list[dict[str, Any]] = []

            # Add text block if there's text
            if message:
@@ -308,7 +281,7 @@ class SpecChatSession:

            # Send multimodal content to Claude using async generator format
            # The SDK's query() accepts AsyncIterable[dict] for custom message formats
-            await self.client.query(_make_multimodal_message(content_blocks))
+            await self.client.query(make_multimodal_message(content_blocks))
            logger.info(f"Sent multimodal message with {len(attachments)} image(s)")
        else:
            # Text-only message: use string format
@@ -317,7 +290,7 @@ class SpecChatSession:
        current_text = ""

        # Track pending writes for BOTH required files
-        pending_writes = {
+        pending_writes: dict[str, dict[str, Any] | None] = {
            "app_spec": None,      # {"tool_id": ..., "path": ...}
            "initializer": None,   # {"tool_id": ..., "path": ...}
        }
@@ -392,7 +365,8 @@ class SpecChatSession:
                            logger.warning(f"Tool error: {content}")
                            # Clear any pending writes that failed
                            for key in pending_writes:
-                                if pending_writes[key] and tool_use_id == pending_writes[key].get("tool_id"):
+                                pending_write = pending_writes[key]
+                                if pending_write is not None and tool_use_id == pending_write.get("tool_id"):
                                    logger.error(f"{key} write failed: {content}")
                                    pending_writes[key] = None
                        else:
--- a/server/services/terminal_manager.py
+++ b/server/services/terminal_manager.py
@@ -371,7 +371,7 @@ class TerminalSession:
            # Reap zombie if not already reaped
            if self._child_pid is not None:
                try:
-                    os.waitpid(self._child_pid, os.WNOHANG)
+                    os.waitpid(self._child_pid, os.WNOHANG)  # type: ignore[attr-defined]  # Unix-only method, guarded by runtime platform selection
                except ChildProcessError:
                    pass
                except Exception:
@@ -736,7 +736,7 @@ async def cleanup_all_terminals() -> None:
    Called on server shutdown to ensure all PTY processes are terminated.
    """
    with _sessions_lock:
-        all_sessions = []
+        all_sessions: list[TerminalSession] = []
        for project_sessions in _sessions.values():
            all_sessions.extend(project_sessions.values())

--- a/server/utils/project_helpers.py
+++ b/server/utils/project_helpers.py
@@ -0,0 +1,32 @@
+"""
+Project Helper Utilities
+========================
+
+Shared project path lookup used across all server routers and websocket handlers.
+Consolidates the previously duplicated _get_project_path() function.
+"""
+
+import sys
+from pathlib import Path
+
+# Ensure the project root is on sys.path so `registry` can be imported.
+# This is necessary because `registry.py` lives at the repository root,
+# outside the `server` package.
+_root = Path(__file__).parent.parent.parent
+if str(_root) not in sys.path:
+    sys.path.insert(0, str(_root))
+
+from registry import get_project_path as _registry_get_project_path
+
+
+def get_project_path(project_name: str) -> Path | None:
+    """Look up a project's filesystem path from the global registry.
+
+    Args:
+        project_name: The registered name of the project.
+
+    Returns:
+        The resolved ``Path`` to the project directory, or ``None`` if the
+        project is not found in the registry.
+    """
+    return _registry_get_project_path(project_name)
--- a/server/utils/validation.py
+++ b/server/utils/validation.py
@@ -1,26 +1,52 @@
 """
-Shared validation utilities for the server.
+Shared Validation Utilities
+============================
+
+Project name validation used across REST endpoints and WebSocket handlers.
+Two variants are provided:
+
+* ``is_valid_project_name`` -- returns ``bool``, suitable for WebSocket
+  handlers where raising an HTTPException is not appropriate.
+* ``validate_project_name`` -- raises ``HTTPException(400)`` on failure,
+  suitable for REST endpoint handlers.
 """

 import re

 from fastapi import HTTPException

+# Compiled once; reused by both variants.
+_PROJECT_NAME_RE = re.compile(r'^[a-zA-Z0-9_-]{1,50}$')
+
+
+def is_valid_project_name(name: str) -> bool:
+    """Check whether *name* is a valid project name.
+
+    Allows only ASCII letters, digits, hyphens, and underscores (1-50 chars).
+    Returns ``True`` if valid, ``False`` otherwise.
+
+    Use this in WebSocket handlers where you need to close the socket
+    yourself rather than raise an HTTP error.
+    """
+    return bool(_PROJECT_NAME_RE.match(name))
+

 def validate_project_name(name: str) -> str:
-    """
-    Validate and sanitize project name to prevent path traversal.
+    """Validate and return *name*, or raise ``HTTPException(400)``.
+
+    Suitable for REST endpoint handlers where FastAPI will convert the
+    exception into an HTTP 400 response automatically.

    Args:
-        name: Project name to validate
+        name: Project name to validate.

    Returns:
-        The validated project name
+        The validated project name (unchanged).

    Raises:
-        HTTPException: If name is invalid
+        HTTPException: If *name* is invalid.
    """
-    if not re.match(r'^[a-zA-Z0-9_-]{1,50}$', name):
+    if not _PROJECT_NAME_RE.match(name):
        raise HTTPException(
            status_code=400,
            detail="Invalid project name. Use only letters, numbers, hyphens, and underscores (1-50 chars)."
--- a/server/websocket.py
+++ b/server/websocket.py
@@ -16,8 +16,11 @@ from typing import Set
 from fastapi import WebSocket, WebSocketDisconnect

 from .schemas import AGENT_MASCOTS
+from .services.chat_constants import ROOT_DIR
 from .services.dev_server_manager import get_devserver_manager
 from .services.process_manager import get_manager
+from .utils.project_helpers import get_project_path as _get_project_path
+from .utils.validation import is_valid_project_name as validate_project_name

 # Lazy imports
 _count_passing_tests = None
@@ -36,6 +39,14 @@ TESTING_AGENT_START_PATTERN = re.compile(r'Started testing agent for feature #(\
 # Matches: "Feature #123 testing completed" or "Feature #123 testing failed"
 TESTING_AGENT_COMPLETE_PATTERN = re.compile(r'Feature #(\d+) testing (completed|failed)')

+# Pattern to detect batch coding agent start message
+# Matches: "Started coding agent for features #5, #8, #12"
+BATCH_CODING_AGENT_START_PATTERN = re.compile(r'Started coding agent for features (#\d+(?:,\s*#\d+)*)')
+
+# Pattern to detect batch completion
+# Matches: "Features #5, #8, #12 completed" or "Features #5, #8, #12 failed"
+BATCH_FEATURES_COMPLETE_PATTERN = re.compile(r'Features (#\d+(?:,\s*#\d+)*)\s+(completed|failed)')
+
 # Patterns for detecting agent activity and thoughts
 THOUGHT_PATTERNS = [
    # Claude's tool usage patterns (actual format: [Tool: name])
@@ -50,7 +61,7 @@ THOUGHT_PATTERNS = [
    (re.compile(r'(?:Testing|Verifying|Running tests|Validating)\s+(.+)', re.I), 'testing'),
    (re.compile(r'(?:Error|Failed|Cannot|Unable to|Exception)\s+(.+)', re.I), 'struggling'),
    # Test results
-    (re.compile(r'(?:PASS|passed|success)', re.I), 'success'),
+    (re.compile(r'(?:PASS|passed|success)', re.I), 'testing'),
    (re.compile(r'(?:FAIL|failed|error)', re.I), 'struggling'),
 ]

@@ -61,12 +72,15 @@ ORCHESTRATOR_PATTERNS = {
    'capacity_check': re.compile(r'\[DEBUG\] Spawning loop: (\d+) ready, (\d+) slots'),
    'at_capacity': re.compile(r'At max capacity|at max testing agents|At max total agents'),
    'feature_start': re.compile(r'Starting feature \d+/\d+: #(\d+) - (.+)'),
-    'coding_spawn': re.compile(r'Started coding agent for feature #(\d+)'),
+    'coding_spawn': re.compile(r'Started coding agent for features? #(\d+)'),
    'testing_spawn': re.compile(r'Started testing agent for feature #(\d+)'),
-    'coding_complete': re.compile(r'Feature #(\d+) (completed|failed)'),
+    'coding_complete': re.compile(r'Features? #(\d+)(?:,\s*#\d+)* (completed|failed)'),
    'testing_complete': re.compile(r'Feature #(\d+) testing (completed|failed)'),
    'all_complete': re.compile(r'All features complete'),
    'blocked_features': re.compile(r'(\d+) blocked by dependencies'),
+    'drain_start': re.compile(r'Graceful pause requested'),
+    'drain_complete': re.compile(r'All agents drained'),
+    'drain_resume': re.compile(r'Resuming from graceful pause'),
 }


@@ -93,14 +107,26 @@ class AgentTracker:
        # Check for orchestrator status messages first
        # These don't have [Feature #X] prefix

-        # Coding agent start: "Started coding agent for feature #X"
-        if line.startswith("Started coding agent for feature #"):
+        # Batch coding agent start: "Started coding agent for features #5, #8, #12"
+        batch_start_match = BATCH_CODING_AGENT_START_PATTERN.match(line)
+        if batch_start_match:
            try:
-                feature_id = int(re.search(r'#(\d+)', line).group(1))
-                return await self._handle_agent_start(feature_id, line, agent_type="coding")
-            except (AttributeError, ValueError):
+                feature_ids = [int(x.strip().lstrip('#')) for x in batch_start_match.group(1).split(',')]
+                if feature_ids:
+                    return await self._handle_batch_agent_start(feature_ids, "coding")
+            except ValueError:
                pass

+        # Single coding agent start: "Started coding agent for feature #X"
+        if line.startswith("Started coding agent for feature #"):
+            m = re.search(r'#(\d+)', line)
+            if m:
+                try:
+                    feature_id = int(m.group(1))
+                    return await self._handle_agent_start(feature_id, line, agent_type="coding")
+                except ValueError:
+                    pass
+
        # Testing agent start: "Started testing agent for feature #X (PID xxx)"
        testing_start_match = TESTING_AGENT_START_PATTERN.match(line)
        if testing_start_match:
@@ -114,14 +140,27 @@ class AgentTracker:
            is_success = testing_complete_match.group(2) == "completed"
            return await self._handle_agent_complete(feature_id, is_success, agent_type="testing")

+        # Batch features complete: "Features #5, #8, #12 completed/failed"
+        batch_complete_match = BATCH_FEATURES_COMPLETE_PATTERN.match(line)
+        if batch_complete_match:
+            try:
+                feature_ids = [int(x.strip().lstrip('#')) for x in batch_complete_match.group(1).split(',')]
+                is_success = batch_complete_match.group(2) == "completed"
+                if feature_ids:
+                    return await self._handle_batch_agent_complete(feature_ids, is_success, "coding")
+            except ValueError:
+                pass
+
        # Coding agent complete: "Feature #X completed/failed" (without "testing" keyword)
        if line.startswith("Feature #") and ("completed" in line or "failed" in line) and "testing" not in line:
-            try:
-                feature_id = int(re.search(r'#(\d+)', line).group(1))
-                is_success = "completed" in line
-                return await self._handle_agent_complete(feature_id, is_success, agent_type="coding")
-            except (AttributeError, ValueError):
-                pass
+            m = re.search(r'#(\d+)', line)
+            if m:
+                try:
+                    feature_id = int(m.group(1))
+                    is_success = "completed" in line
+                    return await self._handle_agent_complete(feature_id, is_success, agent_type="coding")
+                except ValueError:
+                    pass

        # Check for feature-specific output lines: [Feature #X] content
        # Both coding and testing agents use this format now
@@ -151,6 +190,7 @@ class AgentTracker:
                    'name': AGENT_MASCOTS[agent_index % len(AGENT_MASCOTS)],
                    'agent_index': agent_index,
                    'agent_type': 'coding',
+                    'feature_ids': [feature_id],
                    'state': 'thinking',
                    'feature_name': f'Feature #{feature_id}',
                    'last_thought': None,
@@ -158,6 +198,10 @@ class AgentTracker:

            agent = self.active_agents[key]

+            # Update current_feature_id for batch agents when output comes from a different feature
+            if 'current_feature_id' in agent and feature_id in agent.get('feature_ids', []):
+                agent['current_feature_id'] = feature_id
+
            # Detect state and thought from content
            state = 'working'
            thought = None
@@ -181,6 +225,7 @@ class AgentTracker:
                    'agentName': agent['name'],
                    'agentType': agent['agent_type'],
                    'featureId': feature_id,
+                    'featureIds': agent.get('feature_ids', [feature_id]),
                    'featureName': agent['feature_name'],
                    'state': state,
                    'thought': thought,
@@ -237,6 +282,7 @@ class AgentTracker:
                'name': AGENT_MASCOTS[agent_index % len(AGENT_MASCOTS)],
                'agent_index': agent_index,
                'agent_type': agent_type,
+                'feature_ids': [feature_id],
                'state': 'thinking',
                'feature_name': feature_name,
                'last_thought': 'Starting work...',
@@ -248,12 +294,55 @@ class AgentTracker:
                'agentName': AGENT_MASCOTS[agent_index % len(AGENT_MASCOTS)],
                'agentType': agent_type,
                'featureId': feature_id,
+                'featureIds': [feature_id],
                'featureName': feature_name,
                'state': 'thinking',
                'thought': 'Starting work...',
                'timestamp': datetime.now().isoformat(),
            }

+    async def _handle_batch_agent_start(self, feature_ids: list[int], agent_type: str = "coding") -> dict | None:
+        """Handle batch agent start message from orchestrator."""
+        if not feature_ids:
+            return None
+        primary_id = feature_ids[0]
+        async with self._lock:
+            key = (primary_id, agent_type)
+            agent_index = self._next_agent_index
+            self._next_agent_index += 1
+
+            feature_name = f'Features {", ".join(f"#{fid}" for fid in feature_ids)}'
+
+            self.active_agents[key] = {
+                'name': AGENT_MASCOTS[agent_index % len(AGENT_MASCOTS)],
+                'agent_index': agent_index,
+                'agent_type': agent_type,
+                'feature_ids': list(feature_ids),
+                'current_feature_id': primary_id,
+                'state': 'thinking',
+                'feature_name': feature_name,
+                'last_thought': 'Starting batch work...',
+            }
+
+            # Register all feature IDs so output lines can find this agent
+            for fid in feature_ids:
+                secondary_key = (fid, agent_type)
+                if secondary_key != key:
+                    self.active_agents[secondary_key] = self.active_agents[key]
+
+            return {
+                'type': 'agent_update',
+                'agentIndex': agent_index,
+                'agentName': AGENT_MASCOTS[agent_index % len(AGENT_MASCOTS)],
+                'agentType': agent_type,
+                'featureId': primary_id,
+                'featureIds': list(feature_ids),
+                'featureName': feature_name,
+                'state': 'thinking',
+                'thought': 'Starting batch work...',
+                'timestamp': datetime.now().isoformat(),
+            }
+
    async def _handle_agent_complete(self, feature_id: int, is_success: bool, agent_type: str = "coding") -> dict | None:
        """Handle agent completion - ALWAYS emits a message, even if agent wasn't tracked.

@@ -275,6 +364,7 @@ class AgentTracker:
                    'agentName': agent['name'],
                    'agentType': agent.get('agent_type', agent_type),
                    'featureId': feature_id,
+                    'featureIds': agent.get('feature_ids', [feature_id]),
                    'featureName': agent['feature_name'],
                    'state': state,
                    'thought': 'Completed successfully!' if is_success else 'Failed to complete',
@@ -291,6 +381,7 @@ class AgentTracker:
                    'agentName': 'Unknown',
                    'agentType': agent_type,
                    'featureId': feature_id,
+                    'featureIds': [feature_id],
                    'featureName': f'Feature #{feature_id}',
                    'state': state,
                    'thought': 'Completed successfully!' if is_success else 'Failed to complete',
@@ -298,6 +389,49 @@ class AgentTracker:
                    'synthetic': True,
                }

+    async def _handle_batch_agent_complete(self, feature_ids: list[int], is_success: bool, agent_type: str = "coding") -> dict | None:
+        """Handle batch agent completion."""
+        if not feature_ids:
+            return None
+        primary_id = feature_ids[0]
+        async with self._lock:
+            state = 'success' if is_success else 'error'
+            key = (primary_id, agent_type)
+
+            if key in self.active_agents:
+                agent = self.active_agents[key]
+                result = {
+                    'type': 'agent_update',
+                    'agentIndex': agent['agent_index'],
+                    'agentName': agent['name'],
+                    'agentType': agent.get('agent_type', agent_type),
+                    'featureId': primary_id,
+                    'featureIds': agent.get('feature_ids', list(feature_ids)),
+                    'featureName': agent['feature_name'],
+                    'state': state,
+                    'thought': 'Batch completed successfully!' if is_success else 'Batch failed to complete',
+                    'timestamp': datetime.now().isoformat(),
+                }
+                # Clean up all keys for this batch
+                for fid in feature_ids:
+                    self.active_agents.pop((fid, agent_type), None)
+                return result
+            else:
+                # Synthetic completion
+                return {
+                    'type': 'agent_update',
+                    'agentIndex': -1,
+                    'agentName': 'Unknown',
+                    'agentType': agent_type,
+                    'featureId': primary_id,
+                    'featureIds': list(feature_ids),
+                    'featureName': f'Features {", ".join(f"#{fid}" for fid in feature_ids)}',
+                    'state': state,
+                    'thought': 'Batch completed successfully!' if is_success else 'Batch failed to complete',
+                    'timestamp': datetime.now().isoformat(),
+                    'synthetic': True,
+                }
+

 class OrchestratorTracker:
    """Tracks orchestrator state for Mission Control observability.
@@ -431,6 +565,30 @@ class OrchestratorTracker:
                    'All features complete!'
                )

+            # Graceful pause (drain mode) events
+            elif ORCHESTRATOR_PATTERNS['drain_start'].search(line):
+                self.state = 'draining'
+                update = self._create_update(
+                    'drain_start',
+                    'Draining active agents...'
+                )
+
+            elif ORCHESTRATOR_PATTERNS['drain_complete'].search(line):
+                self.state = 'paused'
+                self.coding_agents = 0
+                self.testing_agents = 0
+                update = self._create_update(
+                    'drain_complete',
+                    'All agents drained. Paused.'
+                )
+
+            elif ORCHESTRATOR_PATTERNS['drain_resume'].search(line):
+                self.state = 'scheduling'
+                update = self._create_update(
+                    'drain_resume',
+                    'Resuming feature scheduling'
+                )
+
            return update

    def _create_update(
@@ -444,7 +602,7 @@ class OrchestratorTracker:
        timestamp = datetime.now().isoformat()

        # Add to recent events (keep last 5)
-        event = {
+        event: dict[str, str | int] = {
            'eventType': event_type,
            'message': message,
            'timestamp': timestamp,
@@ -487,17 +645,6 @@ class OrchestratorTracker:
            self.recent_events.clear()


-def _get_project_path(project_name: str) -> Path:
-    """Get project path from registry."""
-    import sys
-    root = Path(__file__).parent.parent
-    if str(root) not in sys.path:
-        sys.path.insert(0, str(root))
-
-    from registry import get_project_path
-    return get_project_path(project_name)
-
-
 def _get_count_passing_tests():
    """Lazy import of count_passing_tests."""
    global _count_passing_tests
@@ -520,9 +667,7 @@ class ConnectionManager:
        self._lock = asyncio.Lock()

    async def connect(self, websocket: WebSocket, project_name: str):
-        """Accept a WebSocket connection for a project."""
-        await websocket.accept()
-
+        """Register a WebSocket connection for a project (must already be accepted)."""
        async with self._lock:
            if project_name not in self.active_connections:
                self.active_connections[project_name] = set()
@@ -564,15 +709,6 @@ class ConnectionManager:
 # Global connection manager
 manager = ConnectionManager()

-# Root directory
-ROOT_DIR = Path(__file__).parent.parent
-
-
-def validate_project_name(name: str) -> bool:
-    """Validate project name to prevent path traversal."""
-    return bool(re.match(r'^[a-zA-Z0-9_-]{1,50}$', name))
-
-
 async def poll_progress(websocket: WebSocket, project_name: str, project_dir: Path):
    """Poll database for progress changes and send updates."""
    count_passing_tests = _get_count_passing_tests()
@@ -580,15 +716,19 @@ async def poll_progress(websocket: WebSocket, project_name: str, project_dir: Pa
    last_in_progress = -1
    last_total = -1

+    last_needs_human_input = -1
+
    while True:
        try:
-            passing, in_progress, total = count_passing_tests(project_dir)
+            passing, in_progress, total, needs_human_input = count_passing_tests(project_dir)

            # Only send if changed
-            if passing != last_passing or in_progress != last_in_progress or total != last_total:
+            if (passing != last_passing or in_progress != last_in_progress
+                    or total != last_total or needs_human_input != last_needs_human_input):
                last_passing = passing
                last_in_progress = in_progress
                last_total = total
+                last_needs_human_input = needs_human_input
                percentage = (passing / total * 100) if total > 0 else 0

                await websocket.send_json({
@@ -597,6 +737,7 @@ async def poll_progress(websocket: WebSocket, project_name: str, project_dir: Pa
                    "in_progress": in_progress,
                    "total": total,
                    "percentage": round(percentage, 1),
+                    "needs_human_input": needs_human_input,
                })

            await asyncio.sleep(2)  # Poll every 2 seconds
@@ -616,16 +757,22 @@ async def project_websocket(websocket: WebSocket, project_name: str):
    - Agent status changes
    - Agent stdout/stderr lines
    """
+    # Always accept WebSocket first to avoid opaque 403 errors
+    await websocket.accept()
+
    if not validate_project_name(project_name):
+        await websocket.send_json({"type": "error", "content": "Invalid project name"})
        await websocket.close(code=4000, reason="Invalid project name")
        return

    project_dir = _get_project_path(project_name)
    if not project_dir:
+        await websocket.send_json({"type": "error", "content": "Project not found in registry"})
        await websocket.close(code=4004, reason="Project not found in registry")
        return

    if not project_dir.exists():
+        await websocket.send_json({"type": "error", "content": "Project directory not found"})
        await websocket.close(code=4004, reason="Project directory not found")
        return

@@ -652,7 +799,7 @@ async def project_websocket(websocket: WebSocket, project_name: str):
                agent_index, _ = await agent_tracker.get_agent_info(feature_id)

            # Send the raw log line with optional feature/agent attribution
-            log_msg = {
+            log_msg: dict[str, str | int] = {
                "type": "log",
                "line": line,
                "timestamp": datetime.now().isoformat(),
@@ -743,7 +890,7 @@ async def project_websocket(websocket: WebSocket, project_name: str):

        # Send initial progress
        count_passing_tests = _get_count_passing_tests()
-        passing, in_progress, total = count_passing_tests(project_dir)
+        passing, in_progress, total, needs_human_input = count_passing_tests(project_dir)
        percentage = (passing / total * 100) if total > 0 else 0
        await websocket.send_json({
            "type": "progress",
@@ -751,6 +898,7 @@ async def project_websocket(websocket: WebSocket, project_name: str):
            "in_progress": in_progress,
            "total": total,
            "percentage": round(percentage, 1),
+            "needs_human_input": needs_human_input,
        })

        # Keep connection alive and handle incoming messages
@@ -768,8 +916,7 @@ async def project_websocket(websocket: WebSocket, project_name: str):
                break
            except json.JSONDecodeError:
                logger.warning(f"Invalid JSON from WebSocket: {data[:100] if data else 'empty'}")
-            except Exception as e:
-                logger.warning(f"WebSocket error: {e}")
+            except Exception:
                break

    finally:
--- a/start.bat
+++ b/start.bat
@@ -3,7 +3,7 @@ cd /d "%~dp0"

 echo.
 echo ========================================
-echo   Autonomous Coding Agent
+echo   AutoForge - Autonomous Coding Agent
 echo ========================================
 echo.

@@ -54,5 +54,15 @@ REM Install dependencies
 echo Installing dependencies...
 pip install -r requirements.txt --quiet

+REM Ensure playwright-cli is available for browser automation
+where playwright-cli >nul 2>&1
+if %ERRORLEVEL% neq 0 (
+    echo Installing playwright-cli for browser automation...
+    call npm install -g @playwright/cli >nul 2>&1
+    if %ERRORLEVEL% neq 0 (
+        echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli
+    )
+)
+
 REM Run the app
 python start.py
--- a/start.py
+++ b/start.py
@@ -82,7 +82,7 @@ def get_existing_projects() -> list[tuple[str, Path]]:
 def display_menu(projects: list[tuple[str, Path]]) -> None:
    """Display the main menu."""
    print("\n" + "=" * 50)
-    print("  Autonomous Coding Agent Launcher")
+    print("  AutoForge - Autonomous Coding Agent")
    print("=" * 50)
    print("\n[1] Create new project")

@@ -390,8 +390,11 @@ def run_agent(project_name: str, project_dir: Path) -> None:
    print(f"Location: {project_dir}")
    print("-" * 50)

-    # Build the command - pass absolute path
-    cmd = [sys.executable, "autonomous_agent_demo.py", "--project-dir", str(project_dir.resolve())]
+    # Build the command - pass absolute path and model from settings
+    from registry import DEFAULT_MODEL, get_all_settings
+    settings = get_all_settings()
+    model = settings.get("api_model") or settings.get("model", DEFAULT_MODEL)
+    cmd = [sys.executable, "autonomous_agent_demo.py", "--project-dir", str(project_dir.resolve()), "--model", model]

    # Run the agent with stderr capture to detect auth errors
    # stdout goes directly to terminal for real-time output
--- a/start.sh
+++ b/start.sh
@@ -3,7 +3,7 @@ cd "$(dirname "$0")"

 echo ""
 echo "========================================"
-echo "  Autonomous Coding Agent"
+echo "  AutoForge - Autonomous Coding Agent"
 echo "========================================"
 echo ""

@@ -74,5 +74,14 @@ fi
 echo "Installing dependencies..."
 pip install -r requirements.txt --quiet

+# Ensure playwright-cli is available for browser automation
+if ! command -v playwright-cli &> /dev/null; then
+    echo "Installing playwright-cli for browser automation..."
+    npm install -g @playwright/cli --quiet 2>/dev/null
+    if [ $? -ne 0 ]; then
+        echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli"
+    fi
+fi
+
 # Run the app
 python start.py
--- a/start_ui.bat
+++ b/start_ui.bat
@@ -1,11 +1,11 @@
@echo off
 cd /d "%~dp0"
-REM AutoCoder UI Launcher for Windows
+REM AutoForge UI Launcher for Windows
 REM This script launches the web UI for the autonomous coding agent.

 echo.
 echo ====================================
-echo   AutoCoder UI
+echo   AutoForge UI
 echo ====================================
 echo.

@@ -37,7 +37,15 @@ REM Install dependencies
 echo Installing dependencies...
 pip install -r requirements.txt --quiet

+REM Ensure playwright-cli is available for browser automation
+where playwright-cli >nul 2>&1
+if %ERRORLEVEL% neq 0 (
+    echo Installing playwright-cli for browser automation...
+    call npm install -g @playwright/cli >nul 2>&1
+    if %ERRORLEVEL% neq 0 (
+        echo Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli
+    )
+)
+
 REM Run the Python launcher
 python "%~dp0start_ui.py" %*
-
-pause
--- a/start_ui.py
+++ b/start_ui.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-AutoCoder UI Launcher
+AutoForge UI Launcher
 =====================

 Automated launcher that handles all setup:
@@ -13,12 +13,16 @@ Automated launcher that handles all setup:
 7. Opens browser to the UI

 Usage:
-    python start_ui.py [--dev]
+    python start_ui.py [--dev] [--host HOST] [--port PORT]

 Options:
-    --dev    Run in development mode with Vite hot reload
+    --dev           Run in development mode with Vite hot reload
+    --host HOST     Host to bind to (default: 127.0.0.1)
+                    Use 0.0.0.0 for remote access (security warning will be shown)
+    --port PORT     Port to bind to (default: 8888)
 """

+import argparse
 import asyncio
 import os
 import shutil
@@ -133,10 +137,25 @@ def check_node() -> bool:


 def install_npm_deps() -> bool:
-    """Install npm dependencies if node_modules doesn't exist."""
+    """Install npm dependencies if node_modules doesn't exist or is stale."""
    node_modules = UI_DIR / "node_modules"
+    package_json = UI_DIR / "package.json"
+    package_lock = UI_DIR / "package-lock.json"

-    if node_modules.exists():
+    # Check if npm install is needed
+    needs_install = False
+
+    if not node_modules.exists():
+        needs_install = True
+    elif package_json.exists():
+        # If package.json or package-lock.json is newer than node_modules, reinstall
+        node_modules_mtime = node_modules.stat().st_mtime
+        if package_json.stat().st_mtime > node_modules_mtime:
+            needs_install = True
+        elif package_lock.exists() and package_lock.stat().st_mtime > node_modules_mtime:
+            needs_install = True
+
+    if not needs_install:
        print("  npm dependencies already installed")
        return True

@@ -183,7 +202,7 @@ def build_frontend() -> bool:
        trigger_file = "dist/ directory missing"
    elif src_dir.exists():
        # Find the newest file in dist/ directory
-        newest_dist_mtime = 0
+        newest_dist_mtime: float = 0
        for dist_file in dist_dir.rglob("*"):
            try:
                if dist_file.is_file():
@@ -235,26 +254,31 @@ def build_frontend() -> bool:
    return run_command([npm_cmd, "run", "build"], cwd=UI_DIR)


-def start_dev_server(port: int) -> tuple:
+def start_dev_server(port: int, host: str = "127.0.0.1") -> tuple:
    """Start both Vite and FastAPI in development mode."""
    venv_python = get_venv_python()

    print("\n  Starting development servers...")
-    print(f"  - FastAPI backend: http://127.0.0.1:{port}")
+    print(f"  - FastAPI backend: http://{host}:{port}")
    print("  - Vite frontend:   http://127.0.0.1:5173")

+    # Set environment for remote access if needed
+    env = os.environ.copy()
+    if host != "127.0.0.1":
+        env["AUTOFORGE_ALLOW_REMOTE"] = "1"
+
    # Start FastAPI
    backend = subprocess.Popen([
        str(venv_python), "-m", "uvicorn",
        "server.main:app",
-        "--host", "127.0.0.1",
+        "--host", host,
        "--port", str(port),
        "--reload"
-    ], cwd=str(ROOT))
+    ], cwd=str(ROOT), env=env)

    # Start Vite with API port env var for proxy configuration
    npm_cmd = "npm.cmd" if sys.platform == "win32" else "npm"
-    vite_env = os.environ.copy()
+    vite_env = env.copy()
    vite_env["VITE_API_PORT"] = str(port)
    frontend = subprocess.Popen([
        npm_cmd, "run", "dev"
@@ -263,15 +287,18 @@ def start_dev_server(port: int) -> tuple:
    return backend, frontend


-def start_production_server(port: int):
-    """Start FastAPI server in production mode with hot reload."""
+def start_production_server(port: int, host: str = "127.0.0.1"):
+    """Start FastAPI server in production mode."""
    venv_python = get_venv_python()

-    print(f"\n  Starting server at http://127.0.0.1:{port} (with hot reload)")
+    print(f"\n  Starting server at http://{host}:{port}")

-    # Set PYTHONASYNCIODEBUG to help with Windows subprocess issues
    env = os.environ.copy()

+    # Enable remote access in server if not localhost
+    if host != "127.0.0.1":
+        env["AUTOFORGE_ALLOW_REMOTE"] = "1"
+
    # NOTE: --reload is NOT used because on Windows it breaks asyncio subprocess
    # support (uvicorn's reload worker doesn't inherit the ProactorEventLoop policy).
    # This affects Claude SDK which uses asyncio.create_subprocess_exec.
@@ -279,17 +306,37 @@ def start_production_server(port: int):
    return subprocess.Popen([
        str(venv_python), "-m", "uvicorn",
        "server.main:app",
-        "--host", "127.0.0.1",
+        "--host", host,
        "--port", str(port),
    ], cwd=str(ROOT), env=env)


 def main() -> None:
    """Main entry point."""
-    dev_mode = "--dev" in sys.argv
+    parser = argparse.ArgumentParser(description="AutoForge UI Launcher")
+    parser.add_argument("--dev", action="store_true", help="Run in development mode with Vite hot reload")
+    parser.add_argument("--host", default="127.0.0.1", help="Host to bind to (default: 127.0.0.1)")
+    parser.add_argument("--port", type=int, default=None, help="Port to bind to (default: auto-detect from 8888)")
+    args = parser.parse_args()
+
+    dev_mode = args.dev
+    host = args.host
+
+    # Security warning for remote access
+    if host != "127.0.0.1":
+        print("\n" + "!" * 50)
+        print("  SECURITY WARNING")
+        print("!" * 50)
+        print(f"  Remote access enabled on host: {host}")
+        print("  The AutoForge UI will be accessible from other machines.")
+        print("  Ensure you understand the security implications:")
+        print("  - The agent has file system access to project directories")
+        print("  - The API can start/stop agents and modify files")
+        print("  - Consider using a firewall or VPN for protection")
+        print("!" * 50 + "\n")

    print("=" * 50)
-    print("  AutoCoder UI Setup")
+    print("  AutoForge UI Setup")
    print("=" * 50)

    total_steps = 6 if not dev_mode else 5
@@ -335,18 +382,20 @@ def main() -> None:
    step = 5 if dev_mode else 6
    print_step(step, total_steps, "Starting server")

-    port = find_available_port()
+    port = args.port if args.port else find_available_port()

    try:
        if dev_mode:
-            backend, frontend = start_dev_server(port)
+            backend, frontend = start_dev_server(port, host)

-            # Open browser to Vite dev server
+            # Open browser to Vite dev server (always localhost for Vite)
            time.sleep(3)
            webbrowser.open("http://127.0.0.1:5173")

            print("\n" + "=" * 50)
            print("  Development mode active")
+            if host != "127.0.0.1":
+                print(f"  Backend accessible at: http://{host}:{port}")
            print("  Press Ctrl+C to stop")
            print("=" * 50)

@@ -362,14 +411,15 @@ def main() -> None:
                backend.wait()
                frontend.wait()
        else:
-            server = start_production_server(port)
+            server = start_production_server(port, host)

-            # Open browser
+            # Open browser (only if localhost)
            time.sleep(2)
-            webbrowser.open(f"http://127.0.0.1:{port}")
+            if host == "127.0.0.1":
+                webbrowser.open(f"http://127.0.0.1:{port}")

            print("\n" + "=" * 50)
-            print(f"  Server running at http://127.0.0.1:{port}")
+            print(f"  Server running at http://{host}:{port}")
            print("  Press Ctrl+C to stop")
            print("=" * 50)

--- a/start_ui.sh
+++ b/start_ui.sh
@@ -1,11 +1,11 @@
 #!/bin/bash
 cd "$(dirname "$0")"
-# AutoCoder UI Launcher for Unix/Linux/macOS
+# AutoForge UI Launcher for Unix/Linux/macOS
 # This script launches the web UI for the autonomous coding agent.

 echo ""
 echo "===================================="
-echo "  AutoCoder UI"
+echo "  AutoForge UI"
 echo "===================================="
 echo ""

@@ -80,5 +80,14 @@ fi
 echo "Installing dependencies..."
 pip install -r requirements.txt --quiet

+# Ensure playwright-cli is available for browser automation
+if ! command -v playwright-cli &> /dev/null; then
+    echo "Installing playwright-cli for browser automation..."
+    npm install -g @playwright/cli --quiet 2>/dev/null
+    if [ $? -ne 0 ]; then
+        echo "Note: Could not install playwright-cli. Install manually: npm install -g @playwright/cli"
+    fi
+fi
+
 # Run the Python launcher
 python start_ui.py "$@"
--- a/temp_cleanup.py
+++ b/temp_cleanup.py
@@ -0,0 +1,221 @@
+"""
+Temp Cleanup Module
+===================
+
+Cleans up stale temporary files and directories created by AutoForge agents,
+Playwright, Node.js, and other development tools.
+
+Called at Maestro (orchestrator) startup to prevent temp folder bloat.
+
+Why this exists:
+- Playwright creates browser profiles and artifacts in %TEMP%
+- Node.js creates .node cache files (~7MB each, can accumulate to GBs)
+- MongoDB Memory Server downloads binaries to temp
+- These are never cleaned up automatically
+
+When cleanup runs:
+- At Maestro startup (when you click Play or auto-restart after rate limits)
+- Only files/folders older than 1 hour are deleted (safe for running processes)
+"""
+
+import logging
+import shutil
+import tempfile
+import time
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Max age in seconds before a temp item is considered stale (1 hour)
+MAX_AGE_SECONDS = 3600
+
+# Directory patterns to clean up (glob patterns)
+DIR_PATTERNS = [
+    "playwright_firefoxdev_profile-*",  # Playwright Firefox profiles
+    "playwright-artifacts-*",           # Playwright test artifacts
+    "playwright-transform-cache",       # Playwright transform cache
+    "mongodb-memory-server*",           # MongoDB Memory Server binaries
+    "ng-*",                             # Angular CLI temp directories
+    "scoped_dir*",                      # Chrome/Chromium temp directories
+    "node-compile-cache",               # Node.js V8 compile cache directory
+]
+
+# File patterns to clean up (glob patterns)
+FILE_PATTERNS = [
+    ".[0-9a-f]*.node",   # Node.js/V8 compile cache files (~7MB each, varying hex prefixes)
+    "claude-*-cwd",   # Claude CLI working directory temp files
+    "mat-debug-*.log",  # Material/Angular debug logs
+]
+
+
+def cleanup_stale_temp(max_age_seconds: int = MAX_AGE_SECONDS) -> dict:
+    """
+    Clean up stale temporary files and directories.
+
+    Only deletes items older than max_age_seconds to avoid
+    interfering with currently running processes.
+
+    Args:
+        max_age_seconds: Maximum age in seconds before an item is deleted.
+                        Defaults to 1 hour (3600 seconds).
+
+    Returns:
+        Dictionary with cleanup statistics:
+        - dirs_deleted: Number of directories deleted
+        - files_deleted: Number of files deleted
+        - bytes_freed: Approximate bytes freed
+        - errors: List of error messages (for debugging, not fatal)
+    """
+    temp_dir = Path(tempfile.gettempdir())
+    cutoff_time = time.time() - max_age_seconds
+
+    stats = {
+        "dirs_deleted": 0,
+        "files_deleted": 0,
+        "bytes_freed": 0,
+        "errors": [],
+    }
+
+    # Clean up directories
+    for pattern in DIR_PATTERNS:
+        for item in temp_dir.glob(pattern):
+            if not item.is_dir():
+                continue
+            try:
+                mtime = item.stat().st_mtime
+                if mtime < cutoff_time:
+                    size = _get_dir_size(item)
+                    shutil.rmtree(item, ignore_errors=True)
+                    if not item.exists():
+                        stats["dirs_deleted"] += 1
+                        stats["bytes_freed"] += size
+                        logger.debug(f"Deleted temp directory: {item}")
+            except Exception as e:
+                stats["errors"].append(f"Failed to delete {item}: {e}")
+                logger.debug(f"Failed to delete {item}: {e}")
+
+    # Clean up files
+    for pattern in FILE_PATTERNS:
+        for item in temp_dir.glob(pattern):
+            if not item.is_file():
+                continue
+            try:
+                mtime = item.stat().st_mtime
+                if mtime < cutoff_time:
+                    size = item.stat().st_size
+                    item.unlink(missing_ok=True)
+                    if not item.exists():
+                        stats["files_deleted"] += 1
+                        stats["bytes_freed"] += size
+                        logger.debug(f"Deleted temp file: {item}")
+            except Exception as e:
+                stats["errors"].append(f"Failed to delete {item}: {e}")
+                logger.debug(f"Failed to delete {item}: {e}")
+
+    # Log summary if anything was cleaned
+    if stats["dirs_deleted"] > 0 or stats["files_deleted"] > 0:
+        mb_freed = stats["bytes_freed"] / (1024 * 1024)
+        logger.info(
+            f"Temp cleanup: {stats['dirs_deleted']} dirs, "
+            f"{stats['files_deleted']} files, {mb_freed:.1f} MB freed"
+        )
+
+    return stats
+
+
+def cleanup_project_screenshots(project_dir: Path, max_age_seconds: int = 300) -> dict:
+    """
+    Clean up stale Playwright CLI artifacts from the project.
+
+    The Playwright CLI daemon saves screenshots, snapshots, and other artifacts
+    to `{project_dir}/.playwright-cli/`. This removes them after they've aged
+    out (default 5 minutes).
+
+    Also cleans up legacy screenshot patterns from the project root (from the
+    old Playwright MCP server approach).
+
+    Args:
+        project_dir: Path to the project directory.
+        max_age_seconds: Maximum age in seconds before an artifact is deleted.
+                        Defaults to 5 minutes (300 seconds).
+
+    Returns:
+        Dictionary with cleanup statistics (files_deleted, bytes_freed, errors).
+    """
+    cutoff_time = time.time() - max_age_seconds
+    stats: dict = {"files_deleted": 0, "bytes_freed": 0, "errors": []}
+
+    # Clean up .playwright-cli/ directory (new CLI approach)
+    playwright_cli_dir = project_dir / ".playwright-cli"
+    if playwright_cli_dir.exists():
+        for item in playwright_cli_dir.iterdir():
+            if not item.is_file():
+                continue
+            try:
+                mtime = item.stat().st_mtime
+                if mtime < cutoff_time:
+                    size = item.stat().st_size
+                    item.unlink(missing_ok=True)
+                    if not item.exists():
+                        stats["files_deleted"] += 1
+                        stats["bytes_freed"] += size
+                        logger.debug(f"Deleted playwright-cli artifact: {item}")
+            except Exception as e:
+                stats["errors"].append(f"Failed to delete {item}: {e}")
+                logger.debug(f"Failed to delete artifact {item}: {e}")
+
+    # Legacy cleanup: root-level screenshot patterns (from old MCP server approach)
+    legacy_patterns = [
+        "feature*-*.png",
+        "screenshot-*.png",
+        "step-*.png",
+    ]
+
+    for pattern in legacy_patterns:
+        for item in project_dir.glob(pattern):
+            if not item.is_file():
+                continue
+            try:
+                mtime = item.stat().st_mtime
+                if mtime < cutoff_time:
+                    size = item.stat().st_size
+                    item.unlink(missing_ok=True)
+                    if not item.exists():
+                        stats["files_deleted"] += 1
+                        stats["bytes_freed"] += size
+                        logger.debug(f"Deleted legacy screenshot: {item}")
+            except Exception as e:
+                stats["errors"].append(f"Failed to delete {item}: {e}")
+                logger.debug(f"Failed to delete screenshot {item}: {e}")
+
+    if stats["files_deleted"] > 0:
+        mb_freed = stats["bytes_freed"] / (1024 * 1024)
+        logger.info(f"Artifact cleanup: {stats['files_deleted']} files, {mb_freed:.1f} MB freed")
+
+    return stats
+
+
+def _get_dir_size(path: Path) -> int:
+    """Get total size of a directory in bytes."""
+    total = 0
+    try:
+        for item in path.rglob("*"):
+            if item.is_file():
+                try:
+                    total += item.stat().st_size
+                except (OSError, PermissionError):
+                    pass
+    except (OSError, PermissionError):
+        pass
+    return total
+
+
+if __name__ == "__main__":
+    # Allow running directly for testing/manual cleanup
+    logging.basicConfig(level=logging.DEBUG)
+    print("Running temp cleanup...")
+    stats = cleanup_stale_temp()
+    mb_freed = stats["bytes_freed"] / (1024 * 1024)
+    print(f"Cleanup complete: {stats['dirs_deleted']} dirs, {stats['files_deleted']} files, {mb_freed:.1f} MB freed")
+    if stats["errors"]:
+        print(f"Errors (non-fatal): {len(stats['errors'])}")
--- a/test_client.py
+++ b/test_client.py
@@ -0,0 +1,272 @@
+#!/usr/bin/env python3
+"""
+Client Utility Tests
+====================
+
+Tests for the client module utility functions.
+Run with: python test_client.py
+"""
+
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+from client import (
+    EXTRA_READ_PATHS_BLOCKLIST,
+    EXTRA_READ_PATHS_VAR,
+    convert_model_for_vertex,
+    get_extra_read_paths,
+)
+
+
+class TestConvertModelForVertex(unittest.TestCase):
+    """Tests for convert_model_for_vertex function."""
+
+    def setUp(self):
+        """Save original env state."""
+        self._orig_vertex = os.environ.get("CLAUDE_CODE_USE_VERTEX")
+
+    def tearDown(self):
+        """Restore original env state."""
+        if self._orig_vertex is None:
+            os.environ.pop("CLAUDE_CODE_USE_VERTEX", None)
+        else:
+            os.environ["CLAUDE_CODE_USE_VERTEX"] = self._orig_vertex
+
+    # --- Vertex AI disabled (default) ---
+
+    def test_returns_model_unchanged_when_vertex_disabled(self):
+        os.environ.pop("CLAUDE_CODE_USE_VERTEX", None)
+        self.assertEqual(
+            convert_model_for_vertex("claude-opus-4-6"),
+            "claude-opus-4-6",
+        )
+
+    def test_returns_model_unchanged_when_vertex_set_to_zero(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "0"
+        self.assertEqual(
+            convert_model_for_vertex("claude-opus-4-6"),
+            "claude-opus-4-6",
+        )
+
+    def test_returns_model_unchanged_when_vertex_set_to_empty(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = ""
+        self.assertEqual(
+            convert_model_for_vertex("claude-sonnet-4-5-20250929"),
+            "claude-sonnet-4-5-20250929",
+        )
+
+    # --- Vertex AI enabled: standard conversions ---
+
+    def test_converts_legacy_opus_model(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("claude-opus-4-5-20251101"),
+            "claude-opus-4-5@20251101",
+        )
+
+    def test_opus_4_6_passthrough_on_vertex(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("claude-opus-4-6"),
+            "claude-opus-4-6",
+        )
+
+    def test_converts_sonnet_model(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("claude-sonnet-4-5-20250929"),
+            "claude-sonnet-4-5@20250929",
+        )
+
+    def test_converts_haiku_model(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("claude-3-5-haiku-20241022"),
+            "claude-3-5-haiku@20241022",
+        )
+
+    # --- Vertex AI enabled: already converted or non-matching ---
+
+    def test_already_vertex_format_unchanged(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("claude-sonnet-4-5@20250929"),
+            "claude-sonnet-4-5@20250929",
+        )
+
+    def test_non_claude_model_unchanged(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("gpt-4o"),
+            "gpt-4o",
+        )
+
+    def test_model_without_date_suffix_unchanged(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(
+            convert_model_for_vertex("claude-opus-4-6"),
+            "claude-opus-4-6",
+        )
+
+    def test_empty_string_unchanged(self):
+        os.environ["CLAUDE_CODE_USE_VERTEX"] = "1"
+        self.assertEqual(convert_model_for_vertex(""), "")
+
+
+class TestExtraReadPathsBlocklist(unittest.TestCase):
+    """Tests for EXTRA_READ_PATHS sensitive directory blocking in get_extra_read_paths()."""
+
+    def setUp(self):
+        """Save original environment and home directory state."""
+        self._orig_extra_read = os.environ.get(EXTRA_READ_PATHS_VAR)
+        self._orig_home = os.environ.get("HOME")
+        self._orig_userprofile = os.environ.get("USERPROFILE")
+        self._orig_homedrive = os.environ.get("HOMEDRIVE")
+        self._orig_homepath = os.environ.get("HOMEPATH")
+
+    def tearDown(self):
+        """Restore original environment state."""
+        restore_map = {
+            EXTRA_READ_PATHS_VAR: self._orig_extra_read,
+            "HOME": self._orig_home,
+            "USERPROFILE": self._orig_userprofile,
+            "HOMEDRIVE": self._orig_homedrive,
+            "HOMEPATH": self._orig_homepath,
+        }
+        for key, value in restore_map.items():
+            if value is None:
+                os.environ.pop(key, None)
+            else:
+                os.environ[key] = value
+
+    def _set_home(self, home_path: str):
+        """Set the home directory for both Unix and Windows."""
+        os.environ["HOME"] = home_path
+        if sys.platform == "win32":
+            os.environ["USERPROFILE"] = home_path
+            drive, path = os.path.splitdrive(home_path)
+            if drive:
+                os.environ["HOMEDRIVE"] = drive
+                os.environ["HOMEPATH"] = path
+
+    def test_sensitive_directory_is_blocked(self):
+        """Path that IS a sensitive directory (e.g., ~/.ssh) should be blocked."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+            # Create the sensitive directory so it exists
+            ssh_dir = Path(tmpdir) / ".ssh"
+            ssh_dir.mkdir()
+
+            os.environ[EXTRA_READ_PATHS_VAR] = str(ssh_dir)
+            result = get_extra_read_paths()
+            self.assertEqual(result, [], "Path that IS ~/.ssh should be blocked")
+
+    def test_path_inside_sensitive_directory_is_blocked(self):
+        """Path INSIDE a sensitive directory (e.g., ~/.ssh/keys) should be blocked."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+            ssh_dir = Path(tmpdir) / ".ssh"
+            keys_dir = ssh_dir / "keys"
+            keys_dir.mkdir(parents=True)
+
+            os.environ[EXTRA_READ_PATHS_VAR] = str(keys_dir)
+            result = get_extra_read_paths()
+            self.assertEqual(result, [], "Path inside ~/.ssh should be blocked")
+
+    def test_path_containing_sensitive_directory_is_blocked(self):
+        """Path that contains a sensitive directory inside it should be blocked.
+
+        For example, if the extra read path is the user's home directory, and
+        ~/.ssh exists inside it, the path should be blocked because granting
+        read access to the parent would expose the sensitive subdirectory.
+        """
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+            # Create a sensitive dir inside the home so it triggers the
+            # "sensitive dir is inside the requested path" check
+            ssh_dir = Path(tmpdir) / ".ssh"
+            ssh_dir.mkdir()
+
+            os.environ[EXTRA_READ_PATHS_VAR] = tmpdir
+            result = get_extra_read_paths()
+            self.assertEqual(result, [], "Home dir containing .ssh should be blocked")
+
+    def test_valid_non_sensitive_path_is_allowed(self):
+        """A valid directory that is NOT sensitive should be allowed."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+            # Create a non-sensitive directory under home
+            docs_dir = Path(tmpdir) / "Documents" / "myproject"
+            docs_dir.mkdir(parents=True)
+
+            os.environ[EXTRA_READ_PATHS_VAR] = str(docs_dir)
+            result = get_extra_read_paths()
+            self.assertEqual(len(result), 1, "Non-sensitive path should be allowed")
+            self.assertEqual(result[0], docs_dir.resolve())
+
+    def test_all_blocklist_entries_are_checked(self):
+        """Every directory in EXTRA_READ_PATHS_BLOCKLIST should actually be blocked."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+
+            for sensitive_name in sorted(EXTRA_READ_PATHS_BLOCKLIST):
+                sensitive_dir = Path(tmpdir) / sensitive_name
+                sensitive_dir.mkdir(parents=True, exist_ok=True)
+
+                os.environ[EXTRA_READ_PATHS_VAR] = str(sensitive_dir)
+                result = get_extra_read_paths()
+                self.assertEqual(
+                    result, [],
+                    f"Blocklist entry '{sensitive_name}' should be blocked"
+                )
+
+    def test_multiple_paths_mixed_sensitive_and_valid(self):
+        """When given multiple paths, only non-sensitive ones should pass."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+
+            # Create one sensitive and one valid directory
+            ssh_dir = Path(tmpdir) / ".ssh"
+            ssh_dir.mkdir()
+            valid_dir = Path(tmpdir) / "projects"
+            valid_dir.mkdir()
+
+            os.environ[EXTRA_READ_PATHS_VAR] = f"{ssh_dir},{valid_dir}"
+            result = get_extra_read_paths()
+            self.assertEqual(len(result), 1, "Only the non-sensitive path should be returned")
+            self.assertEqual(result[0], valid_dir.resolve())
+
+    def test_empty_extra_read_paths_returns_empty(self):
+        """Empty EXTRA_READ_PATHS should return empty list."""
+        os.environ[EXTRA_READ_PATHS_VAR] = ""
+        result = get_extra_read_paths()
+        self.assertEqual(result, [])
+
+    def test_unset_extra_read_paths_returns_empty(self):
+        """Unset EXTRA_READ_PATHS should return empty list."""
+        os.environ.pop(EXTRA_READ_PATHS_VAR, None)
+        result = get_extra_read_paths()
+        self.assertEqual(result, [])
+
+    def test_nonexistent_path_is_skipped(self):
+        """A path that does not exist should be skipped."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            self._set_home(tmpdir)
+            nonexistent = Path(tmpdir) / "does_not_exist"
+
+            os.environ[EXTRA_READ_PATHS_VAR] = str(nonexistent)
+            result = get_extra_read_paths()
+            self.assertEqual(result, [])
+
+    def test_relative_path_is_skipped(self):
+        """A relative path should be skipped."""
+        os.environ[EXTRA_READ_PATHS_VAR] = "relative/path"
+        result = get_extra_read_paths()
+        self.assertEqual(result, [])
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/test_dependency_resolver.py
+++ b/test_dependency_resolver.py
@@ -0,0 +1,426 @@
+#!/usr/bin/env python3
+"""
+Dependency Resolver Tests
+=========================
+
+Tests for the dependency resolver functions including cycle detection.
+Run with: python test_dependency_resolver.py
+"""
+
+import sys
+import time
+from concurrent.futures import ThreadPoolExecutor
+from concurrent.futures import TimeoutError as FuturesTimeoutError
+
+from api.dependency_resolver import (
+    are_dependencies_satisfied,
+    compute_scheduling_scores,
+    get_blocked_features,
+    get_blocking_dependencies,
+    get_ready_features,
+    resolve_dependencies,
+    would_create_circular_dependency,
+)
+
+
+def test_compute_scheduling_scores_simple_chain():
+    """Test scheduling scores for a simple linear dependency chain."""
+    print("\nTesting compute_scheduling_scores with simple chain:")
+
+    features = [
+        {"id": 1, "priority": 1, "dependencies": []},
+        {"id": 2, "priority": 2, "dependencies": [1]},
+        {"id": 3, "priority": 3, "dependencies": [2]},
+    ]
+
+    scores = compute_scheduling_scores(features)
+
+    # All features should have scores
+    passed = True
+    for f in features:
+        if f["id"] not in scores:
+            print(f"  FAIL: Feature {f['id']} missing from scores")
+            passed = False
+
+    if passed:
+        # Root feature (1) should have highest score (unblocks most)
+        if scores[1] > scores[2] > scores[3]:
+            print("  PASS: Root feature has highest score, leaf has lowest")
+        else:
+            print(f"  FAIL: Expected scores[1] > scores[2] > scores[3], got {scores}")
+            passed = False
+
+    return passed
+
+
+def test_compute_scheduling_scores_with_cycle():
+    """Test that compute_scheduling_scores handles circular dependencies without hanging."""
+    print("\nTesting compute_scheduling_scores with circular dependencies:")
+
+    # Create a cycle: 1 -> 2 -> 3 -> 1
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [3]},
+        {"id": 2, "priority": 2, "dependencies": [1]},
+        {"id": 3, "priority": 3, "dependencies": [2]},
+    ]
+
+    # Use timeout to detect infinite loop
+    def compute_with_timeout():
+        return compute_scheduling_scores(features)
+
+    start = time.time()
+    try:
+        with ThreadPoolExecutor(max_workers=1) as executor:
+            future = executor.submit(compute_with_timeout)
+            scores = future.result(timeout=5.0)  # 5 second timeout
+
+        elapsed = time.time() - start
+
+        # Should complete quickly (< 1 second for 3 features)
+        if elapsed > 1.0:
+            print(f"  FAIL: Took {elapsed:.2f}s (expected < 1s)")
+            return False
+
+        # All features should have scores (even cyclic ones)
+        if len(scores) == 3:
+            print(f"  PASS: Completed in {elapsed:.3f}s with {len(scores)} scores")
+            return True
+        else:
+            print(f"  FAIL: Expected 3 scores, got {len(scores)}")
+            return False
+
+    except FuturesTimeoutError:
+        print("  FAIL: Infinite loop detected (timed out after 5s)")
+        return False
+
+
+def test_compute_scheduling_scores_self_reference():
+    """Test scheduling scores with self-referencing dependency."""
+    print("\nTesting compute_scheduling_scores with self-reference:")
+
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [1]},  # Self-reference
+        {"id": 2, "priority": 2, "dependencies": []},
+    ]
+
+    start = time.time()
+    try:
+        with ThreadPoolExecutor(max_workers=1) as executor:
+            future = executor.submit(lambda: compute_scheduling_scores(features))
+            scores = future.result(timeout=5.0)
+
+        elapsed = time.time() - start
+
+        if elapsed > 1.0:
+            print(f"  FAIL: Took {elapsed:.2f}s (expected < 1s)")
+            return False
+
+        if len(scores) == 2:
+            print(f"  PASS: Completed in {elapsed:.3f}s with {len(scores)} scores")
+            return True
+        else:
+            print(f"  FAIL: Expected 2 scores, got {len(scores)}")
+            return False
+
+    except FuturesTimeoutError:
+        print("  FAIL: Infinite loop detected (timed out after 5s)")
+        return False
+
+
+def test_compute_scheduling_scores_complex_cycle():
+    """Test scheduling scores with complex circular dependencies."""
+    print("\nTesting compute_scheduling_scores with complex cycle:")
+
+    # Features 1-3 form a cycle, feature 4 depends on 1
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [3]},
+        {"id": 2, "priority": 2, "dependencies": [1]},
+        {"id": 3, "priority": 3, "dependencies": [2]},
+        {"id": 4, "priority": 4, "dependencies": [1]},  # Outside cycle
+    ]
+
+    start = time.time()
+    try:
+        with ThreadPoolExecutor(max_workers=1) as executor:
+            future = executor.submit(lambda: compute_scheduling_scores(features))
+            scores = future.result(timeout=5.0)
+
+        elapsed = time.time() - start
+
+        if elapsed > 1.0:
+            print(f"  FAIL: Took {elapsed:.2f}s (expected < 1s)")
+            return False
+
+        if len(scores) == 4:
+            print(f"  PASS: Completed in {elapsed:.3f}s with {len(scores)} scores")
+            return True
+        else:
+            print(f"  FAIL: Expected 4 scores, got {len(scores)}")
+            return False
+
+    except FuturesTimeoutError:
+        print("  FAIL: Infinite loop detected (timed out after 5s)")
+        return False
+
+
+def test_compute_scheduling_scores_diamond():
+    """Test scheduling scores with diamond dependency pattern."""
+    print("\nTesting compute_scheduling_scores with diamond pattern:")
+
+    #     1
+    #    / \
+    #   2   3
+    #    \ /
+    #     4
+    features = [
+        {"id": 1, "priority": 1, "dependencies": []},
+        {"id": 2, "priority": 2, "dependencies": [1]},
+        {"id": 3, "priority": 3, "dependencies": [1]},
+        {"id": 4, "priority": 4, "dependencies": [2, 3]},
+    ]
+
+    scores = compute_scheduling_scores(features)
+
+    # Feature 1 should have highest score (unblocks 2, 3, and transitively 4)
+    if scores[1] > scores[2] and scores[1] > scores[3] and scores[1] > scores[4]:
+        # Feature 4 should have lowest score (leaf, unblocks nothing)
+        if scores[4] < scores[2] and scores[4] < scores[3]:
+            print("  PASS: Root has highest score, leaf has lowest")
+            return True
+        else:
+            print(f"  FAIL: Leaf should have lowest score. Scores: {scores}")
+            return False
+    else:
+        print(f"  FAIL: Root should have highest score. Scores: {scores}")
+        return False
+
+
+def test_compute_scheduling_scores_empty():
+    """Test scheduling scores with empty feature list."""
+    print("\nTesting compute_scheduling_scores with empty list:")
+
+    scores = compute_scheduling_scores([])
+
+    if scores == {}:
+        print("  PASS: Returns empty dict for empty input")
+        return True
+    else:
+        print(f"  FAIL: Expected empty dict, got {scores}")
+        return False
+
+
+def test_would_create_circular_dependency():
+    """Test cycle detection for new dependencies."""
+    print("\nTesting would_create_circular_dependency:")
+
+    # Current dependencies: 2 depends on 1, 3 depends on 2
+    # Dependency chain: 3 -> 2 -> 1 (arrows mean "depends on")
+    features = [
+        {"id": 1, "priority": 1, "dependencies": []},
+        {"id": 2, "priority": 2, "dependencies": [1]},
+        {"id": 3, "priority": 3, "dependencies": [2]},
+    ]
+
+    passed = True
+
+    # source_id gains dependency on target_id
+    # Adding "1 depends on 3" would create cycle: 1 -> 3 -> 2 -> 1
+    if would_create_circular_dependency(features, 1, 3):
+        print("  PASS: Detected cycle when adding 1 depends on 3")
+    else:
+        print("  FAIL: Should detect cycle when adding 1 depends on 3")
+        passed = False
+
+    # Adding "3 depends on 1" would NOT create cycle (redundant but not circular)
+    if not would_create_circular_dependency(features, 3, 1):
+        print("  PASS: No false positive for 3 depends on 1")
+    else:
+        print("  FAIL: False positive for 3 depends on 1")
+        passed = False
+
+    # Self-reference should be detected
+    if would_create_circular_dependency(features, 1, 1):
+        print("  PASS: Detected self-reference")
+    else:
+        print("  FAIL: Should detect self-reference")
+        passed = False
+
+    return passed
+
+
+def test_resolve_dependencies_with_cycle():
+    """Test resolve_dependencies detects and reports cycles."""
+    print("\nTesting resolve_dependencies with cycle:")
+
+    # Create a cycle: 1 -> 2 -> 3 -> 1
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [3]},
+        {"id": 2, "priority": 2, "dependencies": [1]},
+        {"id": 3, "priority": 3, "dependencies": [2]},
+    ]
+
+    result = resolve_dependencies(features)
+
+    # Should report circular dependencies
+    if result["circular_dependencies"]:
+        print(f"  PASS: Detected cycle: {result['circular_dependencies']}")
+        return True
+    else:
+        print("  FAIL: Should report circular dependencies")
+        return False
+
+
+def test_are_dependencies_satisfied():
+    """Test dependency satisfaction checking."""
+    print("\nTesting are_dependencies_satisfied:")
+
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [], "passes": True},
+        {"id": 2, "priority": 2, "dependencies": [1], "passes": False},
+        {"id": 3, "priority": 3, "dependencies": [2], "passes": False},
+    ]
+
+    passed = True
+
+    # Feature 1 has no deps, should be satisfied
+    if are_dependencies_satisfied(features[0], features):
+        print("  PASS: Feature 1 (no deps) is satisfied")
+    else:
+        print("  FAIL: Feature 1 should be satisfied")
+        passed = False
+
+    # Feature 2 depends on 1 which passes, should be satisfied
+    if are_dependencies_satisfied(features[1], features):
+        print("  PASS: Feature 2 (dep on passing) is satisfied")
+    else:
+        print("  FAIL: Feature 2 should be satisfied")
+        passed = False
+
+    # Feature 3 depends on 2 which doesn't pass, should NOT be satisfied
+    if not are_dependencies_satisfied(features[2], features):
+        print("  PASS: Feature 3 (dep on non-passing) is not satisfied")
+    else:
+        print("  FAIL: Feature 3 should not be satisfied")
+        passed = False
+
+    return passed
+
+
+def test_get_blocking_dependencies():
+    """Test getting blocking dependency IDs."""
+    print("\nTesting get_blocking_dependencies:")
+
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [], "passes": True},
+        {"id": 2, "priority": 2, "dependencies": [], "passes": False},
+        {"id": 3, "priority": 3, "dependencies": [1, 2], "passes": False},
+    ]
+
+    blocking = get_blocking_dependencies(features[2], features)
+
+    # Only feature 2 should be blocking (1 passes)
+    if blocking == [2]:
+        print("  PASS: Correctly identified blocking dependency")
+        return True
+    else:
+        print(f"  FAIL: Expected [2], got {blocking}")
+        return False
+
+
+def test_get_ready_features():
+    """Test getting ready features."""
+    print("\nTesting get_ready_features:")
+
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [], "passes": True},
+        {"id": 2, "priority": 2, "dependencies": [], "passes": False, "in_progress": False},
+        {"id": 3, "priority": 3, "dependencies": [1], "passes": False, "in_progress": False},
+        {"id": 4, "priority": 4, "dependencies": [2], "passes": False, "in_progress": False},
+    ]
+
+    ready = get_ready_features(features)
+
+    # Features 2 and 3 should be ready
+    # Feature 1 passes, feature 4 blocked by 2
+    ready_ids = [f["id"] for f in ready]
+
+    if 2 in ready_ids and 3 in ready_ids:
+        if 1 not in ready_ids and 4 not in ready_ids:
+            print(f"  PASS: Ready features: {ready_ids}")
+            return True
+        else:
+            print(f"  FAIL: Should not include passing/blocked. Got: {ready_ids}")
+            return False
+    else:
+        print(f"  FAIL: Should include 2 and 3. Got: {ready_ids}")
+        return False
+
+
+def test_get_blocked_features():
+    """Test getting blocked features."""
+    print("\nTesting get_blocked_features:")
+
+    features = [
+        {"id": 1, "priority": 1, "dependencies": [], "passes": False},
+        {"id": 2, "priority": 2, "dependencies": [1], "passes": False},
+    ]
+
+    blocked = get_blocked_features(features)
+
+    # Feature 2 should be blocked by 1
+    if len(blocked) == 1 and blocked[0]["id"] == 2:
+        if blocked[0]["blocked_by"] == [1]:
+            print("  PASS: Correctly identified blocked feature")
+            return True
+        else:
+            print(f"  FAIL: Wrong blocked_by: {blocked[0]['blocked_by']}")
+            return False
+    else:
+        print(f"  FAIL: Expected feature 2 blocked, got: {blocked}")
+        return False
+
+
+def run_all_tests():
+    """Run all tests and report results."""
+    print("=" * 60)
+    print("Dependency Resolver Tests")
+    print("=" * 60)
+
+    tests = [
+        test_compute_scheduling_scores_simple_chain,
+        test_compute_scheduling_scores_with_cycle,
+        test_compute_scheduling_scores_self_reference,
+        test_compute_scheduling_scores_complex_cycle,
+        test_compute_scheduling_scores_diamond,
+        test_compute_scheduling_scores_empty,
+        test_would_create_circular_dependency,
+        test_resolve_dependencies_with_cycle,
+        test_are_dependencies_satisfied,
+        test_get_blocking_dependencies,
+        test_get_ready_features,
+        test_get_blocked_features,
+    ]
+
+    passed = 0
+    failed = 0
+
+    for test in tests:
+        try:
+            if test():
+                passed += 1
+            else:
+                failed += 1
+        except Exception as e:
+            print(f"  ERROR: {e}")
+            failed += 1
+
+    print("\n" + "=" * 60)
+    print(f"Results: {passed} passed, {failed} failed")
+    print("=" * 60)
+
+    return failed == 0
+
+
+if __name__ == "__main__":
+    success = run_all_tests()
+    sys.exit(0 if success else 1)
--- a/test_devserver_security.py
+++ b/test_devserver_security.py
@@ -0,0 +1,319 @@
+#!/usr/bin/env python3
+"""
+Dev Server Security Tests
+=========================
+
+Tests for dev server command validation and security hardening.
+Run with: python -m pytest test_devserver_security.py -v
+"""
+
+import sys
+from pathlib import Path
+
+import pytest
+
+# Add project root to path
+sys.path.insert(0, str(Path(__file__).parent))
+
+from server.routers.devserver import (
+    ALLOWED_NPM_SCRIPTS,
+    ALLOWED_PYTHON_MODULES,
+    ALLOWED_RUNNERS,
+    BLOCKED_SHELLS,
+    validate_custom_command_strict,
+)
+
+# =============================================================================
+# validate_custom_command_strict - Valid commands
+# =============================================================================
+
+
+class TestValidCommands:
+    """Commands that should pass validation."""
+
+    def test_npm_run_dev(self):
+        validate_custom_command_strict("npm run dev")
+
+    def test_npm_run_start(self):
+        validate_custom_command_strict("npm run start")
+
+    def test_npm_run_serve(self):
+        validate_custom_command_strict("npm run serve")
+
+    def test_npm_run_preview(self):
+        validate_custom_command_strict("npm run preview")
+
+    def test_pnpm_dev(self):
+        validate_custom_command_strict("pnpm dev")
+
+    def test_pnpm_run_dev(self):
+        validate_custom_command_strict("pnpm run dev")
+
+    def test_yarn_start(self):
+        validate_custom_command_strict("yarn start")
+
+    def test_yarn_run_serve(self):
+        validate_custom_command_strict("yarn run serve")
+
+    def test_uvicorn_basic(self):
+        validate_custom_command_strict("uvicorn main:app")
+
+    def test_uvicorn_with_flags(self):
+        validate_custom_command_strict("uvicorn main:app --host 0.0.0.0 --port 8000 --reload")
+
+    def test_uvicorn_flag_equals_syntax(self):
+        validate_custom_command_strict("uvicorn main:app --port=8000 --host=0.0.0.0")
+
+    def test_python_m_uvicorn(self):
+        validate_custom_command_strict("python -m uvicorn main:app --reload")
+
+    def test_python3_m_uvicorn(self):
+        validate_custom_command_strict("python3 -m uvicorn main:app")
+
+    def test_python_m_flask(self):
+        validate_custom_command_strict("python -m flask run")
+
+    def test_python_m_gunicorn(self):
+        validate_custom_command_strict("python -m gunicorn main:app")
+
+    def test_python_m_http_server(self):
+        validate_custom_command_strict("python -m http.server 8000")
+
+    def test_python_script(self):
+        validate_custom_command_strict("python app.py")
+
+    def test_python_manage_py_runserver(self):
+        validate_custom_command_strict("python manage.py runserver")
+
+    def test_python_manage_py_runserver_with_port(self):
+        validate_custom_command_strict("python manage.py runserver 0.0.0.0:8000")
+
+    def test_flask_run(self):
+        validate_custom_command_strict("flask run")
+
+    def test_flask_run_with_options(self):
+        validate_custom_command_strict("flask run --host 0.0.0.0 --port 5000")
+
+    def test_poetry_run_command(self):
+        validate_custom_command_strict("poetry run python app.py")
+
+    def test_cargo_run(self):
+        # cargo is allowed but has no special sub-validation
+        validate_custom_command_strict("cargo run")
+
+    def test_go_run(self):
+        # go is allowed but has no special sub-validation
+        validate_custom_command_strict("go run .")
+
+
+# =============================================================================
+# validate_custom_command_strict - Blocked shells
+# =============================================================================
+
+
+class TestBlockedShells:
+    """Shell interpreters that must be rejected."""
+
+    @pytest.mark.parametrize("shell", ["sh", "bash", "zsh", "cmd", "powershell", "pwsh", "cmd.exe"])
+    def test_blocked_shell(self, shell):
+        with pytest.raises(ValueError, match="runner not allowed"):
+            validate_custom_command_strict(f"{shell} -c 'echo hacked'")
+
+
+# =============================================================================
+# validate_custom_command_strict - Blocked commands
+# =============================================================================
+
+
+class TestBlockedCommands:
+    """Commands that should be rejected."""
+
+    def test_empty_command(self):
+        with pytest.raises(ValueError, match="cannot be empty"):
+            validate_custom_command_strict("")
+
+    def test_whitespace_only(self):
+        with pytest.raises(ValueError, match="cannot be empty"):
+            validate_custom_command_strict("   ")
+
+    def test_python_dash_c(self):
+        with pytest.raises(ValueError, match="python -c is not allowed"):
+            validate_custom_command_strict("python -c 'import os; os.system(\"rm -rf /\")'")
+
+    def test_python3_dash_c(self):
+        with pytest.raises(ValueError, match="python -c is not allowed"):
+            validate_custom_command_strict("python3 -c 'print(1)'")
+
+    def test_python_no_script_or_module(self):
+        with pytest.raises(ValueError, match="must use"):
+            validate_custom_command_strict("python --version")
+
+    def test_python_m_disallowed_module(self):
+        with pytest.raises(ValueError, match="not allowed"):
+            validate_custom_command_strict("python -m pip install something")
+
+    def test_unknown_runner(self):
+        with pytest.raises(ValueError, match="runner not allowed"):
+            validate_custom_command_strict("curl http://evil.com")
+
+    def test_rm_rf(self):
+        with pytest.raises(ValueError, match="runner not allowed"):
+            validate_custom_command_strict("rm -rf /")
+
+    def test_npm_arbitrary_script(self):
+        with pytest.raises(ValueError, match="npm custom_command"):
+            validate_custom_command_strict("npm run postinstall")
+
+    def test_npm_exec(self):
+        with pytest.raises(ValueError, match="npm custom_command"):
+            validate_custom_command_strict("npm exec evil-package")
+
+    def test_pnpm_arbitrary_script(self):
+        with pytest.raises(ValueError, match="pnpm custom_command"):
+            validate_custom_command_strict("pnpm run postinstall")
+
+    def test_yarn_arbitrary_script(self):
+        with pytest.raises(ValueError, match="yarn custom_command"):
+            validate_custom_command_strict("yarn run postinstall")
+
+    def test_uvicorn_no_app(self):
+        with pytest.raises(ValueError, match="must specify an app"):
+            validate_custom_command_strict("uvicorn --reload")
+
+    def test_uvicorn_disallowed_flag(self):
+        with pytest.raises(ValueError, match="flag not allowed"):
+            validate_custom_command_strict("uvicorn main:app --factory")
+
+    def test_flask_no_run(self):
+        with pytest.raises(ValueError, match="flask custom_command"):
+            validate_custom_command_strict("flask shell")
+
+    def test_poetry_no_run(self):
+        with pytest.raises(ValueError, match="poetry custom_command"):
+            validate_custom_command_strict("poetry install")
+
+
+# =============================================================================
+# validate_custom_command_strict - Injection attempts
+# =============================================================================
+
+
+class TestInjectionAttempts:
+    """Adversarial inputs that attempt to bypass validation."""
+
+    def test_shell_via_path_traversal(self):
+        with pytest.raises(ValueError, match="runner not allowed"):
+            validate_custom_command_strict("/bin/sh -c 'echo hacked'")
+
+    def test_shell_via_relative_path(self):
+        with pytest.raises(ValueError, match="runner not allowed"):
+            validate_custom_command_strict("../../bin/bash -c whoami")
+
+    def test_none_input(self):
+        with pytest.raises(ValueError, match="cannot be empty"):
+            validate_custom_command_strict(None)  # type: ignore[arg-type]
+
+    def test_integer_input(self):
+        with pytest.raises(ValueError, match="cannot be empty"):
+            validate_custom_command_strict(123)  # type: ignore[arg-type]
+
+    def test_python_dash_c_uppercase(self):
+        with pytest.raises(ValueError, match="python -c is not allowed"):
+            validate_custom_command_strict("python -C 'exec(evil)'")
+
+    def test_powershell_via_path(self):
+        with pytest.raises(ValueError, match="runner not allowed"):
+            validate_custom_command_strict("C:\\Windows\\System32\\powershell.exe -c Get-Process")
+
+
+# =============================================================================
+# dev_server_manager.py - dangerous_ops blocking
+# =============================================================================
+
+
+class TestDangerousOpsBlocking:
+    """Test the metacharacter blocking in dev_server_manager.start()."""
+
+    @pytest.fixture
+    def manager(self, tmp_path):
+        from server.services.dev_server_manager import DevServerProcessManager
+        return DevServerProcessManager("test-project", tmp_path)
+
+    @pytest.mark.asyncio
+    @pytest.mark.parametrize("cmd,desc", [
+        ("npm run dev && curl evil.com", "double ampersand"),
+        ("npm run dev & curl evil.com", "single ampersand"),
+        ("npm run dev || curl evil.com", "double pipe"),
+        ("npm run dev | curl evil.com", "single pipe"),
+        ("npm run dev ; curl evil.com", "semicolon"),
+        ("npm run dev `curl evil.com`", "backtick"),
+        ("npm run dev $(curl evil.com)", "dollar paren"),
+        ("npm run dev > /etc/passwd", "output redirect"),
+        ("npm run dev < /etc/passwd", "input redirect"),
+        ("npm run dev ^& calc", "caret escape"),
+        ("npm run %COMSPEC%", "percent env expansion"),
+    ])
+    async def test_blocks_shell_operator(self, manager, cmd, desc):
+        success, message = await manager.start(cmd)
+        assert not success, f"Should block {desc}: {cmd}"
+        assert "not allowed" in message.lower()
+
+    @pytest.mark.asyncio
+    async def test_blocks_newline_injection(self, manager):
+        success, message = await manager.start("npm run dev\ncurl evil.com")
+        assert not success
+        assert "newline" in message.lower()
+
+    @pytest.mark.asyncio
+    async def test_blocks_carriage_return(self, manager):
+        success, message = await manager.start("npm run dev\r\ncurl evil.com")
+        assert not success
+        assert "newline" in message.lower()
+
+    @pytest.mark.asyncio
+    @pytest.mark.parametrize("shell", ["sh", "bash", "zsh", "cmd", "powershell", "pwsh"])
+    async def test_blocks_shell_runners(self, manager, shell):
+        success, message = await manager.start(f"{shell} -c 'echo hacked'")
+        assert not success
+        assert "not allowed" in message.lower()
+
+    @pytest.mark.asyncio
+    async def test_blocks_empty_command(self, manager):
+        success, message = await manager.start("")
+        assert not success
+        assert "empty" in message.lower()
+
+    @pytest.mark.asyncio
+    async def test_blocks_whitespace_command(self, manager):
+        success, message = await manager.start("   ")
+        assert not success
+        assert "empty" in message.lower()
+
+
+# =============================================================================
+# Constants validation
+# =============================================================================
+
+
+class TestConstants:
+    """Verify security constants are properly defined."""
+
+    def test_all_common_shells_blocked(self):
+        for shell in ["sh", "bash", "zsh", "cmd", "powershell", "pwsh", "cmd.exe"]:
+            assert shell in BLOCKED_SHELLS, f"{shell} should be in BLOCKED_SHELLS"
+
+    def test_common_npm_scripts_allowed(self):
+        for script in ["dev", "start", "serve", "preview"]:
+            assert script in ALLOWED_NPM_SCRIPTS, f"{script} should be in ALLOWED_NPM_SCRIPTS"
+
+    def test_common_python_modules_allowed(self):
+        for mod in ["uvicorn", "flask", "gunicorn"]:
+            assert mod in ALLOWED_PYTHON_MODULES, f"{mod} should be in ALLOWED_PYTHON_MODULES"
+
+    def test_common_runners_allowed(self):
+        for runner in ["npm", "pnpm", "yarn", "python", "python3", "uvicorn", "flask", "cargo", "go"]:
+            assert runner in ALLOWED_RUNNERS, f"{runner} should be in ALLOWED_RUNNERS"
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
--- a/test_rate_limit_utils.py
+++ b/test_rate_limit_utils.py
@@ -0,0 +1,205 @@
+"""
+Unit tests for rate limit handling functions.
+
+Tests the parse_retry_after(), is_rate_limit_error(), and backoff calculation
+functions from rate_limit_utils.py (shared module).
+"""
+
+import unittest
+
+from rate_limit_utils import (
+    calculate_error_backoff,
+    calculate_rate_limit_backoff,
+    clamp_retry_delay,
+    is_rate_limit_error,
+    parse_retry_after,
+)
+
+
+class TestParseRetryAfter(unittest.TestCase):
+    """Tests for parse_retry_after() function."""
+
+    def test_retry_after_colon_format(self):
+        """Test 'Retry-After: 60' format."""
+        assert parse_retry_after("Retry-After: 60") == 60
+        assert parse_retry_after("retry-after: 120") == 120
+        assert parse_retry_after("retry after: 30 seconds") == 30
+
+    def test_retry_after_space_format(self):
+        """Test 'retry after 60 seconds' format."""
+        assert parse_retry_after("retry after 60 seconds") == 60
+        assert parse_retry_after("Please retry after 120 seconds") == 120
+        assert parse_retry_after("Retry after 30") == 30
+
+    def test_try_again_in_format(self):
+        """Test 'try again in X seconds' format."""
+        assert parse_retry_after("try again in 120 seconds") == 120
+        assert parse_retry_after("Please try again in 60s") == 60
+        assert parse_retry_after("Try again in 30 seconds") == 30
+
+    def test_seconds_remaining_format(self):
+        """Test 'X seconds remaining' format."""
+        assert parse_retry_after("30 seconds remaining") == 30
+        assert parse_retry_after("60 seconds left") == 60
+        assert parse_retry_after("120 seconds until reset") == 120
+
+    def test_retry_after_zero(self):
+        """Test 'Retry-After: 0' returns 0 (not None)."""
+        assert parse_retry_after("Retry-After: 0") == 0
+        assert parse_retry_after("retry after 0 seconds") == 0
+
+    def test_no_match(self):
+        """Test messages that don't contain retry-after info."""
+        assert parse_retry_after("no match here") is None
+        assert parse_retry_after("Connection refused") is None
+        assert parse_retry_after("Internal server error") is None
+        assert parse_retry_after("") is None
+
+    def test_minutes_not_supported(self):
+        """Test that minutes are not parsed (by design)."""
+        # We only support seconds to avoid complexity
+        # These patterns should NOT match when followed by minute/hour units
+        assert parse_retry_after("wait 5 minutes") is None
+        assert parse_retry_after("try again in 2 minutes") is None
+        assert parse_retry_after("retry after 5 minutes") is None
+        assert parse_retry_after("retry after 1 hour") is None
+        assert parse_retry_after("try again in 30 min") is None
+
+
+class TestIsRateLimitError(unittest.TestCase):
+    """Tests for is_rate_limit_error() function."""
+
+    def test_rate_limit_patterns(self):
+        """Test various rate limit error messages."""
+        assert is_rate_limit_error("Rate limit exceeded") is True
+        assert is_rate_limit_error("rate_limit_exceeded") is True
+        assert is_rate_limit_error("Too many requests") is True
+        assert is_rate_limit_error("HTTP 429 Too Many Requests") is True
+        assert is_rate_limit_error("API quota exceeded") is True
+        assert is_rate_limit_error("Server is overloaded") is True
+
+    def test_specific_429_patterns(self):
+        """Test that 429 is detected with proper context."""
+        assert is_rate_limit_error("http 429") is True
+        assert is_rate_limit_error("HTTP429") is True
+        assert is_rate_limit_error("status 429") is True
+        assert is_rate_limit_error("error 429") is True
+        assert is_rate_limit_error("429 too many requests") is True
+
+    def test_case_insensitive(self):
+        """Test that detection is case-insensitive."""
+        assert is_rate_limit_error("RATE LIMIT") is True
+        assert is_rate_limit_error("Rate Limit") is True
+        assert is_rate_limit_error("rate limit") is True
+        assert is_rate_limit_error("RaTe LiMiT") is True
+
+    def test_non_rate_limit_errors(self):
+        """Test non-rate-limit error messages."""
+        assert is_rate_limit_error("Connection refused") is False
+        assert is_rate_limit_error("Authentication failed") is False
+        assert is_rate_limit_error("Invalid API key") is False
+        assert is_rate_limit_error("Internal server error") is False
+        assert is_rate_limit_error("Network timeout") is False
+        assert is_rate_limit_error("") is False
+
+
+class TestFalsePositives(unittest.TestCase):
+    """Verify non-rate-limit messages don't trigger detection."""
+
+    def test_version_numbers_with_429(self):
+        """Version numbers should not trigger."""
+        assert is_rate_limit_error("Node v14.29.0") is False
+        assert is_rate_limit_error("Python 3.12.429") is False
+        assert is_rate_limit_error("Version 2.429 released") is False
+
+    def test_issue_and_pr_numbers(self):
+        """Issue/PR numbers should not trigger."""
+        assert is_rate_limit_error("See PR #429") is False
+        assert is_rate_limit_error("Fixed in issue 429") is False
+        assert is_rate_limit_error("Closes #429") is False
+
+    def test_line_numbers(self):
+        """Line numbers in errors should not trigger."""
+        assert is_rate_limit_error("Error at line 429") is False
+        assert is_rate_limit_error("See file.py:429") is False
+
+    def test_port_numbers(self):
+        """Port numbers should not trigger."""
+        assert is_rate_limit_error("port 4293") is False
+        assert is_rate_limit_error("localhost:4290") is False
+
+    def test_legitimate_wait_messages(self):
+        """Legitimate wait instructions should not trigger."""
+        # These would fail if "please wait" pattern still exists
+        assert is_rate_limit_error("Please wait for the build to complete") is False
+        assert is_rate_limit_error("Please wait while I analyze this") is False
+
+    def test_retry_discussion_messages(self):
+        """Messages discussing retry logic should not trigger."""
+        # These would fail if "try again later" pattern still exists
+        assert is_rate_limit_error("Try again later after maintenance") is False
+        assert is_rate_limit_error("The user should try again later") is False
+
+    def test_limit_discussion_messages(self):
+        """Messages discussing limits should not trigger (removed pattern)."""
+        # These would fail if "limit reached" pattern still exists
+        assert is_rate_limit_error("File size limit reached") is False
+        assert is_rate_limit_error("Memory limit reached, consider optimization") is False
+
+    def test_overloaded_in_programming_context(self):
+        """Method/operator overloading discussions should not trigger."""
+        assert is_rate_limit_error("I will create an overloaded constructor") is False
+        assert is_rate_limit_error("The + operator is overloaded") is False
+        assert is_rate_limit_error("Here is the overloaded version of the function") is False
+        assert is_rate_limit_error("The method is overloaded to accept different types") is False
+        # But actual API overload messages should still match
+        assert is_rate_limit_error("Server is overloaded") is True
+        assert is_rate_limit_error("API overloaded") is True
+        assert is_rate_limit_error("system is overloaded") is True
+
+
+class TestBackoffFunctions(unittest.TestCase):
+    """Test backoff calculation functions from rate_limit_utils."""
+
+    def test_rate_limit_backoff_sequence(self):
+        """Test that rate limit backoff follows expected exponential sequence with jitter.
+
+        Base formula: 15 * 2^retries with 0-30% jitter.
+        Base values: 15, 30, 60, 120, 240, 480, 960, 1920, 3600, 3600
+        With jitter the result should be in [base, base * 1.3].
+        """
+        base_values = [15, 30, 60, 120, 240, 480, 960, 1920, 3600, 3600]
+        for retries, base in enumerate(base_values):
+            delay = calculate_rate_limit_backoff(retries)
+            # Delay must be at least the base value (jitter is non-negative)
+            assert delay >= base, f"Retry {retries}: {delay} < base {base}"
+            # Delay must not exceed base + 30% jitter (int truncation means <= base * 1.3)
+            max_with_jitter = int(base * 1.3)
+            assert delay <= max_with_jitter, f"Retry {retries}: {delay} > max {max_with_jitter}"
+
+    def test_error_backoff_sequence(self):
+        """Test that error backoff follows expected linear sequence."""
+        expected = [30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 300]  # Caps at 300
+        for retries in range(1, len(expected) + 1):
+            delay = calculate_error_backoff(retries)
+            expected_delay = expected[retries - 1]
+            assert delay == expected_delay, f"Retry {retries}: expected {expected_delay}, got {delay}"
+
+    def test_clamp_retry_delay(self):
+        """Test that retry delay is clamped to valid range."""
+        # Values within range stay the same
+        assert clamp_retry_delay(60) == 60
+        assert clamp_retry_delay(1800) == 1800
+        assert clamp_retry_delay(3600) == 3600
+
+        # Values below minimum get clamped to 1
+        assert clamp_retry_delay(0) == 1
+        assert clamp_retry_delay(-10) == 1
+
+        # Values above maximum get clamped to 3600
+        assert clamp_retry_delay(7200) == 3600
+        assert clamp_retry_delay(86400) == 3600
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/test_security.py
+++ b/test_security.py
@@ -18,11 +18,14 @@ from security import (
    bash_security_hook,
    extract_commands,
    get_effective_commands,
+    get_effective_pkill_processes,
    load_org_config,
    load_project_commands,
    matches_pattern,
    validate_chmod_command,
    validate_init_script,
+    validate_pkill_command,
+    validate_playwright_command,
    validate_project_command,
 )

@@ -105,6 +108,8 @@ def test_extract_commands():
        ("/usr/bin/node script.js", ["node"]),
        ("VAR=value ls", ["ls"]),
        ("git status || git init", ["git", "git"]),
+        # Fallback parser test: complex nested quotes that break shlex
+        ('docker exec container php -r "echo \\"test\\";"', ["docker"]),
    ]

    for cmd, expected in test_cases:
@@ -269,11 +274,11 @@ def test_yaml_loading():

    with tempfile.TemporaryDirectory() as tmpdir:
        project_dir = Path(tmpdir)
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()

        # Test 1: Valid YAML
-        config_path = autocoder_dir / "allowed_commands.yaml"
+        config_path = autoforge_dir / "allowed_commands.yaml"
        config_path.write_text("""version: 1
 commands:
  - name: swift
@@ -293,7 +298,7 @@ commands:
            failed += 1

        # Test 2: Missing file returns None
-        (project_dir / ".autocoder" / "allowed_commands.yaml").unlink()
+        (project_dir / ".autoforge" / "allowed_commands.yaml").unlink()
        config = load_project_commands(project_dir)
        if config is None:
            print("  PASS: Missing file returns None")
@@ -403,11 +408,11 @@ def test_project_commands():

    with tempfile.TemporaryDirectory() as tmpdir:
        project_dir = Path(tmpdir)
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()

        # Create a config with Swift commands
-        config_path = autocoder_dir / "allowed_commands.yaml"
+        config_path = autoforge_dir / "allowed_commands.yaml"
        config_path.write_text("""version: 1
 commands:
  - name: swift
@@ -451,6 +456,21 @@ commands:
            print("  FAIL: Non-allowed command 'rustc' should be blocked")
            failed += 1

+        # Test 4: Empty command name is rejected
+        config_path.write_text("""version: 1
+commands:
+  - name: ""
+    description: Empty name should be rejected
+""")
+        result = load_project_commands(project_dir)
+        if result is None:
+            print("  PASS: Empty command name rejected in project config")
+            passed += 1
+        else:
+            print("  FAIL: Empty command name should be rejected in project config")
+            print(f"         Got: {result}")
+            failed += 1
+
    return passed, failed


@@ -463,7 +483,7 @@ def test_org_config_loading():
    with tempfile.TemporaryDirectory() as tmpdir:
        # Use temporary_home for cross-platform compatibility
        with temporary_home(tmpdir):
-            org_dir = Path(tmpdir) / ".autocoder"
+            org_dir = Path(tmpdir) / ".autoforge"
            org_dir.mkdir()
            org_config_path = org_dir / "config.yaml"

@@ -557,7 +577,7 @@ def test_hierarchy_resolution():
        with tempfile.TemporaryDirectory() as tmpproject:
            # Use temporary_home for cross-platform compatibility
            with temporary_home(tmphome):
-                org_dir = Path(tmphome) / ".autocoder"
+                org_dir = Path(tmphome) / ".autoforge"
                org_dir.mkdir()
                org_config_path = org_dir / "config.yaml"

@@ -574,9 +594,9 @@ blocked_commands:
 """)

                project_dir = Path(tmpproject)
-                project_autocoder = project_dir / ".autocoder"
-                project_autocoder.mkdir()
-                project_config = project_autocoder / "allowed_commands.yaml"
+                project_autoforge = project_dir / ".autoforge"
+                project_autoforge.mkdir()
+                project_config = project_autoforge / "allowed_commands.yaml"

                # Create project config
                project_config.write_text("""version: 1
@@ -641,7 +661,7 @@ def test_org_blocklist_enforcement():
        with tempfile.TemporaryDirectory() as tmpproject:
            # Use temporary_home for cross-platform compatibility
            with temporary_home(tmphome):
-                org_dir = Path(tmphome) / ".autocoder"
+                org_dir = Path(tmphome) / ".autoforge"
                org_dir.mkdir()
                org_config_path = org_dir / "config.yaml"

@@ -652,8 +672,8 @@ blocked_commands:
 """)

                project_dir = Path(tmpproject)
-                project_autocoder = project_dir / ".autocoder"
-                project_autocoder.mkdir()
+                project_autoforge = project_dir / ".autoforge"
+                project_autoforge.mkdir()

                # Try to use terraform (should be blocked)
                input_data = {"tool_name": "Bash", "tool_input": {"command": "terraform apply"}}
@@ -670,6 +690,304 @@ blocked_commands:
    return passed, failed


+def test_pkill_extensibility():
+    """Test that pkill processes can be extended via config."""
+    print("\nTesting pkill process extensibility:\n")
+    passed = 0
+    failed = 0
+
+    # Test 1: Default processes work without config
+    allowed, reason = validate_pkill_command("pkill node")
+    if allowed:
+        print("  PASS: Default process 'node' allowed")
+        passed += 1
+    else:
+        print(f"  FAIL: Default process 'node' should be allowed: {reason}")
+        failed += 1
+
+    # Test 2: Non-default process blocked without config
+    allowed, reason = validate_pkill_command("pkill python")
+    if not allowed:
+        print("  PASS: Non-default process 'python' blocked without config")
+        passed += 1
+    else:
+        print("  FAIL: Non-default process 'python' should be blocked without config")
+        failed += 1
+
+    # Test 3: Extra processes allowed when passed
+    allowed, reason = validate_pkill_command("pkill python", extra_processes={"python"})
+    if allowed:
+        print("  PASS: Extra process 'python' allowed when configured")
+        passed += 1
+    else:
+        print(f"  FAIL: Extra process 'python' should be allowed when configured: {reason}")
+        failed += 1
+
+    # Test 4: Default processes still work with extra processes
+    allowed, reason = validate_pkill_command("pkill npm", extra_processes={"python"})
+    if allowed:
+        print("  PASS: Default process 'npm' still works with extra processes")
+        passed += 1
+    else:
+        print(f"  FAIL: Default process should still work: {reason}")
+        failed += 1
+
+    # Test 5: Test get_effective_pkill_processes with org config
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                org_dir = Path(tmphome) / ".autoforge"
+                org_dir.mkdir()
+                org_config_path = org_dir / "config.yaml"
+
+                # Create org config with extra pkill processes
+                org_config_path.write_text("""version: 1
+pkill_processes:
+  - python
+  - uvicorn
+""")
+
+                project_dir = Path(tmpproject)
+                processes = get_effective_pkill_processes(project_dir)
+
+                # Should include defaults + org processes
+                if "node" in processes and "python" in processes and "uvicorn" in processes:
+                    print("  PASS: Org pkill_processes merged with defaults")
+                    passed += 1
+                else:
+                    print(f"  FAIL: Expected node, python, uvicorn in {processes}")
+                    failed += 1
+
+    # Test 6: Test get_effective_pkill_processes with project config
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                project_dir = Path(tmpproject)
+                project_autoforge = project_dir / ".autoforge"
+                project_autoforge.mkdir()
+                project_config = project_autoforge / "allowed_commands.yaml"
+
+                # Create project config with extra pkill processes
+                project_config.write_text("""version: 1
+commands: []
+pkill_processes:
+  - gunicorn
+  - flask
+""")
+
+                processes = get_effective_pkill_processes(project_dir)
+
+                # Should include defaults + project processes
+                if "node" in processes and "gunicorn" in processes and "flask" in processes:
+                    print("  PASS: Project pkill_processes merged with defaults")
+                    passed += 1
+                else:
+                    print(f"  FAIL: Expected node, gunicorn, flask in {processes}")
+                    failed += 1
+
+    # Test 7: Integration test - pkill python blocked by default
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                project_dir = Path(tmpproject)
+                input_data = {"tool_name": "Bash", "tool_input": {"command": "pkill python"}}
+                context = {"project_dir": str(project_dir)}
+                result = asyncio.run(bash_security_hook(input_data, context=context))
+
+                if result.get("decision") == "block":
+                    print("  PASS: pkill python blocked without config")
+                    passed += 1
+                else:
+                    print("  FAIL: pkill python should be blocked without config")
+                    failed += 1
+
+    # Test 8: Integration test - pkill python allowed with org config
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                org_dir = Path(tmphome) / ".autoforge"
+                org_dir.mkdir()
+                org_config_path = org_dir / "config.yaml"
+
+                org_config_path.write_text("""version: 1
+pkill_processes:
+  - python
+""")
+
+                project_dir = Path(tmpproject)
+                input_data = {"tool_name": "Bash", "tool_input": {"command": "pkill python"}}
+                context = {"project_dir": str(project_dir)}
+                result = asyncio.run(bash_security_hook(input_data, context=context))
+
+                if result.get("decision") != "block":
+                    print("  PASS: pkill python allowed with org config")
+                    passed += 1
+                else:
+                    print(f"  FAIL: pkill python should be allowed with org config: {result}")
+                    failed += 1
+
+    # Test 9: Regex metacharacters should be rejected in pkill_processes
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                org_dir = Path(tmphome) / ".autoforge"
+                org_dir.mkdir()
+                org_config_path = org_dir / "config.yaml"
+
+                # Try to register a regex pattern (should be rejected)
+                org_config_path.write_text("""version: 1
+pkill_processes:
+  - ".*"
+""")
+
+                config = load_org_config()
+                if config is None:
+                    print("  PASS: Regex pattern '.*' rejected in pkill_processes")
+                    passed += 1
+                else:
+                    print("  FAIL: Regex pattern '.*' should be rejected")
+                    failed += 1
+
+    # Test 10: Valid process names with dots/underscores/hyphens should be accepted
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                org_dir = Path(tmphome) / ".autoforge"
+                org_dir.mkdir()
+                org_config_path = org_dir / "config.yaml"
+
+                # Valid names with special chars
+                org_config_path.write_text("""version: 1
+pkill_processes:
+  - my-app
+  - app_server
+  - node.js
+""")
+
+                config = load_org_config()
+                if config is not None and config.get("pkill_processes") == ["my-app", "app_server", "node.js"]:
+                    print("  PASS: Valid process names with dots/underscores/hyphens accepted")
+                    passed += 1
+                else:
+                    print(f"  FAIL: Valid process names should be accepted: {config}")
+                    failed += 1
+
+    # Test 11: Names with spaces should be rejected
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            with temporary_home(tmphome):
+                org_dir = Path(tmphome) / ".autoforge"
+                org_dir.mkdir()
+                org_config_path = org_dir / "config.yaml"
+
+                org_config_path.write_text("""version: 1
+pkill_processes:
+  - "my app"
+""")
+
+                config = load_org_config()
+                if config is None:
+                    print("  PASS: Process name with space rejected")
+                    passed += 1
+                else:
+                    print("  FAIL: Process name with space should be rejected")
+                    failed += 1
+
+    # Test 12: Multiple patterns - all must be allowed (BSD behavior)
+    # On BSD, "pkill node sshd" would kill both, so we must validate all patterns
+    allowed, reason = validate_pkill_command("pkill node npm")
+    if allowed:
+        print("  PASS: Multiple allowed patterns accepted")
+        passed += 1
+    else:
+        print(f"  FAIL: Multiple allowed patterns should be accepted: {reason}")
+        failed += 1
+
+    # Test 13: Multiple patterns - block if any is disallowed
+    allowed, reason = validate_pkill_command("pkill node sshd")
+    if not allowed:
+        print("  PASS: Multiple patterns blocked when one is disallowed")
+        passed += 1
+    else:
+        print("  FAIL: Should block when any pattern is disallowed")
+        failed += 1
+
+    # Test 14: Multiple patterns - only first allowed, second disallowed
+    allowed, reason = validate_pkill_command("pkill npm python")
+    if not allowed:
+        print("  PASS: Multiple patterns blocked (first allowed, second not)")
+        passed += 1
+    else:
+        print("  FAIL: Should block when second pattern is disallowed")
+        failed += 1
+
+    return passed, failed
+
+
+def test_playwright_cli_validation():
+    """Test playwright-cli subcommand validation."""
+    print("\nTesting playwright-cli validation:\n")
+    passed = 0
+    failed = 0
+
+    # Test cases: (command, should_be_allowed, description)
+    test_cases = [
+        # Allowed cases
+        ("playwright-cli screenshot", True, "screenshot allowed"),
+        ("playwright-cli snapshot", True, "snapshot allowed"),
+        ("playwright-cli click e5", True, "click with ref"),
+        ("playwright-cli open http://localhost:3000", True, "open URL"),
+        ("playwright-cli -s=agent-1 click e5", True, "session flag with click"),
+        ("playwright-cli close", True, "close browser"),
+        ("playwright-cli goto http://localhost:3000/page", True, "goto URL"),
+        ("playwright-cli fill e3 'test value'", True, "fill form field"),
+        ("playwright-cli console", True, "console messages"),
+        # Blocked cases
+        ("playwright-cli run-code 'await page.evaluate(() => {})'", False, "run-code blocked"),
+        ("playwright-cli eval 'document.title'", False, "eval blocked"),
+        ("playwright-cli -s=test eval 'document.title'", False, "eval with session flag blocked"),
+    ]
+
+    for cmd, should_allow, description in test_cases:
+        allowed, reason = validate_playwright_command(cmd)
+        if allowed == should_allow:
+            print(f"  PASS: {cmd!r} ({description})")
+            passed += 1
+        else:
+            expected = "allowed" if should_allow else "blocked"
+            actual = "allowed" if allowed else "blocked"
+            print(f"  FAIL: {cmd!r} ({description})")
+            print(f"         Expected: {expected}, Got: {actual}")
+            if reason:
+                print(f"         Reason: {reason}")
+            failed += 1
+
+    # Integration test: verify through the security hook
+    print("\n  Integration tests (via security hook):\n")
+
+    # playwright-cli screenshot should be allowed
+    input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli screenshot"}}
+    result = asyncio.run(bash_security_hook(input_data))
+    if result.get("decision") != "block":
+        print("  PASS: playwright-cli screenshot allowed via hook")
+        passed += 1
+    else:
+        print(f"  FAIL: playwright-cli screenshot should be allowed: {result.get('reason')}")
+        failed += 1
+
+    # playwright-cli run-code should be blocked
+    input_data = {"tool_name": "Bash", "tool_input": {"command": "playwright-cli run-code 'code'"}}
+    result = asyncio.run(bash_security_hook(input_data))
+    if result.get("decision") == "block":
+        print("  PASS: playwright-cli run-code blocked via hook")
+        passed += 1
+    else:
+        print("  FAIL: playwright-cli run-code should be blocked via hook")
+        failed += 1
+
+    return passed, failed
+
+
 def main():
    print("=" * 70)
    print("  SECURITY HOOK TESTS")
@@ -733,32 +1051,40 @@ def main():
    passed += org_block_passed
    failed += org_block_failed

+    # Test pkill process extensibility
+    pkill_passed, pkill_failed = test_pkill_extensibility()
+    passed += pkill_passed
+    failed += pkill_failed
+
+    # Test playwright-cli validation
+    pw_passed, pw_failed = test_playwright_cli_validation()
+    passed += pw_passed
+    failed += pw_failed
+
    # Commands that SHOULD be blocked
+    # Note: blocklisted commands (sudo, shutdown, dd, aws) are tested in
+    # test_blocklist_enforcement(). chmod validation is tested in
+    # test_validate_chmod(). init.sh validation is tested in
+    # test_validate_init_script(). pkill validation is tested in
+    # test_pkill_extensibility(). The entries below focus on scenarios
+    # NOT covered by those dedicated tests.
    print("\nCommands that should be BLOCKED:\n")
    dangerous = [
        # Not in allowlist - dangerous system commands
-        "shutdown now",
        "reboot",
-        "dd if=/dev/zero of=/dev/sda",
        # Not in allowlist - common commands excluded from minimal set
        "wget https://example.com",
        "python app.py",
        "killall node",
-        # pkill with non-dev processes
+        # pkill with non-dev processes (pkill python tested in test_pkill_extensibility)
        "pkill bash",
        "pkill chrome",
-        "pkill python",
        # Shell injection attempts
        "$(echo pkill) node",
        'eval "pkill node"',
-        # chmod with disallowed modes
-        "chmod 777 file.sh",
-        "chmod 755 file.sh",
-        "chmod +w file.sh",
-        "chmod -R +x dir/",
-        # Non-init.sh scripts
-        "./setup.sh",
-        "./malicious.sh",
+        # playwright-cli dangerous subcommands
+        "playwright-cli run-code 'await page.goto(\"http://evil.com\")'",
+        "playwright-cli eval 'document.cookie'",
    ]

    for cmd in dangerous:
@@ -768,6 +1094,10 @@ def main():
            failed += 1

    # Commands that SHOULD be allowed
+    # Note: chmod +x variants are tested in test_validate_chmod().
+    # init.sh variants are tested in test_validate_init_script().
+    # The combined "chmod +x init.sh && ./init.sh" below serves as the
+    # integration test verifying the hook routes to both validators correctly.
    print("\nCommands that should be ALLOWED:\n")
    safe = [
        # File inspection
@@ -818,17 +1148,14 @@ def main():
        "ls | grep test",
        # Full paths
        "/usr/local/bin/node app.js",
-        # chmod +x (allowed)
-        "chmod +x init.sh",
-        "chmod +x script.sh",
-        "chmod u+x init.sh",
-        "chmod a+x init.sh",
-        # init.sh execution (allowed)
-        "./init.sh",
-        "./init.sh --production",
-        "/path/to/init.sh",
-        # Combined chmod and init.sh
+        # Combined chmod and init.sh (integration test for both validators)
        "chmod +x init.sh && ./init.sh",
+        # Playwright CLI allowed commands
+        "playwright-cli open http://localhost:3000",
+        "playwright-cli screenshot",
+        "playwright-cli snapshot",
+        "playwright-cli click e5",
+        "playwright-cli -s=agent-1 close",
    ]

    for cmd in safe:
--- a/test_security_integration.py
+++ b/test_security_integration.py
@@ -79,9 +79,9 @@ def test_blocked_command_via_hook():
        project_dir = Path(tmpdir)

        # Create minimal project structure
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
-        (autocoder_dir / "allowed_commands.yaml").write_text(
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()
+        (autoforge_dir / "allowed_commands.yaml").write_text(
            "version: 1\ncommands: []"
        )

@@ -114,9 +114,9 @@ def test_allowed_command_via_hook():
        project_dir = Path(tmpdir)

        # Create minimal project structure
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
-        (autocoder_dir / "allowed_commands.yaml").write_text(
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()
+        (autoforge_dir / "allowed_commands.yaml").write_text(
            "version: 1\ncommands: []"
        )

@@ -145,9 +145,9 @@ def test_non_allowed_command_via_hook():
        project_dir = Path(tmpdir)

        # Create minimal project structure
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
-        (autocoder_dir / "allowed_commands.yaml").write_text(
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()
+        (autoforge_dir / "allowed_commands.yaml").write_text(
            "version: 1\ncommands: []"
        )

@@ -179,9 +179,9 @@ def test_project_config_allows_command():
        project_dir = Path(tmpdir)

        # Create project config with swift allowed
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
-        (autocoder_dir / "allowed_commands.yaml").write_text("""version: 1
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()
+        (autoforge_dir / "allowed_commands.yaml").write_text("""version: 1
 commands:
  - name: swift
    description: Swift compiler
@@ -214,9 +214,9 @@ def test_pattern_matching():
        project_dir = Path(tmpdir)

        # Create project config with swift* pattern
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
-        (autocoder_dir / "allowed_commands.yaml").write_text("""version: 1
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()
+        (autoforge_dir / "allowed_commands.yaml").write_text("""version: 1
 commands:
  - name: swift*
    description: All Swift tools
@@ -247,7 +247,7 @@ def test_org_blocklist_enforcement():
        with tempfile.TemporaryDirectory() as tmpproject:
            # Use context manager to safely set and restore HOME
            with temporary_home(tmphome):
-                org_dir = Path(tmphome) / ".autocoder"
+                org_dir = Path(tmphome) / ".autoforge"
                org_dir.mkdir()
                (org_dir / "config.yaml").write_text("""version: 1
 allowed_commands: []
@@ -257,11 +257,11 @@ blocked_commands:
 """)

                project_dir = Path(tmpproject)
-                autocoder_dir = project_dir / ".autocoder"
-                autocoder_dir.mkdir()
+                autoforge_dir = project_dir / ".autoforge"
+                autoforge_dir.mkdir()

                # Try to allow terraform in project config (should fail - org blocked)
-                (autocoder_dir / "allowed_commands.yaml").write_text("""version: 1
+                (autoforge_dir / "allowed_commands.yaml").write_text("""version: 1
 commands:
  - name: terraform
    description: Infrastructure as code
@@ -295,7 +295,7 @@ def test_org_allowlist_inheritance():
        with tempfile.TemporaryDirectory() as tmpproject:
            # Use context manager to safely set and restore HOME
            with temporary_home(tmphome):
-                org_dir = Path(tmphome) / ".autocoder"
+                org_dir = Path(tmphome) / ".autoforge"
                org_dir.mkdir()
                (org_dir / "config.yaml").write_text("""version: 1
 allowed_commands:
@@ -305,9 +305,9 @@ blocked_commands: []
 """)

                project_dir = Path(tmpproject)
-                autocoder_dir = project_dir / ".autocoder"
-                autocoder_dir.mkdir()
-                (autocoder_dir / "allowed_commands.yaml").write_text(
+                autoforge_dir = project_dir / ".autoforge"
+                autoforge_dir.mkdir()
+                (autoforge_dir / "allowed_commands.yaml").write_text(
                    "version: 1\ncommands: []"
                )

@@ -336,9 +336,9 @@ def test_invalid_yaml_ignored():
        project_dir = Path(tmpdir)

        # Create invalid YAML
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
-        (autocoder_dir / "allowed_commands.yaml").write_text("invalid: yaml: content:")
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()
+        (autoforge_dir / "allowed_commands.yaml").write_text("invalid: yaml: content:")

        # Try to run ls (should still work - falls back to defaults)
        input_data = {"tool_name": "Bash", "tool_input": {"command": "ls"}}
@@ -365,13 +365,13 @@ def test_100_command_limit():
        project_dir = Path(tmpdir)

        # Create config with 101 commands
-        autocoder_dir = project_dir / ".autocoder"
-        autocoder_dir.mkdir()
+        autoforge_dir = project_dir / ".autoforge"
+        autoforge_dir.mkdir()

        commands = [
            f"  - name: cmd{i}\n    description: Command {i}" for i in range(101)
        ]
-        (autocoder_dir / "allowed_commands.yaml").write_text(
+        (autoforge_dir / "allowed_commands.yaml").write_text(
            "version: 1\ncommands:\n" + "\n".join(commands)
        )

--- a/ui/components.json
+++ b/ui/components.json
@@ -0,0 +1,22 @@
+{
+  "$schema": "https://ui.shadcn.com/schema.json",
+  "style": "new-york",
+  "rsc": false,
+  "tsx": true,
+  "tailwind": {
+    "config": "",
+    "css": "src/styles/globals.css",
+    "baseColor": "neutral",
+    "cssVariables": true,
+    "prefix": ""
+  },
+  "iconLibrary": "lucide",
+  "aliases": {
+    "components": "@/components",
+    "utils": "@/lib/utils",
+    "ui": "@/components/ui",
+    "lib": "@/lib",
+    "hooks": "@/hooks"
+  },
+  "registries": {}
+}
--- a/ui/e2e/tooltip.spec.ts
+++ b/ui/e2e/tooltip.spec.ts
@@ -0,0 +1,47 @@
+import { test, expect } from '@playwright/test'
+
+/**
+ * E2E tooltip tests for header icon buttons.
+ *
+ * Run tests:
+ *   cd ui && npm run test:e2e
+ *   cd ui && npm run test:e2e -- tooltip.spec.ts
+ */
+test.describe('Header tooltips', () => {
+  test.setTimeout(30000)
+
+  test.beforeEach(async ({ page }) => {
+    await page.goto('/')
+    await page.waitForSelector('button:has-text("Select Project")', { timeout: 10000 })
+  })
+
+  async function selectProject(page: import('@playwright/test').Page) {
+    const projectSelector = page.locator('button:has-text("Select Project")')
+    if (await projectSelector.isVisible()) {
+      await projectSelector.click()
+      const items = page.locator('.neo-dropdown-item')
+      const itemCount = await items.count()
+      if (itemCount === 0) return false
+      await items.first().click()
+      await expect(projectSelector).not.toBeVisible({ timeout: 5000 }).catch(() => {})
+      return true
+    }
+    return false
+  }
+
+  test('Settings tooltip shows on hover', async ({ page }) => {
+    const hasProject = await selectProject(page)
+    if (!hasProject) {
+      test.skip(true, 'No projects available')
+      return
+    }
+
+    const settingsButton = page.locator('button[aria-label="Open Settings"]')
+    await expect(settingsButton).toBeVisible()
+
+    await settingsButton.hover()
+
+    const tooltip = page.locator('[data-slot="tooltip-content"]', { hasText: 'Settings' })
+    await expect(tooltip).toBeVisible({ timeout: 2000 })
+  })
+})
--- a/ui/index.html
+++ b/ui/index.html
@@ -2,12 +2,12 @@
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
-    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <link rel="icon" type="image/png" href="/logo.png" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <title>AutoCoder</title>
+    <title>AutoForge</title>
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
-    <link href="https://fonts.googleapis.com/css2?family=Archivo+Black&family=Work+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600&display=swap" rel="stylesheet">
+    <link href="https://fonts.googleapis.com/css2?family=Archivo+Black&family=Work+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600&family=DM+Sans:wght@400;500;700&family=Space+Mono:wght@400;700&family=Outfit:wght@400;500;600;700&family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
  </head>
  <body>
    <div id="root"></div>
--- a/ui/package-lock.json
+++ b/ui/package-lock.json
--- a/ui/package.json
+++ b/ui/package.json
@@ -1,5 +1,5 @@
 {
-  "name": "autocoder",
+  "name": "autoforge-ui",
  "private": true,
  "version": "1.0.0",
  "type": "module",
@@ -12,37 +12,49 @@
    "test:e2e:ui": "playwright test --ui"
  },
  "dependencies": {
-    "@radix-ui/react-dialog": "^1.1.2",
-    "@radix-ui/react-dropdown-menu": "^2.1.2",
-    "@radix-ui/react-tooltip": "^1.1.3",
-    "@tanstack/react-query": "^5.60.0",
+    "@radix-ui/react-checkbox": "^1.3.3",
+    "@radix-ui/react-dialog": "^1.1.15",
+    "@radix-ui/react-dropdown-menu": "^2.1.16",
+    "@radix-ui/react-label": "^2.1.8",
+    "@radix-ui/react-separator": "^1.1.8",
+    "@radix-ui/react-slot": "^1.2.4",
+    "@radix-ui/react-switch": "^1.2.6",
+    "@radix-ui/react-tooltip": "^1.2.8",
+    "@tanstack/react-query": "^5.72.0",
    "@xterm/addon-fit": "^0.11.0",
    "@xterm/addon-web-links": "^0.12.0",
    "@xterm/xterm": "^6.0.0",
    "@xyflow/react": "^12.10.0",
+    "autoforge-ai": "file:..",
    "canvas-confetti": "^1.9.4",
+    "class-variance-authority": "^0.7.1",
    "clsx": "^2.1.1",
    "dagre": "^0.8.5",
-    "lucide-react": "^0.460.0",
-    "react": "^18.3.1",
-    "react-dom": "^18.3.1"
+    "lucide-react": "^0.475.0",
+    "react": "^19.0.0",
+    "react-dom": "^19.0.0",
+    "react-markdown": "^10.1.0",
+    "remark-gfm": "^4.0.1",
+    "tailwind-merge": "^3.4.0"
  },
  "devDependencies": {
-    "@eslint/js": "^9.13.0",
+    "@eslint/js": "^9.19.0",
    "@playwright/test": "^1.57.0",
-    "@tailwindcss/vite": "^4.0.0-beta.4",
+    "@tailwindcss/vite": "^4.1.0",
    "@types/canvas-confetti": "^1.9.0",
    "@types/dagre": "^0.7.53",
-    "@types/react": "^18.3.12",
-    "@types/react-dom": "^18.3.1",
-    "@vitejs/plugin-react": "^4.3.3",
-    "eslint": "^9.13.0",
-    "eslint-plugin-react-hooks": "^5.0.0",
-    "eslint-plugin-react-refresh": "^0.4.14",
-    "globals": "^15.11.0",
-    "tailwindcss": "^4.0.0-beta.4",
-    "typescript": "~5.6.2",
-    "typescript-eslint": "^8.11.0",
-    "vite": "^5.4.10"
+    "@types/node": "^22.12.0",
+    "@types/react": "^19.0.0",
+    "@types/react-dom": "^19.0.0",
+    "@vitejs/plugin-react": "^4.4.0",
+    "eslint": "^9.19.0",
+    "eslint-plugin-react-hooks": "^5.1.0",
+    "eslint-plugin-react-refresh": "^0.4.19",
+    "globals": "^15.14.0",
+    "tailwindcss": "^4.1.0",
+    "tw-animate-css": "^1.4.0",
+    "typescript": "~5.7.3",
+    "typescript-eslint": "^8.23.0",
+    "vite": "^7.3.0"
  }
 }
--- a/ui/public/logo.png
+++ b/ui/public/logo.png
--- a/ui/public/ollama.png
+++ b/ui/public/ollama.png
--- a/ui/src/App.tsx
+++ b/ui/src/App.tsx
@@ -4,6 +4,7 @@ import { useProjects, useFeatures, useAgentStatus, useSettings } from './hooks/u
 import { useProjectWebSocket } from './hooks/useWebSocket'
 import { useFeatureSound } from './hooks/useFeatureSound'
 import { useCelebration } from './hooks/useCelebration'
+import { useTheme } from './hooks/useTheme'
 import { ProjectSelector } from './components/ProjectSelector'
 import { KanbanBoard } from './components/KanbanBoard'
 import { AgentControl } from './components/AgentControl'
@@ -12,7 +13,6 @@ import { SetupWizard } from './components/SetupWizard'
 import { AddFeatureForm } from './components/AddFeatureForm'
 import { FeatureModal } from './components/FeatureModal'
 import { DebugLogViewer, type TabType } from './components/DebugLogViewer'
-import { AgentThought } from './components/AgentThought'
 import { AgentMissionControl } from './components/AgentMissionControl'
 import { CelebrationOverlay } from './components/CelebrationOverlay'
 import { AssistantFAB } from './components/AssistantFAB'
@@ -24,13 +24,24 @@ import { DevServerControl } from './components/DevServerControl'
 import { ViewToggle, type ViewMode } from './components/ViewToggle'
 import { DependencyGraph } from './components/DependencyGraph'
 import { KeyboardShortcutsHelp } from './components/KeyboardShortcutsHelp'
-import { getDependencyGraph } from './lib/api'
-import { Loader2, Settings, Moon, Sun } from 'lucide-react'
+import { ThemeSelector } from './components/ThemeSelector'
+import { ResetProjectModal } from './components/ResetProjectModal'
+import { ProjectSetupRequired } from './components/ProjectSetupRequired'
+import { getDependencyGraph, startAgent } from './lib/api'
+import { Loader2, Settings, Moon, Sun, RotateCcw, BookOpen } from 'lucide-react'
 import type { Feature } from './lib/types'
+import { Button } from '@/components/ui/button'
+import { Card, CardContent } from '@/components/ui/card'
+import { Badge } from '@/components/ui/badge'
+import { TooltipProvider, Tooltip, TooltipTrigger, TooltipContent } from '@/components/ui/tooltip'

-const STORAGE_KEY = 'autocoder-selected-project'
-const DARK_MODE_KEY = 'autocoder-dark-mode'
-const VIEW_MODE_KEY = 'autocoder-view-mode'
+const STORAGE_KEY = 'autoforge-selected-project'
+const VIEW_MODE_KEY = 'autoforge-view-mode'
+
+// Bottom padding for main content when debug panel is collapsed (40px header + 8px margin)
+const COLLAPSED_DEBUG_PANEL_CLEARANCE = 48
+
+type InitializerStatus = 'idle' | 'starting' | 'error'

 function App() {
  // Initialize selected project from localStorage
@@ -52,14 +63,10 @@ function App() {
  const [showSettings, setShowSettings] = useState(false)
  const [showKeyboardHelp, setShowKeyboardHelp] = useState(false)
  const [isSpecCreating, setIsSpecCreating] = useState(false)
+  const [showResetModal, setShowResetModal] = useState(false)
  const [showSpecChat, setShowSpecChat] = useState(false)  // For "Create Spec" button in empty kanban
-  const [darkMode, setDarkMode] = useState(() => {
-    try {
-      return localStorage.getItem(DARK_MODE_KEY) === 'true'
-    } catch {
-      return false
-    }
-  })
+  const [specInitializerStatus, setSpecInitializerStatus] = useState<InitializerStatus>('idle')
+  const [specInitializerError, setSpecInitializerError] = useState<string | null>(null)
  const [viewMode, setViewMode] = useState<ViewMode>(() => {
    try {
      const stored = localStorage.getItem(VIEW_MODE_KEY)
@@ -75,6 +82,7 @@ function App() {
  const { data: settings } = useSettings()
  useAgentStatus(selectedProject) // Keep polling for status updates
  const wsState = useProjectWebSocket(selectedProject)
+  const { theme, setTheme, darkMode, toggleDarkMode, themes } = useTheme()

  // Get has_spec from the selected project
  const selectedProjectData = projects?.find(p => p.name === selectedProject)
@@ -88,20 +96,6 @@ function App() {
    refetchInterval: 5000, // Refresh every 5 seconds
  })

-  // Apply dark mode class to document
-  useEffect(() => {
-    if (darkMode) {
-      document.documentElement.classList.add('dark')
-    } else {
-      document.documentElement.classList.remove('dark')
-    }
-    try {
-      localStorage.setItem(DARK_MODE_KEY, String(darkMode))
-    } catch {
-      // localStorage not available
-    }
-  }, [darkMode])
-
  // Persist view mode to localStorage
  useEffect(() => {
    try {
@@ -136,7 +130,8 @@ function App() {
    const allFeatures = [
      ...(features?.pending ?? []),
      ...(features?.in_progress ?? []),
-      ...(features?.done ?? [])
+      ...(features?.done ?? []),
+      ...(features?.needs_human_input ?? [])
    ]
    const feature = allFeatures.find(f => f.id === nodeId)
    if (feature) setSelectedFeature(feature)
@@ -185,9 +180,9 @@ function App() {
        setShowAddFeature(true)
      }

-      // E : Expand project with AI (when project selected and has features)
-      if ((e.key === 'e' || e.key === 'E') && selectedProject && features &&
-          (features.pending.length + features.in_progress.length + features.done.length) > 0) {
+      // E : Expand project with AI (when project selected, has spec and has features)
+      if ((e.key === 'e' || e.key === 'E') && selectedProject && hasSpec && features &&
+          (features.pending.length + features.in_progress.length + features.done.length + (features.needs_human_input?.length || 0)) > 0) {
        e.preventDefault()
        setShowExpandProject(true)
      }
@@ -216,10 +211,18 @@ function App() {
        setShowKeyboardHelp(true)
      }

+      // R : Open reset modal (when project selected and agent not running/draining)
+      if ((e.key === 'r' || e.key === 'R') && selectedProject && !['running', 'pausing', 'paused_graceful'].includes(wsState.agentStatus)) {
+        e.preventDefault()
+        setShowResetModal(true)
+      }
+
      // Escape : Close modals
      if (e.key === 'Escape') {
        if (showKeyboardHelp) {
          setShowKeyboardHelp(false)
+        } else if (showResetModal) {
+          setShowResetModal(false)
        } else if (showExpandProject) {
          setShowExpandProject(false)
        } else if (showSettings) {
@@ -238,12 +241,12 @@ function App() {

    window.addEventListener('keydown', handleKeyDown)
    return () => window.removeEventListener('keydown', handleKeyDown)
-  }, [selectedProject, showAddFeature, showExpandProject, selectedFeature, debugOpen, debugActiveTab, assistantOpen, features, showSettings, showKeyboardHelp, isSpecCreating, viewMode])
+  }, [selectedProject, showAddFeature, showExpandProject, selectedFeature, debugOpen, debugActiveTab, assistantOpen, features, showSettings, showKeyboardHelp, isSpecCreating, viewMode, showResetModal, wsState.agentStatus, hasSpec])

  // Combine WebSocket progress with feature data
  const progress = wsState.progress.total > 0 ? wsState.progress : {
    passing: features?.done.length ?? 0,
-    total: (features?.pending.length ?? 0) + (features?.in_progress.length ?? 0) + (features?.done.length ?? 0),
+    total: (features?.pending.length ?? 0) + (features?.in_progress.length ?? 0) + (features?.done.length ?? 0) + (features?.needs_human_input?.length ?? 0),
    percentage: 0,
  }

@@ -256,18 +259,22 @@ function App() {
  }

  return (
-    <div className="min-h-screen bg-neo-bg">
+    <div className="min-h-screen bg-background">
      {/* Header */}
-      <header className="bg-neo-card text-neo-text border-b-4 border-neo-border">
-        <div className="max-w-7xl mx-auto px-4 py-4">
-          <div className="flex items-center justify-between">
-            {/* Logo and Title */}
-            <h1 className="font-display text-2xl font-bold tracking-tight uppercase">
-              AutoCoder
-            </h1>
+      <header className="sticky top-0 z-50 bg-card/80 backdrop-blur-md text-foreground border-b-2 border-border">
+        <div className="max-w-7xl mx-auto px-4 py-3">
+          <TooltipProvider>
+            {/* Row 1: Branding + Project + Utility icons */}
+            <div className="flex items-center gap-3">
+              {/* Logo and Title */}
+              <div className="flex items-center gap-2 shrink-0">
+                <img src="/logo.png" alt="AutoForge" className="h-9 w-9 rounded-full" />
+                <h1 className="font-display text-2xl font-bold tracking-tight uppercase hidden md:block">
+                  AutoForge
+                </h1>
+              </div>

-            {/* Controls */}
-            <div className="flex items-center gap-4">
+              {/* Project selector */}
              <ProjectSelector
                projects={projects ?? []}
                selectedProject={selectedProject}
@@ -276,68 +283,141 @@ function App() {
                onSpecCreatingChange={setIsSpecCreating}
              />

-              {selectedProject && (
-                <>
-                  <AgentControl
-                    projectName={selectedProject}
-                    status={wsState.agentStatus}
-                  />
+              {/* Spacer */}
+              <div className="flex-1" />

-                  <DevServerControl
-                    projectName={selectedProject}
-                    status={wsState.devServerStatus}
-                    url={wsState.devServerUrl}
-                  />
-
-                  <button
-                    onClick={() => setShowSettings(true)}
-                    className="neo-btn text-sm py-2 px-3"
-                    title="Settings (,)"
-                    aria-label="Open Settings"
-                  >
-                    <Settings size={18} />
-                  </button>
-
-                  {/* GLM Mode Badge */}
-                  {settings?.glm_mode && (
-                    <span
-                      className="px-2 py-1 text-xs font-bold bg-[var(--color-neo-glm)] text-white rounded border-2 border-neo-border shadow-neo-sm"
-                      title="Using GLM API (configured via .env)"
-                    >
-                      GLM
-                    </span>
-                  )}
-                </>
+              {/* Ollama Mode Indicator */}
+              {selectedProject && settings?.ollama_mode && (
+                <div
+                  className="hidden sm:flex items-center gap-1.5 px-2 py-1 bg-card rounded border-2 border-border shadow-sm"
+                  title="Using Ollama local models"
+                >
+                  <img src="/ollama.png" alt="Ollama" className="w-5 h-5" />
+                  <span className="text-xs font-bold text-foreground">Ollama</span>
+                </div>
              )}

-              {/* Dark mode toggle - always visible */}
-              <button
-                onClick={() => setDarkMode(!darkMode)}
-                className="neo-btn text-sm py-2 px-3"
-                title="Toggle dark mode"
-                aria-label="Toggle dark mode"
-              >
-                {darkMode ? <Sun size={18} /> : <Moon size={18} />}
-              </button>
+              {/* GLM Mode Badge */}
+              {selectedProject && settings?.glm_mode && (
+                <Badge
+                  className="hidden sm:inline-flex bg-purple-500 text-white hover:bg-purple-600"
+                  title="Using GLM API"
+                >
+                  GLM
+                </Badge>
+              )}
+
+              {/* Utility icons - always visible */}
+              <Tooltip>
+                <TooltipTrigger asChild>
+                  <Button
+                    onClick={() => window.open('https://autoforge.cc', '_blank')}
+                    variant="outline"
+                    size="sm"
+                    aria-label="Open Documentation"
+                  >
+                    <BookOpen size={18} />
+                  </Button>
+                </TooltipTrigger>
+                <TooltipContent>Docs</TooltipContent>
+              </Tooltip>
+
+              <ThemeSelector
+                themes={themes}
+                currentTheme={theme}
+                onThemeChange={setTheme}
+              />
+
+              <Tooltip>
+                <TooltipTrigger asChild>
+                  <Button
+                    onClick={toggleDarkMode}
+                    variant="outline"
+                    size="sm"
+                    aria-label="Toggle dark mode"
+                  >
+                    {darkMode ? <Sun size={18} /> : <Moon size={18} />}
+                  </Button>
+                </TooltipTrigger>
+                <TooltipContent>Toggle theme</TooltipContent>
+              </Tooltip>
            </div>
-          </div>
+
+            {/* Row 2: Project controls - only when a project is selected */}
+            {selectedProject && (
+              <div className="flex items-center gap-3 mt-2 pt-2 border-t border-border/50">
+                <AgentControl
+                  projectName={selectedProject}
+                  status={wsState.agentStatus}
+                  defaultConcurrency={selectedProjectData?.default_concurrency}
+                />
+
+                <DevServerControl
+                  projectName={selectedProject}
+                  status={wsState.devServerStatus}
+                  url={wsState.devServerUrl}
+                />
+
+                <div className="flex-1" />
+
+                <Tooltip>
+                  <TooltipTrigger asChild>
+                    <Button
+                      onClick={() => setShowSettings(true)}
+                      variant="outline"
+                      size="sm"
+                      aria-label="Open Settings"
+                    >
+                      <Settings size={18} />
+                    </Button>
+                  </TooltipTrigger>
+                  <TooltipContent>Settings (,)</TooltipContent>
+                </Tooltip>
+
+                <Tooltip>
+                  <TooltipTrigger asChild>
+                    <Button
+                      onClick={() => setShowResetModal(true)}
+                      variant="outline"
+                      size="sm"
+                      aria-label="Reset Project"
+                      disabled={['running', 'pausing', 'paused_graceful'].includes(wsState.agentStatus)}
+                    >
+                      <RotateCcw size={18} />
+                    </Button>
+                  </TooltipTrigger>
+                  <TooltipContent>Reset (R)</TooltipContent>
+                </Tooltip>
+              </div>
+            )}
+          </TooltipProvider>
        </div>
      </header>

      {/* Main Content */}
      <main
        className="max-w-7xl mx-auto px-4 py-8"
-        style={{ paddingBottom: debugOpen ? debugPanelHeight + 32 : undefined }}
+        style={{ paddingBottom: debugOpen ? debugPanelHeight + 32 : COLLAPSED_DEBUG_PANEL_CLEARANCE }}
      >
        {!selectedProject ? (
-          <div className="neo-empty-state mt-12">
+          <div className="text-center mt-12">
            <h2 className="font-display text-2xl font-bold mb-2">
-              Welcome to AutoCoder
+              Welcome to AutoForge
            </h2>
-            <p className="text-neo-text-secondary mb-4">
+            <p className="text-muted-foreground mb-4">
              Select a project from the dropdown above or create a new one to get started.
            </p>
          </div>
+        ) : !hasSpec ? (
+          <ProjectSetupRequired
+            projectName={selectedProject}
+            projectPath={selectedProjectData?.path}
+            onCreateWithClaude={() => setShowSpecChat(true)}
+            onEditManually={() => {
+              // Open debug panel for the user to see the project path
+              setDebugOpen(true)
+            }}
+          />
        ) : (
          <div className="space-y-8">
            {/* Progress Dashboard */}
@@ -346,6 +426,8 @@ function App() {
              total={progress.total}
              percentage={progress.percentage}
              isConnected={wsState.isConnected}
+              logs={wsState.activeAgents.length === 0 ? wsState.logs : undefined}
+              agentStatus={wsState.activeAgents.length === 0 ? wsState.agentStatus : undefined}
            />

            {/* Agent Mission Control - shows orchestrator status and active agents in parallel mode */}
@@ -356,33 +438,29 @@ function App() {
              getAgentLogs={wsState.getAgentLogs}
            />

-            {/* Agent Thought - shows latest agent narrative (single agent mode) */}
-            {wsState.activeAgents.length === 0 && (
-              <AgentThought
-                logs={wsState.logs}
-                agentStatus={wsState.agentStatus}
-              />
-            )}

            {/* Initializing Features State - show when agent is running but no features yet */}
            {features &&
             features.pending.length === 0 &&
             features.in_progress.length === 0 &&
             features.done.length === 0 &&
+             (features.needs_human_input?.length || 0) === 0 &&
             wsState.agentStatus === 'running' && (
-              <div className="neo-card p-8 text-center">
-                <Loader2 size={32} className="animate-spin mx-auto mb-4 text-neo-progress" />
-                <h3 className="font-display font-bold text-xl mb-2">
-                  Initializing Features...
-                </h3>
-                <p className="text-neo-text-secondary">
-                  The agent is reading your spec and creating features. This may take a moment.
-                </p>
-              </div>
+              <Card className="p-8 text-center">
+                <CardContent className="p-0">
+                  <Loader2 size={32} className="animate-spin mx-auto mb-4 text-primary" />
+                  <h3 className="font-display font-bold text-xl mb-2">
+                    Initializing Features...
+                  </h3>
+                  <p className="text-muted-foreground">
+                    The agent is reading your spec and creating features. This may take a moment.
+                  </p>
+                </CardContent>
+              </Card>
            )}

            {/* View Toggle - only show when there are features */}
-            {features && (features.pending.length + features.in_progress.length + features.done.length) > 0 && (
+            {features && (features.pending.length + features.in_progress.length + features.done.length + (features.needs_human_input?.length || 0)) > 0 && (
              <div className="flex justify-center">
                <ViewToggle viewMode={viewMode} onViewModeChange={setViewMode} />
              </div>
@@ -400,7 +478,7 @@ function App() {
                hasSpec={hasSpec}
              />
            ) : (
-              <div className="neo-card overflow-hidden" style={{ height: '600px' }}>
+              <Card className="overflow-hidden" style={{ height: '600px' }}>
                {graphData ? (
                  <DependencyGraph
                    graphData={graphData}
@@ -409,10 +487,10 @@ function App() {
                  />
                ) : (
                  <div className="h-full flex items-center justify-center">
-                    <Loader2 size={32} className="animate-spin text-neo-progress" />
+                    <Loader2 size={32} className="animate-spin text-primary" />
                  </div>
                )}
-              </div>
+              </Card>
            )}
          </div>
        )}
@@ -436,7 +514,7 @@ function App() {
      )}

      {/* Expand Project Modal - AI-powered bulk feature creation */}
-      {showExpandProject && selectedProject && (
+      {showExpandProject && selectedProject && hasSpec && (
        <ExpandProjectModal
          isOpen={showExpandProject}
          projectName={selectedProject}
@@ -450,17 +528,34 @@ function App() {

      {/* Spec Creation Chat - for creating spec from empty kanban */}
      {showSpecChat && selectedProject && (
-        <div className="fixed inset-0 z-50 bg-[var(--color-neo-bg)]">
+        <div className="fixed inset-0 z-50 bg-background">
          <SpecCreationChat
            projectName={selectedProject}
-            onComplete={() => {
-              setShowSpecChat(false)
-              // Refresh projects to update has_spec
-              queryClient.invalidateQueries({ queryKey: ['projects'] })
-              queryClient.invalidateQueries({ queryKey: ['features', selectedProject] })
+            onComplete={async (_specPath, yoloMode) => {
+              setSpecInitializerStatus('starting')
+              try {
+                await startAgent(selectedProject, {
+                  yoloMode: yoloMode ?? false,
+                  maxConcurrency: 3,
+                })
+                // Success — close chat and refresh
+                setShowSpecChat(false)
+                setSpecInitializerStatus('idle')
+                queryClient.invalidateQueries({ queryKey: ['projects'] })
+                queryClient.invalidateQueries({ queryKey: ['features', selectedProject] })
+              } catch (err) {
+                setSpecInitializerStatus('error')
+                setSpecInitializerError(err instanceof Error ? err.message : 'Failed to start agent')
+              }
+            }}
+            onCancel={() => { setShowSpecChat(false); setSpecInitializerStatus('idle') }}
+            onExitToProject={() => { setShowSpecChat(false); setSpecInitializerStatus('idle') }}
+            initializerStatus={specInitializerStatus}
+            initializerError={specInitializerError}
+            onRetryInitializer={() => {
+              setSpecInitializerError(null)
+              setSpecInitializerStatus('idle')
            }}
-            onCancel={() => setShowSpecChat(false)}
-            onExitToProject={() => setShowSpecChat(false)}
          />
        </div>
      )}
@@ -497,13 +592,24 @@ function App() {
      )}

      {/* Settings Modal */}
-      {showSettings && (
-        <SettingsModal onClose={() => setShowSettings(false)} />
-      )}
+      <SettingsModal isOpen={showSettings} onClose={() => setShowSettings(false)} />

      {/* Keyboard Shortcuts Help */}
-      {showKeyboardHelp && (
-        <KeyboardShortcutsHelp onClose={() => setShowKeyboardHelp(false)} />
+      <KeyboardShortcutsHelp isOpen={showKeyboardHelp} onClose={() => setShowKeyboardHelp(false)} />
+
+      {/* Reset Project Modal */}
+      {showResetModal && selectedProject && (
+        <ResetProjectModal
+          isOpen={showResetModal}
+          projectName={selectedProject}
+          onClose={() => setShowResetModal(false)}
+          onResetComplete={(wasFullReset) => {
+            // If full reset, the spec was deleted - show spec creation chat
+            if (wasFullReset) {
+              setShowSpecChat(true)
+            }
+          }}
+        />
      )}

      {/* Celebration Overlay - shows when a feature is completed by an agent */}
--- a/ui/src/components/ActivityFeed.tsx
+++ b/ui/src/components/ActivityFeed.tsx
@@ -1,6 +1,7 @@
 import { Activity } from 'lucide-react'
 import { AgentAvatar } from './AgentAvatar'
 import type { AgentMascot } from '../lib/types'
+import { Card, CardContent } from '@/components/ui/card'

 interface ActivityItem {
  agentName: string
@@ -38,8 +39,8 @@ export function ActivityFeed({ activities, maxItems = 5, showHeader = true }: Ac
    <div>
      {showHeader && (
        <div className="flex items-center gap-2 mb-2">
-          <Activity size={14} className="text-neo-text-secondary" />
-          <span className="text-xs font-bold text-neo-text-secondary uppercase tracking-wide">
+          <Activity size={14} className="text-muted-foreground" />
+          <span className="text-xs font-semibold text-muted-foreground uppercase tracking-wide">
            Recent Activity
          </span>
        </div>
@@ -47,34 +48,36 @@ export function ActivityFeed({ activities, maxItems = 5, showHeader = true }: Ac

      <div className="space-y-2">
        {displayedActivities.map((activity) => (
-          <div
+          <Card
            key={`${activity.featureId}-${activity.timestamp}-${activity.thought.slice(0, 20)}`}
-            className="flex items-start gap-2 py-1.5 px-2 rounded bg-[var(--color-neo-bg)] border border-neo-border/20"
+            className="py-1.5"
          >
-            <AgentAvatar
-              name={activity.agentName as AgentMascot}
-              state="working"
-              size="sm"
-            />
-            <div className="flex-1 min-w-0">
-              <div className="flex items-center gap-2">
-                <span className="text-xs font-bold" style={{
-                  color: getMascotColor(activity.agentName as AgentMascot)
-                }}>
-                  {activity.agentName}
-                </span>
-                <span className="text-[10px] text-neo-text-muted">
-                  #{activity.featureId}
-                </span>
-                <span className="text-[10px] text-neo-text-muted ml-auto">
-                  {formatTimestamp(activity.timestamp)}
-                </span>
+            <CardContent className="p-2 flex items-start gap-2">
+              <AgentAvatar
+                name={activity.agentName as AgentMascot}
+                state="working"
+                size="sm"
+              />
+              <div className="flex-1 min-w-0">
+                <div className="flex items-center gap-2">
+                  <span className="text-xs font-semibold" style={{
+                    color: getMascotColor(activity.agentName as AgentMascot)
+                  }}>
+                    {activity.agentName}
+                  </span>
+                  <span className="text-[10px] text-muted-foreground">
+                    #{activity.featureId}
+                  </span>
+                  <span className="text-[10px] text-muted-foreground ml-auto">
+                    {formatTimestamp(activity.timestamp)}
+                  </span>
+                </div>
+                <p className="text-xs text-muted-foreground truncate" title={activity.thought}>
+                  {activity.thought}
+                </p>
              </div>
-              <p className="text-xs text-neo-text-secondary truncate" title={activity.thought}>
-                {activity.thought}
-              </p>
-            </div>
-          </div>
+            </CardContent>
+          </Card>
        ))}
      </div>
    </div>
--- a/ui/src/components/AddFeatureForm.tsx
+++ b/ui/src/components/AddFeatureForm.tsx
@@ -1,6 +1,18 @@
 import { useState, useId } from 'react'
 import { X, Plus, Trash2, Loader2, AlertCircle } from 'lucide-react'
 import { useCreateFeature } from '../hooks/useProjects'
+import {
+  Dialog,
+  DialogContent,
+  DialogHeader,
+  DialogTitle,
+  DialogFooter,
+} from '@/components/ui/dialog'
+import { Button } from '@/components/ui/button'
+import { Input } from '@/components/ui/input'
+import { Textarea } from '@/components/ui/textarea'
+import { Label } from '@/components/ui/label'
+import { Alert, AlertDescription } from '@/components/ui/alert'

 interface Step {
  id: string
@@ -65,149 +77,135 @@ export function AddFeatureForm({ projectName, onClose }: AddFeatureFormProps) {
  const isValid = category.trim() && name.trim() && description.trim()

  return (
-    <div className="neo-modal-backdrop" onClick={onClose}>
-      <div
-        className="neo-modal w-full max-w-2xl"
-        onClick={(e) => e.stopPropagation()}
-      >
-        {/* Header */}
-        <div className="flex items-center justify-between p-6 border-b-3 border-[var(--color-neo-border)]">
-          <h2 className="font-display text-2xl font-bold">
-            Add Feature
-          </h2>
-          <button
-            onClick={onClose}
-            className="neo-btn neo-btn-ghost p-2"
-          >
-            <X size={24} />
-          </button>
-        </div>
+    <Dialog open={true} onOpenChange={(open) => !open && onClose()}>
+      <DialogContent className="sm:max-w-2xl">
+        <DialogHeader>
+          <DialogTitle>Add Feature</DialogTitle>
+        </DialogHeader>

-        {/* Form */}
-        <form onSubmit={handleSubmit} className="p-6 space-y-4">
+        <form onSubmit={handleSubmit} className="space-y-4">
          {/* Error Message */}
          {error && (
-            <div className="flex items-center gap-3 p-4 bg-[var(--color-neo-error-bg)] text-[var(--color-neo-error-text)] border-3 border-[var(--color-neo-error-border)]">
-              <AlertCircle size={20} />
-              <span>{error}</span>
-              <button
-                type="button"
-                onClick={() => setError(null)}
-                className="ml-auto hover:opacity-70 transition-opacity"
-              >
-                <X size={16} />
-              </button>
-            </div>
+            <Alert variant="destructive">
+              <AlertCircle className="h-4 w-4" />
+              <AlertDescription className="flex items-center justify-between">
+                <span>{error}</span>
+                <Button
+                  type="button"
+                  variant="ghost"
+                  size="icon-xs"
+                  onClick={() => setError(null)}
+                >
+                  <X size={14} />
+                </Button>
+              </AlertDescription>
+            </Alert>
          )}

          {/* Category & Priority Row */}
          <div className="flex gap-4">
-            <div className="flex-1">
-              <label className="block font-display font-bold mb-2 uppercase text-sm">
-                Category
-              </label>
-              <input
+            <div className="flex-1 space-y-2">
+              <Label htmlFor="category">Category</Label>
+              <Input
+                id="category"
                type="text"
                value={category}
                onChange={(e) => setCategory(e.target.value)}
                placeholder="e.g., Authentication, UI, API"
-                className="neo-input"
                required
              />
            </div>
-            <div className="w-32">
-              <label className="block font-display font-bold mb-2 uppercase text-sm">
-                Priority
-              </label>
-              <input
+            <div className="w-32 space-y-2">
+              <Label htmlFor="priority">Priority</Label>
+              <Input
+                id="priority"
                type="number"
                value={priority}
                onChange={(e) => setPriority(e.target.value)}
                placeholder="Auto"
                min="1"
-                className="neo-input"
              />
            </div>
          </div>

          {/* Name */}
-          <div>
-            <label className="block font-display font-bold mb-2 uppercase text-sm">
-              Feature Name
-            </label>
-            <input
+          <div className="space-y-2">
+            <Label htmlFor="name">Feature Name</Label>
+            <Input
+              id="name"
              type="text"
              value={name}
              onChange={(e) => setName(e.target.value)}
              placeholder="e.g., User login form"
-              className="neo-input"
              required
            />
          </div>

          {/* Description */}
-          <div>
-            <label className="block font-display font-bold mb-2 uppercase text-sm">
-              Description
-            </label>
-            <textarea
+          <div className="space-y-2">
+            <Label htmlFor="description">Description</Label>
+            <Textarea
+              id="description"
              value={description}
              onChange={(e) => setDescription(e.target.value)}
              placeholder="Describe what this feature should do..."
-              className="neo-input min-h-[100px] resize-y"
+              className="min-h-[100px] resize-y"
              required
            />
          </div>

          {/* Steps */}
-          <div>
-            <label className="block font-display font-bold mb-2 uppercase text-sm">
-              Test Steps (Optional)
-            </label>
+          <div className="space-y-2">
+            <Label>Test Steps (Optional)</Label>
            <div className="space-y-2">
              {steps.map((step, index) => (
                <div key={step.id} className="flex gap-2 items-center">
-                  <span
-                    className="w-10 h-10 flex-shrink-0 flex items-center justify-center font-mono font-bold text-sm border-3 border-[var(--color-neo-border)] bg-[var(--color-neo-bg)] text-[var(--color-neo-text-secondary)]"
-                    style={{ boxShadow: 'var(--shadow-neo-sm)' }}
-                  >
+                  <span className="w-10 h-10 flex-shrink-0 flex items-center justify-center font-mono font-semibold text-sm border rounded-md bg-muted text-muted-foreground">
                    {index + 1}
                  </span>
-                  <input
+                  <Input
                    type="text"
                    value={step.value}
                    onChange={(e) => handleStepChange(step.id, e.target.value)}
                    placeholder="Describe this step..."
-                    className="neo-input flex-1"
+                    className="flex-1"
                  />
                  {steps.length > 1 && (
-                    <button
+                    <Button
                      type="button"
+                      variant="ghost"
+                      size="icon"
                      onClick={() => handleRemoveStep(step.id)}
-                      className="neo-btn neo-btn-ghost p-2"
                    >
                      <Trash2 size={18} />
-                    </button>
+                    </Button>
                  )}
                </div>
              ))}
            </div>
-            <button
+            <Button
              type="button"
+              variant="ghost"
+              size="sm"
              onClick={handleAddStep}
-              className="neo-btn neo-btn-ghost mt-2 text-sm"
            >
              <Plus size={16} />
              Add Step
-            </button>
+            </Button>
          </div>

          {/* Actions */}
-          <div className="flex gap-3 pt-4 border-t-3 border-[var(--color-neo-border)]">
-            <button
+          <DialogFooter className="pt-4 border-t">
+            <Button
+              type="button"
+              variant="outline"
+              onClick={onClose}
+            >
+              Cancel
+            </Button>
+            <Button
              type="submit"
              disabled={!isValid || createFeature.isPending}
-              className="neo-btn neo-btn-success flex-1"
            >
              {createFeature.isPending ? (
                <Loader2 size={18} className="animate-spin" />
@@ -217,17 +215,10 @@ export function AddFeatureForm({ projectName, onClose }: AddFeatureFormProps) {
                  Create Feature
                </>
              )}
-            </button>
-            <button
-              type="button"
-              onClick={onClose}
-              className="neo-btn neo-btn-ghost"
-            >
-              Cancel
-            </button>
-          </div>
+            </Button>
+          </DialogFooter>
        </form>
-      </div>
-    </div>
+      </DialogContent>
+    </Dialog>
  )
 }
--- a/ui/src/components/AgentAvatar.tsx
+++ b/ui/src/components/AgentAvatar.tsx
@@ -1,4 +1,10 @@
 import { type AgentMascot, type AgentState } from '../lib/types'
+import {
+  AVATAR_COLORS,
+  UNKNOWN_COLORS,
+  MASCOT_SVGS,
+  UnknownMascotSVG,
+} from './mascotData'

 interface AgentAvatarProps {
  name: AgentMascot | 'Unknown'
@@ -7,515 +13,12 @@ interface AgentAvatarProps {
  showName?: boolean
 }

-// Fallback colors for unknown agents (neutral gray)
-const UNKNOWN_COLORS = { primary: '#6B7280', secondary: '#9CA3AF', accent: '#F3F4F6' }
-
-const AVATAR_COLORS: Record<AgentMascot, { primary: string; secondary: string; accent: string }> = {
-  // Original 5
-  Spark: { primary: '#3B82F6', secondary: '#60A5FA', accent: '#DBEAFE' },  // Blue robot
-  Fizz: { primary: '#F97316', secondary: '#FB923C', accent: '#FFEDD5' },   // Orange fox
-  Octo: { primary: '#8B5CF6', secondary: '#A78BFA', accent: '#EDE9FE' },   // Purple octopus
-  Hoot: { primary: '#22C55E', secondary: '#4ADE80', accent: '#DCFCE7' },   // Green owl
-  Buzz: { primary: '#EAB308', secondary: '#FACC15', accent: '#FEF9C3' },   // Yellow bee
-  // Tech-inspired
-  Pixel: { primary: '#EC4899', secondary: '#F472B6', accent: '#FCE7F3' },  // Pink
-  Byte: { primary: '#06B6D4', secondary: '#22D3EE', accent: '#CFFAFE' },   // Cyan
-  Nova: { primary: '#F43F5E', secondary: '#FB7185', accent: '#FFE4E6' },   // Rose
-  Chip: { primary: '#84CC16', secondary: '#A3E635', accent: '#ECFCCB' },   // Lime
-  Bolt: { primary: '#FBBF24', secondary: '#FCD34D', accent: '#FEF3C7' },   // Amber
-  // Energetic
-  Dash: { primary: '#14B8A6', secondary: '#2DD4BF', accent: '#CCFBF1' },   // Teal
-  Zap: { primary: '#A855F7', secondary: '#C084FC', accent: '#F3E8FF' },    // Violet
-  Gizmo: { primary: '#64748B', secondary: '#94A3B8', accent: '#F1F5F9' },  // Slate
-  Turbo: { primary: '#EF4444', secondary: '#F87171', accent: '#FEE2E2' },  // Red
-  Blip: { primary: '#10B981', secondary: '#34D399', accent: '#D1FAE5' },   // Emerald
-  // Playful
-  Neon: { primary: '#D946EF', secondary: '#E879F9', accent: '#FAE8FF' },   // Fuchsia
-  Widget: { primary: '#6366F1', secondary: '#818CF8', accent: '#E0E7FF' }, // Indigo
-  Zippy: { primary: '#F59E0B', secondary: '#FBBF24', accent: '#FEF3C7' },  // Orange-yellow
-  Quirk: { primary: '#0EA5E9', secondary: '#38BDF8', accent: '#E0F2FE' },  // Sky
-  Flux: { primary: '#7C3AED', secondary: '#8B5CF6', accent: '#EDE9FE' },   // Purple
-}
-
 const SIZES = {
  sm: { svg: 32, font: 'text-xs' },
  md: { svg: 48, font: 'text-sm' },
  lg: { svg: 64, font: 'text-base' },
 }

-// SVG mascot definitions - simple cute characters
-function SparkSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Spark; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Robot body */}
-      <rect x="16" y="20" width="32" height="28" rx="4" fill={colors.primary} />
-      {/* Robot head */}
-      <rect x="12" y="8" width="40" height="24" rx="4" fill={colors.secondary} />
-      {/* Antenna */}
-      <circle cx="32" cy="4" r="4" fill={colors.primary} className="animate-pulse" />
-      <rect x="30" y="4" width="4" height="8" fill={colors.primary} />
-      {/* Eyes */}
-      <circle cx="24" cy="18" r="4" fill="white" />
-      <circle cx="40" cy="18" r="4" fill="white" />
-      <circle cx="25" cy="18" r="2" fill={colors.primary} />
-      <circle cx="41" cy="18" r="2" fill={colors.primary} />
-      {/* Mouth */}
-      <rect x="26" y="24" width="12" height="2" rx="1" fill="white" />
-      {/* Arms */}
-      <rect x="6" y="24" width="8" height="4" rx="2" fill={colors.primary} />
-      <rect x="50" y="24" width="8" height="4" rx="2" fill={colors.primary} />
-    </svg>
-  )
-}
-
-function FizzSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Fizz; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Ears */}
-      <polygon points="12,12 20,28 4,28" fill={colors.primary} />
-      <polygon points="52,12 60,28 44,28" fill={colors.primary} />
-      <polygon points="14,14 18,26 8,26" fill={colors.accent} />
-      <polygon points="50,14 56,26 44,26" fill={colors.accent} />
-      {/* Head */}
-      <ellipse cx="32" cy="36" rx="24" ry="22" fill={colors.primary} />
-      {/* Face */}
-      <ellipse cx="32" cy="40" rx="18" ry="14" fill={colors.accent} />
-      {/* Eyes */}
-      <ellipse cx="24" cy="32" rx="4" ry="5" fill="white" />
-      <ellipse cx="40" cy="32" rx="4" ry="5" fill="white" />
-      <circle cx="25" cy="33" r="2" fill="#1a1a1a" />
-      <circle cx="41" cy="33" r="2" fill="#1a1a1a" />
-      {/* Nose */}
-      <ellipse cx="32" cy="42" rx="4" ry="3" fill={colors.primary} />
-      {/* Whiskers */}
-      <line x1="8" y1="38" x2="18" y2="40" stroke={colors.primary} strokeWidth="2" />
-      <line x1="8" y1="44" x2="18" y2="44" stroke={colors.primary} strokeWidth="2" />
-      <line x1="46" y1="40" x2="56" y2="38" stroke={colors.primary} strokeWidth="2" />
-      <line x1="46" y1="44" x2="56" y2="44" stroke={colors.primary} strokeWidth="2" />
-    </svg>
-  )
-}
-
-function OctoSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Octo; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Tentacles */}
-      <path d="M12,48 Q8,56 12,60 Q16,64 20,58" fill={colors.secondary} />
-      <path d="M22,50 Q20,58 24,62" fill={colors.secondary} />
-      <path d="M32,52 Q32,60 36,62" fill={colors.secondary} />
-      <path d="M42,50 Q44,58 40,62" fill={colors.secondary} />
-      <path d="M52,48 Q56,56 52,60 Q48,64 44,58" fill={colors.secondary} />
-      {/* Head */}
-      <ellipse cx="32" cy="32" rx="22" ry="24" fill={colors.primary} />
-      {/* Eyes */}
-      <ellipse cx="24" cy="28" rx="6" ry="8" fill="white" />
-      <ellipse cx="40" cy="28" rx="6" ry="8" fill="white" />
-      <ellipse cx="25" cy="30" rx="3" ry="4" fill={colors.primary} />
-      <ellipse cx="41" cy="30" rx="3" ry="4" fill={colors.primary} />
-      {/* Smile */}
-      <path d="M24,42 Q32,48 40,42" stroke={colors.accent} strokeWidth="2" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-function HootSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Hoot; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Ear tufts */}
-      <polygon points="14,8 22,24 6,20" fill={colors.primary} />
-      <polygon points="50,8 58,20 42,24" fill={colors.primary} />
-      {/* Body */}
-      <ellipse cx="32" cy="40" rx="20" ry="18" fill={colors.primary} />
-      {/* Head */}
-      <circle cx="32" cy="28" r="20" fill={colors.secondary} />
-      {/* Eye circles */}
-      <circle cx="24" cy="26" r="10" fill={colors.accent} />
-      <circle cx="40" cy="26" r="10" fill={colors.accent} />
-      {/* Eyes */}
-      <circle cx="24" cy="26" r="6" fill="white" />
-      <circle cx="40" cy="26" r="6" fill="white" />
-      <circle cx="25" cy="27" r="3" fill="#1a1a1a" />
-      <circle cx="41" cy="27" r="3" fill="#1a1a1a" />
-      {/* Beak */}
-      <polygon points="32,32 28,40 36,40" fill="#F97316" />
-      {/* Belly */}
-      <ellipse cx="32" cy="46" rx="10" ry="8" fill={colors.accent} />
-    </svg>
-  )
-}
-
-function BuzzSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Buzz; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Wings */}
-      <ellipse cx="14" cy="32" rx="10" ry="14" fill={colors.accent} opacity="0.8" className="animate-pulse" />
-      <ellipse cx="50" cy="32" rx="10" ry="14" fill={colors.accent} opacity="0.8" className="animate-pulse" />
-      {/* Body stripes */}
-      <ellipse cx="32" cy="36" rx="14" ry="20" fill={colors.primary} />
-      <ellipse cx="32" cy="30" rx="12" ry="6" fill="#1a1a1a" />
-      <ellipse cx="32" cy="44" rx="12" ry="6" fill="#1a1a1a" />
-      {/* Head */}
-      <circle cx="32" cy="16" r="12" fill={colors.primary} />
-      {/* Antennae */}
-      <line x1="26" y1="8" x2="22" y2="2" stroke="#1a1a1a" strokeWidth="2" />
-      <line x1="38" y1="8" x2="42" y2="2" stroke="#1a1a1a" strokeWidth="2" />
-      <circle cx="22" cy="2" r="2" fill="#1a1a1a" />
-      <circle cx="42" cy="2" r="2" fill="#1a1a1a" />
-      {/* Eyes */}
-      <circle cx="28" cy="14" r="4" fill="white" />
-      <circle cx="36" cy="14" r="4" fill="white" />
-      <circle cx="29" cy="15" r="2" fill="#1a1a1a" />
-      <circle cx="37" cy="15" r="2" fill="#1a1a1a" />
-      {/* Smile */}
-      <path d="M28,20 Q32,24 36,20" stroke="#1a1a1a" strokeWidth="1.5" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Pixel - cute pixel art style character
-function PixelSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Pixel; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Blocky body */}
-      <rect x="20" y="28" width="24" height="28" fill={colors.primary} />
-      <rect x="16" y="32" width="8" height="20" fill={colors.secondary} />
-      <rect x="40" y="32" width="8" height="20" fill={colors.secondary} />
-      {/* Head */}
-      <rect x="16" y="8" width="32" height="24" fill={colors.primary} />
-      {/* Eyes */}
-      <rect x="20" y="14" width="8" height="8" fill="white" />
-      <rect x="36" y="14" width="8" height="8" fill="white" />
-      <rect x="24" y="16" width="4" height="4" fill="#1a1a1a" />
-      <rect x="38" y="16" width="4" height="4" fill="#1a1a1a" />
-      {/* Mouth */}
-      <rect x="26" y="26" width="12" height="4" fill={colors.accent} />
-    </svg>
-  )
-}
-
-// Byte - data cube character
-function ByteSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Byte; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* 3D cube body */}
-      <polygon points="32,8 56,20 56,44 32,56 8,44 8,20" fill={colors.primary} />
-      <polygon points="32,8 56,20 32,32 8,20" fill={colors.secondary} />
-      <polygon points="32,32 56,20 56,44 32,56" fill={colors.accent} opacity="0.6" />
-      {/* Face */}
-      <circle cx="24" cy="28" r="4" fill="white" />
-      <circle cx="40" cy="28" r="4" fill="white" />
-      <circle cx="25" cy="29" r="2" fill="#1a1a1a" />
-      <circle cx="41" cy="29" r="2" fill="#1a1a1a" />
-      <path d="M26,38 Q32,42 38,38" stroke="white" strokeWidth="2" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Nova - star character
-function NovaSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Nova; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Star points */}
-      <polygon points="32,2 38,22 58,22 42,36 48,56 32,44 16,56 22,36 6,22 26,22" fill={colors.primary} />
-      <circle cx="32" cy="32" r="14" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="27" cy="30" r="3" fill="white" />
-      <circle cx="37" cy="30" r="3" fill="white" />
-      <circle cx="28" cy="31" r="1.5" fill="#1a1a1a" />
-      <circle cx="38" cy="31" r="1.5" fill="#1a1a1a" />
-      <path d="M28,37 Q32,40 36,37" stroke="#1a1a1a" strokeWidth="1.5" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Chip - circuit board character
-function ChipSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Chip; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Chip body */}
-      <rect x="16" y="16" width="32" height="32" rx="4" fill={colors.primary} />
-      {/* Pins */}
-      <rect x="20" y="10" width="4" height="8" fill={colors.secondary} />
-      <rect x="30" y="10" width="4" height="8" fill={colors.secondary} />
-      <rect x="40" y="10" width="4" height="8" fill={colors.secondary} />
-      <rect x="20" y="46" width="4" height="8" fill={colors.secondary} />
-      <rect x="30" y="46" width="4" height="8" fill={colors.secondary} />
-      <rect x="40" y="46" width="4" height="8" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="26" cy="28" r="4" fill={colors.accent} />
-      <circle cx="38" cy="28" r="4" fill={colors.accent} />
-      <circle cx="26" cy="28" r="2" fill="#1a1a1a" />
-      <circle cx="38" cy="28" r="2" fill="#1a1a1a" />
-      <rect x="26" y="38" width="12" height="3" rx="1" fill={colors.accent} />
-    </svg>
-  )
-}
-
-// Bolt - lightning character
-function BoltSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Bolt; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Lightning bolt body */}
-      <polygon points="36,4 20,28 30,28 24,60 48,32 36,32 44,4" fill={colors.primary} />
-      <polygon points="34,8 24,26 32,26 28,52 42,34 34,34 40,8" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="30" cy="30" r="3" fill="white" />
-      <circle cx="38" cy="26" r="3" fill="white" />
-      <circle cx="31" cy="31" r="1.5" fill="#1a1a1a" />
-      <circle cx="39" cy="27" r="1.5" fill="#1a1a1a" />
-    </svg>
-  )
-}
-
-// Dash - speedy character
-function DashSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Dash; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Speed lines */}
-      <rect x="4" y="28" width="12" height="3" rx="1" fill={colors.accent} opacity="0.6" />
-      <rect x="8" y="34" width="10" height="3" rx="1" fill={colors.accent} opacity="0.4" />
-      {/* Aerodynamic body */}
-      <ellipse cx="36" cy="32" rx="20" ry="16" fill={colors.primary} />
-      <ellipse cx="40" cy="32" rx="14" ry="12" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="38" cy="28" r="4" fill="white" />
-      <circle cx="48" cy="28" r="4" fill="white" />
-      <circle cx="39" cy="29" r="2" fill="#1a1a1a" />
-      <circle cx="49" cy="29" r="2" fill="#1a1a1a" />
-      <path d="M40,36 Q44,39 48,36" stroke="#1a1a1a" strokeWidth="1.5" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Zap - electric orb
-function ZapSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Zap; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Electric sparks */}
-      <path d="M12,32 L20,28 L16,32 L22,30" stroke={colors.secondary} strokeWidth="2" className="animate-pulse" />
-      <path d="M52,32 L44,28 L48,32 L42,30" stroke={colors.secondary} strokeWidth="2" className="animate-pulse" />
-      {/* Orb */}
-      <circle cx="32" cy="32" r="18" fill={colors.primary} />
-      <circle cx="32" cy="32" r="14" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="26" cy="30" r="4" fill="white" />
-      <circle cx="38" cy="30" r="4" fill="white" />
-      <circle cx="27" cy="31" r="2" fill={colors.primary} />
-      <circle cx="39" cy="31" r="2" fill={colors.primary} />
-      <path d="M28,40 Q32,44 36,40" stroke="white" strokeWidth="2" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Gizmo - gear character
-function GizmoSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Gizmo; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Gear teeth */}
-      <rect x="28" y="4" width="8" height="8" fill={colors.primary} />
-      <rect x="28" y="52" width="8" height="8" fill={colors.primary} />
-      <rect x="4" y="28" width="8" height="8" fill={colors.primary} />
-      <rect x="52" y="28" width="8" height="8" fill={colors.primary} />
-      {/* Gear body */}
-      <circle cx="32" cy="32" r="20" fill={colors.primary} />
-      <circle cx="32" cy="32" r="14" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="26" cy="30" r="4" fill="white" />
-      <circle cx="38" cy="30" r="4" fill="white" />
-      <circle cx="27" cy="31" r="2" fill="#1a1a1a" />
-      <circle cx="39" cy="31" r="2" fill="#1a1a1a" />
-      <path d="M28,40 Q32,43 36,40" stroke="#1a1a1a" strokeWidth="2" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Turbo - rocket character
-function TurboSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Turbo; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Flames */}
-      <ellipse cx="32" cy="58" rx="8" ry="6" fill="#FBBF24" className="animate-pulse" />
-      <ellipse cx="32" cy="56" rx="5" ry="4" fill="#FCD34D" />
-      {/* Rocket body */}
-      <ellipse cx="32" cy="32" rx="14" ry="24" fill={colors.primary} />
-      {/* Nose cone */}
-      <ellipse cx="32" cy="12" rx="8" ry="10" fill={colors.secondary} />
-      {/* Fins */}
-      <polygon points="18,44 10,56 18,52" fill={colors.secondary} />
-      <polygon points="46,44 54,56 46,52" fill={colors.secondary} />
-      {/* Window/Face */}
-      <circle cx="32" cy="28" r="8" fill={colors.accent} />
-      <circle cx="29" cy="27" r="2" fill="#1a1a1a" />
-      <circle cx="35" cy="27" r="2" fill="#1a1a1a" />
-      <path d="M29,32 Q32,34 35,32" stroke="#1a1a1a" strokeWidth="1" fill="none" />
-    </svg>
-  )
-}
-
-// Blip - radar dot character
-function BlipSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Blip; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Radar rings */}
-      <circle cx="32" cy="32" r="28" stroke={colors.accent} strokeWidth="2" fill="none" opacity="0.3" />
-      <circle cx="32" cy="32" r="22" stroke={colors.accent} strokeWidth="2" fill="none" opacity="0.5" />
-      {/* Main dot */}
-      <circle cx="32" cy="32" r="14" fill={colors.primary} />
-      <circle cx="32" cy="32" r="10" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="28" cy="30" r="3" fill="white" />
-      <circle cx="36" cy="30" r="3" fill="white" />
-      <circle cx="29" cy="31" r="1.5" fill="#1a1a1a" />
-      <circle cx="37" cy="31" r="1.5" fill="#1a1a1a" />
-      <path d="M29,37 Q32,40 35,37" stroke="white" strokeWidth="1.5" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Neon - glowing character
-function NeonSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Neon; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Glow effect */}
-      <circle cx="32" cy="32" r="26" fill={colors.accent} opacity="0.3" />
-      <circle cx="32" cy="32" r="22" fill={colors.accent} opacity="0.5" />
-      {/* Body */}
-      <circle cx="32" cy="32" r="18" fill={colors.primary} />
-      {/* Inner glow */}
-      <circle cx="32" cy="32" r="12" fill={colors.secondary} />
-      {/* Face */}
-      <circle cx="27" cy="30" r="4" fill="white" />
-      <circle cx="37" cy="30" r="4" fill="white" />
-      <circle cx="28" cy="31" r="2" fill={colors.primary} />
-      <circle cx="38" cy="31" r="2" fill={colors.primary} />
-      <path d="M28,38 Q32,42 36,38" stroke="white" strokeWidth="2" fill="none" strokeLinecap="round" />
-    </svg>
-  )
-}
-
-// Widget - UI component character
-function WidgetSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Widget; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Window frame */}
-      <rect x="8" y="12" width="48" height="40" rx="4" fill={colors.primary} />
-      {/* Title bar */}
-      <rect x="8" y="12" width="48" height="10" rx="4" fill={colors.secondary} />
-      <circle cx="16" cy="17" r="2" fill="#EF4444" />
-      <circle cx="24" cy="17" r="2" fill="#FBBF24" />
-      <circle cx="32" cy="17" r="2" fill="#22C55E" />
-      {/* Content area / Face */}
-      <rect x="12" y="26" width="40" height="22" rx="2" fill={colors.accent} />
-      <circle cx="24" cy="34" r="4" fill="white" />
-      <circle cx="40" cy="34" r="4" fill="white" />
-      <circle cx="25" cy="35" r="2" fill={colors.primary} />
-      <circle cx="41" cy="35" r="2" fill={colors.primary} />
-      <rect x="28" y="42" width="8" height="3" rx="1" fill={colors.primary} />
-    </svg>
-  )
-}
-
-// Zippy - fast bunny-like character
-function ZippySVG({ colors, size }: { colors: typeof AVATAR_COLORS.Zippy; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Ears */}
-      <ellipse cx="22" cy="14" rx="6" ry="14" fill={colors.primary} />
-      <ellipse cx="42" cy="14" rx="6" ry="14" fill={colors.primary} />
-      <ellipse cx="22" cy="14" rx="3" ry="10" fill={colors.accent} />
-      <ellipse cx="42" cy="14" rx="3" ry="10" fill={colors.accent} />
-      {/* Head */}
-      <circle cx="32" cy="38" r="20" fill={colors.primary} />
-      {/* Face */}
-      <circle cx="24" cy="34" r="5" fill="white" />
-      <circle cx="40" cy="34" r="5" fill="white" />
-      <circle cx="25" cy="35" r="2.5" fill="#1a1a1a" />
-      <circle cx="41" cy="35" r="2.5" fill="#1a1a1a" />
-      {/* Nose and mouth */}
-      <ellipse cx="32" cy="44" rx="3" ry="2" fill={colors.secondary} />
-      <path d="M32,46 L32,50 M28,52 Q32,56 36,52" stroke="#1a1a1a" strokeWidth="1.5" fill="none" />
-    </svg>
-  )
-}
-
-// Quirk - question mark character
-function QuirkSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Quirk; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Question mark body */}
-      <path d="M24,20 Q24,8 32,8 Q44,8 44,20 Q44,28 32,32 L32,40"
-            stroke={colors.primary} strokeWidth="8" fill="none" strokeLinecap="round" />
-      <circle cx="32" cy="52" r="6" fill={colors.primary} />
-      {/* Face on the dot */}
-      <circle cx="29" cy="51" r="1.5" fill="white" />
-      <circle cx="35" cy="51" r="1.5" fill="white" />
-      <circle cx="29" cy="51" r="0.75" fill="#1a1a1a" />
-      <circle cx="35" cy="51" r="0.75" fill="#1a1a1a" />
-      {/* Decorative swirl */}
-      <circle cx="32" cy="20" r="4" fill={colors.secondary} />
-    </svg>
-  )
-}
-
-// Flux - flowing wave character
-function FluxSVG({ colors, size }: { colors: typeof AVATAR_COLORS.Flux; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none">
-      {/* Wave body */}
-      <path d="M8,32 Q16,16 32,32 Q48,48 56,32" stroke={colors.primary} strokeWidth="16" fill="none" strokeLinecap="round" />
-      <path d="M8,32 Q16,16 32,32 Q48,48 56,32" stroke={colors.secondary} strokeWidth="10" fill="none" strokeLinecap="round" />
-      {/* Face */}
-      <circle cx="28" cy="28" r="4" fill="white" />
-      <circle cx="40" cy="36" r="4" fill="white" />
-      <circle cx="29" cy="29" r="2" fill="#1a1a1a" />
-      <circle cx="41" cy="37" r="2" fill="#1a1a1a" />
-      {/* Sparkles */}
-      <circle cx="16" cy="24" r="2" fill={colors.accent} className="animate-pulse" />
-      <circle cx="48" cy="40" r="2" fill={colors.accent} className="animate-pulse" />
-    </svg>
-  )
-}
-
-// Unknown agent fallback - simple question mark icon
-function UnknownSVG({ colors, size }: { colors: typeof UNKNOWN_COLORS; size: number }) {
-  return (
-    <svg width={size} height={size} viewBox="0 0 64 64" fill="none" xmlns="http://www.w3.org/2000/svg">
-      {/* Circle background */}
-      <circle cx="32" cy="32" r="28" fill={colors.primary} />
-      <circle cx="32" cy="32" r="24" fill={colors.secondary} />
-      {/* Question mark */}
-      <text x="32" y="44" textAnchor="middle" fontSize="32" fontWeight="bold" fill="white">?</text>
-    </svg>
-  )
-}
-
-const MASCOT_SVGS: Record<AgentMascot, typeof SparkSVG> = {
-  // Original 5
-  Spark: SparkSVG,
-  Fizz: FizzSVG,
-  Octo: OctoSVG,
-  Hoot: HootSVG,
-  Buzz: BuzzSVG,
-  // Tech-inspired
-  Pixel: PixelSVG,
-  Byte: ByteSVG,
-  Nova: NovaSVG,
-  Chip: ChipSVG,
-  Bolt: BoltSVG,
-  // Energetic
-  Dash: DashSVG,
-  Zap: ZapSVG,
-  Gizmo: GizmoSVG,
-  Turbo: TurboSVG,
-  Blip: BlipSVG,
-  // Playful
-  Neon: NeonSVG,
-  Widget: WidgetSVG,
-  Zippy: ZippySVG,
-  Quirk: QuirkSVG,
-  Flux: FluxSVG,
-}
-
 // Animation classes based on state
 function getStateAnimation(state: AgentState): string {
  switch (state) {
@@ -581,7 +84,7 @@ export function AgentAvatar({ name, state, size = 'md', showName = false }: Agen
  const isUnknown = name === 'Unknown'
  const colors = isUnknown ? UNKNOWN_COLORS : AVATAR_COLORS[name]
  const { svg: svgSize, font } = SIZES[size]
-  const SvgComponent = isUnknown ? UnknownSVG : MASCOT_SVGS[name]
+  const SvgComponent = isUnknown ? UnknownMascotSVG : MASCOT_SVGS[name]
  const stateDesc = getStateDescription(state)
  const ariaLabel = `Agent ${name} is ${stateDesc}`

@@ -606,7 +109,7 @@ export function AgentAvatar({ name, state, size = 'md', showName = false }: Agen
        <SvgComponent colors={colors} size={svgSize} />
      </div>
      {showName && (
-        <span className={`${font} font-bold text-neo-text`} style={{ color: colors.primary }}>
+        <span className={`${font} font-bold text-foreground`} style={{ color: colors.primary }}>
          {name}
        </span>
      )}
--- a/ui/src/components/AgentCard.tsx
+++ b/ui/src/components/AgentCard.tsx
@@ -3,6 +3,9 @@ import { useState } from 'react'
 import { createPortal } from 'react-dom'
 import { AgentAvatar } from './AgentAvatar'
 import type { ActiveAgent, AgentLogEntry, AgentType } from '../lib/types'
+import { Card, CardContent } from '@/components/ui/card'
+import { Button } from '@/components/ui/button'
+import { Badge } from '@/components/ui/badge'

 interface AgentCardProps {
  agent: ActiveAgent
@@ -31,22 +34,22 @@ function getStateText(state: ActiveAgent['state']): string {
  }
 }

-// Get state color
+// Get state color class
 function getStateColor(state: ActiveAgent['state']): string {
  switch (state) {
    case 'success':
-      return 'text-neo-done'
+      return 'text-primary'
    case 'error':
-      return 'text-neo-pending'  // Yellow - just pivoting, not a real error
+      return 'text-yellow-600'
    case 'struggling':
-      return 'text-orange-500'   // Orange - working hard, being persistent
+      return 'text-orange-500'
    case 'working':
    case 'testing':
-      return 'text-neo-progress'
+      return 'text-primary'
    case 'thinking':
-      return 'text-neo-pending'
+      return 'text-yellow-600'
    default:
-      return 'text-neo-text-secondary'
+      return 'text-muted-foreground'
  }
 }

@@ -55,14 +58,13 @@ function getAgentTypeBadge(agentType: AgentType): { label: string; className: st
  if (agentType === 'testing') {
    return {
      label: 'TEST',
-      className: 'bg-purple-100 text-purple-700 border-purple-300',
+      className: 'bg-purple-100 text-purple-700 dark:bg-purple-900/30 dark:text-purple-300',
      icon: FlaskConical,
    }
  }
-  // Default to coding
  return {
    label: 'CODE',
-    className: 'bg-blue-100 text-blue-700 border-blue-300',
+    className: 'bg-blue-100 text-blue-700 dark:bg-blue-900/30 dark:text-blue-300',
    icon: Code,
  }
 }
@@ -74,75 +76,79 @@ export function AgentCard({ agent, onShowLogs }: AgentCardProps) {
  const TypeIcon = typeBadge.icon

  return (
-    <div
-      className={`
-        neo-card p-3 min-w-[180px] max-w-[220px]
-        ${isActive ? 'animate-pulse-neo' : ''}
-        transition-all duration-300
-      `}
-    >
-      {/* Agent type badge */}
-      <div className="flex justify-end mb-1">
-        <span
-          className={`
-            inline-flex items-center gap-1 px-1.5 py-0.5 text-[10px] font-bold
-            uppercase tracking-wide rounded border
-            ${typeBadge.className}
-          `}
-        >
-          <TypeIcon size={10} />
-          {typeBadge.label}
-        </span>
-      </div>
+    <Card className={`min-w-[180px] max-w-[220px] py-3 ${isActive ? 'animate-pulse' : ''}`}>
+      <CardContent className="p-3 space-y-2">
+        {/* Agent type badge */}
+        <div className="flex justify-end">
+          <Badge variant="outline" className={`text-[10px] ${typeBadge.className}`}>
+            <TypeIcon size={10} />
+            {typeBadge.label}
+          </Badge>
+        </div>

-      {/* Header with avatar and name */}
-      <div className="flex items-center gap-2 mb-2">
-        <AgentAvatar name={agent.agentName} state={agent.state} size="sm" />
-        <div className="flex-1 min-w-0">
-          <div className="font-display font-bold text-sm truncate">
-            {agent.agentName}
+        {/* Header with avatar and name */}
+        <div className="flex items-center gap-2">
+          <AgentAvatar name={agent.agentName} state={agent.state} size="sm" />
+          <div className="flex-1 min-w-0">
+            <div className="font-semibold text-sm truncate">
+              {agent.agentName}
+            </div>
+            <div className={`text-xs ${getStateColor(agent.state)}`}>
+              {getStateText(agent.state)}
+            </div>
          </div>
-          <div className={`text-xs ${getStateColor(agent.state)}`}>
-            {getStateText(agent.state)}
-          </div>
-        </div>
-        {/* Log button */}
-        {hasLogs && onShowLogs && (
-          <button
-            onClick={() => onShowLogs(agent.agentIndex)}
-            className="p-1 hover:bg-neo-bg-secondary rounded transition-colors"
-            title={`View logs (${agent.logs?.length || 0} entries)`}
-          >
-            <ScrollText size={14} className="text-neo-text-secondary" />
-          </button>
-        )}
-      </div>
-
-      {/* Feature info */}
-      <div className="mb-2">
-        <div className="text-xs text-neo-text-secondary mb-0.5">
-          Feature #{agent.featureId}
-        </div>
-        <div className="text-sm font-medium truncate" title={agent.featureName}>
-          {agent.featureName}
-        </div>
-      </div>
-
-      {/* Thought bubble */}
-      {agent.thought && (
-        <div className="relative mt-2 pt-2 border-t-2 border-neo-border/30">
-          <div className="flex items-start gap-1.5">
-            <MessageCircle size={14} className="text-neo-progress shrink-0 mt-0.5" />
-            <p
-              className="text-xs text-neo-text-secondary line-clamp-2 italic"
-              title={agent.thought}
+          {/* Log button */}
+          {hasLogs && onShowLogs && (
+            <Button
+              variant="ghost"
+              size="icon-xs"
+              onClick={() => onShowLogs(agent.agentIndex)}
+              title={`View logs (${agent.logs?.length || 0} entries)`}
            >
-              {agent.thought}
-            </p>
-          </div>
+              <ScrollText size={14} className="text-muted-foreground" />
+            </Button>
+          )}
        </div>
-      )}
-    </div>
+
+        {/* Feature info */}
+        <div>
+          {agent.featureIds && agent.featureIds.length > 1 ? (
+            <>
+              <div className="text-xs text-muted-foreground mb-0.5">
+                Batch: {agent.featureIds.map(id => `#${id}`).join(', ')}
+              </div>
+              <div className="text-sm font-bold truncate">
+                Active: Feature #{agent.featureId}
+              </div>
+            </>
+          ) : (
+            <>
+              <div className="text-xs text-muted-foreground mb-0.5">
+                Feature #{agent.featureId}
+              </div>
+              <div className="text-sm font-medium truncate" title={agent.featureName}>
+                {agent.featureName}
+              </div>
+            </>
+          )}
+        </div>
+
+        {/* Thought bubble */}
+        {agent.thought && (
+          <div className="pt-2 border-t border-border/50">
+            <div className="flex items-start gap-1.5">
+              <MessageCircle size={14} className="text-primary shrink-0 mt-0.5" />
+              <p
+                className="text-xs text-muted-foreground line-clamp-2 italic"
+                title={agent.thought}
+              >
+                {agent.thought}
+              </p>
+            </div>
+          </div>
+        )}
+      </CardContent>
+    </Card>
  )
 }

@@ -170,91 +176,79 @@ export function AgentLogModal({ agent, logs, onClose }: AgentLogModalProps) {
  const getLogColor = (type: AgentLogEntry['type']) => {
    switch (type) {
      case 'error':
-        return 'text-neo-danger'
+        return 'text-destructive'
      case 'state_change':
-        return 'text-neo-progress'
+        return 'text-primary'
      default:
-        return 'text-neo-text'
+        return 'text-foreground'
    }
  }

-  // Use portal to render modal at document body level (avoids overflow:hidden issues)
  return createPortal(
    <div
      className="fixed inset-0 flex items-center justify-center p-4 bg-black/50"
      style={{ zIndex: 9999 }}
      onClick={(e) => {
-        // Close when clicking backdrop
        if (e.target === e.currentTarget) onClose()
      }}
    >
-      <div className="neo-card w-full max-w-4xl max-h-[80vh] flex flex-col bg-neo-bg">
+      <Card className="w-full max-w-4xl max-h-[80vh] flex flex-col py-0">
        {/* Header */}
-        <div className="flex items-center justify-between p-4 border-b-3 border-neo-border">
+        <div className="flex items-center justify-between p-4 border-b">
          <div className="flex items-center gap-3">
            <AgentAvatar name={agent.agentName} state={agent.state} size="sm" />
            <div>
              <div className="flex items-center gap-2">
-                <h2 className="font-display font-bold text-lg">
+                <h2 className="font-semibold text-lg">
                  {agent.agentName} Logs
                </h2>
-                <span
-                  className={`
-                    inline-flex items-center gap-1 px-1.5 py-0.5 text-[10px] font-bold
-                    uppercase tracking-wide rounded border
-                    ${typeBadge.className}
-                  `}
-                >
+                <Badge variant="outline" className={`text-[10px] ${typeBadge.className}`}>
                  <TypeIcon size={10} />
                  {typeBadge.label}
-                </span>
+                </Badge>
              </div>
-              <p className="text-sm text-neo-text-secondary">
-                Feature #{agent.featureId}: {agent.featureName}
+              <p className="text-sm text-muted-foreground">
+                {agent.featureIds && agent.featureIds.length > 1
+                  ? `Batch: ${agent.featureIds.map(id => `#${id}`).join(', ')}`
+                  : `Feature #${agent.featureId}: ${agent.featureName}`
+                }
              </p>
            </div>
          </div>
          <div className="flex items-center gap-2">
-            <button
-              onClick={handleCopy}
-              className="neo-button neo-button-sm flex items-center gap-1"
-              title="Copy all logs"
-            >
+            <Button variant="outline" size="sm" onClick={handleCopy}>
              {copied ? <Check size={14} /> : <Copy size={14} />}
              {copied ? 'Copied!' : 'Copy'}
-            </button>
-            <button
-              onClick={onClose}
-              className="p-2 hover:bg-neo-bg-secondary rounded transition-colors"
-            >
+            </Button>
+            <Button variant="ghost" size="icon-sm" onClick={onClose}>
              <X size={20} />
-            </button>
+            </Button>
          </div>
        </div>

        {/* Log content */}
-        <div className="flex-1 overflow-auto p-4 bg-neo-bg-secondary font-mono text-xs">
-          {logs.length === 0 ? (
-            <p className="text-neo-text-secondary italic">No logs available</p>
-          ) : (
-            <div className="space-y-1">
-              {logs.map((log, idx) => (
+        <div className="flex-1 min-h-0 overflow-y-auto p-4 bg-muted/50">
+          <div className="font-mono text-xs space-y-1">
+            {logs.length === 0 ? (
+              <p className="text-muted-foreground italic">No logs available</p>
+            ) : (
+              logs.map((log, idx) => (
                <div key={idx} className={`${getLogColor(log.type)} whitespace-pre-wrap break-all`}>
-                  <span className="text-neo-muted">
+                  <span className="text-muted-foreground">
                    [{new Date(log.timestamp).toLocaleTimeString()}]
                  </span>{' '}
                  {log.line}
                </div>
-              ))}
-            </div>
-          )}
+              ))
+            )}
+          </div>
        </div>

        {/* Footer */}
-        <div className="p-3 border-t-2 border-neo-border/30 text-xs text-neo-text-secondary">
+        <div className="p-3 border-t text-xs text-muted-foreground">
          {logs.length} log entries
        </div>
-      </div>
+      </Card>
    </div>,
    document.body
  )
--- a/Show More
+++ b/Show More